Method | In-sample  ϵATE | Out-of-sample ϵATE |
---|
BNN | 0.37 ± .03 | 0.42 ±.03 |
BLR | 0.72 ±.04 | 0.93 ±.05 |
TARNet | 0.26 ±.01 | 0.28 ±.01 |
CFR MMD | 0.30 ±.01 | 0.31 ±.01 |
CFR WASS | 0.25 ± .01 | 0.27 ±.01 |
GANITE | 0.43 ± .05 | 0.49 ± .05 |
Dragonnet | 0.14 ± .01 | 0.21 ± .01 |
CEVAE | 0.34 ±.01 | 0.46 ±.02 |
BART | 0.47 ±.02 | 0.66 ±.03 |
BCAUS IPTW | 0.30 ± .01 | 0.60 ±.02 |
BCAUS DR | 0.13 ±.00 | 0.29 ±.01 |
- We include BART for comparison even though it is not neural network based. ϵATE (lower is better) is the mean absolute error between estimated ATE and ground-truth ATE. BNN Balancing Neural Network [21], BLR Balancing Linear Regression [21], TARNet Treatment-Agnostic Representation Network [22], CFR Counterfactual Regression [22], GANITE Generative Adversarial Nets for inference of Individualized Treatment Effects [24], Dragonnet [19], CEVAE Causal Effect Variational Autoencoder [23], BART Bayesian Additive Regression Trees [16]. In-sample value is computed on 672 examples (training + cross-validation) and the out-of-sample value is computed on 75 examples in the hold-out set. The standard error across 1000 realizations is reported as the uncertainty. Performance of BCAUS is comparable to other models