📚 https://arxiv.org/abs/2011.04216
🧑💻 GitHub: microsoft/dowhy
Today I tested the full DoWhy pipeline on a synthetic dataset.
The goal was to compare a naive correlation-based estimate with a true causal estimate,
and then assess the robustness of that estimate using placebo and subset refutation methods.
🔧 Seed fixed at 42 for reproducibility
Method | Estimated Effect | Notes |
---|---|---|
Naive Linear Regression | 15.48 | Overestimates due to unadjusted confounding |
DoWhy (Backdoor + Linear Regression) | 9.999 | Effect after adjusting for confounders (W0 ~ W3) |
Estimated effect: 9.999
New effect (placebo): 0.0038
p-value: 0.86
→ When the treatment is replaced with a random variable, the estimated effect disappears.
This indicates the original causal effect was not due to chance — a strong positive sign.
Estimated effect: 9.999
New effect (subset): 9.999
p-value: 0.92
→ Even when using a random subset of the data, the effect remains nearly identical.
This suggests the effect is stable and generalizable across different samples.
Bar chart comparing:
📎 Image:
This experiment showed how naive analysis can overestimate effects,
and how DoWhy enables more disciplined and robust causal inference.
The refutation results further support that the estimated effect is:
✅ DoWhy enforces transparency and repeatability — a core need in real-world causal analysis.