📚 https://arxiv.org/abs/2011.04216
🧑‍💻 GitHub: microsoft/dowhy

✅ Day 3 – Causal Estimation in Action (w/ Refutation Tests)

Today I tested the full DoWhy pipeline on a synthetic dataset.
The goal was to compare a naive correlation-based estimate with a true causal estimate,
and then assess the robustness of that estimate using placebo and subset refutation methods.

📌 Experimental Setup

Dataset: Synthetic data with binary treatment, linear outcome, and 4 confounders
True Causal Effect (ground truth): set to \( \beta = 10 \)
Goal: Estimate \( \beta \) using both naive regression and DoWhy, then evaluate robustness

🔧 Seed fixed at 42 for reproducibility

📊 Step-by-Step Results

Method	Estimated Effect	Notes
Naive Linear Regression	15.48	Overestimates due to unadjusted confounding
DoWhy (Backdoor + Linear Regression)	9.999	Effect after adjusting for confounders (W0 ~ W3)

🔍 Refutation Test Results

✅ Refute: Placebo Treatment

Estimated effect: 9.999
New effect (placebo): 0.0038
p-value: 0.86

→ When the treatment is replaced with a random variable, the estimated effect disappears.
This indicates the original causal effect was not due to chance — a strong positive sign.

✅ Refute: Subset of Data

Estimated effect: 9.999
New effect (subset): 9.999
p-value: 0.92

→ Even when using a random subset of the data, the effect remains nearly identical.
This suggests the effect is stable and generalizable across different samples.

📈 Visual Summary

Bar chart comparing:

Naive Regression
DoWhy Estimate
Placebo Refutation
Subset Refutation

📎 Image:
Effect Estimates Barplot

🧠 Final Thoughts

This experiment showed how naive analysis can overestimate effects,
and how DoWhy enables more disciplined and robust causal inference.

The refutation results further support that the estimated effect is:

Not spurious (placebo test)
Not sample-dependent (subset test)

✅ DoWhy enforces transparency and repeatability — a core need in real-world causal analysis.