📚 https://arxiv.org/abs/2011.04216
🧑‍💻 GitHub: microsoft/dowhy

✅ Day 3 – Causal Estimation in Action (w/ Refutation Tests)

Today I tested the full DoWhy pipeline on a synthetic dataset.
The goal was to compare a naive correlation-based estimate with a true causal estimate,
and then assess the robustness of that estimate using placebo and subset refutation methods.


📌 Experimental Setup

🔧 Seed fixed at 42 for reproducibility


📊 Step-by-Step Results

Method Estimated Effect Notes
Naive Linear Regression 15.48 Overestimates due to unadjusted confounding
DoWhy (Backdoor + Linear Regression) 9.999 Effect after adjusting for confounders (W0 ~ W3)

🔍 Refutation Test Results

✅ Refute: Placebo Treatment

Estimated effect: 9.999
New effect (placebo): 0.0038
p-value: 0.86

→ When the treatment is replaced with a random variable, the estimated effect disappears.
This indicates the original causal effect was not due to chance — a strong positive sign.


✅ Refute: Subset of Data

Estimated effect: 9.999
New effect (subset): 9.999
p-value: 0.92

→ Even when using a random subset of the data, the effect remains nearly identical.
This suggests the effect is stable and generalizable across different samples.


📈 Visual Summary

Bar chart comparing:

📎 Image:
Effect Estimates Barplot


🧠 Final Thoughts

This experiment showed how naive analysis can overestimate effects,
and how DoWhy enables more disciplined and robust causal inference.

The refutation results further support that the estimated effect is:

✅ DoWhy enforces transparency and repeatability — a core need in real-world causal analysis.