📚 https://arxiv.org/abs/2006.11239
🏆 Published in NeurIPS 2020

✅ Day 1 – Abstract & Introduction

📌 Background & Motivation

Deep generative models (GANs, VAEs, autoregressive, flow-based) achieved strong results but had critical weaknesses:
- VAE: blurry samples from variational approximations
- GAN: unstable training, mode collapse
- Flow-based: heavy inductive biases, complex designs
These issues motivated a new direction for stable, high-quality generation.

📌 Core Idea

Reframe generation as a denoising process.
Forward process: add Gaussian noise step by step until data becomes pure noise.
Reverse process: learn to remove noise progressively, reconstructing data from random noise.
Gaussian formulation enables simple neural network training.

📌 Main Contributions

Produces high-quality synthesis, rivaling or beating GANs.
Stable training without adversarial tricks.
Simple MSE objective → predict noise directly.
Shows a theoretical link between diffusion models, denoising score matching, and Langevin dynamics.

📌 Early Results

CIFAR-10: IS = 9.46, FID = 3.17 (state-of-the-art at the time).
LSUN 256×256: rivaled ProgressiveGAN in sample quality.

📌 Key Takeaways (Day 1)

Diffusion models redefine generation as noise removal.
Avoid major drawbacks of GANs/VAEs with stable training.
Achieve SOTA results with a simple and interpretable framework.

🧠 Final Thoughts (Day 1)

Day 1 shows how diffusion models emerged as a clean, stable alternative to adversarial or variational approaches.
The elegance of turning generation into progressive denoising laid the foundation for their rapid adoption in vision tasks.