📌 Paper Info

Title: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Authors: Mingxing Tan, Quoc V. Le
Link: arXiv 1905.11946
Published: ICML 2019 (Google AI)

🧠 Day 3 – Deep Dive into EfficientNet-B0 Architecture & Scaling Coefficients

Today’s study focused on understanding how the EfficientNet-B0 baseline is constructed via NAS, and how the compound scaling coefficients \( \alpha, \beta, \gamma \) are derived and applied in practice.

⚙️ EfficientNet-B0: NAS-Based Baseline

EfficientNet-B0 is designed using Neural Architecture Search (NAS) with the same search space as MnasNet, but targeting a larger FLOPs budget (400M).
The architecture balances accuracy and efficiency using a multi-objective function:

\[\text{Objective} = \text{ACC}(M) \cdot \left( \frac{\text{FLOPs}(M)}{T} \right)^w\]

\( T = 400M \): FLOPs target
\( w = -0.07 \): trade-off factor between accuracy and cost

🔧 Core Components:

MBConv blocks (Mobile Inverted Bottlenecks)
Squeeze-and-Excitation (SE) modules for channel-wise attention
Expansion ratio:
- MBConv1 (expansion=1) used in early layers
- MBConv6 (expansion=6) used in later layers
Skip connections only when stride = 1 and input/output shapes match

🧮 Compound Scaling Revisited

After defining B0, the paper introduces compound scaling, where model dimensions grow in a coordinated manner:

\[\text{depth} \propto \alpha^{\phi}, \quad \text{width} \propto \beta^{\phi}, \quad \text{resolution} \propto \gamma^{\phi}\]

\( \phi \): user-defined scaling coefficient
\( \alpha = 1.2 \), \( \beta = 1.1 \), \( \gamma = 1.15 \) (found via grid search)
Subject to the constraint:

\[\alpha \cdot \beta^2 \cdot \gamma^2 \approx 2\]

This ensures FLOPs double with each unit increase in \( \phi \), making the scaling predictable and efficient.

💡 Why Find Coefficients on a Small Model?

Searching for optimal \( \alpha, \beta, \gamma \) on large models is expensive
EfficientNet finds them on B0, then applies them to B1~B7
This reduces search cost while keeping scaling behavior consistent

🔍 Key Insights

EfficientNet-B0 is not manually designed, but NAS-optimized under computational constraints
MBConv blocks with SE units provide expressive yet efficient computation
The compound scaling method provides a unified, constraint-aware way to scale networks
FLOPs increase roughly as \( 2^{\phi} \), while keeping architecture balanced

💬 Personal Reflection

The use of a small, well-designed base model (B0) and then applying uniform scaling using simple coefficients is both elegant and practical.
Instead of engineering each model version, EfficientNet grows predictably in all dimensions, delivering SOTA accuracy with fewer resources.

🔖 This post is part of an ongoing paper review series for deeper learning and long-term retention!