Today’s session focused on Section 4.3: Scaling Results, which evaluates how the EfficientNet family (B0~B7) performs as the compound scaling coefficient \( \phi \) increases. The paper compares accuracy, parameter size, and computational cost (FLOPs), and provides practical guidance on selecting the right model for different resource settings.
The compound scaling method is applied as follows:
\[\text{depth} \propto \alpha^{\phi}, \quad \text{width} \propto \beta^{\phi}, \quad \text{resolution} \propto \gamma^{\phi}\]This approach allows a balanced and predictable scaling of model complexity.
Table 2 in the paper presents a clear trend:
Model | Params | FLOPs | Top-1 Acc (%) |
---|---|---|---|
B0 | 5.3M | 0.39B | 77.1 |
B3 | 12M | 1.8B | 81.6 |
B7 | 66M | 37B | 84.3 |
At what point do returns diminish?
Performance gains start flattening notably around B4 to B5, where doubling FLOPs yields <1% improvement in accuracy.
This suggests that for many real-world applications, mid-sized models (B2~B4) offer the best cost-performance trade-off.
In my analysis, EfficientNet-B3 stands out as the most cost-effective:
This makes it ideal for use cases where high accuracy is needed without massive computational resources.
The compound scaling framework shows its strength here: with a single set of scaling coefficients, EfficientNet scales smoothly from edge-device models (B0/B1) to high-end configurations (B6/B7).
By analyzing Table 2, I’ve gained a clearer understanding of how to choose the right model depending on resource constraints, and the point at which adding more compute no longer justifies the cost.
🔖 Stay tuned for Day 5, where I’ll dive into the Ablation Study (Section 4.4) and explore why compound scaling outperforms single-dimension scaling!