📌 Paper Info


✅ Day 6 – My Reproduction Results

Today, I completed and summarized my reproduction experiments for EfficientNet’s compound scaling strategy.
By testing base, depth-only, width-only, and compound-scaled models on CIFAR-10, I was able to confirm that compound scaling consistently delivers the highest validation accuracy and lowest loss — even if the gap is small due to the dataset’s simplicity.


📈 Summary of Experimental Results

Model FLOPs (MMac) Params (M) Val Acc (Best) Val Loss (Lowest)
Base (B0) 408.93 4.02 93.82% 0.2045
Depth-only 533.91 4.02 93.85% 0.1951
Width-only 578.40 4.02 93.69% 0.1936
Compound 838.07 4.02 93.98% 0.1924

📌 While differences were small, compound scaling still showed the best overall performance.
I expect the gap to widen on more complex datasets like CIFAR-100 or TinyImageNet.

Visualizations and training logs are available in my GitHub repo under Paper-Review/vision/02_efficientnet/.


💡 Reflection & Insights


🔭 What’s Next: Pose-Based Action Recognition

Starting today, I’m moving on to a new topic:
Pose-based Action Recognition, especially for dance genre classification (e.g., hip-hop, waacking, etc.).

I believe this shift from model scaling to skeleton-based temporal modeling will give me practical insight into human-centric vision, especially for motion and genre classification.


✅ TL;DR

📍 Wrapped up my EfficientNet scaling experiments (base vs depth vs width vs compound)
📍 Confirmed compound scaling performs best (even on CIFAR-10)
📍 Ready to explore new tasks: Pose-based Action Recognition using keypoints
📍 Next paper: Action Recognition using Pose Estimation (2019)

Stay tuned for pose modeling, temporal sequence classification, and experiments with dance video data!