To reduce computation, MobileNetV2 employs depthwise separable convolutions, splitting standard convolution into:
This reduces computation by a factor of 8β9Γ (with 3Γ3
kernels) while maintaining accuracy, making it ideal for mobile devices.
MobileNetV2 removes the ReLU activation from the final 1Γ1 projection layer in the bottleneck block.
Why? Because ReLU can destroy information in low-dimensional spaces by zeroing out values.
By keeping this layer linear, the architecture preserves more information while still introducing non-linearity earlier in the block.
Unlike traditional residual blocks (e.g., ResNet) that skip across wide layers, MobileNetV2 connects compressed bottlenecks.
This inverted design:
π Residual connections are only applied if:
stride = 1
, andEach MobileNetV2 block follows this sequence:
Input β 1Γ1 Conv (Expansion, ReLU6)
β 3Γ3 Depthwise Conv (Stride s, ReLU6)
β 1Γ1 Conv (Projection, Linear)
β + Residual connection (if stride = 1 and input/output dims match)
t = 6
(commonly)This structure is lightweight, modular, and extremely efficient for mobile environments.
Detailed markdown summary:
π github.com/hojjang98/Paper-Review
The architecture design feels even more elegant now that Iβve broken down its components.
The use of linear layers and depthwise convolutions shows how theory and engineering blend together.
Next, I plan to look at experiments, ablation results, and comparisons with MobileNetV1 and ShuffleNet.