π§ Daily Study Log [2025-06-29]
Today was fully dedicated to optimizing ensemble strategies for the SCU_Competition.
I experimented with weighted voting, seed diversity, and OOF-based stacking to push the modelβs generalization performance.
π SCU_Competition β Submission 26β30
Focus: Enhancing AUC score with tuned ensemble strategies and classic feature refinement
Model: LGBMClassifier
, soft voting ensemble, OOF-based stacking
Strategies Applied:
- Submission 26: Weighted soft voting (model_11: model_9 = 7:3)
- Submission 27: VotingClassifier with seed-tuned model_11/9/21 (weights 5:3:2)
- Submission 28: OOF-based StackingClassifier (meta model: LGBM)
- Submission 29: Clean model with only essential features (classic structure)
- Submission 30: Engineered interaction features based on top variables
Key Observations:
- Submission 30 achieved Kaggle AUC 0.8968, the best so far
- Overfitting in submission 28 (CV AUC 0.9995 vs. Kaggle AUC 0.8363) proved OOF stacking was risky
- Simpler structures with meaningful features continued to perform more reliably
β
Takeaways
- Ensemble synergy depends on model diversity, not quantity
- High local CV scores can mislead β real test is on unseen (Kaggle) data
- Strategic feature engineering often outperforms complex modeling tricks
π― Next Steps
- Submission 31: Combine classic model with filtered feature interactions
- Revisit meta-feature stacking with regularization and robust validation
- Log feature importances and start SHAP-based interpretation
β
TL;DR
π Voting: Weighted ensemble of well-tuned models gave stable results
π Stacking: OOF-based meta-model overfitted β needs better regularization
π Classic: Clean features + smart interaction boosted performance (best AUC 0.8968)