Today marks the official completion of my Titanic survival prediction project — my very first full-cycle machine learning portfolio piece!

From basic EDA to advanced ensemble models, I went through every stage: feature engineering, model tuning, validation, and multiple Kaggle submissions. After 8 different submissions, tons of experiments, and hours of learning, I’m happy to close this one out 🎉

✅ Final Outcomes

Built and compared models: RandomForest, XGBoost, GradientBoosting, CatBoost
Created ensemble models:
- Soft Voting (RF + GB + XGB)
- Stacking (meta: Logistic Regression)
Tuned hyperparameters with GridSearchCV and Optuna
Applied feature selection (SelectFromModel) and analyzed performance impact
Best local validation accuracy: 0.8146
Best public Kaggle score: 0.77990

📦 Repo & Results

📁 Project Repository: hojjang98/titanic-survival-prediction
✅ Cleaned-up README
📝 Submission log with all 8 attempts
📊 Visualizations saved to figures/

🧠 What I Took Away

Validation score ≠ Kaggle score → always compare broadly
Ensemble methods are powerful but must be tuned carefully
Logging every experiment helped me stay on track and learn from failures
Reproducibility matters — especially when projects get bigger!

🚀 Onward!

The Titanic project gave me a strong foundation in real-world ML workflow.
Now it’s time to take these skills to a new domain — possibly healthcare, time series, or tabular competitions on Kaggle.

Titanic may be done, but the journey continues 🌊