Today marks the official completion of my Titanic survival prediction project β my very first full-cycle machine learning portfolio piece!
From basic EDA to advanced ensemble models, I went through every stage: feature engineering, model tuning, validation, and multiple Kaggle submissions. After 8 different submissions, tons of experiments, and hours of learning, Iβm happy to close this one out π
β
Final Outcomes
- Built and compared models: RandomForest, XGBoost, GradientBoosting, CatBoost
- Created ensemble models:
- Soft Voting (RF + GB + XGB)
- Stacking (meta: Logistic Regression)
- Tuned hyperparameters with GridSearchCV and Optuna
- Applied feature selection (SelectFromModel) and analyzed performance impact
- Best local validation accuracy: 0.8146
- Best public Kaggle score: 0.77990
π¦ Repo & Results
π§ What I Took Away
- Validation score β Kaggle score β always compare broadly
- Ensemble methods are powerful but must be tuned carefully
- Logging every experiment helped me stay on track and learn from failures
- Reproducibility matters β especially when projects get bigger!
π Onward!
The Titanic project gave me a strong foundation in real-world ML workflow.
Now itβs time to take these skills to a new domain β possibly healthcare, time series, or tabular competitions on Kaggle.
Titanic may be done, but the journey continues π