Today, my food image classifier reached its highest accuracy yet: 59%.
This wasn’t just a technical achievement — it was also a reminder that how you collect data matters as much as how you train a model.

🍇 What Changed?

After hitting 56% with 100 images per class, I decided to push the data frontier further:

📈 Crawled up to 200 images per class
🔍 Switched search keywords from "food" to more specific ones like "fruit apple" or "vegetables carrot"
🚫 Skipped vague queries to reduce noise and improve dataset clarity
✅ Refined image filtering: corrupted files and low-res images removed

But I also stopped to think:

Is it okay to use all these Google Images?

⚖️ Data Ethics – Thinking Beyond Accuracy

While this dataset is entirely self-collected using Selenium-based crawlers, I made sure to respect copyright concerns:

None of the image files will be uploaded to GitHub.
The dataset is for non-commercial, academic use only.
I explicitly note in the README that the copyright of all images belongs to their original owners.

You can reproduce the dataset using the script I provide — and that’s as far as I’ll go in “sharing” the data.

📊 Results

Accuracy: 59%
Macro F1-score: 0.58
Some standout classes:
- kiwi, banana, pineapple — F1-scores above 0.84
- capsicum, apple still struggle → next focus for improvement

The model still uses MobileNetV2, no fine-tuning yet.
But it’s learning. Clearly.

🚀 Next Steps

🔁 Enable fine-tuning (base_model.trainable = True) to push past 60%
🧠 Swap MobileNet for EfficientNetB0 and compare
👀 Visualize misclassified images to catch confusing patterns
🎨 Maybe try Grad-CAM or Streamlit demo to make this interactive
📂 Clean up GitHub & write a more detailed README

💡 Final Reflection

Better data still beats better models —
but better ethical data is what lasts.

This project helped me grow as both a practitioner and a responsible engineer.
The next 1% will be harder — but it’ll be built on the solid foundation I now have.