Today, my food image classifier reached its highest accuracy yet: 59%.
This wasnβt just a technical achievement β it was also a reminder that how you collect data matters as much as how you train a model.
After hitting 56% with 100 images per class, I decided to push the data frontier further:
"food"
to more specific ones like "fruit apple"
or "vegetables carrot"
But I also stopped to think:
Is it okay to use all these Google Images?
While this dataset is entirely self-collected using Selenium-based crawlers, I made sure to respect copyright concerns:
None of the image files will be uploaded to GitHub.
The dataset is for non-commercial, academic use only.
I explicitly note in the README that the copyright of all images belongs to their original owners.
You can reproduce the dataset using the script I provide β and thatβs as far as Iβll go in βsharingβ the data.
kiwi
, banana
, pineapple
β F1-scores above 0.84capsicum
, apple
still struggle β next focus for improvementThe model still uses MobileNetV2, no fine-tuning yet.
But itβs learning. Clearly.
base_model.trainable = True
) to push past 60%EfficientNetB0
and compareGrad-CAM
or Streamlit
demo to make this interactiveBetter data still beats better models β
but better ethical data is what lasts.
This project helped me grow as both a practitioner and a responsible engineer.
The next 1% will be harder β but itβll be built on the solid foundation I now have.