📚 https://arxiv.org/abs/2403.15377
🏆 Published in CVPR 2024 (Best Paper Honorable Mention)

📄 InternVideo2 – Scaling Video Foundation Models for Multimodal Understanding

✨ Key Contributions


🎯 Problem Definition


🧠 Method / Architecture

  1. Progressive Multimodal Training
  2. Model Design

🧪 Experiments & Results


🚫 Limitations


🔭 Future Ideas


🔁 Personal Reflections