đź§ Daily Study Log [2025-10-09]
Developed a new idea, continued paper study on InternVideo2, advanced the pose-based dance evaluation research project, and refined resumes for internship applications.
đź’ˇ Idea
📖 Paper Study – InternVideo2 (CVPR 2024, Day 2)
- Reviewed InternVideo2: Scaling Video Foundation Models for Multimodal Understanding.
- Key Takeaways:
- Unified spatiotemporal–multimodal architecture combining video, text, and audio.
- Two main modules:
- Cross-Modal Fusion Module (CMFM): Enables adaptive information exchange.
- Hierarchical Temporal Encoder (HTE): Efficiently captures long-term motion.
- Trains with progressive multitask pretraining — reconstruction, alignment, and generation.
- Findings:
- Improved temporal reasoning, cross-modal alignment, and video–language generalization.
- Still requires large-scale data and compute.
- Future Direction:
- Explore adapter-based fine-tuning and lightweight multimodal extensions.
🧠Research Progress – Pose-Based Dance Evaluation
- Planned next actions for the dance evaluation study:
- Collect additional motion data for upcoming experiments.
- Search potential academic conferences for submission.
- Research dance generation models for comparative analysis.
- Additional Ideas:
- Extract optimal BPM and compare with music for alignment accuracy.
- Measure BPM error by body part to analyze rhythmic precision.
- Include expert qualitative feedback to identify key skeletal points.
- Implemented a new code module to test these ideas.
📝 Resume Update
- Updated and refined both Korean and English resumes.
- Improved structure, flow, and expression.
- Added recent projects, competition results, and research activities.
âś… TL;DR
📍 Proposed “Cup Sense” – a playful self-tracking hygiene idea
📍 Studied InternVideo2 (Day 2) – unified multimodal video foundation model
📍 Advanced dance evaluation research with new BPM-based analysis
📍 Polished bilingual resume for internship applications