🧠 Daily Study Log [2025-09-23]
Submitted an assignment, explored a new healthcare idea, studied the Vision Transformer paper (Day 1), and advanced the dance-beat alignment pipeline.

📄 Assignment Submission

Submitted a business plan project.
Fortunately, the assignment included an “idea” section → reused one of the prepared ideas.

💡 Idea

Proposed Smoke-Free Journey.
Concept: Enhance smoking cessation by embedding supportive interventions into daily travel routines.

📖 Paper Study – Vision Transformer (ViT)

Paper: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ICLR 2021)
Day 1 – Abstract & Introduction (Summary)
- CNNs have dominated CV for a decade but suffer from strong inductive biases and limited scalability.
- Transformers in NLP scale efficiently and achieve state-of-the-art results.
- Core question: Can the same Transformer be applied directly to vision, without convolutions?
- Core idea: Treat an image as a sequence of flattened patches, feed them into a standard Transformer with a [CLS] token and positional embeddings.
- Contributions:
  1. Introduced Vision Transformer (ViT), the first pure Transformer for large-scale vision.
  2. Showed ViT rivals or surpasses CNNs when trained on huge datasets (e.g., JFT-300M).
  3. Confirmed that scaling laws from NLP also apply in vision.

🎬 Paper Study – Dance Beat Alignment

Built and tested a skeleton + beat alignment pipeline.
Methods implemented:
- Extracted skeletons from both real and generated videos using MediaPipe Pose.
- Used librosa to detect tempo/beat times → overlayed on skeleton trajectories.
- Implemented BPM sweep (50–150 BPM) → generated error curves → identified optimal BPM (minimum error point).
- Defined alignment score based on mean/std error.
- Validated metric by comparing original vs tempo-shifted music (128 BPM).
Progress Today:
- Pipeline fully established (skeleton extraction + beat overlay + BPM sweep).
- Error metric defined and partially validated.
Next Steps:
- Apply BPM sweep to multiple generated dance videos.
- Compare “on-beat” vs “off-beat” generated videos directly.
- Scale up to ~100 generated videos for benchmark construction.

✅ TL;DR

📍 Submitted business plan assignment using a prepared idea
📍 New idea: Smoke-Free Journey for healthcare + daily life
📍 ViT study (Day 1): Patch-based Transformer rivaling CNNs
📍 Dance-beat alignment pipeline completed → ready for large-scale testing