📚 https://doi.org/10.1038/s41598-024-83475-4
✅ Day 2 – Literature & Method Review: Foundations Behind the Framework
Today’s reading focused on the foundation of the proposed scoring framework —
including how previous motion analysis methods compare, and how this paper constructs a full pipeline from skeleton data to explainable output.
📚 What I Read – From Motion Capture to Alignment Logic
📌 Key Literature Threads
- Motion Capture Types:
- Wearable: Infrared (lab-accurate), IMU (portable but sparse)
- Markerless: RGB-D (e.g., Kinect), 2D/3D Pose Estimation (OpenPose, HRNet, BlazePose)
- Evaluation Approaches:
- Rule-based: Hard thresholds from experts (simple but not generalizable)
- Similarity metrics: DTW, cosine, Euclidean (scoring but not interpretable)
- Model-based: Regressors/classifiers for learning performance → higher accuracy, but often black-box
⚙️ Method Overview
- MediaPipe is used to extract 3D skeleton coordinates (33 joints)
- 18 angles are computed from body parts (shoulder, elbow, hip, knee, torso)
- Each angle is calculated as:
\[A_i = \arccos \left( \frac{\vec{v_1} \cdot \vec{v_2}}{|\vec{v_1}||\vec{v_2}|} \right) \cdot \frac{180}{\pi}\]
🔹 Feature Alignment
- Uses Dynamic Time Warping (DTW) to align motions temporally
- A 32-frame template is created from the top-performing sequence
- All others are aligned to this reference → compensates for speed/rhythm differences
🔹 Regression & Ensemble
- Base models: Linear, Lasso, SVM, KNN, Decision Tree, Random Forest, Bagging
- Uses adaptive weighting → models with lower RMSE get higher weights
- Final score = weighted average of all predictions
🔹 Explainability (SHAP)
- Global SHAP: shows which features consistently influence scores
- Local SHAP: explains why a specific motion got its score
- Makes feedback joint-specific and interpretable
💡 Why This Matters
This section gave me insight into:
- Why alignment isn’t just a preprocessing step — it’s the backbone of fair comparison
- How classic models (DT, SVM) can be repurposed for scoring if aligned features are strong
- The value of temporal normalization in movement analysis
- How interpretability doesn’t require deep nets — just good features and clear scoring logic
A solid skeleton feature design + fair alignment + transparent scoring = usable real-world evaluation system.
🛠️ What I’ll Implement Next
- Define 2D skeleton angles using MediaPipe keypoints
- Build a 32-frame DTW-based alignment system
- Train multiple regressors (e.g., Ridge, KNN, RF)
- Weight model outputs by validation RMSE
- Try SHAP (or cosine error maps) to highlight important joints
🔭 What’s Coming on Day 3
- Evaluate aligned vs. non-aligned features
- Compare model-based vs. similarity-based scoring
- Try visualizing SHAP impact over sequence heatmaps
- See if scoring outperforms average human inter-rater agreement
📝 Reflection
Today’s section helped ground the project:
alignment and angle feature design may actually matter more than model choice.
Also, I realized that explainability doesn’t have to wait until the end — it can shape how we choose and evaluate input features from the start.
The idea of using scoring models not just for labels, but for feedback is now clearer than ever.