Each frame ( I(t) ) is processed to estimate 2D poses ( \hat{p}^i(t) ) using OpenPose.
Bounding boxes ( B^i(t) = (x, y, w, l) ) are used to track each dancer using the LDES tracker, maintaining per-person histograms and motion info.
When tracking fails (e.g., due to occlusion), the algorithm detects overlap by checking directional changes in movement.
It then predicts where the overlap will end, and re-assigns the correct dancer by comparing appearance histograms in the predicted frame.
After overlap ends, multiple poses might be present in the bounding box.
The pose most similar to the previous frameβs histogram is selected to maintain temporal consistency.
Detailed markdown summary:
π github.com/hojjang98/Paper-Review
This section gave a clear overview of how the authors handle the multi-dancer tracking problem, which is critical for dance recognition.
The idea of using histogram-based re-identification after occlusion feels both lightweight and practical.
I still want to understand the LDES tracker in more detailβwill check citation [16] later for its internals.