Model Workflow
Collecting the data

Collected 40 videos per action, with each video being 30 frames long. Capturing 33 pose landmarks, each landmark has a x, y, z coordinate along with a visibility score. As well as the 42 hand landmarks for the left and right hands, each with an x, y, and z coordinate. Which was done using MediaPipe for the landmarks, and OpenCV to handle the computer vision.

Currently have 26 actions collected, with plans to add more in the future.

Process the data

I extracted the hand landmarks as they were not neeeded for the model, but I still collected them in case I wanted to add more detail to the model latter. I saved the pose only landmarks into a new folder, so that I knew which folder the model needed to pull the data from.

For the pose landmarks, I calculated the mid-hip normalization and the shoulder-width scale to be able to standardize the data regardless of the person's size or distance from the camera. This part really helped to make the model more accurate in classifying the actions.

Training the model

Here is some of the output from training the model.



Testing the model

I tested each action that the model is able to classify to check how well it was detecting and classifying them If the model was not preforming adequatly then I would retrain it to try and get a better model accuracy. And when I wanted to add more actions for the model to detect I just started over and went back through the workflow.