Starting to collect data for training the models, intially started focusing on three main actions: high block, down block and reverse punch. Each action had 20 videos with 30 frames per video. The model was not very accurate with the small sample of data along with no transition actions, as two of the actions had the same starting position.
Added more actions to the data, including transition actions such as ready position and sparring position. It helped to improve the accuracy of the model as it could now differentiate between actions better. It had a better accuracy of around 60% on average. Also tried a different ratio for the training data and the testing data, with 80% of the data for training and 20% for testing. Instead of the previous 70-30 ratio. The MediaPipe Holisitic model was used along with the Left and Right hand landmarks on top of the pose landmarks.
Added more actions to the data, including transition actions such as ready position and sparring position. It helped to improve the accuracy of the model as it could now differentiate between actions better. It had a better accuracy of around 60% on average. Also tried a different ratio for the training data and the testing data, with 80% of the data for training and 20% for testing. Instead of the previous 70-30 ratio. The data no longer includes the left and right hand landmarks, only the pose landmarks from the MediaPipe Holisitic model are used. This is also where sequence matching was implemented to see if the user did the actions in the right order. The model had dense layers added to improve accuracy as well as dropout layers to prevent overfitting.
The model has up to 26 actions including transition actions. The there are 14 distinc actions with two of them being transition actions. The 12 other actions have both left and right variations. The accuracy of the model improved significantly with an average accuracy of with only two of the actions being misclassified part of the time. Those being the backfist and inside outside blocks both left and right variations. This is where I realized that normalizing the data would help improve accuracy, so mid-hip was normalized and shoulder-width scaling was implementd. Which helped improve accuracy even more.
Also the beginning of the feedback system was implemented, where the user would get real-time feedback on their actions based on the model's predictions. However the feedback system is very rudimentary and the order of the actions is not independently checked apart from one sequence match to another. There were three main sequences that were checked for the feedback system but there was only feedback given for one of the sequences.