When machine vision and deep learning started to become popular, there was this interesting yoyoing that occurred regarding model sizes. At first, the new literature started to produce bigger and bigger models as everyone raced for the most accurate solution. Then, when it started to commercialize and accuracy became strong, there was this huge impetus to make models smaller and smaller. Probably so that they could be run on phones and edge devices. So, these lightning fast minimalistic models were produced that achieved 90% of the solution of the models hundreds of times their size. Now of course, like a yoyo, the opposite is currently happening, as some natural language processing models are being researched with an unprecedented number of parameters. But, which to choose? the most accurate? Or, the ones that can run on my phone (or both?)?
At Theia, we are faced with an identical dilemma. Most sites preach that accuracy is by far the most important factor; however, there is always a tradeoff. In order for them to run the most accurate model, they need pretty substantial computer hardware, and they have to wait quite a bit longer for the analysis to occur. Even in instances where there is no question that accuracy is critical (like in clinical biomechanics), there is still demand for faster analyses!
Like any ML research company, we have examined the effect of model size on the accuracy of speed of our solution. In general, this is achieved by reducing or increasing the number of parameters in the model. Although this isn’t a huge priority for us (because accuracy is), like in other ML settings, there is a time and place for faster models. For instance, think about it from the perspective of a child coming into a clinic to do an assessment. Even if the lightweight version is only used for visualization (i.e., seeing the child’s skeleton), showing that to the child or the parent without making them wait could drastically improve the experience of coming to the clinic. Although not having to apply any instrumentation changes the experience in general (kid’s not having to wear spandex etc..), this is something that we really need to think about in order to have widespread adoption of movement analysis.
Let’s dive into model performance to take a close look. The first model is our standard large joint detector, and the second is a pruned lightweight version. As you can see by the video below, the pruned version analyzes way faster (approximately 3X). Also, when you look at the output of the skeletons, I believe that they both pass face validity tests.
(Speed comparison, large versus small)
(Eye test, large versus small)
(Joint angles, small versus large)
When we examine the data a little bit, the results are not so convincing. The first metric of data quality that we look at is an internal measure called confidence. This measure describes how closely our joints triangulate from the different camera perspectives. The confidence in the lightweight model (for basic movements) is between 15 and 40% lower than the regular production version, so we are seeing significant degradation of accuracy. When we look at joint angles, though effects aren’t as compelling, we still see differences. When we examine the knee angles, the flexion angles are basically identical. However, we see some larger differences in internal rotation and adduction, with the smaller model showing more erratic and noisy data, while the larger model is showing highly repeatable, and honestly, more believable signals. Though I haven’t looked into the entire cause of this, my educated guess would be that we require high granularity and accuracy for measuring off axis angles. Errors of a few pixels (which is very possible with the smaller models) could easily add up when we are looking at secondary and tertiary joint angles.
Now do we meet in the middle somewhere and make a model that is a bit faster and a bit less accurate, while testing against the larger model as the gold standard? Our answer to that is no! It really comes down to what our goals are and what type of company we want to be. Because our focus is on biomechanics, where signal quality and accuracy are critical, we go with the slower model and have people wait. Now, what about the kid who wants to see the skeleton and only has a minute? Well, our solution (which isn’t out yet) is to release both models, and allow the experimenter to use the lighter-weight version for visualizations where time is critical, and reanalyze after the fact with the more accurate version. In order to go real-time, there is a compromise on biomechanics, which isn’t something we are prepared to do.
Commentaires