Dec 20, 20235 min read

Algorithm Development and Avoiding Bias

Summary: Avoiding bias in algorithm development and training is a top priority. We consider data, architecture, training procedure as well as testing as critical components, and continually expand on how we test and retest this process to ensure it improves over time. Our goal is to provide a system that can accurately and reliably measure human movement, and these, among other processes, are essential in recognizing and eliminating factors that can negatively impact signal quality.

Avoiding bias in machine learning is a contentious and important issue. Regardless of the application, special care needs to be put into sourcing data and training algorithms to avoid bias that affect groups differently. The goal of this blog is to provide a bit of background on how we do this - and that we take this seriously. Unfortunately because of the proprietary nature of our algorithms and data - a comprehensive review won’t be presented here. Rather, a more general overview is summarized.

To give a bit of context to deep learning algorithms - I consider them to be completely indifferent. What I mean by this is that they effectively learn what you tell them to learn. There are a lot of details to this of course, such as the input data, how the data is manipulated, how the algorithms are developed, how the algorithms are trained, and of course, how they are tested and used. For example, given an algorithm that is supposed to identify the location of fruits in an image - if we only train it on images of an apple, it will probably be pretty good at finding apples! However, when presented with an image that contains a pear, it will probably mis-categorize the pear as an apple. In this extremely basic but real example, it doesn’t mean that the algorithm prefers apples to pears (and to be honest - both fruits are good to me!), but that it has learned what we have told it to learn, and in this case, it has mis-understood its objective, and the consequence is an error. Though this example is really basic, this type of issue occurs widely in deep learning and avoiding it takes some special attention and care (and oddly - it doesn’t require us to teach it about pears to know an apple is not a pear…). For motion capture, this means considering the entire algorithm and data stack in the context of human motion measurement.

For starters, the input is assessed critically. In our application, since we consume images, this means the images need to be varied and diverse. The diversity has a few components - the poses the people are in, the environment they are being captured in, the clothing they are wearing, as well as their appearance. To put it lightly, this has always been a top priority for us over the years. We constantly examine our data for variety considering these factors, and we have put a huge amount of effort into expanding our datasets to be even more diverse. This has required the addition of millions of images to better capture challenging poses and different conditions that may introduce bias in how our pose algorithm operates. For us, this means that no image is off the table - from children at a ski lodge to mud wrestling - we prioritize finding challenging images with variability in the underlying pose. Despite being resource intensive, this has allowed our system to operate more optimally in foreign environments.

Another factor here are the algorithms which include their underlying architecture, how they are trained, and what they produce. Without getting into too much detail, we begin with algorithms that are very big, which enables learning the complexity of our data set. Smaller (and sadly - faster) algorithms can generalize well, but they struggle to learn the subtleties that are present within large data sets. In order for us to capture and use the information that we are carefully curating within our expansive sets - we need algorithms that have the capacity to learn these features. The larger algorithms do take longer to train and are slower to run analysis, however we deeply value data integrity and accuracy and would not compromise these values for an increase in speed (for more information on the ‘speed vs. accuracy tradeoff’ - visit our previous blog post here). With respect to algorithm training, we employ best practices in data training and are always up to date with the latest training techniques from literature. This is important because algorithms can become overtrained - meaning they specialize to the data set you are teaching it (remember our apple pear example) and perform poorly when presented with an image that isn’t similar to the data set. Furthermore, we continually expand what our algorithms produce in terms of number of keypoints. When we started this long journey, we tracked 21 keypoints on the body - now, fast forward to our current offering which tracks 120+ keypoints. This allows us to capture granular details of motion that can’t be measured with a more sparse data set or lighter algorithms- avoiding bias in the measurement as a result of a model that isn’t sufficiently complex to represent the motion at hand. Finally, we employ techniques in machine learning training that further our data diversity. These techniques involve augmenting images to produce more images - this includes distortion, rotations and color changes, to name a few. The consequence of this is a greatly expanded data set that furthers the variety of images that the algorithms are trained on - and consequently the performance of these algorithms on other data sets.

Once the algorithms are trained, the final and critical portion of our process is exhaustive testing. This testing begins with machine learning evaluation on images that have not been seen in the training data. These include diverse images from different environments, motions, and appearances. Here we are looking purely at how our algorithms are performing in a more isolated (static images) setting compared to how they perform in a multi-camera biomechanics setup. This information provides a starting point to determine how successful the training procedure was on data it has not seen. Following this assessment, we re-analyze our validation work - comparing the new algorithms to previous versions of the algorithms and to marker based data. This provides a litmus test compared to the accepted gold standard for a variety of motions. Finally, we analyze a significant number of trials, comparing the pose output from our newly trained model to our previous model. This allows us to test in many environments, for more diverse motions than those that are contained within the validation data sets. Though this testing procedure is complex and significant, it has previously flagged issues that require corrections, and is critical to ensure that our process can accurately and repeatedly measure variable motions.

Sorry for the length of the read! For us, this is so important that it’s difficult not to expand and provide details on how we assess our technology. We firmly believe that this type of care translates into a system that performs reliably and accurately, and allows researchers to unlock human movement measurements in a trusted way.

To learn more about Theia, click here to book a demo.

Algorithm Development and Avoiding Bias

Recent Posts

Join our mailing list