A common challenge in human movement measurement is occlusions. In writing that, it’s actually a challenge in all visual measurement systems! If it can’t be seen, how can we measure it?
Let’s start with what we mean by occlusions. The definition is to “block off”, but a synonym that really does it justice is conceal. In this context, we are talking about a particular part of the body being concealed. Now, we like to break this down into a few distinct categories.
The first is self occlusion, which occurs in every single volume regardless of how careful you are. In essence, it means that parts of your body are blocking others. As an example, imagine a video that is capturing the sagittal view of the subject. Most of the contralateral side is occluded by the near side, making it hard to see, and even harder to determine where key points are located.
The next type of occlusion is object occlusion. Basically the same as self occlusion, but instead of a body part, it's an object. Think about a person lifting a box from the front, and the center part of the body is fully obscured, or a person sitting at a desk where the legs are not visible.
The final type of occlusions that we encounter is person to person occlusions, where closely interacting individuals occlude each other. This type is far more complex, because it often comes down to key point disentanglement, which is very hard and I haven’t entirely figured it out yet.
So, let’s focus on the first two types and assume that if we are doing multi-person tracking, they aren’t interacting too closely (i.e., hugging or wrestling).
In marker-based modalities, when this occurs, it’s actually pretty straightforward to handle. The marker either drops out, or is tracked by the other cameras that are in the volume. So, an occlusion from one angle may not be a big deal (though it’s more complicated than that because occlusions really can affect tracking). But imagine if each camera could guess where markers are based on the scene and other data. This is effectively the situation that we encounter. Independent of the occlusions we see, we are still guessing where that “marker”, or in our case, salient point, is located. Going back to our sagittal view example mentioned above, we are still guessing the contralateral side of the body.
Effectively, thinking only about the hip for now, if we are collecting with 8 cameras, we have 8 guesses of where that hip is located. So, what do we do? Surely the occluded view isn’t as good as the others, right?
We handle that with math and statistics. In addition to the location of the key point, we also extract the probability of the key point being located. So, for this occluded hip, if its probability is lower (which it will be), it will have to be very similar to the data measured from the other views in order for it to be included in the analysis (in 3D!).
Now what happens if there are occlusions in multiple views, and the points we have inferred are low probability? Well, it will likely result in unreliably tracking! Consider a volume where a person walks up to a wall, and the anterior part of the body is fully occluded in every view. As you can guess, the anterior points will track more poorly than as if they were visible! So, if we can’t see it, though we guess well and use statistics, it’s not magic.
Comentarios