Summary: All data is stored and analyzed locally - the user is responsible for the entire data life cycle. Using the C drive (or “boot drive”) for short term storage, an SSD or network drive for medium term storage, and a mass storage device for archiving will ensure that the data is properly managed for analysis and re-analysis, allow the IT protocols and security requirements for sensitive data of the site to be followed, and limit the likelihood of data loss.
The data generated from a markerless motion capture system can be daunting. The videos are large, they contain identifiable subject information, and the raw data needs to be preserved for reprocessing. As a result, the best practices for data storage that apply to other motion capture modalities are not applicable. Over the last five years, we have come up with our own best practices that help us manage our vast data sets, and our hope is that we can share some of this information so you can more easily manage yours.
Theia3D is a local application: All analysis is performed on site - the data is not sent to us for processing and storage. The consequence of this is that the user is responsible for all aspects of the data life cycle - including recording, security, and archiving. Depending on the application and sensitivity of the data, these processes can vary from site to site. That being said, we consider all video data sensitive, and the remainder of this blog assumes that this is the case for you as well!
If you are collecting a lot of data like the Kevin P Granata Biomechanics Lab, data management and storage will be a big concern.
Recording: Keep in mind that Theia3D does not actually record video data - this is typically handled by camera manufacturer’s data recording software. We recommend that when the data is initially recorded, it’s saved directly to the C drive of the machine. This is important because it generally helps save videos faster, in a location specific to the user. Saving to a network drive can result in corrupted videos and is not considered best practice. If multiple users have access to the machine, the permissions for that particular user apply - they can only save/view their own data. Furthermore, during any collection there are additional trials that were collected that are likely not usable. Having the data stored locally gives you an opportunity to do so. Despite the size of the videos being recorded, the C drive of a modern machine should be more than capable of managing these data sizes. Please note that the computer itself should be set up based on access, data management and other organization-specific IT policies. This will ensure your team is compliant with overarching REB and PHI rules and regulations.
Data Cleaning and Review: At this point, you have recorded data for one day, with some calibration files and motion trials for one or more subjects. The data names are consistent and in a state where they can be automatically formatted for consumption (see our post on good practices for data organization). Now we recommend that you run the calibration, and if successful, batch analyze all of the data from that day (another recent blog post summarizes best practices). It’s important to do this as you go because depending on the machine, processing video data can be time consuming. However, since the processing is automatic and doesn’t require much input, setting up the processing pipeline and getting it started should take less than 15 minutes. We strongly recommend that you analyze data as you go - collecting a huge amount of data, waiting for the end of the study to analyze it only to find that there were some mistakes in the collection protocol is very time consuming and completely avoidable. Once processed, run your analysis scripts and review your output variables. If you haven’t built any analysis scripts - you should do that now. This serves two purposes. First, it ensures that the study is well organized and designed, specifying the trials and the outcome variables prior to data collection, which for us is best practice and good science. Second, this provides the opportunity to review a smaller subset of the data. If there are issues here, correcting them does not take a lot of time, and unrecoverable errors (such as a bumped camera) can be considered in subsequent data collections.
Saving data locally will allow you to batch process video files for a quick QA check after your initial data collection.
Post Analysis: Once you have your final analyzed data (typically C3d files) and formatted data (videos + calibration), the data should be moved to medium term storage while the study continues. This location could be another SSD on that machine, a network drive administered by IT, or in some cases, it can remain on the C drive. We typically recommend a location different from the C drive because depending on how many users are recording on that machine, the drive can fill up quickly. Please consult directly with your IT team and lab policies to remain compliant with any data storage best practices here.
One important aspect of this medium term storage solution is that it should have fast transfer speeds to the local machine. External SSD drives are inexpensive and serve this purpose well. We make two copies of the data here just in case there is a drive failure. If you have had a change of heart regarding some analysis preferences, the fast transfer speed allows you to fully re-analyze the data after the fact without long transfer times. If external SSD are not available or permissible at your organization, using a network drive with considerations for data access and subject sensitivity is the next best option. Security here is important - make sure that the selected medium considers access, data sensitivity, and adheres to the IT policies regarding these factors of the organization.
Cameras | Trial Length | Image Resolution | Frame Rate | Estimated File Size |
8 | 10 sec | 1080p | 60 Hz | 540 Megabytes |
As you can see, even a short ten second data collection can generate almost half a gigabyte of data. Managing, storing and maintaining these data are critically important!
Post Study Archiving: Move the data from the medium term storage to the long term storage, preserving the raw and analyzed data of the study in an archive. The location of the mass storage, its setup, as well as its security should be considered based on the individual IT requirements of the organization and the sensitivity of the data. One unique feature of Theia3D is that all data recorded is backwards compatible. As we evolve our algorithms, add features, and improve tracking, it’s possible and highly recommended to re-analyze all the data. Since this occurs infrequently, having a mass storage device with a lot of space and slower transfer speeds will be very useful. Though fast transfer speeds are always better, in this case, the size of the storage is the priority over transfer speeds. In addition to being able to re-analyze the data, it provides a common location for raw data even after the experimenter has left. Future studies that require normal data for instance, may be able to use this archived data instead of re-collecting it, which can save an enormous amount of time.
We have evolved these guidelines based on years of experience and honestly, a lot of trial and error. Adhering to these principles will really help in ensuring that a study is performed smoothly, data can be reprocessed, no sensitive data is compromised, and no data is lost, whether from drive failure or staff turnover. Ultimately, we hope that following these guidelines will allow you to get the most of your data and your markerless system.
To learn more about Theia, click here to book a demo.
Comments