Making Data FAIR with Lab Automation
Intro
Quality data is crucial for drug discovery, regardless of whether AI is involved. It creates a positive feedback loop, enhancing the design and synthesis of new molecules. However, a major challenge in AI drug discovery is the limited availability and quality of public data for training ML algorithms making that positive feedback loop a negative one. This issue also arises on a smaller scale when using internal data for molecule synthesis. Automation can help mitigate this problem by adhering to the principles of FAIR data, making it findable, accessible, interoperable, and reusable.
Where’s my data?
To design molecules efficiently, you need to find data easily. Cluttered email chains, unformatted excel files, and random prism files take time to decipher. An automated workflow can:
Automatically upload data to the correct folder from your instrument
Link metadata to the relevant folder or molecule with plate barcoding
Enable easy search-ability with an integrated LIMS system
Standardize data into a readable format
Unlock data for others to see
Data isn't useful if it's not accessible. Data can stay locked away on a scientist’s computer, undecipherable by any one else. By nature of being findable, data becomes more accessible, so there is overlap between find-ability
and accessibility. Automation can improve accessibility by:
Linking raw data to plate layouts and sample lists
Analyzing data automatically
Linking raw data, metadata, and analyzed data to protocols or entities being tested
The tool I’ve used the most for findable and accessible data is CDD Vault, I highly recommend requesting a demo if any of this resonates with you.
How to get repeatable results
A 2016 Nature survey found that 70% of scientists struggled to reproduce published results. Automation can enhance repeatability within your organization through:
Robotic liquid handling for reduced variability
Accessible protocols for consistent execution
LIMS systems for tracking reagent use
Instrument process data for monitoring environmental variables
Reduce experiment reruns, reuse and recycle good data
High-quality data is valuable, both internally and externally. Data in the context of capitalism means proprietary data is generated by private companies. While there are efforts like the Human Immunome Project to build accessible datasets, data that is publicly available is lacking. Consider the following:
Reuse internal data to save time and resources for internal projects
Share high-quality data for the greater good, preferably for free
Explore data licensing as a revenue stream if free isn’t an option
Conclusion
Data quality directly impacts scientific conclusions. Automation enables the production of FAIR data, which is findable, accessible, interoperable, and reusable. By prioritizing FAIR data principles, we can accelerate scientific progress.