Drug Discovery ML Pipelines Explained (explained with Coronavirus example)
Common Pitfalls of ML people going into Bio (and how to avoid them)
UPDATE : At the request of a client, the Github repo detailing the full backend for a drug discovery pipeline has been set to private, with access being granted to only specific individuals. The outline will still be available publically.
The Environment
Final Quality Control
As with any drug discovery machine learning pipeline, it’s always important to double check that the top hits make sense. This would usually be done by checking whether the molecular structure for the compound in question fits into the binding pocket of the molecule of interest. One would use a tool like Mol* (pronounced mol-star) to do that.
Bio-specific libraries
In the bioinformatics/cheminformatics space, there are plenty of libraries and tools out there that might be new to you.
Albumentations - If you’re doing anything with microscopy or radiography, you should definitely make sure to include the albumentations library in your setup. This was a library built to contain many of the common image augmentations used by Kaggle-competition-winning teams.
DeepChem -
Running your First Experiments
Forking and Contributing
Cited as:
@article{mcateer2020ddmlcovid,
title = "Drug Discovery ML Pipelines Explained (explained with Coronavirus example)",
author = "McAteer, Matthew",
journal = "matthewmcateer.me",
year = "2020",
url = "https://matthewmcateer.me/blog/drug-discovery-ml-covid/"
}
If you notice mistakes and errors in this post, don’t hesitate to contact me at [contact at matthewmcateer dot me]
and I will be very happy to correct them right away! Alternatily, you can follow me on Twitter and reach out to me there.
See you in the next post 😄