Drug Discovery ML Pipelines Explained (explained with Coronavirus example)

Common Pitfalls of ML people going into Bio (and how to avoid them)

UPDATE : At the request of a client, the Github repo detailing the full backend for a drug discovery pipeline has been set to private, with access being granted to only specific individuals. The outline will still be available publically.

The Environment

Final Quality Control

As with any drug discovery machine learning pipeline, it’s always important to double check that the top hits make sense. This would usually be done by checking whether the molecular structure for the compound in question fits into the binding pocket of the molecule of interest. One would use a tool like Mol* (pronounced mol-star) to do that.

Bio-specific libraries

In the bioinformatics/cheminformatics space, there are plenty of libraries and tools out there that might be new to you.

Albumentations - If you’re doing anything with microscopy or radiography, you should definitely make sure to include the albumentations library in your setup. This was a library built to contain many of the common image augmentations used by Kaggle-competition-winning teams.

DeepChem -

Running your First Experiments

Forking and Contributing


Cited as:

@article{mcateer2020ddmlcovid,
    title = "Drug Discovery ML Pipelines Explained (explained with Coronavirus example)",
    author = "McAteer, Matthew",
    journal = "matthewmcateer.me",
    year = "2020",
    url = "https://matthewmcateer.me/blog/drug-discovery-ml-covid/"
}

If you notice mistakes and errors in this post, don’t hesitate to contact me at [contact at matthewmcateer dot me] and I will be very happy to correct them right away! Alternatily, you can follow me on Twitter and reach out to me there.

See you in the next post 😄

I write about AI, Biotech, and a bunch of other topics. Subscribe to get new posts by email!


This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

At least this isn't a full-screen popup

That'd be more annoying. Anyways, subscribe to my newsletter to get new posts by email! I write about AI, Biotech, and a bunch of other topics.


This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.