Using Deep Learning and Python to Cure Diseases
This is a follow-up piece, if you missed Part I dig in here for the baseline:
Using Deep Learning to Invent Medicine
As you will recall from Part I, I am solo attempting one of the hottest Kaggle.com competitions right now. The game is predicting protein bonding affinity… will these molecules form or not?
Drug companies are crowdsourcing different methods (machine learning models mostly) for digesting SMILES data, creating representations of the data, using those to build a prediction machine (a model) and leveraging that digital machine to find the best candidate compounds to develop in real life.
Using math to save lives.
Since the dataset is massive and unbalanced I need to:
work with a much smaller subset of the data to build my v1 pipeline
ensure my tiny sample is representative of the population across 295,000,000 rows
ensure my v1 pipeline can actually scale sufficiently to “see” enough of the training data to develop generalizability
If I build a super powerful machine at predicting …



