Life in the Singularity

Life in the Singularity

Predicting Medicine into Existence - Part III

Matt McDonagh's avatar
Matt McDonagh
Jul 05, 2024
∙ Paid

As mentioned in Part I and Part II our current mission is to use molecular data to predict potential compounds. We’re doing this to predict medicines.

Last time out we got our prediction pipeline flowing and submitted the first official competition submission.

Now we are optimizing.

Here's a quick overview of what we have working today:

  1. Data Input: SMILES strings representing molecular structures.

  2. Feature Transformation

    • Graph Representation: SMILES are converted into molecular graphs (nodes = atoms, edges = bonds) with node features (atomic number, mass, etc.). These are fed to the Graph Attention Network (GAT).

    • Fingerprint Representation: SMILES are converted into fixed-length molecular fingerprints. These are fed to the Gated Recurrent Unit (GRU).

  3. Model Ensemble: A GAT and GRU model predict binding affinity (binary classification), and their outputs are averaged.

  4. Cross-Validation: The models are evaluated using 5-fold cross-validation.

  5. Generate Test Predictions

The system is working, and I’ve …

User's avatar

Continue reading this post for free, courtesy of Matt McDonagh.

Or purchase a paid subscription.
© 2026 Matt McDonagh · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture