Predicting Medicine into Existence - Part III
As mentioned in Part I and Part II our current mission is to use molecular data to predict potential compounds. We’re doing this to predict medicines.
Last time out we got our prediction pipeline flowing and submitted the first official competition submission.
Now we are optimizing.
Here's a quick overview of what we have working today:
Data Input: SMILES strings representing molecular structures.
Feature Transformation
Graph Representation: SMILES are converted into molecular graphs (nodes = atoms, edges = bonds) with node features (atomic number, mass, etc.). These are fed to the Graph Attention Network (GAT).
Fingerprint Representation: SMILES are converted into fixed-length molecular fingerprints. These are fed to the Gated Recurrent Unit (GRU).
Model Ensemble: A GAT and GRU model predict binding affinity (binary classification), and their outputs are averaged.
Cross-Validation: The models are evaluated using 5-fold cross-validation.
Generate Test Predictions
The system is working, and I’ve …


