Using AI to Win Math Competitions

Nov 28, 2024

I’ve mentioned in previous articles that learning data science and specializing in data engineering was the best choice I made when I first got started on Wall Street.

Most of the other bankers were studying up on valuation methodologies, learning about industry bottlenecks and opportunities and getting very handy with powerpoint / excel.

I went hardcore Python / SQL because I wanted to mechanically read financial statements and see the entire stock market. I thought it would provide an advantage to build analytic workspaces that had more than just numbers from excel files but also had sentiment from company filings, sentiment from industry analysts via conference call transcripts and research reports paired with sentiment from the “market” using X/Twitter + Reddit conversation data.

Along with 3 partners we launched Throne Capital more than a decade ago using this approach.

Eventually I learned about Bitcoin and realized actively managing money was just generating commissions for our prime broker, lawyer time and legal filings, Bloomberg terminal fees… all so we could generate a lower return on capital because we were stuck (due to our prospectus) only being able to invest in the equities markets.

Nowadays I write business intelligence software, run a revenue strategy and operations consultancy and I also work on a contract basis for Fortune 500 companies on multiple fronts including product development and go-to-market strategy.

None of that itches the engineering and problem-solving bug I caught back in my finance days, so for that I turn to Kaggle Data Science competitions.

Kaggle is a popular online community platform for data scientists and machine learning enthusiasts. It's a hub where users can find and publish datasets, explore and build models in a web-based data science environment, collaborate with fellow data scientists and machine learning engineers, and participate in competitions to solve data science challenges.

Kaggle is a nerd olympics mixed with nerd coding camp.

Last summer I wrote a 4-part series covering my entry into a $50,000 competition to use ML to predict new medicines into existence. I had absolutely zero domain experience and my final submission score showed it - bottom 20% of the nearly 2000 teams.

Lately I’ve been working on a challenge closer to my expertise: solve national-level math challenges using artificial intelligence models.

The ability to reason mathematically is a critical milestone for AI.

Mathematical reasoning is the foundation for solving many complex problems, from engineering marvels to intricate financial models. However, current AI capabilities are limited in this area.

The AI Mathematical Olympiad (AIMO) Prize is a $10mn fund to spur the open development of AI models capable of performing as well as top human participants in the International Mathematical Olympiad (IMO).

This second AIMO Progress Prize competition (the one I am talking about right now) has 110 math problems in algebra, combinatorics, geometry and number theory. The difficulty has been increased from the first competition, and the problems are now around the National Olympiad level. The problems have also been designed to be 'AI hard' in terms of the mathematical reasoning required, which was tested against current open LLMs' capabilities.

My Approach

LLMs aren’t reading the entire problem like humans do. They tokenize and even with incredible context windows, cache and other memory solutions… their minds don’t work like humans do.

They don’t natively create symbols to “hold” the information they are learning (as they read the ENTIRE problem) — those symbols morph and move as you learn more and complete reading the problem.

My plan is to write a Python script that loops through all the questions in the challenge:

takes the question as an input
asks each Q “what am I being asked to answer, what are the steps to solve this problem, what concepts are required to answer this question completely and correctly, etc…” - that’s Output 1, or O1
Take O1 and feed it to the LLM to generate an answer to the question from the loop. Explicitly tell it to return executable Python code with formatting notes - that’s O2.
Extract Python code from O2, run it and take the output - that’s O3.
O3 is the first answer… but in my solution we’re going to loop through different prompts and get a few different answers for each Q
Finally we’ll define functions to filter out bad answers (too long, not formatted in the way Kaggle wants, etc..) and then use LLM to decide which answer to serve ultimately.

I’m telling it to think in steps. I’m telling it to use symbolic reasoning. I’m telling it to leverage Python.

In essence, this script uses a loop of LLM generation, code extraction, code execution, and answer filtering to iteratively refine its solution to the math problems. It combines the strengths of both the LLM (for reasoning and generating code) and symbolic math/Python execution (for precise calculations) to increase its accuracy.

In future versions (I have 4-months left) I will layer in more complicated methods for selecting the final answer. I have considered different voting methods, even creating a “math council” with different personalities and math styles to highlight the best answers from their perspective.

What I’ve built so far is close to top 20% already.

I am doing much better than bottom 20% this time, as you can see.

Now my goal is to climb into the Top-100. My plan is to build out 3 or 4 competing designs and use all the time I have to test different training methods, prompt engineering and other variables.

This will be a multi-part series!

Thank you for helping us grow Life in the Singularity by sharing.

I started this letter in May 2023 to track all the accelerating changes in AI/ML, robotics, quantum computing and the rest of the technologies accelerating humanity forward into the future.

Our brilliant audience includes engineers and executives, incredible technologists, Fortune 500 board members and thousands of people who want to use technology to maximize the utility in their lives.

To help us continue our growth, would you please Like, Comment and Share this?

Share Life in the Singularity

Life in the Singularity is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Using AI to Win Math Competitions

My Approach

Discussion about this post