START: Self-taught Reasoner with Tools
The last few months in AI have been incredible.
The reinforcement learning renaissance is underway. Then we hacked how to give models near-perfect (and near-infinite) memory. This all lead to the breakthrough of giving models a mechanism to self-reward:
Self-Rewarding Reasoning Large Language Models (SR-LLMs)
"There are decades where nothing happens; and there are weeks where decades happen"
All of that in the span of 30-days.
Now, we are giving these reasoning models a large and powerful toolbox to handle as they learn and work.
This paper was released by the Alibaba Group, the team behind the famous Qwen series of models that are currently battling with DeepSeek for the king of Chinese AI.
START, which is short for Self-taught Reasoner with Tools, moves beyond simply prompting LLMs to reason; it provides a mechanism for them to ground their reasoning in verifiable computations, learn to use tools efficiently, and do so in a way that is more scalable and accessible than previous app…




