START: Self-taught Reasoner with Tools
The last few months in AI have been incredible.
The reinforcement learning renaissance is underway. Then we hacked how to give models near-perfect (and near-infinite) memory. This all lead to the breakthrough of giving models a mechanism to self-reward:
Self-Rewarding Reasoning Large Language Models (SR-LLMs)
·
"There are decades where nothing happens; and there are weeks where decades happen"
All of that in the span of 30-days.
Now, we are giving these reasoning models a large and powerful tool…