Mistral AI Releases Leanstral 1.5: Apache-2.0 Lean 4 Code Agent Model Solving 587 of 672 PutnamBench Issues

Today, Mistral AI was released Lesson 1.5. It is a code agent model built for Lean 4. The release targets automated theorem proving and proof engineering. Weights are enabled under Apache 2.0. A free API repository, leanstral-1-5it’s live.
Leanstral 1.5 updates the previous Leanstral-2603 model. It belongs to the Mistral Small 4 family.
What is Leastral 1.5
Leanstral 1.5 is a code agent model of Lean 4, a proof assistant. The proof assistant checks all logical steps mechanically. Lean 4 can reveal things like complete blanks and Rust fragment structures.
Architecture is a mix-of-experts, or MoE. MoE distributes each token to several specialized sub-networks. This keeps the computer down while the total value remains large. Leanstral employs 128 experts, with 4 working per token.
Total size is 119B parameters, with 6.5B open per token. The context length is 256k tokens. Input is multimodal, accepting text and image. The output is text only.
How Mistral trained Leanstral 1.5
The training goes through three stages. This is moderated training, well-supervised planning, and reinforced learning through CISPO. Two areas of reinforcement learning shaped the behavior of the model agent.
Of multiturn naturethe model derives the statement of the theorem. It has to prove or disprove it. Runs the proof, then reads the Lean compiler’s response. It refines every effort until it succeeds or exhausts its budget.
Of code agent environmentLeanstral works within a raw file system. It edits files, runs bash commands, and runs a soft language server. That server reveals goals, errors, and type information in real time.
This allows it to fill in small proofs, construct auxiliary lemmas, and proceed through context compression. Congestion compresses the previous context so that longer tasks still fit the window. Fairness is verified by the Mistral fork of SafeVerify against target theories.
Benchmarks and performance
The Mistral team reports that Leanstral 1.5 is full of miniF2F. It reaches 100% in both validation and test sets. Solves 587 of 672 PutnamBench problems.
The model sets a new state of the art in FATE-H and FATE-X algebraic benchmarks. Mistral accounts for 87% in FATE-H and 34% in FATE-X. In FLTEval, pass@1 increases from 21.9 to 28.9. Pass@8 increases from 31.9 to 43.2.
FLTEval was developed from original draw applications to the last point of Fermat’s Theorem. In it, Leanstral beats Opus 4.6’s 39.6 by one-seventh. It also extends its lead over open source models by three to ten times. Pass@8 means eight attempts are allowed per problem.
| Benchmark | Lesson 1.5 | Details |
|---|---|---|
| miniF2F (val + test) | 100% | Full, according to Mistral |
| PutnamBench | 587/672 | ~$4 per issue |
| FATE-H | 87% | The new state of the art |
| FATE-X | 34% | The new state of the art |
| FLTEval pass@1 | 28.9 | From 21.9 |
| FLTEval pass@8 | 43.2 | It beats Opus 4.6’s 39.6 |
On PutnamBench, Leanstral edges Seed-Prover 1.5 up by 7 issues. It does that for about $4 per problem. Mistral estimates the maximum Seed-Prover setup at around $300 or more per problem.
That setting uses a budget of 10 H20 days per problem. Mistral also compares with Goedel-Architect and AxProverBase. It notes that the Aleph Prover costs about $54 to $68 per problem.
The test time scaling is the descriptive behavior of the model. Raising the token budget per attempt raises the PutnamBench Pass@8. The Mistral team reports that 44 have been resolved at 50k, 244 at 200k, 493 at 1M, and 587 at 4M. The functional tester below allows you to scrub in that same curve.



