MAI-Thinking-1

MAI-Thinking-1 is Microsoft AI's first in-house reasoning model, a 35B-active / ~1T-total parameter sparse Mixture of Experts model (base model MAI-Base-1) trained from scratch without distillation from third-party models. Built with Microsoft's Hill-Climbing Machine pipeline, it was pre-trained on 30T tokens of clean, commercially licensed, human-generated data (plus 3.55T mid-training tokens), then post-trained via reinforcement learning across STEM, agentic coding, and helpfulness/safety specialists consolidated into a single model. It delivers strong mathematical reasoning and software-engineering performance for its weight class, going toe-to-toe with Claude Opus 4.6 on SWE-Bench Pro and reaching 97.0% on AIME 2025. It supports a 256k token context window, function calling, and developer instructions, and is preferred over Claude Sonnet 4.6 in blind human side-by-side evaluations.

Benchmark results

Benchmark Score Tags Source
AdvancedIF 85.0% self-reported llm-stats link →
AIME 2025 97.0% self-reported llm-stats link →
AIME 2026 94.5% self-reported llm-stats link →
AIR-Bench 88.0% self-reported llm-stats link →
BFCL-v3 72.0% self-reported llm-stats link →
CorpusQA 82.0% self-reported llm-stats link →
CyberSecEval 4 63.0% self-reported llm-stats link →
GPQA 84.2% self-reported llm-stats link →
GraphWalks 90.0% self-reported llm-stats link →
HealthBench Professional 35.0% self-reported llm-stats link →
HMMT Feb 26 84.9% self-reported llm-stats link →
IFBench 69.0% self-reported llm-stats link →
LiveCodeBench v6 87.7% self-reported llm-stats link →
LongBench v2 61.0% self-reported llm-stats link →
LongFact 98.0% self-reported llm-stats link →
MedXpertQA 43.0% self-reported llm-stats link →
MMLU-Pro 85.0% self-reported llm-stats link →
Multi-Challenge 53.0% self-reported llm-stats link →
SimpleQA Verified 31.0% self-reported llm-stats link →
SWE-Bench Pro 52.8% self-reported llm-stats link →
SWE-Bench Verified 73.5% self-reported llm-stats link →
Terminal-Bench 2.0 46.0% self-reported llm-stats link →
TruthfulQA 88.0% self-reported llm-stats link →