Granite 3.3 8B Instruct
Granite 3.3 models feature enhanced reasoning capabilities and support for Fill-in-the-Middle (FIM) code completion. They are built on a foundation of open-source instruction datasets with permissive licenses, alongside internally curated synthetic datasets tailored for long-context problem-solving. These models preserve the key strengths of previous Granite versions, including support for a 128K context length, strong performance in retrieval-augmented generation (RAG) and function calling, and controls for response length and originality. Granite 3.3 also delivers competitive results across general, enterprise, and safety benchmarks. Released as open source, the models are available under the Apache 2.0 license.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2024 | 81.2% | self-reported llm-stats | link → |
| AlpacaEval 2.0 | 62.7% | self-reported llm-stats | link → |
| Arena Hard | 57.6% | self-reported llm-stats | link → |
| AttaQ | 88.5% | self-reported llm-stats | link → |
| BIG-Bench Hard | 69.1% | self-reported llm-stats | link → |
| DROP | 59.4% | self-reported llm-stats | link → |
| GSM8k | 80.9% | self-reported llm-stats | link → |
| HumanEval | 89.7% | self-reported llm-stats | link → |
| HumanEval+ | 86.1% | self-reported llm-stats | link → |
| IFEval | 74.8% | self-reported llm-stats | link → |
| MATH-500 | 69.0% | self-reported llm-stats | link → |
| MMLU | 65.5% | self-reported llm-stats | link → |
| PopQA | 26.2% | self-reported llm-stats | link → |
| TruthfulQA | 66.9% | self-reported llm-stats | link → |