Granite 3.3 8B Instruct

Granite 3.3 models feature enhanced reasoning capabilities and support for Fill-in-the-Middle (FIM) code completion. They are built on a foundation of open-source instruction datasets with permissive licenses, alongside internally curated synthetic datasets tailored for long-context problem-solving. These models preserve the key strengths of previous Granite versions, including support for a 128K context length, strong performance in retrieval-augmented generation (RAG) and function calling, and controls for response length and originality. Granite 3.3 also delivers competitive results across general, enterprise, and safety benchmarks. Released as open source, the models are available under the Apache 2.0 license.

Benchmark results

Benchmark Score Tags Source
AIME 2024 81.2% self-reported llm-stats link →
AlpacaEval 2.0 62.7% self-reported llm-stats link →
Arena Hard 57.6% self-reported llm-stats link →
AttaQ 88.5% self-reported llm-stats link →
BIG-Bench Hard 69.1% self-reported llm-stats link →
DROP 59.4% self-reported llm-stats link →
GSM8k 80.9% self-reported llm-stats link →
HumanEval 89.7% self-reported llm-stats link →
HumanEval+ 86.1% self-reported llm-stats link →
IFEval 74.8% self-reported llm-stats link →
MATH-500 69.0% self-reported llm-stats link →
MMLU 65.5% self-reported llm-stats link →
PopQA 26.2% self-reported llm-stats link →
TruthfulQA 66.9% self-reported llm-stats link →