Granite 3.3 8B Base
Granite-3.3-8B-Base is a decoder-only language model with a 128K token context window. It improves upon Granite-3.1-8B-Base by adding support for Fill-in-the-Middle (FIM) using specialized tokens, enabling the model to generate content conditioned on both prefix and suffix. This makes it well-suited for code completion tasks
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AGIEval | 49.3% | self-reported llm-stats | link → |
| AIME 2024 | 81.2% | self-reported llm-stats | link → |
| AlpacaEval 2.0 | 62.7% | self-reported llm-stats | link → |
| ARC-C | 50.8% | self-reported llm-stats | link → |
| Arena Hard | 57.6% | self-reported llm-stats | link → |
| AttaQ | 88.5% | self-reported llm-stats | link → |
| BIG-Bench Hard | 69.1% | self-reported llm-stats | link → |
| DROP | 36.1% | self-reported llm-stats | link → |
| GSM8k | 59.0% | self-reported llm-stats | link → |
| HellaSwag | 80.1% | self-reported llm-stats | link → |
| HumanEval | 89.7% | self-reported llm-stats | link → |
| HumanEval+ | 86.1% | self-reported llm-stats | link → |
| IFEval | 74.8% | self-reported llm-stats | link → |
| MATH-500 | 69.0% | self-reported llm-stats | link → |
| MMLU | 63.9% | self-reported llm-stats | link → |
| NQ | 36.5% | self-reported llm-stats | link → |
| PopQA | 26.2% | self-reported llm-stats | link → |
| TriviaQA | 78.2% | self-reported llm-stats | link → |
| TruthfulQA | 52.1% | self-reported llm-stats | link → |
| Winogrande | 74.4% | self-reported llm-stats | link → |