TLDR9+ (test)
language official site →
A large-scale summarization dataset containing over 9 million training instances extracted from Reddit, designed for extreme summarization (generating one-sentence summaries with high compression and abstraction). More than twice larger than previously proposed datasets.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: language, summarization. Language: en. Verified by llm-stats: no.