Command Palette
Search for a command to run...
LongBlocks Long Context Multilingual Question Answering Dataset
Date
License
CC BY-SA 4.0
LongBlocks is a long-context multilingual synthesis dataset released in 2026 by the University of Lisbon, the Instituto de Telecomunicações, TransPerfect, and other institutions. This dataset contains approximately 194,000 long-context question-and-answer examples, covering long document corpora such as books, web page text, Wikipedia, arXiv papers, programming code, and community Q&A.
Data Fields:
- id: String, a unique instance identifier (only used to recover restricted book data; null for other sources).
- document: String, long source document content (null for limited book data).
- source: String, the name of the source corpus.
- language: A string representing the language or programming language of the example.
- Question: String composition, long context problem.
- answer: String, a reference answer that has been filtered for authenticity.
- response_Qwen3-Next-80B-A3B / response_Qwen3.5-27B / response_Nemotron-3-Nano-30B-A3B: Strings corresponding to the generated responses of the teacher model.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.