Command Palette
Search for a command to run...
WikiText Long Term Dependency Language Modeling Dataset Long Term Dependency Language Modeling Dataset
Date
Size
Publish URL
The WikiText long-term reliance language modeling dataset contains 100 million English words, which come from Wikipedia's high-quality articles and benchmark articles.
The dataset is divided into two versions: WikiText-2 and WikiText-103. Compared with the PTB vocabulary, it is larger in scale and each word also retains the relevant original article, which is suitable for scenarios that require long-term reliance on natural language modeling.
This dataset was released by Salesforce Research in 2016, with the main publishers being Stephen Merity, Caiming Xiong, James Bradbury and Richard Socher. The related paper is "Pointer Sentinel Mixture Models".
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.