HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Enriching Word Vectors with Subword Information

Piotr Bojanowski; Edouard Grave; Armand Joulin; Tomas Mikolov

Enriching Word Vectors with Subword Information

Abstract

Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character $n$-grams. A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows us to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.

Code Repositories

FengJiaChunFromSYSU/fastText
Mentioned in GitHub
GitHubSprint/fasttext4j
Mentioned in GitHub
labdac/charlacompling
Mentioned in GitHub
amymariaparker2401/new
Mentioned in GitHub
bung87/fastText
Mentioned in GitHub
M155K4R4/fastText
Mentioned in GitHub
pommedeterresautee/fastrtext
Mentioned in GitHub
plasticityai/magnitude
pytorch
Mentioned in GitHub
ericxsun/fastText
Mentioned in GitHub
luckyPT/jvm-ml
tf
Mentioned in GitHub
vinhkhuc/JFastText
Mentioned in GitHub
mrzzy/np-dl-assign-2
tf
Mentioned in GitHub
ulf1/augtxt
Mentioned in GitHub
Omerktn/fastText-iterative
Mentioned in GitHub
currentsapi/fastlangid
Mentioned in GitHub
Nim-NLP/fastText
Mentioned in GitHub
Kinetikm/fastTextRelearning
Mentioned in GitHub
SarangShaikh201/fastText
Mentioned in GitHub
divisionai/fastText
Mentioned in GitHub
rmenegaux/fastDNA
Mentioned in GitHub
lmd1993/fastTextBoost
Mentioned in GitHub
kpu/fastertext
Mentioned in GitHub
explosion/floret
Mentioned in GitHub
mwydmuch/extremeText
tf
Mentioned in GitHub
indix/whatthelang
Mentioned in GitHub
bamtercelboo/cw2vec
Mentioned in GitHub
trietnm2/sent2vec4j
Mentioned in GitHub
jen1995/fastText
Mentioned in GitHub
luhuiguo/jfasttext
Mentioned in GitHub
facebookresearch/fastText
Official
Mentioned in GitHub
wyfish/fastText
Mentioned in GitHub
tshev/faster-FastText
Mentioned in GitHub
DW-yejing/fasttext4j-jdk6
Mentioned in GitHub
linkfluence/fastText4j
Mentioned in GitHub
dbaumgarten/FToDTF
tf
Mentioned in GitHub
zhang2010hao/cw2vec-pytorch
pytorch
Mentioned in GitHub
hufscapstone/Fast_text
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
word-similarity-on-ws353SkipGram
Spearman's Rho: 61.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Enriching Word Vectors with Subword Information | Papers | HyperAI