HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Distributed Representations of Sentences and Documents

Quoc V. Le; Tomas Mikolov

Distributed Representations of Sentences and Documents

Abstract

Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, "powerful," "strong" and "Paris" are equally distant. In this paper, we propose Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents. Our algorithm represents each document by a dense vector which is trained to predict words in the document. Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models. Empirical results show that Paragraph Vectors outperform bag-of-words models as well as other techniques for text representations. Finally, we achieve new state-of-the-art results on several text classification and sentiment analysis tasks.

Code Repositories

bombdiggity/paper-bag
tf
Mentioned in GitHub
jimmy6727/Informd
tf
Mentioned in GitHub
julian-risch/ICADL2018
tf
Mentioned in GitHub
hithisisdhara/doc2vec
pytorch
Mentioned in GitHub
inejc/paragraph-vectors
pytorch
Mentioned in GitHub
kr900910/supreme_court_opinion
tf
Mentioned in GitHub
tsandefer/capstone_2
tf
Mentioned in GitHub
tsandefer/dsi_capstone_2
tf
Mentioned in GitHub
eske/multivec
Mentioned in GitHub
Nalydy/doc2vec
Mentioned in GitHub
ibrahimsharaf/doc2vec
Mentioned in GitHub
g-k-l/dsi-arxiv-recommender
Mentioned in GitHub
dhyeon/ingredient-vectors
pytorch
Mentioned in GitHub
kramamur/sentiment-analysis
Mentioned in GitHub
slme1109/lyrics-generator
tf
Mentioned in GitHub
rvstraalen/doc2vec-workshop
Mentioned in GitHub
YinpeiDai/NAUM
tf
Mentioned in GitHub
kinimod23/NMT_Project
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
question-answering-on-qasentParagraph vector
MAP: 0.5213
MRR: 0.6023
question-answering-on-qasentParagraph vector (lexical overlap + dist output)
MAP: 0.6762
MRR: 0.7514
question-answering-on-wikiqaParagraph vector
MAP: 0.5110
MRR: 0.5160
question-answering-on-wikiqaParagraph vector (lexical overlap + dist output)
MAP: 0.5976
MRR: 0.6058

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Distributed Representations of Sentences and Documents | Papers | HyperAI