HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Fewer features perform well at Native Language Identification task

{{\c{C}}a{\u{g}}r{\i} {\c{C}}{\o}ltekin Taraka Rama}

Fewer features perform well at Native Language Identification task

Abstract

This paper describes our results at the NLI shared task 2017. We participated in essays, speech, and fusion task that uses text, speech, and i-vectors for the task of identifying the native language of the given input. In the essay track, a linear SVM system using word bigrams and character 7-grams performed the best. In the speech track, an LDA classifier based only on i-vectors performed better than a combination system using text features from speech transcriptions and i-vectors. In the fusion task, we experimented with systems that used combination of i-vectors with higher order n-grams features, combination of i-vectors with word unigrams, a mean probability ensemble, and a stacked ensemble system. Our finding is that word unigrams in combination with i-vectors achieve higher score than systems trained with larger number of $n$-gram features. Our best-performing systems achieved F1-scores of 87.16{%}, 83.33{%} and 91.75{%} on the essay track, the speech track and the fusion track respectively.

Benchmarks

BenchmarkMethodologyMetrics
native-language-identification-on-italki-nliTubasfs
Average F1: 0.5807

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Fewer features perform well at Native Language Identification task | Papers | HyperAI