HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Morfessor-enriched features and multilingual training for canonical morphological segmentation

{Mikko Kurimo Mathias Creutz Sami Virpioja Stig-Arne Grönroos Aku Rouhe}

Morfessor-enriched features and multilingual training for canonical morphological segmentation

Abstract

In our submission to the SIGMORPHON 2022 Shared Task on Morpheme Segmentation, we study whether an unsupervised morphological segmentation method, Morfessor, can help in a supervised setting. Previous research has shown the effectiveness of the approach in semisupervised settings with small amounts of labeled data. The current tasks vary in data size: the amount of word-level annotated training data is much larger, but the amount of sentencelevel annotated training data remains small. Our approach is to pre-segment the input data for a neural sequence-to-sequence model with the unsupervised method. As the unsupervised method can be trained with raw text data, we use Wikipedia to increase the amount of training data. In addition, we train multilingual models for the sentence-level task. The results for the Morfessor-enriched features are mixed, showing benefit for all three sentencelevel tasks but only some of the word-level tasks. The multilingual training yields considerable improvements over the monolingual sentence-level models, but it negates the effect of the enriched features.

Benchmarks

BenchmarkMethodologyMetrics
morpheme-segmentaiton-on-unimorph-4-0AUUH_C
f1 macro avg (subtask 2): 70.76
lev dist (subtask 2): 35.94
morpheme-segmentaiton-on-unimorph-4-0Bidirectional GRU + Morfessor features (AUUH_F)
f1 macro avg (subtask 2): 66.73
lev dist (subtask 2): 36.35
macro avg (subtask 1): 93.72
morpheme-segmentaiton-on-unimorph-4-0AUUH_E
f1 macro avg (subtask 2): 73.21
lev dist (subtask 2): 31.05
morpheme-segmentaiton-on-unimorph-4-0AUUH_D
f1 macro avg (subtask 2): 72.75
lev dist (subtask 2): 36.38
morpheme-segmentaiton-on-unimorph-4-0AUUH_B
f1 macro avg (subtask 2): 89.77
lev dist (subtask 2): 3.50
morpheme-segmentaiton-on-unimorph-4-0AUUH_A
f1 macro avg (subtask 2): 89.00
lev dist (subtask 2): 4.08

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Morfessor-enriched features and multilingual training for canonical morphological segmentation | Papers | HyperAI