HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

LATTE: Lattice ATTentive Encoding for Character-based Word Segmentation

{Manabu Okumura Kotaro Funakoshi Hidetaka Kamigaito Thodsaporn Chay-intr}

LATTE: Lattice ATTentive Encoding for Character-based Word Segmentation

Abstract

A character sequence comprises at least one or more segmentation alternatives. This can be considered segmentation ambiguity and may weaken segmentation performance in word segmentation. Proper handling of such ambiguity lessens ambiguous decisions on word boundaries. Previous works have achieved remarkable segmentation performance and alleviated the ambiguity problem by incorporating the lattice, owing to its ability to capture segmentation alternatives, along with graph-based and pre-trained models. However, multiple granularity information, including character and word, in a lattice that encodes with such models may not be attentively exploited. To strengthen multi-granularity representations in a lattice, we propose the Lattice ATTentive Encoding (LATTE) method for character-based word segmentation. Our model employs the lattice structure to handle segmentation alternatives and utilizes graph neural networks along with an attention mechanism to attentively extract multi-granularity representation from the lattice for complementing character representations. Our experimental results demonstrated improvements in segmentation performance on the BCCWJ, CTB6, and BEST2010 datasets in three languages, particularly Japanese, Chinese, and Thai.

Benchmarks

BenchmarkMethodologyMetrics
chinese-word-segmentation-on-ctb6LATTE (Linguistic units, lattices, PTMs, GNNs)
F1: 98.07
japanese-word-segmentation-on-bccwjLATTE (Linguistic units, lattices, PTMs, GNNs)
F1-score (Word): 0.9936
thai-word-tokenization-on-best-2010LATTE (Linguistic units, lattices, PTMs, GNNs)
F1-Score: 0.9907

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
LATTE: Lattice ATTentive Encoding for Character-based Word Segmentation | Papers | HyperAI