Command Palette
Search for a command to run...
Domain Adaptation of Thai Word Segmentation Models using Stacked Ensemble
{Sarana Nutanong Ekapol Chuangsuwanich Raheem Sarwar Wannaphong Phatthiyaphaibun Peerat Limkonchotiwat}

Abstract
Like many Natural Language Processing tasks, Thai word segmentation is domain-dependent. Researchers have been relying on transfer learning to adapt an existing model to a new domain. However, this approach is inapplicable to cases where we can interact with only input and output layers of the models, also known as {``}black boxes{''}. We propose a filter-and-refine solution based on the stacked-ensemble learning paradigm to address this black-box limitation. We conducted extensive experimental studies comparing our method against state-of-the-art models and transfer learning. Experimental results show that our proposed solution is an effective domain adaptation method and has a similar performance as the transfer learning method.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| thai-word-segmentation-on-ws160 | Stacked Ensemble (CRF) | F1-score: 0.952 |
| thai-word-tokenization-on-best-2010 | Stacked Ensemble (CRF) | F1-Score: 0.9812 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.