Analysis of Dataset Limitations in Semantic Knowledge-Driven Multi-Variant Machine Translation
DOI:
https://doi.org/10.14313/JAMRIS/3-2024/20Keywords:
machine translation, intelligent virtual assistants, natural language understandingAbstract
In this study, we explore the implications of dataset limitations in semantic knowledge-driven machine translation (MT) for intelligent virtual assistants (IVA). Our approach diverges from traditional single-best translation techniques, utilizing a multi-variant MT method that generates multiple valid translations per input sentence through a constrained beam search. This method extends beyond the typical constraints of specific verb ontologies, embedding within a broader semantic knowledge framework.
We evaluate the performance of multi-variant MT models in translating training sets for Natural Language Understanding (NLU) models. These models are applied to semantically diverse datasets, including a detailed evaluation using the standard MultiATIS++ dataset. The results from this evaluation indicate that while multi-variant MT method is promising, its impact on improving intent classification (IC) accuracy is limited when applied to conventional datasets such as MultiATIS++. However, our findings underscore that the effectiveness of multi-variant translation is closely associated with the diversity and suitability of the datasets utilized.
Finally, we provide an in-depth analysis focused on generating variant-aware NLU datasets. This analysis aims to offer guidance on enhancing NLU models through semantically rich and variant-sensitive datasets, maximizing the advantages of multi-variant MT.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Journal of Automation, Mobile Robotics and Intelligent Systems

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors retain copyright. Authors grant the journal a non-exclusive right to publish the article. Articles are published under the CC BY-NC-ND 4.0 licence.


