Analysis of Dataset Limitations in Semantic Knowledge-Driven Multi-Variant Machine Translation

Authors

Keywords: machine translation, intelligent virtual assistants, natural language understanding

Abstract

In this study, we explore the implications of dataset limitations in semantic knowledge-driven machine translation (MT) for intelligent virtual assistants (IVA). Our approach diverges from traditional single-best translation techniques, utilizing a multi-variant MT method that generates multiple valid translations per input sentence through a constrained beam search. This method extends beyond the typical constraints of specific verb ontologies, embedding within a broader semantic knowledge framework.

We evaluate the performance of multi-variant MT models in translating training sets for Natural Language Understanding (NLU) models. These models are applied to semantically diverse datasets, including a detailed evaluation using the standard MultiATIS++ dataset. The results from this evaluation indicate that while multi-variant MT method is promising, its impact on improving intent classification (IC) accuracy is limited when applied to conventional datasets such as MultiATIS++. However, our findings underscore that the effectiveness of multi-variant translation is closely associated with the diversity and suitability of the datasets utilized.

Finally, we provide an in-depth analysis focused on generating variant-aware NLU datasets. This analysis aims to offer guidance on enhancing NLU models through semantically rich and variant-sensitive datasets, maximizing the advantages of multi-variant MT.

Downloads

Published
03.09.2024
Issue
Section
Articles

How to Cite

Sowański, M., Hosciłowicz, J., & Janicki, A. (2024). Analysis of Dataset Limitations in Semantic Knowledge-Driven Multi-Variant Machine Translation. Journal of Automation, Mobile Robotics and Intelligent Systems, 18(3), 39-48. https://doi.org/10.14313/JAMRIS/3-2024/20