Analysis of Dataset Limitations in Semantic Knowledge-Driven Multi-Variant Machine Translation

Marcin Sowański; Jakub Hosciłowicz; Artur  Janicki

doi:10.14313/JAMRIS/3-2024/20

Authors

Marcin Sowański TCL Research Europe, Poland
https://orcid.org/0000-0002-9360-1395
Jakub Hosciłowicz Samsung R&D Institute, Poland
Artur Janicki Warsaw University of Technology, Poland
https://orcid.org/0000-0002-9937-4402

DOI:

https://doi.org/10.14313/JAMRIS/3-2024/20

Keywords:

machine translation, intelligent virtual assistants, natural language understanding

Abstract

In this study, we explore the implications of dataset limitations in semantic knowledge-driven machine translation (MT) for intelligent virtual assistants (IVA). Our approach diverges from traditional single-best translation techniques, utilizing a multi-variant MT method that generates multiple valid translations per input sentence through a constrained beam search. This method extends beyond the typical constraints of specific verb ontologies, embedding within a broader semantic knowledge framework.

We evaluate the performance of multi-variant MT models in translating training sets for Natural Language Understanding (NLU) models. These models are applied to semantically diverse datasets, including a detailed evaluation using the standard MultiATIS++ dataset. The results from this evaluation indicate that while multi-variant MT method is promising, its impact on improving intent classification (IC) accuracy is limited when applied to conventional datasets such as MultiATIS++. However, our findings underscore that the effectiveness of multi-variant translation is closely associated with the diversity and suitability of the datasets utilized.

Finally, we provide an in-depth analysis focused on generating variant-aware NLU datasets. This analysis aims to offer guidance on enhancing NLU models through semantically rich and variant-sensitive datasets, maximizing the advantages of multi-variant MT.

Analysis of Dataset Limitations in Semantic Knowledge-Driven Multi-Variant Machine Translation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Information