The Architecture of the Intellectual System for Determining the Degree of Uniqueness of Armenian Text in a Multilingual Environment

The Architecture of the Intellectual System for Determining the Degree of Uniqueness

of Armenian Text in a Multilingual Environment

Petrosyan Gevorg

Summary

Key words: natural language processing, interlingual borrowing, plagiarism, text originality, language model, sentence embedding, transformer

The problem of determining the uniqueness of texts in a multilingual environment, in the context of the availability of translation and rewording tools, has acquired greater importance. Traditional methods of searching for monolingual borrowings based on word coincidences cannot analyze the semantic correspondences between typologically different languages, for example, Armenian, English, and Russian. Without identifying multilingual borrowings, it is impossible to ensure an objective and accurate determination of the degree of text uniqueness. This work presents the architecture of the intellectual system for determining the degree of uniqueness of Armenian texts, with a focus on detecting multilingual borrowings in the Armenian–English and Armenian–Russian language pairs. According to the proposed architecture, the system is presented as a two-level approach to searching for borrowings. At the first level, a search for possible sources is performed based on the most informative parts of speech, which ensures speed and sufficient accuracy in selecting candidate texts. At the second level, the semantic analysis is performed using a multilingual model based on a transformer architecture, which maps sentences from different languages into a common vector space. At this level, structural analysis is also performed using a method based on Markov chains.

PDF

DOI: https://doi.org/10.58726/27382923-2025.2-66