ANALYSIS OF THE HOMOGENEITY OF TWO DIGITAL TEXT PORTRAITS

Authors

 

Kosimova N.O. Ph.D. student, the Department of informational systems and technologies, Technological University of Tajikistan, Dushanbe, Republic of Tajikistan, nilufar_k@inbox.ru

Mirzoev S.Kh.Doctor of technical sciences, Associate professor of Department of Informatics, Tajik National University, Dushanbe, Republic of Tajikistan,saidalo.mirzoev.1967@mail.ru

 

Annotation

 

  The article deals with the method of analysis and the study of the homogeneity of digital portraits of texts. A technique for identifying and comparing similar texts is proposed. The technique can be effectively used to study texts of other authors and other languages. To conduct a homogeneity analysis, two different works by five Russian writers were randomly selected as a model collection, for a total of 10 texts. Preliminary processing of the selected products for calculations was carried out. For each of the works, two digital portraits of the text are formed, based on the frequency distribution of letter unigrams and letter bigrams, respectively. Using a specific formula, the paired distances between the digital portraits of the authors’ works were calculated separately for unigrams and bigrams. From the obtained calculations, 2 separate tables were formed with 45 paired distances between digital portraits. Further analysis of the homogeneity of the authors’ works was carried out on the basis of the data in these tables. Based on the results of the calculated distances, the hypothesis of “homogeneity” of two works of the same author and “heterogeneity” of two works of different authors was tested. The methodology proposed in the study will be useful in the future for identifying the authors of works or for comparing similar works.

 

Key words

 

text, digital portrait, distances, comparisons, application prospects

 

References

 

  1. Usmanov Z.D. Algorithm for tuning a clusterer of discrete random variables – DAN RT, v.60, no. 9, p. 392-397.
  2. Usmanov Z.D. Classifier of discrete random variables – DAN RT, 2017, v.60, no. 7-8, p. 291-300.
  3. Usmanov Z.D. Evaluation of the effectiveness of the use of a classifier for the attribution of printed text // DAN RT – 2020.- V.63, No. 3-4 – P.172-179
  4. Rudman J. The state of authorship attribution studies: Some problems and solutions // Computers and Humanities. – 1998. – Vol.31. – p. 351-365.

 

Publication date

2023-10-28