ABOUT THE AUTHOR’S TEXT RECOGNITION BASED ON THE FREQUENCY OF WORD UNIGRAMMS

Authors

           Ashurova Sh.N.senior teacher, Chair of Programming and Information Systems, Polytechnic Institute of Tajik Technical University, Khujand, Republic of Tajikistan, sh.nurulloevna@gmail.com.

Annotation

           The article solves the problem of recognising the authors of works separately for classical and modern poetry, as well as modern prose of Tajik literature. The considering model collection of texts consists of 30 works by 15 authors. The works are compared with a digital portrait, characterized by the distribution of the frequency of verbal unigrams in them. Z.D. Usmanov’s classifier is used as a tool for solving the problem, which identifies the authors of textual information based on the frequency of word unigrams. The effectiveness of the application of the classifier is established. It is concluded that the identification of the author of a text by a digital portrait, i.e. the frequency distribution of word unigrams, is more successful for poetic works compared to prose. This method can be applied as an alternative to other methods of text authorship recognition in Tajik in the case where a text related to an existing author and its works in a database is considered.

Key words

    Tajik language, text, poetry, prose, frequency, classifier, identification.

Language 

english

Year

2020

Type

technical

Page

15

References

  1. Ashurova Sh.N. Efficiency evaluation of using words bigram for a text identification – Materials of the international scientific and practical conference of TUT “The role of ICT in the innovative development of the economy of the Republic of Tajikistan” – Dushanbe: Bahmanrud, 2017, p. 292-297.
  2. Ashurova Sh.N. Efficiency evaluation of using words trigram for a text identification – Bulletin of the Technological University of Tajikistan. 2017. No. 4 (31). S. 51-58.
  3. Ashurova Sh.N., Kosimov A.A. Efficiency evaluation of using words unigramm for a text identification – Proceedings of the Academy of Sciences of the Republic of Tajikistan. Department of physico-mathematical, chemical, geological and technical sciences. 2017. No. 2 (167). S. 49-54.
  4. Karimov A.A. On the digital portrait of textual information – Polytechnic Bulletin, 2019, 1 (45), Series: Intelligence, Innovation, Investment, pp. 7-10.
  5. Kayumov M.M. On the digital portrait of textual information based on the frequency of punctuation marks – Polytechnic Bulletin, 2019, 1 (45), Series: Intelligence, Innovation, Investment, p.20-23.
  6. Kosimov A.A., Bakhteev K.S. On recognition of the author of a text fragment // News of the Academy of Sciences of the Republic of Tajikistan. Department of Physical, Mathematical, Chemical, Geological and Technical Sciences, 2019.
  7. Kosimov A.A., Bakhteev K.S. The use of a specific digital portrait to identify authors of works // News of the Academy of Sciences of the Republic of Tajikistan. Department of Physical, Mathematical, Chemical, Geological and Technical Sciences, 2019.
  8. Usmanov Z.D. About one digital portrait of the text and its application – Polytechnic Bulletin, 2019, 3 (47). Series: intelligence, innovation, investment.
  9. Usmanov Z.D. Classifier of discrete random variables / Usmanov Z.D. // Reports of the Academy of Sciences of the Republic of Tajikistan. – 2017. – T.60 – No. 7-8 – S. 291-300.
  10. Usmanov Z.D., Kosimov A.A. Digital Image of “Shahnameh” (“Books of Kings”) A. Firdausi – Reports of the Academy of Sciences of the Republic of Tajikistan, 2014, vol. 57, No. 6, p. 471-476.
  11. Usmanov Z.D., Kosimov A.A. On the applicability of the γ-classifier to the recognition of authorship and themes of works of art // Materials of the twenty-second scientific-practical seminar “New information technologies in automated systems”, Moscow, 2019, p. 174-178.
  12. Usmanov Z.D., Kosimov A.A. On the issue of automatic recognition of authorship and styles of works of Tajik-Persian fiction // Reports of the Academy of Sciences of the Republic of Tajikistan, 2019.
  13. Usmanov Z.D., Kosimov A.A. On the recognition of authorship of the Tajik text – Reports of the Academy of Sciences of the Republic of Tajikistan, 2016, vol. 59, No. 3-4, p. 114-119.
  14. Usmanov Z.D., Kosimov A.A. The frequency of bigrams in Tajik literature – Reports of the Academy of Sciences of the Republic of Tajikistan, 2016, vol. 59, No. 1-2, p. 28-32.
  15. Usmanov Z.D., Kosimov A.A. The frequency of letters of Tajik literature – Reports of the Academy of Sciences of the Republic of Tajikistan, 2015, vol. 58, No. 2, p. 112-115.

Usmanov Z.D., Soliev O.M. The problem of the layout of characters on a computer keyboard. – Dushanbe: Irfon, 2010, 104 p.