MODERN TEXT CLASSIFICATION METHODS BASED ON MACHINE LEARNING ALGORITHMS

Authors

         Nizamitdinov A.I. – Philosophy Doctor(PhD) in Statistics, The lecturer of Digital Economy department, Polytechnic institute of Tajik technical University, Khujand, Republic of Tajikistan,ahlidin@gmail.com

         Jaborova Sh.A.The 4th year student of specialty technology and information systems in economy, Polytechnic institute of Tajik technical university, Khujand, Republic of Tajikistan,jaborova232000@gmail.com

Annotation

        The article deals with the application of machine learning in text analysis tasks. Particular attention is paid to those approaches that can be effectively used to extract information from natural language text. Various stages and levels of text analysis and the possibility of using machine learning methods on each of them are considered. When solving text classification problems, machine learning algorithms are used, such as logistic regression, K-nearest neighbors, decision trees, Random Forest, boosting algorithms (CatBoost, XGBoost), Linear Discriminants and Neural Networks. It is concluded that the solution of text classification problems based on these algorithms has a fairly high score compared to other approaches to classification problems. To evaluate the efficiency of the algorithms used, used an Confusion Matrix, which shows the accuracy of model prediction and the degree of errors in classification problems. The machine learning procedure demonstrated an efficiency of about 60-86% in the analysis of parts of speech in sentences of various thematic orientations (using the example of the Russian language) using data from the information site lenta.ru.

Key words

  algorithm, machine learning, text classification, data processing, text analysis, text preprocessing

Language

english

Type

technical

Year

2023

Page

31

References

    1. Aggarwal C. and Zhai C. A survey of text classification algorithms. 2012. Springer, p.163—222.
    2. Artificial intelligence, machine learning and deep learning: [Электронный ресурс]. URL: https://velog.io/@gabie0208/1.1-Artificial-intelligence-machine-learning-and-deep-learning.
    3. Benjamin Bengfort, Rebecca Bilbro, Tony Ojeda.(2018). Applied Text Analysis with Python. O’Reilly Media, Inc., pp.368.
    4. Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. An introduction to statistical learning: with applications in R. New York: 2013. Springer.
    5. Korde V. and Mahender C. Text classification and classifiers: A survey. International Journal of Artificial Intelligence & Applications (IJAIA), 2012. 3 (2), P. 85—99.
    6. Niharika S., Latha V. and Lavanya, D. A Survey on Text Categorization. International Journal of Computer Trends and Technology, 2012. volume 3, Issue 1.
    7. NLP Tutorial for Text Classification in Python: [Электронный ресурс]. URL: https://medium.com/analytics-vidhya/nlp-tutorial-for-text-classification-in-python-8f19cd17b49e.
    8. Wilcox A. and Hripcsak G. Classification algorithms applied to narrative reports. 1999. p. 455.

Publication date

2023-10-19