{"id":3143,"date":"2023-10-19T10:42:37","date_gmt":"2023-10-19T05:42:37","guid":{"rendered":"http:\/\/vestnik.polytech.tj\/?p=3143"},"modified":"2023-10-19T10:45:12","modified_gmt":"2023-10-19T05:45:12","slug":"modern-text-classification-methods-based-on-machine-learning-algorithms","status":"publish","type":"post","link":"https:\/\/vestnik.polytech.tj\/?p=3143&lang=en","title":{"rendered":"MODERN TEXT CLASSIFICATION METHODS BASED ON MACHINE  LEARNING ALGORITHMS"},"content":{"rendered":"<p><!--vcv no format--><!-- vcwb\/dynamicElementComment:cffa340e --><\/p>\n<div class=\"vce-row-container\" data-vce-boxed-width=\"true\">\n<div class=\"vce-row vce-row--col-gap-30 vce-row-equal-height vce-row-content--top\" id=\"el-cffa340e\" data-vce-do-apply=\"all el-cffa340e\">\n<div class=\"vce-row-content\" data-vce-element-content=\"true\"><!-- vcwb\/dynamicElementComment:ac597629 --><\/p>\n<div class=\"vce-col vce-col--md-78p vce-col--xs-1 vce-col--xs-last vce-col--xs-first vce-col--sm-last vce-col--sm-first vce-col--md-first vce-col--lg-first vce-col--xl-first\" id=\"el-ac597629\">\n<div class=\"vce-col-inner\" data-vce-do-apply=\"border margin background  el-ac597629\">\n<div class=\"vce-col-content\" data-vce-element-content=\"true\" data-vce-do-apply=\"padding el-ac597629\"><!-- vcwb\/dynamicElementComment:f8072807 --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-f8072807\" data-vce-do-apply=\"all el-f8072807\">\n<p><strong><span style=\"font-size: 14pt;\">Authors<\/span><\/strong><\/p>\n<p><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<\/strong><strong>Nizamitdinov A.I. &#8212; <\/strong><em>Philosophy Doctor(PhD) in Statistics, The lecturer of Digital Economy <\/em><em>department, Polytechnic institute of Tajik technical University, Khujand, <\/em><em>Republic of Tajikistan,<\/em><a href=\"mailto:ahlidin@gmail.com\"><em>ahlidin@gmail.com<\/em><\/a><\/p>\n<p><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Jaborova Sh.A.<\/strong> &#8212; <em>The 4<sup>th<\/sup> year student of specialty technology and information systems in <\/em><em>economy, Polytechnic institute of Tajik technical university, Khujand, <\/em><em>Republic of Tajikistan,<\/em><a href=\"mailto:jaborova232000@gmail.com\"><em>jaborova232000@gmail.com<\/em><\/a><\/p>\n<p><span style=\"font-size: 14pt;\"><strong>Annotation<\/strong><\/span><\/p>\n<p><em>&nbsp; &nbsp; &nbsp; &nbsp; The article deals with the application of machine learning in text analysis tasks. Particular attention is paid to those approaches that can be effectively used to extract information from natural language text. Various stages and levels of text analysis and the possibility of using machine learning methods on each of them are considered. When solving text classification problems, machine learning algorithms are used, such as logistic regression, K-nearest neighbors, decision trees, Random Forest, boosting algorithms (CatBoost, XGBoost), Linear Discriminants and Neural Networks. It is concluded that the solution of text classification problems based on these algorithms has a fairly high score compared to other approaches to classification problems. To evaluate the efficiency of the algorithms used, used an Confusion Matrix, which shows the accuracy of model prediction and the degree of errors in classification problems. The machine learning procedure demonstrated an efficiency of about 60-86% in the analysis of parts of speech in sentences of various thematic orientations (using the example of the Russian language) using data from the information site lenta.ru.<\/em><\/p>\n<p><span style=\"font-size: 14pt;\"><strong><em>Key words<\/em><\/strong><\/span><\/p>\n<p><em>&nbsp; algorithm, machine learning, text classification, data processing, text analysis, text preprocessing<\/em><\/p>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:f8072807 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:ac597629 --><!-- vcwb\/dynamicElementComment:affa17a6 --><\/p>\n<div class=\"vce-col vce-col--md-22p vce-col--xs-1 vce-col--xs-last vce-col--xs-first vce-col--sm-last vce-col--sm-first vce-col--md-last vce-col--lg-last vce-col--xl-last\" id=\"el-affa17a6\">\n<div class=\"vce-col-inner\" data-vce-do-apply=\"border margin background  el-affa17a6\">\n<div class=\"vce-col-content\" data-vce-element-content=\"true\" data-vce-do-apply=\"padding el-affa17a6\"><!-- vcwb\/dynamicElementComment:ad6def1d --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-ad6def1d\" data-vce-do-apply=\"all el-ad6def1d\">\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">\n<p style=\"line-height: 1;\">Language<\/p>\n<p style=\"line-height: 1;\"><span style=\"font-weight: 400; font-style: normal;\">english<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:ad6def1d --><!-- vcwb\/dynamicElementComment:2a874f9f --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-2a874f9f\" data-vce-do-apply=\"all el-2a874f9f\">\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">\n<p style=\"line-height: 1;\">Type<\/p>\n<p style=\"line-height: 1;\"><span style=\"font-weight: 400; font-style: normal;\">technical<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:2a874f9f --><!-- vcwb\/dynamicElementComment:dd4929a8 --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-dd4929a8\" data-vce-do-apply=\"all el-dd4929a8\">\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">\n<p style=\"line-height: 1;\">Year<\/p>\n<p style=\"line-height: 1;\"><span style=\"font-weight: 400; font-style: normal;\">2023<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:dd4929a8 --><!-- vcwb\/dynamicElementComment:a4548e92 --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-a4548e92\" data-vce-do-apply=\"all el-a4548e92\">\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">\n<p style=\"line-height: 1;\">Page<\/p>\n<p style=\"line-height: 1;\"><span style=\"font-weight: 400;\">31<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:a4548e92 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:affa17a6 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:cffa340e --><!-- vcwb\/dynamicElementComment:ef87f1ff --><\/p>\n<div class=\"vce-row-container\" data-vce-boxed-width=\"true\">\n<div class=\"vce-row vce-row--col-gap-30 vce-row-equal-height vce-row-content--top\" id=\"el-ef87f1ff\" data-vce-do-apply=\"all el-ef87f1ff\">\n<div class=\"vce-row-content\" data-vce-element-content=\"true\"><!-- vcwb\/dynamicElementComment:19b25799 --><\/p>\n<div class=\"vce-col vce-col--md-auto vce-col--xs-1 vce-col--xs-last vce-col--xs-first vce-col--sm-last vce-col--sm-first vce-col--md-last vce-col--lg-last vce-col--xl-last vce-col--md-first vce-col--lg-first vce-col--xl-first\" id=\"el-19b25799\">\n<div class=\"vce-col-inner\" data-vce-do-apply=\"border margin background  el-19b25799\">\n<div class=\"vce-col-content\" data-vce-element-content=\"true\" data-vce-do-apply=\"padding el-19b25799\"><!-- vcwb\/dynamicElementComment:580b4ef2 --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-580b4ef2\" data-vce-do-apply=\"all el-580b4ef2\">\n<p><span style=\"font-size: 14pt;\"><strong>References<\/strong><\/span><\/p>\n<ol>\n<li style=\"list-style-type: none;\">\n<ol>\n<li><em>Aggarwal C. and Zhai C. A survey of text classification algorithms. 2012. Springer, p.163\u2014222.<\/em><\/li>\n<li><em>Artificial intelligence, machine learning and deep learning: [<\/em><em>\u042d\u043b\u0435\u043a\u0442\u0440\u043e\u043d\u043d\u044b\u0439<\/em> <em>\u0440\u0435\u0441\u0443\u0440\u0441<\/em><em>]. URL: https:\/\/velog.io\/@gabie0208\/1.1-Artificial-intelligence-machine-learning-and-deep-learning.<\/em><\/li>\n<li><em>Benjamin Bengfort, Rebecca Bilbro, Tony Ojeda.(2018). Applied Text Analysis with Python. O&#8217;Reilly Media, Inc., pp.368.<\/em><\/li>\n<li><em>Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. An introduction to statistical learning: with applications in R. New York: 2013. Springer.<\/em><\/li>\n<li><em>Korde V. and Mahender C. Text classification and classifiers: A survey. International Journal of Artificial Intelligence &amp; Applications (IJAIA), 2012. 3 (2), P. 85\u201499.<\/em><\/li>\n<li><em>Niharika S., Latha V. and Lavanya, D. A Survey on Text Categorization. International Journal of Computer Trends and Technology, 2012. volume 3, Issue 1.<\/em><\/li>\n<li><em>NLP Tutorial for Text Classification in Python: [\u042d\u043b\u0435\u043a\u0442\u0440\u043e\u043d\u043d\u044b\u0439 \u0440\u0435\u0441\u0443\u0440\u0441]. URL: https:\/\/medium.com\/analytics-vidhya\/nlp-tutorial-for-text-classification-in-python-8f19cd17b49e.<\/em><\/li>\n<li><em>Wilcox A. and Hripcsak G. Classification algorithms applied to narrative reports. 1999. p. 455.<\/em><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:580b4ef2 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:19b25799 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:ef87f1ff --><!-- vcwb\/dynamicElementComment:d8d58525 --><\/p>\n<div class=\"vce-row-container\" data-vce-boxed-width=\"true\">\n<div class=\"vce-row vce-row--col-gap-30 vce-row-equal-height vce-row-content--top\" id=\"el-d8d58525\" data-vce-do-apply=\"all el-d8d58525\">\n<div class=\"vce-row-content\" data-vce-element-content=\"true\"><!-- vcwb\/dynamicElementComment:b303e553 --><\/p>\n<div class=\"vce-col vce-col--md-auto vce-col--xs-1 vce-col--xs-last vce-col--xs-first vce-col--sm-last vce-col--sm-first vce-col--md-last vce-col--lg-last vce-col--xl-last vce-col--md-first vce-col--lg-first vce-col--xl-first\" id=\"el-b303e553\">\n<div class=\"vce-col-inner\" data-vce-do-apply=\"border margin background  el-b303e553\">\n<div class=\"vce-col-content\" data-vce-element-content=\"true\" data-vce-do-apply=\"padding el-b303e553\"><!-- vcwb\/dynamicElementComment:a5e19941 --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-a5e19941\" data-vce-do-apply=\"all el-a5e19941\">\n<h2><strong><span style=\"font-size: 14pt;\">Publication date<\/span><\/strong><\/h2>\n<p>2023-10-19<\/p>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:a5e19941 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:b303e553 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:d8d58525 --><!--vcv no format--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Authors &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Nizamitdinov A.I. &#8212; Philosophy Doctor(PhD) in Statistics, The lecturer of Digital Economy department, Polytechnic institute of Tajik technical University, Khujand, Republic of Tajikistan,ahlidin@gmail.com &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Jaborova Sh.A. &#8212; The 4th year student of specialty technology and information systems in economy, Polytechnic institute of Tajik technical university, Khujand, Republic of Tajikistan,jaborova232000@gmail.com Annotation &nbsp; &nbsp; &nbsp; &nbsp; The article deals with the application of machine learning in text analysis tasks. Particular attention is paid to those approaches that can be effectively used to extract information&hellip;<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[161],"tags":[372],"class_list":["post-3143","post","type-post","status-publish","format-standard","hentry","category-bulletin-of-pittu-2022","tag-bulletin-of-pittu-2022-1"],"acf":[],"featured_image_src":null,"author_info":{"display_name":"Ilhomjon Qodirov","author_link":"https:\/\/vestnik.polytech.tj\/?author=3"},"_links":{"self":[{"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/posts\/3143","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3143"}],"version-history":[{"count":3,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/posts\/3143\/revisions"}],"predecessor-version":[{"id":3146,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/posts\/3143\/revisions\/3146"}],"wp:attachment":[{"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3143"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3143"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3143"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}