{"id":3045,"date":"2023-10-11T15:49:29","date_gmt":"2023-10-11T10:49:29","guid":{"rendered":"http:\/\/vestnik.polytech.tj\/?p=3045"},"modified":"2023-10-11T15:53:58","modified_gmt":"2023-10-11T10:53:58","slug":"development-of-the-tajik-speech-corpora-for-solving-some-problems-of-computer-linguistics","status":"publish","type":"post","link":"https:\/\/vestnik.polytech.tj\/?p=3045&lang=en","title":{"rendered":"DEVELOPMENT OF THE TAJIK SPEECH CORPORA FOR SOLVING  SOME PROBLEMS OF COMPUTER LINGUISTICS"},"content":{"rendered":"<p><!--vcv no format--><!-- vcwb\/dynamicElementComment:8a5f924d --><!-- \/vcwb\/dynamicElementComment:8a5f924d --><!-- vcwb\/dynamicElementComment:abf58696 --><!-- \/vcwb\/dynamicElementComment:abf58696 --><!-- vcwb\/dynamicElementComment:c136c645 --><!-- \/vcwb\/dynamicElementComment:c136c645 --><!-- vcwb\/dynamicElementComment:d34b4b81 --><\/p>\n<div class=\"vce-row-container\" data-vce-boxed-width=\"true\">\n<div class=\"vce-row vce-row--col-gap-30 vce-row-equal-height vce-row-content--top\" id=\"el-d34b4b81\" data-vce-do-apply=\"all el-d34b4b81\">\n<div class=\"vce-row-content\" data-vce-element-content=\"true\"><!-- vcwb\/dynamicElementComment:596d0f77 --><\/p>\n<div class=\"vce-col vce-col--md-78p vce-col--xs-1 vce-col--xs-last vce-col--xs-first vce-col--sm-last vce-col--sm-first vce-col--md-first vce-col--lg-first vce-col--xl-first\" id=\"el-596d0f77\">\n<div class=\"vce-col-inner\" data-vce-do-apply=\"border margin background  el-596d0f77\">\n<div class=\"vce-col-content\" data-vce-element-content=\"true\" data-vce-do-apply=\"padding el-596d0f77\"><!-- vcwb\/dynamicElementComment:9c7ee82e --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-9c7ee82e\" data-vce-do-apply=\"all el-9c7ee82e\">\n<p><strong><span style=\"font-size: 14pt;\">Authors<\/span><\/strong><\/p>\n<p><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <\/strong><strong>Khudoiberdiev H.A. &#8212;<\/strong> <em>Candidate of Physics and Mathematics, Associate Professor, <\/em><em>Department of programming and information systems, Polytechnic Institute of Tajik Technical University, Khujand, Republic of Tajikistan, <\/em><a href=\"mailto:tajlingvo@gmail.com\"><em>tajlingvo@gmail.com<\/em><\/a><\/p>\n<p><strong>&nbsp; &nbsp; &nbsp; &nbsp; Muzafarov D.Z. &#8212;<\/strong> <em>Candidate of Physics and Mathematics, Associate Professor, Department<\/em> <em>of programming, Khujand State University rova, Khujand, Republic of Tajikistan, <\/em><a href=\"mailto:muzafarov.dilshod@gmail.com\"><em>muzafarov.dilshod@gmail.com<\/em><\/a><\/p>\n<p><strong>&nbsp; &nbsp; &nbsp; Ashurova Sh.N.<\/strong> <em>\u2013 Senior Lecturer, Department of Programming and information systems, Polytechnic Institute of Tajik Technical University, Khujand, Republic of Tajikistan, <\/em><a href=\"mailto:shnurulloevna@gmail.com\"><em>shnurulloevna@gmail.com<\/em><\/a><\/p>\n<p><span style=\"font-size: 14pt;\"><strong>Annotation<\/strong><\/span><\/p>\n<p><em>&nbsp; &nbsp; &nbsp; &nbsp; <\/em><em>The article proposes a scientific concept and stages of planning the devel-opment of the Tajik speech corpus. The purpose of creating such a corpus is to solve important problems of computational linguistics related to voice control, synthesis and speech recognition. The authors note the insufficient elaboration of these issues for the Tajik language in contrast to English and Russian. The main proposed methods include automatic processing of text elements, preliminary analysis of audio data, formation of a corpus database. It is planned to create a corpus with a volume of 1000 hours of speech recordings obtained from different speakers, tak-ing into account age and gender. Further, based on the corpus, software modules will be devel-oped for its processing, including modules for voice control of computer tools and automatic synthesis and speech recognition. The proposed approaches are based on modern methods of mathematical modeling, data analysis and artificial intelligence technologies. The research re-sults can find wide application in scientific research, education and industry of the Republic of Tajikistan. It is noted that the implementation of the proposed approach will allow solving im-portant problems of processing Tajik speech, such as voice control, automatic synthesis and recognition. The developed corpus can serve as a fundamental basis for research and develop-ment in the field of computational linguistics in relation to the Tajik language.<\/em><\/p>\n<p><span style=\"font-size: 14pt;\"><strong><em>Key words<\/em><\/strong><\/span><\/p>\n<p><em> Tajik language, text corpus, speech corpus, speech data analysis, speech technologies, speech recognition.<\/em><\/p>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:9c7ee82e --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:596d0f77 --><!-- vcwb\/dynamicElementComment:61705926 --><\/p>\n<div class=\"vce-col vce-col--md-22p vce-col--xs-1 vce-col--xs-last vce-col--xs-first vce-col--sm-last vce-col--sm-first vce-col--md-last vce-col--lg-last vce-col--xl-last\" id=\"el-61705926\">\n<div class=\"vce-col-inner\" data-vce-do-apply=\"border margin background  el-61705926\">\n<div class=\"vce-col-content\" data-vce-element-content=\"true\" data-vce-do-apply=\"padding el-61705926\"><!-- vcwb\/dynamicElementComment:ad359293 --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-ad359293\" data-vce-do-apply=\"all el-ad359293\">\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">\n<p style=\"line-height: 1;\">Language<\/p>\n<p style=\"line-height: 1;\"><span style=\"font-weight: 400; font-style: normal;\">english<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:ad359293 --><!-- vcwb\/dynamicElementComment:0febd15d --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-0febd15d\" data-vce-do-apply=\"all el-0febd15d\">\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">\n<p style=\"line-height: 1;\">Type<\/p>\n<p style=\"line-height: 1;\"><span style=\"font-weight: 400; font-style: normal;\">technical<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:0febd15d --><!-- vcwb\/dynamicElementComment:d62b94c7 --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-d62b94c7\" data-vce-do-apply=\"all el-d62b94c7\">\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">\n<p style=\"line-height: 1;\">Year<\/p>\n<p style=\"line-height: 1;\"><span style=\"font-weight: 400; font-style: normal;\">2023<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:d62b94c7 --><!-- vcwb\/dynamicElementComment:dffa5a14 --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-dffa5a14\" data-vce-do-apply=\"all el-dffa5a14\">\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">\n<p style=\"line-height: 1;\">Page<\/p>\n<p style=\"line-height: 1;\"><span style=\"font-weight: 400;\">14<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:dffa5a14 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:61705926 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:d34b4b81 --><!-- vcwb\/dynamicElementComment:89b21327 --><\/p>\n<div class=\"vce-row-container\" data-vce-boxed-width=\"true\">\n<div class=\"vce-row vce-row--col-gap-30 vce-row-equal-height vce-row-content--top\" id=\"el-89b21327\" data-vce-do-apply=\"all el-89b21327\">\n<div class=\"vce-row-content\" data-vce-element-content=\"true\"><!-- vcwb\/dynamicElementComment:4b0de5bb --><\/p>\n<div class=\"vce-col vce-col--md-auto vce-col--xs-1 vce-col--xs-last vce-col--xs-first vce-col--sm-last vce-col--sm-first vce-col--md-last vce-col--lg-last vce-col--xl-last vce-col--md-first vce-col--lg-first vce-col--xl-first\" id=\"el-4b0de5bb\">\n<div class=\"vce-col-inner\" data-vce-do-apply=\"border margin background  el-4b0de5bb\">\n<div class=\"vce-col-content\" data-vce-element-content=\"true\" data-vce-do-apply=\"padding el-4b0de5bb\"><!-- vcwb\/dynamicElementComment:d02b3ef5 --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-d02b3ef5\" data-vce-do-apply=\"all el-d02b3ef5\">\n<p><span style=\"font-size: 14pt;\"><strong>References<\/strong><\/span><\/p>\n<ol>\n<li><em>Tajik language pack for spell checking in Microsoft Office. Usmanov Z.D., Soliev O.M., Khudoyberdiev Kh.A., Dovudov G.M. \/\/ Patent registered 4201200235 dated 04.10.2012. Research Center of the Ministry of Economic Development and Trade of the Republic of Ta-tarstan.<\/em><\/li>\n<\/ol>\n<ol start=\"2\">\n<li><em>Usmanov Z.D., Dovudov G.M. Formation of the base of morphs of the Tajik language. Monograph. \u2013 Dushanbe: &#171;Donish&#187;, 2014. -110 p.<\/em><\/li>\n<\/ol>\n<ol start=\"4\">\n<li><em>Usmanov Z.D., Khudoiberdiev Kh.A., Experience of computer synthesis of Tajik speech according to the text. Monograph. Technological University of Tajikistan Khujand branch. Monograph. -Dushanbe. &#171;Irfon&#187;, 2010 -145 \u0440.<\/em><\/li>\n<li><em>Usmanov Z.D., Soliev O.M. Keyboard layout problem. Monograph. Technological University of Tajikistan. &#8212; Dushanbe: &#171;Irfon&#187;, 2010. -104 p<\/em>12<\/li>\n<li><em>Khudoiberdiev H.A., Muzafarov D.Z., Ashurova Sh.N.<\/em><em> Development of the tajik speech corpora for solving some problems of computer linguistics<\/em><\/li>\n<li><em>Usmanov Z.D., Soliev O.M., Khudoyberdiev Kh.A., Dovudov G.M. Automatic system TajSpell-2.0. to check the spelling of the Tajik language in the MS Office 2010-2019 office suite.<\/em><\/li>\n<li><em>\u2013 Certificate of state registration of information resource, Republic of Tajikistan. No. 4202000456 dated 07\/30\/2020<\/em><\/li>\n<li><em>Usmonov Z.J., Khudoyberdiev Kh.A. Nizomkhoi hudkori korcardi ma&#8217;lumot bo zaboni tojiki. Monograph. \u2013 Khujand. &#171;Irfon&#187;, 2022. -186 \u0440.<\/em><\/li>\n<li><em>Khudoiberdiev Kh.A. Web-application \u201cAutomatic information processing systems in the Tajik language\u201d www.tajlingvo.tj. \u2013 Certificate of state registration of information resource, Republic of Tajikistan. No. 4202200496 dated 04\/28\/2022.<\/em><\/li>\n<li><em>Khudoiberdiev Kh.A., Soliev O.M., Soliev P.A., Dovudov G.M., Nazarov A.A. Web ap-plication Tajik translator www.tarjumon.tj. \u2013 Certificate of state registration of information re-source, Republic of Tajikistan. No. 4202100482 dated 12\/03\/2021.<\/em><\/li>\n<\/ol>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:d02b3ef5 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:4b0de5bb --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:89b21327 --><!-- vcwb\/dynamicElementComment:bef31103 --><\/p>\n<div class=\"vce-row-container\" data-vce-boxed-width=\"true\">\n<div class=\"vce-row vce-row--col-gap-30 vce-row-equal-height vce-row-content--top\" id=\"el-bef31103\" data-vce-do-apply=\"all el-bef31103\">\n<div class=\"vce-row-content\" data-vce-element-content=\"true\"><!-- vcwb\/dynamicElementComment:d2bd0ec3 --><\/p>\n<div class=\"vce-col vce-col--md-auto vce-col--xs-1 vce-col--xs-last vce-col--xs-first vce-col--sm-last vce-col--sm-first vce-col--md-last vce-col--lg-last vce-col--xl-last vce-col--md-first vce-col--lg-first vce-col--xl-first\" id=\"el-d2bd0ec3\">\n<div class=\"vce-col-inner\" data-vce-do-apply=\"border margin background  el-d2bd0ec3\">\n<div class=\"vce-col-content\" data-vce-element-content=\"true\" data-vce-do-apply=\"padding el-d2bd0ec3\"><!-- vcwb\/dynamicElementComment:dc8ad5a6 --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-dc8ad5a6\" data-vce-do-apply=\"all el-dc8ad5a6\">\n<h2><strong><span style=\"font-size: 14pt;\">Publication date<\/span><\/strong><\/h2>\n<p>2023-10-11<\/p>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:dc8ad5a6 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:d2bd0ec3 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:bef31103 --><!--vcv no format--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Authors &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Khudoiberdiev H.A. &#8212; Candidate of Physics and Mathematics, Associate Professor, Department of programming and information systems, Polytechnic Institute of Tajik Technical University, Khujand, Republic of Tajikistan, tajlingvo@gmail.com &nbsp; &nbsp; &nbsp; &nbsp; Muzafarov D.Z. &#8212; Candidate of Physics and Mathematics, Associate Professor, Department of programming, Khujand State University rova, Khujand, Republic of Tajikistan, muzafarov.dilshod@gmail.com &nbsp; &nbsp; &nbsp; Ashurova Sh.N. \u2013 Senior Lecturer, Department of Programming and information systems, Polytechnic Institute of Tajik Technical University, Khujand, Republic of Tajikistan, shnurulloevna@gmail.com Annotation &nbsp; &nbsp; &nbsp; &nbsp; The&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[406],"tags":[418],"class_list":["post-3045","post","type-post","status-publish","format-standard","hentry","category-bulletin_of_pittu-2023","tag-bulletin-of-pittu-2023-2"],"acf":[],"featured_image_src":null,"author_info":{"display_name":"ilhomjonqodirov02","author_link":"https:\/\/vestnik.polytech.tj\/?author=1"},"_links":{"self":[{"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/posts\/3045","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3045"}],"version-history":[{"count":3,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/posts\/3045\/revisions"}],"predecessor-version":[{"id":3048,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/posts\/3045\/revisions\/3048"}],"wp:attachment":[{"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3045"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3045"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3045"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}