{"id":1966,"date":"2023-09-21T14:03:21","date_gmt":"2023-09-21T09:03:21","guid":{"rendered":"http:\/\/vestnik.polytech.tj\/?p=1966"},"modified":"2023-09-21T14:41:04","modified_gmt":"2023-09-21T09:41:04","slug":"tajik-russian-parallel-corpus-development-and-description","status":"publish","type":"post","link":"https:\/\/vestnik.polytech.tj\/?p=1966&lang=en","title":{"rendered":"TAJIK-RUSSIAN PARALLEL CORPUS: DEVELOPMENT AND DESCRIPTION"},"content":{"rendered":"<p><!--vcv no format--><!-- vcwb\/dynamicElementComment:6c840908 --><\/p>\n<div class=\"vce-row-container\" data-vce-boxed-width=\"true\">\n<div class=\"vce-row vce-row--col-gap-30 vce-row-equal-height vce-row-content--top\" id=\"el-6c840908\" data-vce-do-apply=\"all el-6c840908\">\n<div class=\"vce-row-content\" data-vce-element-content=\"true\"><!-- vcwb\/dynamicElementComment:72c02f4a --><\/p>\n<div class=\"vce-col vce-col--md-78p vce-col--xs-1 vce-col--xs-last vce-col--xs-first vce-col--sm-last vce-col--sm-first vce-col--md-first vce-col--lg-first vce-col--xl-first\" id=\"el-72c02f4a\">\n<div class=\"vce-col-inner\" data-vce-do-apply=\"border margin background  el-72c02f4a\">\n<div class=\"vce-col-content\" data-vce-element-content=\"true\" data-vce-do-apply=\"padding el-72c02f4a\"><!-- vcwb\/dynamicElementComment:1571ec1d --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-1571ec1d\" data-vce-do-apply=\"all el-1571ec1d\">\n<p><strong><span style=\"font-size: 14pt;\">Authors<\/span><\/strong><\/p>\n<p><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Khudoyberdiev H.A. <\/strong>\u2013 <em>Candidate of Physical and Mathematical Sciences, <\/em><em>Associate professor, Department<\/em> <em>of programming and information technologies, <\/em><em>Polytechnic Institute of Tajik Technical University.<\/em><\/p>\n<p><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Nazarov A.A. \u2013 <\/strong><em>Senior teacher, Chair of Programming and Information Technologies <\/em><em>Department<\/em><em>, Polytechnic Institute of Tajik Technical University.<\/em><\/p>\n<p><span style=\"font-size: 14pt;\"><strong>Annotation<\/strong><\/span><\/p>\n<p><em>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;The first stage of development of the Tajik-Russian parallel building for machine translation of text from the Tajik language into Russian is presented. The general structure of the corpus, the structure of text data, algorithms, as well as the automatic control of the corpus using the author&#8217;s program Taj-Rus-Corp are considered. The analysis of the tasks of the development of a parallel corpus: the selection of the correct texts; preprocessing; text source analysis; text comparison; creation of data processing algorithms; creating a Taj-Rus-Corp program with text search capabilities; input of ready texts in the parallel case; statistical data analysis; creation of experimental modules of machine translation are completed. In the end, the author concludes that the development of a parallel corpus in the future will facilitate machine translation of text from Tajik into Russian languages.<\/em><\/p>\n<p><span style=\"font-size: 14pt;\"><strong>Key words<\/strong><\/span><\/p>\n<p><em>&nbsp; &nbsp; &nbsp; &nbsp; Tajik language, Russian language, parallel building, text analysis, software, database, machine translation.<\/em><\/p>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:1571ec1d --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:72c02f4a --><!-- vcwb\/dynamicElementComment:ad08b0f9 --><\/p>\n<div class=\"vce-col vce-col--md-22p vce-col--xs-1 vce-col--xs-last vce-col--xs-first vce-col--sm-last vce-col--sm-first vce-col--md-last vce-col--lg-last vce-col--xl-last\" id=\"el-ad08b0f9\">\n<div class=\"vce-col-inner\" data-vce-do-apply=\"border margin background  el-ad08b0f9\">\n<div class=\"vce-col-content\" data-vce-element-content=\"true\" data-vce-do-apply=\"padding el-ad08b0f9\"><!-- vcwb\/dynamicElementComment:30b7bb4b --><!-- \/vcwb\/dynamicElementComment:30b7bb4b --><!-- vcwb\/dynamicElementComment:921da9cb --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-921da9cb\" data-vce-do-apply=\"all el-921da9cb\">\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 100%; border-style: solid; border-color: #63c6f7;\">\n<p style=\"line-height: 1;\"><span style=\"font-size: 14pt;\">Language&nbsp;<\/span><\/p>\n<p style=\"line-height: 1;\"><span style=\"font-weight: 400; font-style: normal;\">english<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:921da9cb --><!-- vcwb\/dynamicElementComment:30b7bb4b --><!-- \/vcwb\/dynamicElementComment:30b7bb4b --><!-- vcwb\/dynamicElementComment:b6427d6b --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-b6427d6b\" data-vce-do-apply=\"all el-b6427d6b\">\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 100%; border-style: solid; border-color: #63c6f7;\">\n<p style=\"line-height: 1;\"><span style=\"font-size: 14pt;\">Type<\/span><\/p>\n<p style=\"line-height: 1;\"><span style=\"font-weight: 400; font-style: normal;\">technical<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:b6427d6b --><!-- vcwb\/dynamicElementComment:30b7bb4b --><!-- \/vcwb\/dynamicElementComment:30b7bb4b --><!-- vcwb\/dynamicElementComment:ace3b6ef --><!-- \/vcwb\/dynamicElementComment:ace3b6ef --><!-- vcwb\/dynamicElementComment:30b7bb4b --><!-- \/vcwb\/dynamicElementComment:30b7bb4b --><!-- vcwb\/dynamicElementComment:a90045f0 --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-a90045f0\" data-vce-do-apply=\"all el-a90045f0\">\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 100%; border-style: solid; border-color: #63c6f7;\">\n<p style=\"line-height: 1;\"><span style=\"font-size: 18.6667px;\">Year<\/span><\/p>\n<p style=\"line-height: 1;\"><span style=\"font-weight: 400; font-style: normal;\">2019<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:a90045f0 --><!-- vcwb\/dynamicElementComment:ace3b6ef --><!-- \/vcwb\/dynamicElementComment:ace3b6ef --><!-- vcwb\/dynamicElementComment:30b7bb4b --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-30b7bb4b\" data-vce-do-apply=\"all el-30b7bb4b\">\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 100%; border-style: solid; border-color: #63c6f7;\">\n<p style=\"line-height: 1;\"><span style=\"font-size: 14pt;\">Page<\/span><\/p>\n<p style=\"line-height: 1;\"><span style=\"font-weight: 400;\">12<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:30b7bb4b --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:ad08b0f9 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:6c840908 --><!-- vcwb\/dynamicElementComment:549297f0 --><\/p>\n<div class=\"vce-row-container\" data-vce-boxed-width=\"true\">\n<div class=\"vce-row vce-row--col-gap-30 vce-row-equal-height vce-row-content--top\" id=\"el-549297f0\" data-vce-do-apply=\"all el-549297f0\">\n<div class=\"vce-row-content\" data-vce-element-content=\"true\"><!-- vcwb\/dynamicElementComment:5d094d66 --><\/p>\n<div class=\"vce-col vce-col--md-auto vce-col--xs-1 vce-col--xs-last vce-col--xs-first vce-col--sm-last vce-col--sm-first vce-col--md-last vce-col--lg-last vce-col--xl-last vce-col--md-first vce-col--lg-first vce-col--xl-first\" id=\"el-5d094d66\">\n<div class=\"vce-col-inner\" data-vce-do-apply=\"border margin background  el-5d094d66\">\n<div class=\"vce-col-content\" data-vce-element-content=\"true\" data-vce-do-apply=\"padding el-5d094d66\"><!-- vcwb\/dynamicElementComment:69e2a836 --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-69e2a836\" data-vce-do-apply=\"all el-69e2a836\">\n<p><span style=\"font-size: 14pt;\"><strong><em>References<\/em><\/strong><\/span><\/p>\n<ol>\n<li><em> Rastorgueva V.S. Essays on the Tajik dialectology. &#8212; Stalin-bad: Publishing house Acad. Sciences of the Tajik SSR, 1956. \u2013 80 p.<\/em><\/li>\n<li><em> Zakharov V.P. Corpus linguistics. \u2013 SPb: SPbU. \u2013 2005.<\/em><\/li>\n<li><em> Usmanov Z.D. On the ordered alphabetical coding of words of natural languages, Reports of the Academy of Sciences of the Republic of Tajikistan, 2012. v. 55, \u2116 7, P. 545 \u2013 548.<\/em><\/li>\n<li><em> Khudoyberdiev Kh.A. On automatic conversion of Tajik text to standard graphics. Reports of the Academy of Sciences of the Republic of Tajikistan, 2014. v. 57, \u2116 3. P. 210 \u2013 214.<\/em><\/li>\n<li><em> Usmanov Z.D., Dovudov G.M. 2015. Morphological analysis of word forms of the Tajik language (monograph). Dushanbe, \u201cDonish\u201d, 2015. \u2013 130 \u0440.<\/em><\/li>\n<li><em> Khudoyberdiev Kh.A., Soliev O.M. Linguistic Thesaurus of TaJik Language. New information technologies in automated systems. MIEM HSE. Moscow, 2017. \u2013 \u0420. 103 \u2013 106.<\/em><\/li>\n<li><em>Khudoyberdiev Kh.A., Rakhmonov Z.A. Logical structure and analysis of machine translation artifacts. Herald KPITTU M.S. Osimi, \u2116 2 (7), Khujand, 2018. \u2013 P. 7 \u2013 11.<\/em><\/li>\n<\/ol>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:69e2a836 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:5d094d66 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:549297f0 --><!-- vcwb\/dynamicElementComment:1e7f0180 --><\/p>\n<div class=\"vce-row-container\" data-vce-boxed-width=\"true\">\n<div class=\"vce-row vce-row--col-gap-30 vce-row-equal-height vce-row-content--top\" id=\"el-1e7f0180\" data-vce-do-apply=\"all el-1e7f0180\">\n<div class=\"vce-row-content\" data-vce-element-content=\"true\"><!-- vcwb\/dynamicElementComment:d7005e8a --><\/p>\n<div class=\"vce-col vce-col--md-auto vce-col--xs-1 vce-col--xs-last vce-col--xs-first vce-col--sm-last vce-col--sm-first vce-col--md-last vce-col--lg-last vce-col--xl-last vce-col--md-first vce-col--lg-first vce-col--xl-first\" id=\"el-d7005e8a\">\n<div class=\"vce-col-inner\" data-vce-do-apply=\"border margin background  el-d7005e8a\">\n<div class=\"vce-col-content\" data-vce-element-content=\"true\" data-vce-do-apply=\"padding el-d7005e8a\"><!-- vcwb\/dynamicElementComment:144bac53 --><\/p>\n<div class=\"vce-text-block\">\n<div class=\"vce-text-block-wrapper vce\" id=\"el-144bac53\" data-vce-do-apply=\"all el-144bac53\">\n<p><span style=\"font-size: 14pt;\"><strong>Date publication<\/strong><\/span><\/p>\n<p>09\/21\/2023<\/p>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:144bac53 --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:d7005e8a --><\/div>\n<\/div>\n<\/div>\n<p><!-- \/vcwb\/dynamicElementComment:1e7f0180 --><!--vcv no format--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Authors &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Khudoyberdiev H.A. \u2013 Candidate of Physical and Mathematical Sciences, Associate professor, Department of programming and information technologies, Polytechnic Institute of Tajik Technical University. &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Nazarov A.A. \u2013 Senior teacher, Chair of Programming and Information Technologies Department, Polytechnic Institute of Tajik Technical University. Annotation &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;The first stage of development of the Tajik-Russian parallel building for machine translation of text from the Tajik language into Russian is presented. The general structure of the&hellip;<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143],"tags":[175],"class_list":["post-1966","post","type-post","status-publish","format-standard","hentry","category-bulletin-of-pittu-2019","tag-bulletin-of-pittu-2019-1"],"acf":[],"featured_image_src":null,"author_info":{"display_name":"Ilhomjon Qodirov","author_link":"https:\/\/vestnik.polytech.tj\/?author=3"},"_links":{"self":[{"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/posts\/1966","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1966"}],"version-history":[{"count":3,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/posts\/1966\/revisions"}],"predecessor-version":[{"id":1975,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=\/wp\/v2\/posts\/1966\/revisions\/1975"}],"wp:attachment":[{"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1966"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1966"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vestnik.polytech.tj\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1966"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}