DATASET — THE BASIS OF ANALYSIS AND NEURAL NETWORK TRAINING

Authors

Ashurova Shabnam NurulloevnaSenior Lecturer at the Department of Programming and Information Systems, Polytechnic Institute of Tajik Technical University named after аcademician M.S. Osimi, Khujand, Republic of Tajikistan, sh.nurulloevna@gmail.com
Solieva Mehrangez Tolibovna PhD doctoral student, Polytechnic Institute оf the Tajik Technical University named after academician M.S. Osimi, Khujand, Republic of Tajikistan, smehrangez92@gmail.com

Abstract

This article discusses the concept of a “dataset,” the main types of datasets, how they are used, free data sources, methods of collection, and their application. It also examines issues related to the study and use of datasets in the field of data analysis and automation, as well as their role in the formation of artificial intelligence systems and machine learning models. In this article, a dataset is defined as a collection of structured and processed data. The article describes in detail the main types of datasets—simple records, graph structures, and ordered data sets—and analyzes methods for collecting them (manual and automated) and the stages of data processing—from error cleaning to dividing information into training, test, and verification samples. The importance of data volume and quality for improving the performance of neural network training is emphasized. In modern conditions, the growth of multimodal datasets — combining text, image, audio, and numeric data — has become a key trend in the development of digital technologies and artificial intelligence. Understanding the structure, properties, and methods of creating such data collections has significant practical and theoretical value for analysts, developers, and AI engineers. Knowledge of the structure, properties, and methods of forming such datasets is of great practical and theoretical importance for analysts, programmers, and AI engineers. The article is useful for researchers, analysts, and specialists, as the methods and principles of working with datasets and training neural networks presented therein enable effective analysis and modeling.

Keywords

dataset, data graph, open data, predictive model, neural networks, neural systems, information.

References

1. Ashurzoda B.Kh. Issues of Joint Speech Recognition and Key Word Search / B.Kh. Ashurzoda // Message of the National University of Tajikistan. Section of Natural Sciences. – Dushanbe. – 2018. № 2 (33). – P. 53-57.

2. Dataset for machine learning and data analysis: what it is, types, where to get datasets. URL: https://practicum.yandex.ru/blog/dataset-dlya-mashinnogo-obucheniya-i-analiza/? ysclid =mhadrioint 775746258 (access date: 28.08.2025).

3. Dataset for machine learning. URL: https://practicum.yandex.ru/blog/dataset-dlya-mashinnogo-obucheniya-i-analiza/?ysclid=mhadrioint775746258 (access date: 28.08.2025).

4. Dataset: types, applications, collection of the best. URL: https://gb.ru/blog/dataset/?ysclid=mhaiigcejg 831141712 (access date: 25.10.2025).

5. Jalolov T. A. “Data analysis and building predictive models.” Proceedings of the International Conference on Information Technologies, 2024, pp. 12–18.

6. Kaggle Datasets. URL: https://www.kaggle.com/datasets (access date: 28.08.2025).

7. Khudoyberdiev H.A. Amplification of the Speech Recognition Process Based on the Tajik Language / H.A. Khudoyberdiev, B.Kh. Ashurzoda // Polytechnic Bulletin. Series: Intelligence. Innovations. Investments. – 2022. – No. 2(58). – P. 39-42. – EDN VNMJGH.

8. Khudoyberdiev Kh. A. Design and software implementation of automatic transliteration in a digital library / Kh. A. Khudoiberdiev, M. P. Muzaffarov, F. E. Mirzozoda // Bulletin of PITTU named after academician M.S. Oshimi. – 2022. – No. 1(22). – P. 7-15.

9. Nazarov A. I. “Using datasets for neural network training.” Dushanbe: National University of Tajikistan, 2022, pp. 33–41.

10. Rakhimzoda S. M. “Data collections and their application in data analysis.” Journal of Tajik Technical University, 2023, No. 2, pp. 45–52.

11. World Bank Open Data. URL: https://data.worldbank.org (access date: 28.08.2025).


 


Publish date

2026-03-31