A Method for Generating Synthetic Data based on Genetic Algorithms
Scientific Proceedings of Vanadzor State University Natural and Exact Sciences (ISSN 2738-2923)
2024 vol 1
A Method for Generating Synthetic Data based on Genetic Algorithms for Modeling Credit Risk
Garnik Arakelyan
Summary
Key words: logistic regression, kNN, genetic algorithm, mutation, data grouping, correlation
Any company, including banks and credit organizations, operates in an unstable environment and may incur significant losses without having complete information about it. One of the main sources of such losses is credit risk, for the management of which various mathematical models are created. However, modeling often faces challenges related to the lack of a sufficient number of observations.
Within the research topic, studies by other researchers have been examined. In this work, an attempt was made to create synthetic data based on a small number of real credit observations, which can be used to create machine learning models that require a large dataset.
To generate synthetic data, the logic of genetic algorithms, the concepts of Darwin’s theory of evolution, as well as machine learning methods that do not require a large amount of data were used. The quality of the generated data was assessed using statistical methods.
The results obtained are practically applicable and demonstrate that any bank or credit organization can develop a high-quality solution for managing credit risks even with a small amount of available data.
DOI: https://di.org/10.58726/27382923-ne2024.1-8