AUTOMATED MACHINE LEARNING INFORMATION SYSTEMS IN THE CREDIT SCORING SERVICE - Студенческий научный форум

XIII Международная студенческая научная конференция Студенческий научный форум - 2021

AUTOMATED MACHINE LEARNING INFORMATION SYSTEMS IN THE CREDIT SCORING SERVICE

Юрчак В.А. 1
1ФГАОУ ВО РУТ(МИИТ)
 Комментарии
Текст работы размещён без изображений и формул.
Полная версия работы доступна во вкладке "Файлы работы" в формате PDF

Intro

Credit scoring is the most popular form of borrower assessment in countries with a developed financial infrastructure and financial market. Scoring is a system for assessing the creditworthiness (credit risks) of a client, which is based on numerical static methods for a variety of indicators. To implement scoring, you need to take these indicators into account and have a clear understanding of which factors are taken into account in the model and which are not.

The method is implemented only by a well-constructed mathematical model that allows you to get not only the result of the assessment, but also to measure it with the potential risk of the client for the bank.

Determining the creditworthiness of the scoring system allows a financial institution to get more advantages than when using the methodology of assessing solvency. Among these advantages, we can highlight an increase in the profitability of banking activities by minimizing default loans, as well as the possibility of expanding the range of credit services by minimizing the risks from the introduction of new products.

The financial system of Russia is developing and quite young, and therefore in the banking sector of the country, the method of studying the client's solvency is more often used, which does not take into account a sufficiently large number of factors in comparison with the credit scoring method. Therefore, at present, it is quite difficult to find a domestic scoring system that works as an AIS.

The reasons for the lack of a scoring system in Russia are the need for a database of credit cases, the availability of research in this area and the construction of sufficiently accurate mathematical models. Since 2005, the database of credit files and information about borrowers with existing or already repaid loans has been replenished by the creation of the National Credit History Bureau (NBKI), which at the time of writing stores more than 400 million records. In February 2013, the NBCI and FICO (USA) proposed a new credit scoring model that will be used to assess possible fraudulent actions of potential borrowers.

AIS for building scoring models

Today, there are already a large number of AIS in the world for building scoring models, the most recent of them use machine learning algorithms and artificial neural networks. The most commonly used AIS are Loginom Scorecard Modeler, HES GiniMachine, Deductor Credit Scorecard Modeler, and SAS Credit Scoring.

Loginom Scorecard Modeler has found application in the development of scoring cards for questionnaire, behavioral and collection scoring. The solution allows you to automate the entire process of creating customer scoring cards, from data preparation to modeling and monitoring. The constructed maps can be easily integrated into the borrower's assessment process based on the Loginom Decision Maker software or any other software solution.

SAS Credit Scoring is a system for assessing the creditworthiness of an individual or legal entity, analyzing the data of a potential borrower and giving the final answer — whether to grant a loan. The system includes a set of methods and tools that allow you to predict the behavioral model of customers, determine the probability of leaving customers in a default state. The system also includes tools for processing and storing information, forming data marts, as well as a wide range of analytical tools for building and analyzing credit scoring models and an extensive reporting system for solving problems of evaluating the performance of scoring models and the state of the bank's loan portfolio.

Deductor Credit Scorecard Modeler is a comprehensive solution, as well as Loginom Scorecard Modeler, which allows you to automate the process of building scoring maps. The application of the solution allows us to quantify the risks associated with the client based on hundreds of characteristics of the borrower and predict the probability of repayment of credit funds. The analytical solution of the software product is designed for the accumulation of information from various sources of structured data and subsequent in-depth analysis of the received business information with the ability to visualize and create the necessary reports. The product is effective when used by professional analysts and leading specialists of large and medium-sized organizations, in the work of professional agencies and independent specialists, as well as in the work of university researchers and scientists of research institutes.

GiniMachine - a platform for assessing the creditworthiness of borrowers, using machine learning algorithms, allows you to automatically build scoring models using historical customer data and evaluate borrowers according to the criteria that are important for the end business, while the entire platform can easily be reconfigured to meet the changing requirements of the bank's business processes, as well as specify the depth of historical data in the analysis.

The main distinguishing feature of all the above-mentioned AIS for building scoring models of customer lending is the use of machine learning methods and artificial neural networks (INS).

Thus, based on all the above advantages, the purpose of the research in this article is to study the work of machine learning methods in automated information systems (AIS) for rapid assessment of potential bank customers (borrowers) for the possibility of issuing them a loan.

Description of GiniMachine

Developed by the Belarusian company HiEnd Systems (HES) in 2016, the GiniMachine system is a platform for assessing the creditworthiness of borrowers, automatically building analytical models and predicting the probability of customer default.

The main difference from competitive AIS is the design of the software package itself, which uses machine learning algorithms that are trained on historical customer data, as a result of which a predictive model is built with high indicators for a short period of time, measured in minutes. The platform has great flexibility, which allows you to test dozens of hypotheses for customers and quickly create effective business models.

The advantages of GiniMachine include such indicators as:

- ability to evaluate clients with insufficient and / or missing credit history;

- analysis of unstructured customer data;

- the models built by the system's algorithms are stronger and have a greater predictive weight compared to the solutions of competitors using algorithms built on logistic regression.

Credit scoring on the example of GiniMaschine

To build the model, you need to upload at least 1000 customer records with records of previously issued loans, which will reflect the final result: the loan is repaid or overdue (see Fig. 1).

F igure 1. Data on previously issued loans with the status: repaid or overdue

Figure 1 shows the required data that the client provides in the form .xls or. csv files. Each data row consists of columns (attributes, parameters) and rows (records). The training sample should contain only the data that is known at the time of making a decision on the loan application. Each loan must have a separate line. If the borrower is a repeat borrower and has more than one loan, you must make multiple loan entries according to the number of loans.

The table rows can contain data of any format - text, numeric, and dates; omissions in the cells are also allowed, indicating the absence of some data.

The table does not need to specify the personal data of customers - full name, identification numbers, document numbers, and if they are available-the data will be ignored by the program.

The next step for building a credit scoring model is to automatically divide the data provided by the client into a training and test sample in the ratio of 70/30 in GiniMachine. At the same time, the training sample contains only those data that are known at the time of making a decision on the loan application, as mentioned above. Then, based on the training sample, a model is built, which is subsequently tested on the test sample. The main result of the obtained model is a score score from 0 to 1, which can be interpreted as a prediction of the probability of loan repayment, with certain assumptions [2, p. 6].

Due to the fact that to assess the quality of the model, it is impossible to directly compare the probability itself, expressed as a real number, with the answer in the form of "loan repaid" or "default allowed" (except for the case when the probability for good borrowers is predicted as 1.0, and for bad borrowers as 0.0), we use the Gini index, which is one of the most effective indicators for evaluating predictive models. In addition, the Gini index is closely related to the so-called ROC curve, which shows how many "uncertain" ("uncertain") predictions the predictive model makes. Thus, according to our analytical data, the ROC curve takes the following form (see Figure 2) [2, p. 6].

Figure 2. View of the ROC curve in accordance with the data on previously issued loans

As can be seen from figure 2, the Gini index = 0,87, and according to generally accepted industry standards the value of the Gini coefficient in excess of 0.6 for the test set to evaluate the predictive quality of the model as "high" [2, c. 7].

The resulting predictive model ambulance score ranging from 0 to 1, can be used as the probability of repayment of the loan, with certain assumptions. These probabilities can be used "as is", for example, to calculate the average return amount, or as part of a more complex statistical model used to support business decisions [2, p. 9].

In practice, predicative models in the field of lending are used to make specific decisions on credit transactions: "approve" or "refuse".

The natural way to convert a probability value into such a specific decision is to set a threshold value: all probabilities that exceed this threshold value will be interpreted as "approve", and all probabilities less than the threshold value will be interpreted as" reject " [2, p. 10].

The GiniMachine IC allows you to interactively select a threshold value by moving the slider. The values of all the statistics listed above, including in percentage terms, are dynamically recalculated in real time [2, p. 10]. Thus, Figure 3 shows an example of the corresponding interface, where an error matrix was constructed in accordance with the given analytical data (see Figure 3) [4, p. 2].

F
igure 3. Error matrix in accordance with data on previously issued loans

Although the choice of the threshold value significantly affects what actual decisions will be made by the model, the quality of the model itself, as measured by indicators such as the Gini index, does not depend on the choice of the threshold value [2, p. 11].

Conclusion

In conclusion, it should be noted that thanks to modern machine learning algorithms, it is possible to achieve up to 20% higher Gini index in comparison with the software of competitors. It takes up to 10 minutes to create a full scoring model, and it takes less than 1 second to calculate the full scoring score. The full process of building and implementing the model takes about two hours, which allows you to process more arrays of customer data during a standard eight-hour working day in Russia.

In addition, GiniMachine, unlike competitive AIS, allows you to rely on historical data about customers, while it is possible to set a specific time interval. After building the model, GiniMachine automatically creates a detailed report on the quality of the model, getting all the necessary statistics as a result of calculations, including the Gini index, a list of parameters, and a set of graphs.

The system allows you to transfer the created scoring model to the cloud or integrate it into your own business solution, the models are ready for any calculations immediately after construction.

References:

1. Jambiev G. A., Shutenko A. A., Tsurikov A. N. Assessment of the creditworthiness of potential bank customers using an artificial neural network. 2020. No. 14 (17). pp. 6-12. URL: https://apni.ru/article/1057-otsenka-kreditosposobnosti-potentsialnikh (accessed: 27.02.2021). - Text: electronic.

2. Instructions for using the GiniMachine IC: file of instructions for use– Moscow, 2021. - 14 p. - URL: https://my-files.su/Save/usq80m/Инструкция on the use of GM. html. pdf (accessed: 27.02.2021). - Text: electronic.

3. Credit scoring based on artificial intelligence and machine learning in the GiniMachine IC: file of instructions for use-Moscow, 2021. - 14 p. - URL: https://my-files.su/Save/kyx194/GiniMachine_RU.pdf (accessed 27.02.2021) - Text: electronic.

4. GiniMachine IP portal for building scoring models of lending to customers – borrowers) - URL: https://demo.ginimachine.com/login (accessed 27.02.2021)

5. Vereskun V. D., Tsurikov A. N. Information and control systems in scientific research and in production: textbook. stipend. Rostov n / A: FGBOU VO RSUPS, 2016. 76 p.

6. Bogdanov A. L., Dulya I. S. Application of neural networks in solving the credit scoring problem.Vestnik Tomskogo gosudarstvennogo universiteta. Economy. 2018. No. 44. pp. 173-183.

7. Features of consumer lending in the bank: file instructions for use-Moscow, 2021. - 21 p. - URL:https://studbooks.net/1227279/bankovskoe_delo/rekomendatsii_sovershenstvovaniyu_kreditovaniya_malogo_srednego_biznesa_bankah (accessed: 27.02.2021). - Text: electronic.

Просмотров работы: 5