Expert Systems With Applications 191 (2022) 116236 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa Multi-classification assessment of bank personal credit risk based on multi-source information fusion Tianhui Wang *, 1, Renjing Liu , Guohua Qi Xi’an Jiaotong University, School of Management, Xi’an 710049, PR China A R T I C L E I N F O A B S T R A C T Keywords: Personal credit risk Multi-classification assessment Information fusion D-S evidence theory There have been many studies on machine learning and data mining algorithms to improve the effect of credit risk assessment. However, there are few methods that can meet its universal and efficient characteristics. This paper proposes a new multi-classification assessment model of personal credit risk based on the theory of in­ formation fusion (MIFCA) by using six machine learning algorithms. The MIFCA model can simultaneously integrate the advantages of multiple classifiers and reduce the interference of uncertain information. In order to verify the MIFCA model, dataset collected from a real data set of commercial bank in China. Experimental results show that MIFCA model has two outstanding points in various assessment criteria. One is that it has higher accuracy for multi-classification assessment, and the other is that it is suitable for various risk assessments and has universal applicability. In addition, the results of this research can also provide references for banks and other financial institutions to strengthen their risk prevention and control capabilities, improve their credit risk identification capabilities, and avoid financial losses. 1. Introduction In recent years, with the development of economy, the concept of advance consumption has been more and more accepted by the public, and people pay more attention to the pursuit of material life. The increasingly strong demand for personal loans leads to the continuous emergence of various fraudulent means and loan fraud in the loan process, which seriously restricts the development of banks and many other financial institutions. At the same time, in order to obtain business sources, banks relax the control of loan term and mortgage rate, which artificially increases the potential default risk of loans. Personal credit risk has become the main risk in the operation process of commercial banks (Xiao, Xiao, & Wang, 2016). Credit assessment is an effective and key method for banks and other financial institutions to carry out risk management (Ince & Aktan, 2009). It provides appropriate guidance for issuing loans and reduces the risk in the financial field. Therefore, an efficient and convenient credit assessment model is self-evident for bank’s current state and future development space. Personal credit risk assessment is a classification study on the level of personal credit. It evaluates the level of credit quality of borrowers ac­ cording to the relevant characteristic indicators of borrowers. Financial institutions judge whether to provide financial support to the borrower according to its repayment ability and repayment credit. Modeling ap­ proaches in credit assessment have drawn much scholarly attention. In the early stages of credit assessment, a statistical method is a type of mainstream modeling approach, such as linear discriminate analysis (LDA) and logistic regression (LR). Durand first proposed the use of discriminant functions for the classification of personal consumption in credit classification and credit risk assessment (Durand, 1941). Hand and Henley (1997) devised a technique to estimate the factors that affect credit score and used logistic regression to predict the credit class of the customer. Abid et al. (2016) found that the LR model outperforms Linear Discriminant Analysis (LDA) based on a Tunisian commercial bank dataset. LDA and LR are frequently used because of their comprehen­ sibility and easy implementation (Caigny, Coussement, & Bock, 2018). However, statistical methods typically hold strong assumptions, which limit performances when these assumptions are violated or are applied in large datasets. With the continuous growth of people’s demand for credit, machine learning (ML) methods are being introduced to credit assessment domain (Davis, Edelman, & Gammerman, 1992). In contrast with sta­ tistical methods, intelligent methods do not assume certain data * Corresponding author. E-mail addresses: tianhui_wang2011@163.com (T. Wang), renjingl@mail.xjtu.edu.cn (R. Liu), qiguohua@cmbc.com.cn (G. Qi). 1 https://orcid.org/0000-0003-1984-5102 https://doi.org/10.1016/j.eswa.2021.116236 Received 11 April 2021; Received in revised form 13 November 2021; Accepted 13 November 2021 Available online 27 November 2021 0957-4174/© 2021 Elsevier Ltd. All rights reserved. T. Wang et al. Expert Systems With Applications 191 (2022) 116236 distributions. ML algorithms mainly include decision tree (DT), random forest, support vector machine (SVM), KNN, BP neural network and so on (Bensic, Sarlija, & Zekic-Susac, 2005; Bahnsen, Aouada, & Ottersten, 2015; Breiman, 2001; Huang, Chen, & Wang, 2007; Henley & Hand, 1996; Desai, Crook, & Overstreet,1996; West, 2000; Chen & Guestrin, 2016;). Some experimental studies have demonstrated the advantages of ML-based methods relative to statistical ones in many applications. Baesens et al. (2003) applied SVM, along with other classifiers to several enterprise credit risk datasets. They reported that SVM performs well in comparison with other algorithms, but do not always give the best performance. Akko (2012) proposed a neuros fuzzy based system that performed better than commonly utilized models such as ANN, LDA, LR with respect to estimated misclassification cost and average correct classification rate. Despite the superior performance, each classifier cannot perform well for all problems. Usually, they perform well for some specific problems (Xu, Krzyzak, & Suen, 1992; Xia et al., 2018). Numerous studies uncovered that individual classifier indicates moderate performance when contrasted with ensemble ones (Xu, Krzyzak, & Suen, 1992; Xia et al., 2018). Hsieh et al. (2009) proposed an efficient ensemble classifier of Support Vector Machine, Bayesian Network and Neural Networks. Another ensemble method has been proposed by Feng et al. (2018), whose classifiers are selected on the basis of their classification performance, for credit scoring. Moreover, Xia et al. (2020) develop a tree-based overfitting cautious heterogeneous ensemble model for credit scoring, the proposed method can assign weights to base models dynamically according to the overfitting mea­ sure. Zhang et al (2021) propose a hybrid ensemble model with votingbased outlier detection and balanced sampling is proposed to achieve superior predictive power for credit scoring (Zhang, Yang, & Zhang, 2021). Many studies draw similar conclusions that ensemble models outperform individual models in most cases. Thus, ensemble methods have become a research hotspot in credit scoring in the past few years (Woźniak et al., 2014). Therefore, we propose a new multi-classification credit assessment model (MIFCA), which fully considers the diversity and complemen­ tarity of the basic assessment models, and integrates six different basic credit evaluation models with the help of D-S evidence theory. The MIFCA model offers numerous advantages such as short computational time, faster convergence, robust to the presence of outliers, high effi­ ciency, comparatively less parameter tuning and ensemble framework more lightweight (Zhang et al., 2019). The main contributions of this study can be summarized as follows: (1) This paper constructs a new multi-classification credit assessment model, which is expected to provide more accurate five-classification credit assessment compared with most two-classification credit assess­ ment. (2) This paper innovatively fuse six different types of classifiers with the advantage of D-S evidence theory, and make full use of the complementarity each classifier to reduce the uncertainty of the MIFCA model, so as to improve the overall accuracy and robustness of the multiclassification assessment to the maximum extent. (3) More importantly, the MIFCA model has fewer parameters, because the base classifier adopts most of the conventionally-set parameters, which can skip the time-consuming process of selecting a proper parameters. Hence, the efficiency and usability of the model get significantly improved. Above all, MIFCA is superior in its efficiency and can be a good tool in practice in the big data era. (4) The assessment performance of MIFCA is verified on real data sets, and it compares classic statistical approaches, machine learning approaches (including common machine learning and ensemble learning approaches) and advanced approaches from four indicators: accuracy, precision, recall, and F1-score. The empirical re­ sults indicate that the proposed MIFC model performs better than comparative credit assessment models across different assessment met­ rics, and represents an efficient and universal credit assessment method that enriches credit assessment research and promotes good fintech practices. The remainder of this paper is structured as follows. Section 2 provides a comprehensive summary of modeling approaches for credit assessment models. Subsequently, Section 3 describes the process of the proposed personal credit risk assessment. For further illustration, Sec­ tion 4 describes the application of bank personal credit assessment, including the data description, assessment criteria and data preparation. What’s more, we present the experimental results of various credit assessment models, while comparing the performance with some commonly used different assessment criteria in Section 5. Finally, the conclusions are provided in Section 6. 2. Theoretical model In this work, we mainly describe the advantages and disadvantages of the selected basic models and introduce the basic principles of these models. We aim to construct an efficient, universal and light multiclassification credit assessment model. The model needs to integrate the advantages of a variety of basic credit assessment models, simplify the time-consuming process of parameter optimization as much as possible, and reduce the redundant information of the fusion model. The core issue in this stage lies on the choice of the components of the multiple classifier systems. In order to meet the research requirements, six base models with different assessment performance are selected, namely, DT, RF, SVM, KNN, BP and XGBoost in the proposed method. The reasons for selecting these methods are summarized as follows: Classical. Decision tree is a classical and commonly method. Although it is sensitive to data missing, and it is easy to ignore the correlation between attributes in the data set. However, it can make feasible and effective results for large data sources in a relatively short time. Popularity. RF is extensively considered in current credit scoring studies and is even advocated as benchmark models replacing LR (Lessmann et al., 2015). Compared with other algorithms, it has the advantage of detecting the interaction between features in the training process; and it has strong processing capabilities for unbalanced data sets. But the classification effect is not good for small data or lowdimensional data. Robustness and Generalization. SVM has good robustness and generalization ability, and can solve problems such as small samples, high-dimensional feature spaces and nonlinearities. But it’s similar to decision trees in terms of sensitivity to missing data, and there is no universal solution to nonlinear problems. Simple and Efficient. KNN is simple and effective, which is more suitable for automatic classification of class domains with large sample size. Its disadvantage is that it has high computational cost, is not suit­ able for high-dimensional feature space, and it is also sensitive to un­ balanced data. High-performance. BP has good performance of high classification accuracy, strong robustness to noise and fault tolerance. It can also fully approximate complex non-linear relationships, especially it can be quickly adjusted to adapt to new problem situations. The disadvantage is that a large number of parameters are required and the output results are difficult to explain. Efficiency. Computational cost is also important for credit scoring, It can serve as a core competence if financial institutions can derive the credit score of an applicant and make decisions immediately, thus implying that computational cost should also be considered in modeling. Xgboost is an ensemble model. It has good performance of computa­ tional cost. first, regular terms are added to the cost function to reduce the complexity of the model; Second, it can be processed in parallel, which greatly reduces the amount of calculation; The third is flexibility, which supports user-defined objective functions and evaluation func­ tions. However, there are some shortcomings, such as high requirements for data structure and unsuitable for processing high-dimensional feature data. Below is a brief introduction of the mechanism of the classifiers used in our proposal. 2 T. Wang et al. Expert Systems With Applications 191 (2022) 116236 2.1. Decision tree (3) Finally, for the random forest of k CART trees, the voting method is used for the classification problem, and the category with the highest votes is used for the final judgment result. Decision Tree (Davis, Edelman, & Gammerman, 1992), as a classic algorithm in machine learning, is one of the commonly used classifica­ tion algorithms. It intuitively shows the hidden rules of all variables in the data through the tree structure. Decision tree consists of a root node, several internal nodes and several leaf nodes, as shown in Fig. 1. Each path from the root node to the leaf node constitutes a rule, and the characteristics of the nodes in the path correspond to the conditions of the rule, and the category of the leaf nodes correspond to the results of the rule. The decision tree algorithm selects the best features through recursive methods, and classifies the data according to the features, so that each subset of the data has the best classification results. First, place all training data in the root node, select the best feature, and then divide the training data set into several subsets according to the features, so that each subset has the best classification under the current feature. When these subsets can basically be classified correctly, set leaf nodes and divide these subsets into corresponding leaf nodes; when there is a subset that cannot be classified correctly, select another feature to continue classification and set corresponding nodes. This is repeated until all data are basically classified correctly or all features have been used. Finally, each piece of data has a corresponding classification, and the decision tree is generated (Bahnsen, Aouada, & Ottersten, 2015; Xia et al., 2017). 2.3. Support vector machine Support vector machine is a supervised machine learning algorithm. It was first proposed by Soviet scholars Vladimir N. Vapnik et al (Cortes & Vapnik, 1995). It is a classifier developed from the generalized portrait algorithm in pattern recognition, and then the generalized al­ gorithm is further developed (Huang et al., 2007). The linear SVM with hard margin, the nonlinear SVM and the nonlinear SVM with soft margin have been established successively (Hens & Tiwari, 2012). The core idea of the algorithm is to use the “hyperplane” formed by some support vectors to divide different types of data, and select the optimal “hy­ perplane” from numerous “hyperplanes”. For the selection of the “hy­ perplane” structure, the distance algorithm is needed, which is transformed into the objective function of the distance from all sample points to the “hyperplane”. The objective function of the SVM model is shown in Eq. (1). J(w, b, i) = argw,b maxmin(di ) (1) Where, di represents the distance from sample point i to a fixed seg­ mentation surface; min(di ) represents the minimum distance between all sample points and a segmentation plane; argw,b maxmin(di ) represents to find the “hyperplane” with the widest “segmentation band” among all segmentation planes; where w and b represent the parameters of the linear partition plane. Assuming that the linear partitioning plane is expressed as w’ x + b = 0, then the distance di from the point to the partitioning plane can be expressed as Eq. (2). 2.2. Random forest Random forest algorithm was originally proposed by LCO Breiman based on bagging method (Breiman, 2001). In recent years, it has been widely used in various fields of classification and regression problems. Random forest is an ensemble learning algorithm based on decision tree as the base classifier. It introduces the random feature selection at the node splitting of decision tree on the basis of bagging. The core idea of random forest is to use voting mechanism of multiple decision trees to complete classification or prediction problems. In classification, the decision results of multiple trees are taken as votes, and finally classified into the most voted categories according to the samples. The modeling process of random forest is as follows: di = |w’ xi +b| ||w|| (2) ||w|| represents the second normal form of the w vector, that is:||w| | = √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ w21 + w22 + ⋯ + w2p . The specific solution can be solved by using the mathematical knowledge of the Lagrangian multiplier method to convert the original objective function into Eq. (3). ⎧ ( ) ⎪ ⎪ ∑n ⎪ 1∑n ∑n ⎪ ⎪ min a a y y − α ⎨ α i j i j i j=1 i=1 2 i=1 (3) ⎪ ⎪ ∑n ⎪ ⎪ ⎪ s.t. i=1 α̂l yi = 0, αi ≥ 0 ⎩ (1) First, use Bootstrap sampling method to generate k data sets from the original data set, and each data set contains N observations and P independent variables. (2) Second, construct a CART decision tree on each sampled data set. For each tree node, P features (P < D) are randomly selected from the original d features, and then split features and split points are selected from the feature subspace composed of the P features. The selection criterion is the decrease of the maximum impurity in the classification problem or the decrease of the maximum mean square error (MSE) in the regression problem. The above process is repeated continuously to construct tree nodes one by one until the stop condition is reached. Where, (xi∙ xj ) represents the inner product of two sample points. Finally, ∑ ∑ ∑n calculate the minimum value of 12 ni=1 nj=1 ai aj yi yj − i=1 αi according to the known sample points (xi∙ xj ), and use the value of the Lagrange multiplier αi to calculate the parameters w and b of the segmentation surface w’ x + b = 0. Fig. 1. Decision tree structure diagram. 3 T. Wang et al. Expert Systems With Applications 191 (2022) 116236 ⎧ ∑n ⎪ ⎨w ̂= α̂l yi xi i=1 ∑n ⎪ ⎩̂ b = yj − α̂l yi (xi∙ xj ) (4) i=1 In this paper, we define the credit assessment as a nonlinear problem and use Gaussion kernel to optimize the hyper plane (Ling, Cao, & Zhang, 2012; Harris, 2015). 2.4. K-nearest neighbors The K-nearest neighbor algorithm, as the name suggests, is to search the nearest K known category samples for the prediction of unknown category samples. The measurement of “nearest” is the distance or similarity between application points. In general, the integer K is not greater than 20. This algorithm is a kind of lazy learning algorithm, which does not actively summarize the input samples, but carries out modeling only when new samples need to be classified. The classifica­ tion principle of KNN is based on the distance between samples, the smaller the distance or the greater the similarity high, indicating that they are more similar. For a single sample data, if the surrounding samples belong to a certain category, there is a high probability that the data will belong to the same category, and the new sample will be assigned to the nearest category (Abdelmoula, 2015; Cover & Hart, 1967; Hand & Henley, 1997; Henley & Hand, 1996). Suppose there are two n-dimensional objects X and Y, X and Y are defined as follows: Fig. 2. BP neural network structure diagram. method in Boosting, and the purpose of Boosting algorithm is to construct a strong classifier by integrating many weak classifiers, in which XGboost uses the CART regression tree. Based on the GBDT al­ gorithm, this algorithm performs a second-order Taylor expansion of the loss function and adds a regular term, which effectively avoids overfitting and speeds up the convergence speed. The idea of XGboost al­ gorithm is to continuously form new decision trees to fit the residuals of previous predictions, so that the residuals between the predicted values and the true values are continuously reduced, thereby improving the prediction accuracy. As shown in Eq. (8), it can be expressed as a form of addition: ∑K ŷi = fk (xi ), fk ∈ F (8) k=1 X = (X1 , X2 , ⋯, Xn ) Y = (Y1 , Y2 , ⋯, Yn ) Taking X and Y as examples, three distance algorithms are mainly used in the KNN algorithm, namely Euclidean distance d1 , Manhattan distance d2 and Minkowski distance d3 . The three distance algorithms are written as: √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ √∑ ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ̅ n (Xi − Yi )2 d1 = (X1 − Y1 )2 + (X2 − Y2 )2 + ⋯ + (Xn − Yn )2 = i=1 Where, ŷi represents the predicted value of the model; K represents the number of decision trees;fk represents the kth submodel;xi represents the ith input sample; F represents the set of all decision trees. The objective function and regular terms of XGboost are composed as shown in Eqs. (9) and (10): (5) d2 = ∑n i=1 |Xi − Yi | √∑ ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅p̅ n p d3 = (|Xi − Yi |) i=1 (6) L(∅)t = (7) n ∑ ( (t− l yi , Ŷi 1) ) + ft (xi ) + φ(fk ) (9) i=1 The KNN algorithm is easy to understand and simple to implement. The disadvantage is that the amount of calculation is large, and the result will be error when the sample data is unbalanced. 1 φ(fk ) = αT + β‖ω‖2 2 (10) (t− 1) Where, L(∅)t represents the objective function of the tth iteration; Ŷi ( ) represents the predicted value of the previous t-1 iteration; φ fk rep­ resents the regular term of the model of the tth iteration, which can reduce overfitting.α and β represent regular term coefficients to prevent the decision tree from being too complex; T represents the number of leaf nodes in this model. The objective function can be expanded by Taylor formula: ] n [ T ∑ 1 1 ∑ L(∅)t ≅ ωj 2 gi ft (xi ) + hi ft 2 (xi ) + αT + β 2 2 i=1 j=1 2.5. BP neural network BP neural network is a multilayer feedforward neural network pro­ posed by Rumelhart et al. in 1986. The basic idea of the algorithm is to transform the mapping problem of neural network learning input and output into a nonlinear optimization problem, using the most optimized gradient descent algorithm. The optimized gradient descent algorithm is used to modify the network weights with iterative operations to mini­ mize the mean square error between the network output and the ex­ pected output. BP neural network is a recursive neural network with error direction propagation. It has good generalization ability and the advantages of pattern recognition and pattern classification (West, 2000). The structure of BP neural network is shown in Fig. 2: [( ) ) ] ( T ∑ ∑ 1 ∑ ≅ gi ωj + hi + β ω2j + αT 2 i∈Ij i∈Ij j=1 (11) Where, gi represents the first derivative of sample xi ; hi represents the second derivative of the sample xi ; ωj represents the output value of the jth leaf node; Ij represents the jth leaf node value sample subset. The objective function is a convex function. When the derivative of ωj is equal to zero, the minimum value ωj * of the objective function can be 2.6. XGBoost XGBoost uses the idea of iterative operations to transform a large number of weak classifiers into strong classifiers to achieve accurate classification results (Chen & Guestrin, 2016). XGboost is the classic 4 T. Wang et al. obtained, as shown in Eq. (12): ∑ i∈Ij gi ωj * = − ∑ i∈Ij hi + β Expert Systems With Applications 191 (2022) 116236 Abellán & Mantas, 2014). In D-S evidence theory, sets are generally used to represent propo­ sitions, and it is assumed that θ represents a set of mutually exclusive and exhaustive elements, which is called the recognition frame. The mass function assigns a probability to each element in the recognition frame, which is called the basic probability distribution, as follows: (12) 3. Construction of multi-classification assessment model based on multi - source information fusion ⎧ m(∅) = 0 ⎪ ⎨∑ m(A) = 1 ⎪ ⎩ A⊂θ m(A) ≥ 0 3.1. Dempster-Shafer’s (D-S) evidence theory Multi-source information fusion is usually abbreviated as informa­ tion fusion, also known as data fusion, which originated from the mili­ tary applications in the 1970s. It merges or integrates information or data from multiple sources at different levels of abstraction, and ulti­ mately obtains more complete, reliable, and accurate information or inferences (Shipp & Kuncheva, 2002; Finlay, 2011). In information fusion technology (Lee & Chen, 2005; Ling, Cao, & Zhang, 2012), Dempster-Shafer’s (D-S) evidence theory is widely used in information fusion due to its superiority in handling uncertain information (Demp­ ster, 1967; Shafer, 1976). D-S evidence theory cannot only deal with the uncertainty caused by randomness, but also deal with the uncertainty caused by fuzziness. The potential advantage of D-S method is that it does not require a priori probability and conditional probability density. In view of the advantages in this respect, it can be used completely independently in uncertain assessments (Xu, Krzyzak, & Suen, 1992; (13) ∅ represents the empty set,m(A) represents the basic probability allocation of A, and represents the degree of trust to A, where function m is a mass function. The mass function for any set of elements in the recognition box is defined as follows: ∑ Bel(A) = m(B), (∀A⊂θ) (14) B⊂A Bel(A) represents the total trust in A, that is, the mass function of A is the sum of all probabilities in A. Let m1 and m2 be the two mass functions defined on the recognition frame θ, the calculation of the D-S combination rule for the two evi­ dences is shown in Eq. (15). Fig. 3. Flow chart of the proposed method. 5 T. Wang et al. Expert Systems With Applications 191 (2022) 116236 ∑ m1 (C1 ) = S1 ∩S2 =S m1 (S1 )m2 (S2 ) 1 − K1 Table 1 Confusion matrix. (15) ∑ Where K1 = S1 ∩S2 =∅ m1 (S1 )m2 (S2 ) < 1, ∀C⊂θ, C ∕ = ∅, m(∅) = 0. Briefly, the rest of fusion calculation can be summarized as follows: ∑ Cn− 2 ∩Sn =Cn− 1 mn− 2 (Cn− 2 )mn (Sn ) mn− 1 (Cn− 1 ) = (16) 1 − K1 ∑ Kn− 1 mn− 2 (Cn− 2 )mn (Sn ) = Predicted position Predicted negative Actually positive Actually negative TP FP FN TN matrix which highlights various terms like True Positives (TP), False Positives (FP), True Negatives (TN), False Negatives (FN) which are further used to define various assessment metrics used in this study. For the multi-classification research of this article, on the basis of two classifications, each category is regarded as “positive” separately, and all other categories are regarded as “negative”. (17) Cn− 2 ∩Sn =∅ 3.2. The proposed multi-classification assessment model The process framework of the newly constructed multi-classification assessment model based on multi-source information fusion in this paper is shown in the Fig. 3. The calculation process is as follows: Step1: Preprocess the raw data set and select feature by Pearson correlation analysis; Step2: The preprocessed data set is divided into training data and testing data. The training data are used as input data into the six base classifiers. Note that the default values or common values are generally used for the basic model parameters, and excessive parameter tuning is not performed to reduce the calculation cost; Step3: Calculate the probability distribution value mi (Si ) of each base model at different classification levels on the training set. Accuracy = TP + TN TP + TN + FP + FN (18) Precision = TP TP + FP (19) Recall = TP TP + FN F1 − score = 2*Precision*Recall Precision + Recall (20) (21) Step4: Gradually fuse the probability distribution value of different classification levels DS evidence theory. For example, the probability distribution calculated by the DT model is m1 (S1 ), and the probability distribution calculated by the FR model is m2 (S2 ). Put m1 (S1 ) and m2 (S2 ) into Eq. (15) for D-S fusion to obtain the probability distribution result m1 (C1 ), C1 = (C1 BPA1 , C1 BPA2 , ⋯, C1 BPA5 , ) of different classifica­ tion levels. Then put m1 (C1 ) and m3 (S3 ) into Eq. (15) to for D-S fusion. In this way, the fusion is carried out gradually, and the classification result is obtained through the final fusion result. The accuracy is defined in Eq. (18). Precision is defined in Eq. (19), recall is defined in Eq. (20), and the F1-score is defined in Eq. (21). Accuracy represents the proportion of all correct predictions in the total sample, which reflects the overall performance of a classification model. Precision represents the proportion of the number of correctly predicted positive samples to the total predicted positive samples, which reflects the reliability of the output of classification model results. The recall represents the proportion of those correctly predicted to be positive to those actually positive, reflecting the coverage degree of the model classification effect. F1-score is the harmonic average of recall rate and precision, which is a comprehensive assessment index and a very reli­ able index for evaluating imbalanced data (Visentini, Snidaro, & Foresti, 2016). 4. Empirical design 4.3. Data preparation 4.1. Credit datasets Data quality is directly related to the pros and cons of subsequent analysis conclusions, and data preprocessing is an important step to improve data quality. In the real world, the preliminary collection of bank personal loan information data is mostly incomplete, noisy, and low-quality data sets, which are not conducive to direct data analysis and mining. Therefore, after obtaining the raw data, it is generally necessary to analyze the data structure in order to eliminate the vari­ ables of duplication and redundancy; Secondly, data cleaning is needed to deal with missing values and outliers; Besides, it is also necessary to analyze the rationality of variable selection. The aim of this section is to enhance data reliability using data preprocessing (Coussement et al., 2017). Si = (Si BPA1 , Si BPA2 , ⋯, Si BPA5 , ) The data set of this research comes from an anonymous commercial bank in China. All the data are real information integrated in the cus­ tomer’s personal loan application records (Wang et al., 2019). The raw data set includes a total of (27520) customer personal loan credit records of the commercial bank and 27 related variables. These variables are: x1 : Customer ID, x2 : Type of Loan Business, x3 : Guarantee the Balance, x4 : Account Connection Amount, x5 : Security Guarantee Amount, x6 : Whether Interest is Owed, x7 : Whether Self-service Loan, x8 : Type of Guarantee, x9 : Safety Coefficient, x10 : Collateral Value (yuan),x11 : Guarantee Method, x12 : Date Code,x13 : Approval Deadline, x14 : Whether Devalue Account, x15 : Five-level Classification, x16 : Industry Category, x17 : Down Payment Amount,x18 : Whether Personal Business Loan, x19 : Whether Interest is Owed (regulatory standard), x20 : Repayment Type, x21 : Installment Repayment Method (numerical type), x22 : Installment Repayment Method (discrete type), x23 : Installment Repayment Cycle (numerical type), x24 : Repayment Cycle (discrete type), x25 : Number of Houses, x26 : Month Property Costs, x27 : Family Monthly Income. 4.3.1. Preliminary exploration of data structures According to the preliminary analysis of the raw data set, there is no valuable information and some redundant information in the personal loan data of the bank in this study, so it is necessary to filter and clean the variables of the raw data. Among them, we remove the variable x1 due to it has useless information; The three variables of variable x3 , variable x4 and variable x5 have the redundant information, where variable x3 and variable x4 contain the same information. And the variable x5 is the product of variable x4 and variable x9 , so variable x4 and variable x5 are deleted. There are two five-level classification at­ tributes, one of which is retained; Variable x18 is not related to the characteristic attribute information, delete it; we remove the variable x6 、variable x8 and variable x12 for the following reasons: variable x6 4.2. Assessment criteria To compare the performance of our proposal and benchmarks comprehensively, we choose four representative assessment measures which terms consisting of accuracy (ACC), precision (P), recall (R), F1score (F) based on the confusion matrix. Table 1 illustrates a confusion 6 T. Wang et al. Expert Systems With Applications 191 (2022) 116236 and variable x19 express the same information; variable x8 and variable x11 are repeatedly expressed, and the same is true for Variable x12 and variable x13 . we dropped out variable x20 , and variable x22 because they have extremely high similarity value with the variable x21 ; Furthermore, we remove variable x7 , variable x23 and variable x24 because they have one value in the data set. The characteristic variables of data attributes are shown in Table 2. Table 3 Pre-processing: The percentage of missing values and outliers. 4.3.2. Data cleaning (1) Missing value. The existence of missing values will affect the results of data analysis and mining. Similarly, due to the partic­ ularity of the bank personal loan information data set, the data is not recorded successfully due to the omission of records or the inconvenience of personal privacy during data collection, or due to external reasons such as the failure of the data entry storage device, resulting in vacant values. It will have a certain impact on the subsequent training of big data algorithms. Therefore, it is necessary to combine the actual situation of missing data to process the missing values and optimize the quality of the data to improve the effectiveness and accuracy of the algorithm. The former has been firstly performed by removing variable with a percentage of missing values greater than 55%: variable x16 : In­ dustry Category. Among them, there are 1286 missing values for the variable x26 : Month Property Costs, 757 missing values for the variable x21 : Installment Repayment Method, and 20 missing values for the variable x27 : Family Monthly Income. Missing data accounts for a relatively low proportion, and these missing values can be deleted directly for the entry where it is located, as shown in Table 3. (2) Outliers. In the data preprocessing, the detection and processing of outliers is another important step. In reality, the composition of real data is complex, and there will often be a small number of abnormal maximum or minimum values. Algorithms that are sensitive to outliers will lead to deviations in data results and reduce effectiveness and other negative effects. Since the bank Variable types Attribute x17 : Down Payment Amount numeric x17 ∈ (0, 7000000) x3 : Guarantee the Balance numeric x3 ∈ (0, 99000000) x10 : Collateral Value (yuan) numeric x10 ∈ (0, +∞) x11 : Guarantee Method discrete x11 = security, mortgage x21 : Installment Repayment Method numeric x21 = 1 represents equal amount of principal and interest;x21 = 2 represents equal principal; x13 : Approval Deadline numeric x13 ∈ (1, 11000) x25 : Number of Houses numeric x25 ∈ (0, 2) x26 : Month Property Costs numeric x26 ∈ (0, 350000) x27 : Family Monthly Income numeric x27 ∈ (1, +∞) x19 : Whether Interest is Owed discrete x19 = Y = arrears;x19 = N = no interest owed; x14 : Whether Devalue Account discrete x9 : Safety Coefficient numeric x14 = Y denotes impairment; X14 = N means no impairment Y :Five-Classification discrete Missing values Outliers % of Missing values and outliers Processing Methods Industry Category Installment Repayment Method Month Property Costs 25,444 – 92.45% 757 – 2.75% Remove Variable Remove Values 1286 5.21% Remove Values Family Monthly Income 20 <20 or >20000 = 150 <1500 = 6 0.10% Remove Values personal loan credit data used in this article is non-normal dis­ tribution, the box percentile method can be selected for the detection of outliers. Combining with the data set selected in this paper, it is necessary to screen the variable x26 : Month Property Costs and the variable x27 : Family Monthly Income for outliers. In the Table 3 we can see that there are 150 records of variable x26 less than 20 yuan and more than 20,000 yuan and 6 records of variable x27 less than 1500 yuan. These outlier records account for a relatively small proportion, and can be directly deleted. (3) Recoding of discrete variables. It can be seen from Table 1 that many variables are discrete. These variables can’t be directly used for data analysis. It is necessary to recode these discrete variables of character type and convert them into dummy vari­ ables. As shown in Table 1, for example, x11 : Guarantee Method, x19 : Whether Interest is Owed,x14 : Whether Devalue Account and Y: Five-Classification data are recoded. 4.3.3. Variable correlation analysis Table 4 shows the Pearson correlation coefficients between variables x and Y. it can be seen that in the real data set adopted in this paper, most variables x and Y show low and medium correlation, and there are no unreasonable variables with high correlation. Since the data structure in this paper is nonlinear, even the variables with small correlation co­ efficients are also very important information. Sometimes they may have obvious effects when they work together, so it is obviously un­ reasonable to delete them. After correlation analysis, we think the above variables are reasonable (Chen and Li, 2010). Table 2 Bank personal loan credit attribute characteristic variable description. Variable Variable Name 4.3.4. Processing of imbalanced data sets In practice, the loan customers of commercial banks generally have more users with good credit than those with bad credit. In this data set, categories with normal credit ratings account for up to 96%, and there is a serious imbalance in the data. The imbalance of the personal loan information data set will cause the classification results to be biased towards a large number of categories, which will seriously affect the accuracy of the classification prediction results (Fernández, Jesus, & Herrera, 2010; Du et al., 2020). Therefore, it is necessary to process the unbalanced data on the cleaned data. This paper adopts the SMOTE algorithm proposed by Chawla in 2002, which is an improved technique based on random oversampling algorithm (Junior et al., 2020). This technique is currently a common method for processing unbalanced data and has been unanimously recognized by the academic community (Yeh and Lien, 2009). The core idea of the algorithm is to analyze and simulate a few types of samples, and add new artificially simulated samples to the data set, so that the categories in the raw data are not seriously out of balance. 5. Experimental results x9 = 100,80,75,70,60,50 Y = Normal, Secondary, Concern, Suspicious, Loss In this section, experiment results are presented to validate the ad­ vantages of the proposed model compared to other comparative 7 T. Wang et al. Expert Systems With Applications 191 (2022) 116236 Table 4 Pearson correlation coefficient of variable x and Y. Variable x17 x3 x10 x11 x21 x13 : x25 x26 x27 x19 x14 : x9 r 0.16 0.244 0.08 0.31 − 0.34 0.37 − 0.39 0.452 0.59 0.64 0.67 0.11 seen that in the classification assessment of credit Normal Category, the BP neural network shows excellent classification coverage and strong reliability for the overall unbalanced data, followed by the XGboost model with high reliability in the output of the classification results. Another is the MIFCA model. Although the recall rate is not very high in the classification assessment of Normal Category, the comprehensive assessment index F1-score of the model is second only to XGboost; the worst classification effect is the KNN model, and the assessment index values are low. In credit Secondary Category, as shown in Fig. 5, the accuracy of MIFCA model is the highest (87.15%), followed by BP neural network (84.89%), KNN (83.38%), and DT model (56.17%). MIFCA model had the highest recall (88.27%), followed by DT model (82.29%), and KNN model (77.70%). The highest F1-score is MIFCA model (87.71%), fol­ lowed by KNN model (80.44%), and DT model (66.77%). In the classi­ fication assessment of credit Secondary Category, the MIFCA model has the highest assessment performance. The second is the KNN model. The KNN model has higher accuracy, but its recall rate is the lowest among several models. The recall of DT model is 82.29%, second only to MIFCA model (88.27%), but its classification accuracy is the lowest, leading to the lowest comprehensive assessment index F1-score of the model. In credit Concern Category, as shown in Fig. 6, the BP neural network has the highest accuracy at 89.22%, followed by the RF model (86.27%) and the MIFCA model (84.80%), the worst is the KNN model (75.25%); the SVM model has the highest recall (89.91%), followed by the SVM model (89.91%). XGboost model (89.63%) and MIFCA model (83.78%), the worst is KNN model (72.07%); the highest F1-score is MIFCA model (86.69%), followed by BP neural network (86.15%) and XGboost model (85.97%), the worst is KNN model (73.62%). Comprehensive assessment index, the overall best assessment performance is the MIFCA model constructed in this paper. Among several models, the BP neural network has the highest accuracy, but its recall is much lower than the SVM model. The recall of the SVM model is the highest, but its accuracy is only higher than the worst KNN model. So comprehensively assessment, the MIFCA model constructed in this paper has better stability and overall assessment in terms of accuracy, recall, and F1-score. classifiers and demonstrate the effectiveness of the proposed model. All of the experiments used Python Version 3.7 on a PC with 3.0 GHz Intel CORE i7 processor. The PC had 32 GB of RAM, and ran the Microsoft Windows 10 operating system. The comparison of various classifiers results with our proposed approach are depicted in Tables 5–7 and Figs. 4–12 in terms of accuracy, precision, recall, F1- score and confusion matrix heat map. In Section 5.1, we demonstrate the comparison of the results of six base classifiers. Section 5.2 includes the comparison of our method with other classical methods (Tripathi et al., 2021). 5.1. Classification results of base classifiers This section aims to assess the performance of the six base classifiers viz. Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), BP Neural Network (BP) and XGboost in their ability to assess credit classification. In Section 5.1.1, we compare the assessment performance results of the model on the five classification levels through accuracy, recall and F1-score. In Section 5.1.2, the overall classification performance of the model is measured from the confusion matrix heat map and accuracy. 5.1.1. Assessment performance of the model at each classification level The accuracy, recall, and F1-score results of each model under five credit classification are shown in Table 5 and Figs. 4-8. The results in the figures and tables clearly show the classification performance of the MIFCA model constructed in this paper and the six base models under five credit classifications. In credit Normal Category, as shown in Fig. 4, the XGboost model has the highest accuracy at 87.69%, followed by the SVM model (86.64%) and the MIFCA model (85.92%), and the KNN model (62.29%)is the worst; the BP neural network model with the highest recall is 91.3 %, followed by random forest model (87.68%)and XGboost model (86.17%), the worst is KNN model (76.32%); the highest F1-score is BP neural network model (87%), followed by XGboost model (86.92%) and MIFCA model (86.91%), the lowest is the KNN model (68.59%). It can be Table 5 Accuracy, precision, recall and F1-score of six base classifiers and MIFCA model under five credit classifications. Model Base Models The proposed ACC Criterion C1 C2 C3 C4 C5 DT 0.7295 RF 0.786 SVM 0.7615 KNN 0.7415 BP 0.783 XGboost 0.8025 MIFCA 0.8415 P R F P R F P R F P R F P R F P R F P R F 0.7876 0.8639 0.8240 0.8329 0.8769 0.8543 0.8665 0.8188 0.8419 0.6229 0.7632 0.6859 0.8305 0.9134 0.8700 0.8769 0.8617 0.6920 0.8592 0.7792 0.8691 0.5617 0.8229 0.6677 0.7204 0.8171 0.7657 0.6801 0.7965 0.7337 0.8338 0.7770 0.8044 0.8489 0.7294 0.7846 0.8111 0.7951 0.8030 0.8715 0.8827 0.8771 0.8407 0.7940 0.8167 0.8627 0.8341 0.8482 0.7647 0.8991 0.8265 0.7525 0.7207 0.7362 0.8922 0.833 0.8615 0.826 0.8963 0.8597 0.8980 0.8378 0.8669 0.6319 0.7356 0.6798 0.7493 0.7885 0.7684 0.7650 0.6526 0.7043 0.7702 0.7126 0.7403 0.7311 0.7568 0.7437 0.7676 0.7313 0.7490 0.8225 0.8726 0.8468 0.8168 0.5478 0.6558 0.7583 0.6395 0.6938 0.6718 0.6667 0.6692 0.7354 0.7372 0.7363 0.6031 0.6771 0.6380 0.6921 0.7234 0.7074 0.8215 0.8468 0.8340 Note:C1 represents normal category; C2 represents secondary category; C3 represents concern category; C4 represents suspicious category; C5 represents loss category. 8 T. Wang et al. Expert Systems With Applications 191 (2022) 116236 Table 6 The overall accuracy of the personal credit assessment model. Model Type DT RF SVM KNN BP XGboost MIFCA Accuracy Accuracy improvement rate 72.95% 15.28% 78.6% 6.99% 76.15% 10.44% 74.15% 13.42% 78.3% 7.41% 80.25% 4.79% 84.15% — Table 7 Comparison of the performance achieved by various approaches of credit assessment under five credit classifications. Model Statistical Models Ensemble Models Advanced Methods The proposed ACC Criterion C1 C2 C3 C4 C5 LDA 0.6125 LR 0.6321 LightBGM 0.8034 CatBoost 0.8085 PLTR 0.7923 OCHE 0.8234 MIFCA 0.8415 P R F P R F P R F P R F P R F P R F P R F 0.7477 0.8074 0.7764 0.6399 0.8668 0.7363 0.8641 0.8452 0.8545 0.8963 0.8398 0.8671 0.8340 0.8828 0.8577 0.8348 0.8797 0.8566 0.8592 0.7792 0.8691 0.1428 0.9006 0.2464 0.3533 0.8162 0.4931 0.8547 0.808 0.8307 0.8520 0.8110 0.8310 0.7228 0.8484 0.7806 0.8082 0.8901 0.8471 0.8715 0.8827 0.8771 0.7987 0.7630 0.7804 0.8755 0.7117 0.7852 0.8216 0.8608 0.8408 0.8091 0.8890 0.8472 0.8782 0.8435 0.8605 0.8813 0.8447 0.8626 0.8980 0.8378 0.8669 0.7020 0.5086 0.5898 0.6694 0.528 0.5904 0.7928 0.7166 0.7528 0.8008 0.7168 0.7565 0.7601 0.7801 0.7699 0.7883 0.7946 0.7914 0.8225 0.8726 0.8468 0.6755 0.4521 0.5416 0.6237 0.4681 0.5348 0.6837 0.7966 0.7359 0.6843 0.8006 0.7379 0.767 0.6452 0.7008 0.8045 0.7260 0.7633 0.8215 0.8468 0.8340 Fig. 4. The assessment classification performance of each model on credit normal category. From left to right in the figure are the accuracy, recall, F1-score of each credit assessment model on credit Normal Category. Fig. 5. The assessment classification performance of each model on credit secondary category. From left to right in the figure are the accuracy, recall, F1-score of each credit assessment model on credit Secondary Category. 9 T. Wang et al. Expert Systems With Applications 191 (2022) 116236 Fig. 6. The assessment classification performance of each model on credit concern category. From left to right in the figure are the accuracy, recall, F1-score of each credit assessment model on credit Concern Category. Fig. 7. The assessment classification performance of each model on credit suspicious category. From left to right in the figure are the accuracy, recall, F1-score of each credit assessment model on credit Suspicious Category. Fig. 8. The assessment classification performance of each model on credit loss category. From left to right in the figure are the accuracy, recall, F1-score of each credit assessment model on credit Loss Category. In credit Suspicious Category, as shown in Fig. 7, the highest accu­ racy is MIFCA model (82.25%), followed by KNN model (77.02%) and XGboost model (76.76%), and DT model (63.19%) is the worst. The highest recall is MIFCA model (87.26%), followed by RF Model (78.85%) and BP neural network model (75.68%), the worst recall is SVM model (65.26%); The highest F1-score is MIFCA model (84.68%), followed by RF model (76.84%) and XGboost model (74.37%). In the assessment classification of credit Suspicious Category, the assessment performance indexes of the MIFCA model are the highest, and the classification effect of the DT model is relatively poor compared to other models. In credit Loss Category, as shown in Fig. 8, the DT model has the highest accuracy at 81.68%, followed by the MIFCA model (80.15%) and the RF model (75.83%), and the worst is the BP neural network model (60.31%); the highest recall is the MIFCA model, followed by the KNN model (73.72%) and the XGboost model (72.34%), and the worst is the DT model (54.78%); the highest F1-score is the MIFCA model (83.40%), followed by the KNN model (73.63 %), the lowest is the BP neural network model (63.80%). In the credit Loss Category, based on the assessment index values, the overall assessment performance of the MIFCA model constructed in this paper is the best. The overall assess­ ment performance of the KNN model is second only to the MIFCA model. The BP neural network model has the worst assessment performance. Through the above analysis, it can be seen that among the five credit classifications, DT model and KNN model have the most frequency of the lowest assessment index values compared with other models. The 10 T. Wang et al. Expert Systems With Applications 191 (2022) 116236 Fig. 9. Confusion matrix heat maps of six base models. 11 T. Wang et al. Expert Systems With Applications 191 (2022) 116236 Fig. 10. Confusion matrix heat maps of MIFCA model. Normal Category and the accuracy in Concern Category, but the recall rate in Secondary Category is the same as Loss Category has the lowest accuracy; RF model and XGBOOST model are better than DT model, KNN model, SVM model and BP neural network model in the classifi­ cation performance of five credit categories. RF model and XGBOOST model are better than DT model, KNN model, SVM model and BP neural network model in the classification performance of five credit categories. 5.1.2. Overall assessment performance of the model For the overall assessment performance of the model, the confusion matrix results and the overall accuracy of the model are used to assess the performance in this part. A confusion matrix heat map is used to visualize the confusion matrix results of each model. In the confusion matrix heat map, the color of the squares on the diagonal deeper representing a high classification accu­ racy for each category. As shown in Figs. 9–10, in the confusion matrix heat map of DT model, the blue square on the diagonal has the darkest color in C3, and the lighter color in C2 and C4, indicating that DT model has the best classification performance in C3, and poor classification performance in C2 and C4. Similarly, the classification performance of RF model in C2 is slightly weaker than other categories; The classifica­ tion performance of SVM model is poor in C5. The classification per­ formance of KNN model is the worst in C2 and C5. BP model and XGboost model have poor performance in C5. It can be seen from the confusion matrix heat map of MIFCA that the proposed model has good classifi­ cation performance in all five categories, and the overall classification effect is better than the other six base models, which greatly improves Fig. 11. Overall accuracy of credit assessment models. assessment performance of SVM model in Normal Category is second only to BP neural network model, and the recall rate in Concern Cate­ gory is also the highest, but the recall rate in Suspicious Category is the lowest. the BP neural network model has the highest recall rate in 12 T. Wang et al. Expert Systems With Applications 191 (2022) 116236 Fig. 12. Confusion matrix heat maps of other credit assessment models. 13 T. Wang et al. Expert Systems With Applications 191 (2022) 116236 the weakness of the base models in the classification performance of some categories. The overall accuracy of each model is shown in Table 6 and Fig. 11. It can be seen that the overall assessment accuracy of the MIFCA model constructed in this paper is as high as 84.1%. Compared with the multiclassification assessment model of bank personal credit constructed by decision tree, random forest, support vector machine, K-nearest neighbor, BP neural network and XGboost, the accuracy is significantly improved (Wang, Hao, Ma, & Jiang, 2011). Among them, the accuracy rate of the decision tree model is increased the most, up to 15.28%, and the accuracy rate of the XGboost assessment model, which has better performance than the assessment model, is also increased by 4.79%. In summary, it can be clearly seen that when base classifier performs multi-classification credit assessment, the performance of each classifi­ cation algorithm is different, resulting in different levels of assessment accuracy errors in different categories, and there is complementarity among each base classifier. The MIFCA model constructed in this paper makes full use of the complementarity among multiple sources of in­ formation, fused the redundant information of each classifier, and re­ duces the overall uncertainty of the model, thereby improving the classification accuracy. More importantly, the MIFCA model fully in­ tegrates the excellent performance of multiple classifiers, and is more universal for different types of data structures. assessment methods, the performance of PLTR is similar to RF, and the classification effect is poor in the C2, C4 and C5 categories. Although the PLTR evaluation result is weaker than the MIFCA method, its inter­ pretability is better than the machine learning algorithm. The credit assessment performance of OCHE is slightly weaker than MIFCA. On the whole, the performance of the five classifications is relatively balanced. However, the overall classification accuracy of MIFCA is better than OCHE, and the structure of MIFCA model is more concise than that of OCHE. Through the above experimental analysis, the credit assessment model constructed in this paper has shown good performance both on the whole and in each category. 6. Conclusion Personal credit assessment is particularly important for banks and other financial institutions. Every 1% increase in the accuracy of assessment can avoid the huge losses of banks and other financial in­ stitutions. Numerous classifiers methods have been proposed to handle the problem of credit scoring. However, compared with most twoclassification credit assessment, there are few studies on universal multi-classification credit assessment. This paper presents a new multiclassification credit assessment model (MIFCA). The MIFCA model takes full account of the complementarity of the basic assessment models, and reduces the uncertainty of the model by virtue of the advantage of D-S evidence theory, so as to improve the accuracy and robustness of the classifier. More importantly, MIFCA is a lightweight credit evaluation model with few parameters and low calculation cost, which meets the requirements of high efficiency and universality. We further conduct several experiments to verify the performance. MIFCA and the bench­ marks are evaluated across four metrics using a real-world personal credit risk data set of a commercial bank in China. Regarding predictive accuracy, the results demonstrate that the proposed model significantly outperforms most of benchmark models. No matter in the five credit rating levels or in the overall level, MIFCA model shows high assessment accuracy. It can be seen that MIFCA model can fully integrate the excellent performance of a variety of classifiers, which is suitable for different credit risk data and has more extensive practical value. It can provide an important reference for the credit risk control of commercial banks. 5.2. Comparison of MIFCA with other approaches of credit assessment We select six classic and commonly credit assessment models from the traditional statistical methods, the ensemble methods and the advanced methods to compare and analyze the assessment results. Among them, LDA and LR are the two most commonly credit assessment benchmark models in statistics (Dumitrescu et al., 2021). LightBGM and CatBoost are classic ensemble credit assessment methods, and PLTR and OCHE are the latest two research credit assessment methods. In previous studies, several typical classification models have been applied in classification, such as LDA (Fisher, 1936), LR (Hand & Kelly, 2002). LDA as a classification approach is utilized to find a linear combination of features that characterizes into the classes of objects; LR, which is a widely used statistical modeling technique, could build a model with classification outcome and has been proven as a powerful algorithm (Lee et al., 2006); LightGBM is an advanced GBDT-based al­ gorithm developed by Ke et al. (2018). Some experiments have shown that LightGBM is superior to the original GBDT while consuming less computational costs. Ke et al even demonstrated that LightGBM provides better results than XGBoost on a variety of datasets; Developed by Prokhorenkova, Gusev, Vorobev, Dorogush, and Gulin (2018), CatBoost is a powerful open-sourced GBDT-based technique that achieves prom­ ising results in a variety of ML tasks. The authors claimed that CatBoost outperformed existing GBDT techniques; OCHE is a novel credit scoring models which considered selective heterogeneous ensemble developed by Xia et al. (2020). OCHE employs advanced tree-based classifiers as base models and fully considers the overfitting in ensemble selection stage; PLTR is a high-performance and interpretable credit scoring method proposed by Dumitrescu et al. (2021), which uses information from decision trees to improve the performance of logistic regression. PLTR allows to capture non-linear effects that can arise in credit scoring data while preserving the intrinsic interpretability of the logistic regression model(Dumitrescu et al., 2021). Table 7 gives a comparative study between the other classical ap­ proaches proposed in this domain and our proposed model MIFCA on this paper obtained datasets. As can be seen from Table 7 and Fig. 12, the results of the two traditional statistical credit assessments are poor, which is also consistent with the conclusions of existing studies. The credit assessment results obtained by the two ensemble methods are relatively similar, and the classification effect on the C5 category is relatively poor. Obviously, the ensemble method has better performance than the general machine learning algorithm; For the latest credit CRediT authorship contribution statement Tianhui Wang: Conceptualization, Methodology, Software, Valida­ tion, Writing – original draft, Writing – review & editing. Renjing Liu: Supervision, Writing – review & editing. Guohua Qi: Data curation, Investigation. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgement This work was partially supported by the Major Project of the Na­ tional Social Science Fund of China (18ZDA104). Appendix A. Supplementary data Supplementary data to this article can be found online at https://doi. org/10.1016/j.eswa.2021.116236. 14 T. Wang et al. Expert Systems With Applications 191 (2022) 116236 References Hens, A. B., & Tiwari, M. K. (2012). Computational time reduction for credit scoring: An integrated approach based on support vector machine and stratified sampling method. Expert Systems with Applications, 39(8), 6774–6781. Hand, D. J., & Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: A review. Journal of the Royal Statistical Society, 160(3), 523–541. Harris, T. (2015). Credit scoring using the clustered support vector machine. Expert Systems with Applications, 42, 741–750. Henley, W. E., & Hand, D. J. (1996). A k-Nearest-Neighbour Classifier for Assessing Consumer Credit Risk. The Stasistician, 45(1), 77–95. Ince, H., & Aktan, B. (2009). A comparison of data mining techniques for credit scoring in banking: A managerial perspective. Journal of Business Economics and Management, 10, 233–240. Junior, L. M., Nardini, F. M., Renso, C., Trani, R., & Macedo, J. A. (2020). A novel approach to define the local region of dynamic selection techniques in imbalanced credit scoring problems. Expert Systems with Applications, 152, 113351. Lessmann, S., Baesens, B., Seow, H.-V., & Thomas, L. C. (2015). Benchmarking state-ofthe-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247, 124–136. Ling, Y., Cao, Q., & Zhang, H. (2012). Credit scoring using multi-kernel support vector machine and chaos particle swarm optimization. International Journal of Computational Intelligence & Applications, 11(3), 6774–7142. Lee, T.-S., Chiu, C.-C., Chou, Y.-C., & Lu, C.-J. (2006). Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Computational Statistics and Data Analysis, 50, 1113–1130. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., & Liu, T.-Y. (2018). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems.. In (pp. 3149–3157). Lee, T., & Chen, I. (2005). A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert Systems with Applications, 28(4), 743–752. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems, 6638–6648. Shafer, G. (1976). A mathematical theory of evidence. Princeton University Press. Shipp, C. A., & Kuncheva, L. I. (2002). Relationships between combination methods and measures of diversity in combining classifiers. Information fusion, 3, 135–148. Tripathi, D., Edla, D. R., Bablani, A., Shukla, A. K., & Reddy, B. R. (2021). Experimental analysis of machine learning methods for credit score classification. Progress in Artificial Intelligence, 1–27. Visentini, I., Snidaro, L., & Foresti, G. L. (2016). Diversity-aware classifier ensemble selection via f-score. Information Fusion, 28, 24–43. Woźniak, M., Graña, M., & Corchado, E. (2014). A survey of multiple classifier systems as hybrid systems. Information Fusion, 16, 3–17. Wang, G., Hao, J., Ma, J., & Jiang, H. (2011). A comparative assessment of ensemble learning for credit scoring. Expert Systems with Applications, 38, 223–230. Wang, B., Kong, Y., Zhang, Y., Liu, D., & Ning, L. (2019). Integration of Unsupervised and Supervised Machine Learning Algorithms for Credit Risk Assessment. Expert Systems with Applications, 128(AUG), 301–315. West, D. (2000). Neural network credit scoring models. Computers & Operations Research, 27(11–12), 1131–1152. Xu, L., Krzyzak, A., & Suen, C. Y. (1992). Methods of combining multiple classifiers and their applications to handwriting recognition. EEE Transactions on Information Theory, 22(3), 418–435. Xia, Y., Zhao, J., He, L., Li, Y., & Niu, M. (2020). A novel tree-based dynamic heterogeneous ensemble method for credit scoring. Expert Systems with Applications, 159, 113615. https://doi.org/10.1016/j.eswa.2020.113615 Xiao, H., Xiao, Z., & Wang, Y. (2016). Ensemble classification based on supervised clustering for credit scoring. Applied Soft Computing, 43, 73–86. Xia, Y., Liu, C., Li, Y., & Liu, N. (2017). A boosted decision tree approach using bayesian hyper-parameter optimization for credit scoring. Expert Systems with Applications, 78, 225–241. Xia, Y., Liu, C., Da, B., & Xie, F. (2018). A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Systems with Applications, 93, 182–199. Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473–2480. Zhang, W., He, H., & Zhang, S. (2019). A novel multi-stage hybrid model with enhanced multi-population niche genetic algorithm: An application in credit scoring. Expert Systems with Applications, 121, 221–232. Zhang, W., Yang, D., & Zhang, S. (2021). A new hybrid ensemble model with votingbased outlier detection and balanced sampling for credit scoring. Expert Systems with Applications, 174, 114744. Abid, L., Masmoudi, A., & Zouari-Ghorbel, S. (2016). The consumer loan’s payment default predictive model: an application of the logistic regression and the discriminant analysis in a tunisian commercial bank. Journal of the Knowledge Economy, 9(3), 948-962. Abdelmoula, A. K. (2015). Bank credit risk analysis with k-nearest-neighbor classifier: Case of tunisian banks. Journal of Accounting & Management Information Systems, 14 (1), 79–106. Abellán, J., & Mantas, C. J. (2014). Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Systems with Applications, 41(8), 3825–3830. Akko, S. (2012). An empirical comparison of conventional techniques, neural networks and the three-stage hybrid adaptive neuro fuzzy inference system (anfis) model for credit scoring analysis: The case of turkish credit card data. European Journal of Operational Research, 222(1), 168–178. Bahnsen, A. C., Aouada, D., & Ottersten, B. (2015). Example-dependent cost-sensitive decision trees. Expert Systems with Applications, 42(19), 6609–6619. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. Bensic, M., Sarlija, N., & Zekic-Susac, M. (2005). Modelling small-business credit scoring by using logistic regression, neural networks and decision trees. Intelligent Systems in Accounting, Finance & Management: International Journal, 13, 133–150. Baesens, B., Gestel, T. V., Viaene, S., Stepanova, M., & Vanthienen, J. S. (2003). Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society, 54(6), 627–635. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297. Coussement, K., Lessman, S., & Verstraeten, G. (2017). A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication. Decision Support Systems, 95, 27–36. Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery & Data Mining (pp. 785–794). ACM. Caigny, A. D., Coussement, K., & Bock, K. W. D. (2018). A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research, 269(2), 760–772. Chen, F. L., & Li, F. C. (2010). Combination of feature selection approaches with SVM in credit scoring. Expert Systems with Applications, 37(7), 4902–4909. Dumitrescu, E., Hué, S., Hurlin, C., et al. (2021). Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects. European Journal of Operational Research. https://doi.org/10.1016/j.ejor.2021.06.053 Du, G., Zhang, J., Luo, Z., Ma, F., & Li, S. (2020). Joint imbalanced classification and feature selection for hospital readmissions. Knowledge-Based Systems, 200, 106020. Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21–27. Davis, R. H., Edelman, D. B., & Gammerman, A. J. (1992). Machine-learning algorithms for credit-card applications. IMA Journal of Management Mathematics, 4(1), 43–51. Durand, D. (1941). Risk Elements in consumer Installment financing. New York: National Bureau of Economic Research. Desai, V. S., Crook, J. N., & Overstreet, G. A., Jr (1996). A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95(1), 24–37. Dempster, A. (1967). Upper and lower probabilities induced by a multivalued mapping. Annals of Mathematical Statistics, 38(2), 325–339. Feng, Xiaodong, Xiao, Zhi, Zhong, Bo, et al. (2018). Dynamic ensemble classification for credit scoring using soft probability. Applied Soft Computing, 65, 139–151. Fernández, M. J. Jesus, F. Herrera. (2010). On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced datasets. Information Sciences, 180(8), 1268-1291. Finlay, S. (2011). Multiple classifier architectures and their application to credit risk assessment. European Journal of Operational Research, 210, 368–378. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Human Genetics, 7(2), 179–188. Hsieh, N. C., Hung, L. P., & Ho, C. L. (2009). A data driven ensemble classifier for credit scoring analysis. Expert Systems with Applications, 37(1), 534–545. Hand, D. J., & Kelly, M. G. (2002). Superscorecards. IMA Journal of Management Mathematics, 13(4), 273–281. Huang, C.-L., Chen, M.-C., & Wang, C.-J. (2007). Credit scoring with a data mining approach based on support vector machines. Expert Systems with Applications, 33, 847–856. 15