Subido por Antony Porras

Q1 Multi-classification assessment of bank personal credit risk based on

Anuncio
Expert Systems With Applications 191 (2022) 116236
Contents lists available at ScienceDirect
Expert Systems With Applications
journal homepage: www.elsevier.com/locate/eswa
Multi-classification assessment of bank personal credit risk based on
multi-source information fusion
Tianhui Wang *, 1, Renjing Liu , Guohua Qi
Xi’an Jiaotong University, School of Management, Xi’an 710049, PR China
A R T I C L E I N F O
A B S T R A C T
Keywords:
Personal credit risk
Multi-classification assessment
Information fusion
D-S evidence theory
There have been many studies on machine learning and data mining algorithms to improve the effect of credit
risk assessment. However, there are few methods that can meet its universal and efficient characteristics. This
paper proposes a new multi-classification assessment model of personal credit risk based on the theory of in­
formation fusion (MIFCA) by using six machine learning algorithms. The MIFCA model can simultaneously
integrate the advantages of multiple classifiers and reduce the interference of uncertain information. In order to
verify the MIFCA model, dataset collected from a real data set of commercial bank in China. Experimental results
show that MIFCA model has two outstanding points in various assessment criteria. One is that it has higher
accuracy for multi-classification assessment, and the other is that it is suitable for various risk assessments and
has universal applicability. In addition, the results of this research can also provide references for banks and
other financial institutions to strengthen their risk prevention and control capabilities, improve their credit risk
identification capabilities, and avoid financial losses.
1. Introduction
In recent years, with the development of economy, the concept of
advance consumption has been more and more accepted by the public,
and people pay more attention to the pursuit of material life. The
increasingly strong demand for personal loans leads to the continuous
emergence of various fraudulent means and loan fraud in the loan
process, which seriously restricts the development of banks and many
other financial institutions. At the same time, in order to obtain business
sources, banks relax the control of loan term and mortgage rate, which
artificially increases the potential default risk of loans. Personal credit
risk has become the main risk in the operation process of commercial
banks (Xiao, Xiao, & Wang, 2016). Credit assessment is an effective and
key method for banks and other financial institutions to carry out risk
management (Ince & Aktan, 2009). It provides appropriate guidance for
issuing loans and reduces the risk in the financial field. Therefore, an
efficient and convenient credit assessment model is self-evident for
bank’s current state and future development space.
Personal credit risk assessment is a classification study on the level of
personal credit. It evaluates the level of credit quality of borrowers ac­
cording to the relevant characteristic indicators of borrowers. Financial
institutions judge whether to provide financial support to the borrower
according to its repayment ability and repayment credit. Modeling ap­
proaches in credit assessment have drawn much scholarly attention.
In the early stages of credit assessment, a statistical method is a type
of mainstream modeling approach, such as linear discriminate analysis
(LDA) and logistic regression (LR). Durand first proposed the use of
discriminant functions for the classification of personal consumption in
credit classification and credit risk assessment (Durand, 1941). Hand
and Henley (1997) devised a technique to estimate the factors that affect
credit score and used logistic regression to predict the credit class of the
customer. Abid et al. (2016) found that the LR model outperforms Linear
Discriminant Analysis (LDA) based on a Tunisian commercial bank
dataset. LDA and LR are frequently used because of their comprehen­
sibility and easy implementation (Caigny, Coussement, & Bock, 2018).
However, statistical methods typically hold strong assumptions, which
limit performances when these assumptions are violated or are applied
in large datasets.
With the continuous growth of people’s demand for credit, machine
learning (ML) methods are being introduced to credit assessment
domain (Davis, Edelman, & Gammerman, 1992). In contrast with sta­
tistical methods, intelligent methods do not assume certain data
* Corresponding author.
E-mail addresses: tianhui_wang2011@163.com (T. Wang), renjingl@mail.xjtu.edu.cn (R. Liu), qiguohua@cmbc.com.cn (G. Qi).
1
https://orcid.org/0000-0003-1984-5102
https://doi.org/10.1016/j.eswa.2021.116236
Received 11 April 2021; Received in revised form 13 November 2021; Accepted 13 November 2021
Available online 27 November 2021
0957-4174/© 2021 Elsevier Ltd. All rights reserved.
T. Wang et al.
Expert Systems With Applications 191 (2022) 116236
distributions. ML algorithms mainly include decision tree (DT), random
forest, support vector machine (SVM), KNN, BP neural network and so
on (Bensic, Sarlija, & Zekic-Susac, 2005; Bahnsen, Aouada, & Ottersten,
2015; Breiman, 2001; Huang, Chen, & Wang, 2007; Henley & Hand,
1996; Desai, Crook, & Overstreet,1996; West, 2000; Chen & Guestrin,
2016;). Some experimental studies have demonstrated the advantages of
ML-based methods relative to statistical ones in many applications.
Baesens et al. (2003) applied SVM, along with other classifiers to several
enterprise credit risk datasets. They reported that SVM performs well in
comparison with other algorithms, but do not always give the best
performance. Akko (2012) proposed a neuros fuzzy based system that
performed better than commonly utilized models such as ANN, LDA, LR
with respect to estimated misclassification cost and average correct
classification rate. Despite the superior performance, each classifier
cannot perform well for all problems. Usually, they perform well for
some specific problems (Xu, Krzyzak, & Suen, 1992; Xia et al., 2018).
Numerous studies uncovered that individual classifier indicates
moderate performance when contrasted with ensemble ones (Xu,
Krzyzak, & Suen, 1992; Xia et al., 2018). Hsieh et al. (2009) proposed an
efficient ensemble classifier of Support Vector Machine, Bayesian
Network and Neural Networks. Another ensemble method has been
proposed by Feng et al. (2018), whose classifiers are selected on the
basis of their classification performance, for credit scoring. Moreover,
Xia et al. (2020) develop a tree-based overfitting cautious heterogeneous
ensemble model for credit scoring, the proposed method can assign
weights to base models dynamically according to the overfitting mea­
sure. Zhang et al (2021) propose a hybrid ensemble model with votingbased outlier detection and balanced sampling is proposed to achieve
superior predictive power for credit scoring (Zhang, Yang, & Zhang,
2021). Many studies draw similar conclusions that ensemble models
outperform individual models in most cases. Thus, ensemble methods
have become a research hotspot in credit scoring in the past few years
(Woźniak et al., 2014).
Therefore, we propose a new multi-classification credit assessment
model (MIFCA), which fully considers the diversity and complemen­
tarity of the basic assessment models, and integrates six different basic
credit evaluation models with the help of D-S evidence theory. The
MIFCA model offers numerous advantages such as short computational
time, faster convergence, robust to the presence of outliers, high effi­
ciency, comparatively less parameter tuning and ensemble framework
more lightweight (Zhang et al., 2019).
The main contributions of this study can be summarized as follows:
(1) This paper constructs a new multi-classification credit assessment
model, which is expected to provide more accurate five-classification
credit assessment compared with most two-classification credit assess­
ment. (2) This paper innovatively fuse six different types of classifiers
with the advantage of D-S evidence theory, and make full use of the
complementarity each classifier to reduce the uncertainty of the MIFCA
model, so as to improve the overall accuracy and robustness of the multiclassification assessment to the maximum extent. (3) More importantly,
the MIFCA model has fewer parameters, because the base classifier
adopts most of the conventionally-set parameters, which can skip the
time-consuming process of selecting a proper parameters. Hence, the
efficiency and usability of the model get significantly improved. Above
all, MIFCA is superior in its efficiency and can be a good tool in practice
in the big data era. (4) The assessment performance of MIFCA is verified
on real data sets, and it compares classic statistical approaches, machine
learning approaches (including common machine learning and
ensemble learning approaches) and advanced approaches from four
indicators: accuracy, precision, recall, and F1-score. The empirical re­
sults indicate that the proposed MIFC model performs better than
comparative credit assessment models across different assessment met­
rics, and represents an efficient and universal credit assessment method
that enriches credit assessment research and promotes good fintech
practices.
The remainder of this paper is structured as follows. Section 2
provides a comprehensive summary of modeling approaches for credit
assessment models. Subsequently, Section 3 describes the process of the
proposed personal credit risk assessment. For further illustration, Sec­
tion 4 describes the application of bank personal credit assessment,
including the data description, assessment criteria and data preparation.
What’s more, we present the experimental results of various credit
assessment models, while comparing the performance with some
commonly used different assessment criteria in Section 5. Finally, the
conclusions are provided in Section 6.
2. Theoretical model
In this work, we mainly describe the advantages and disadvantages
of the selected basic models and introduce the basic principles of these
models. We aim to construct an efficient, universal and light multiclassification credit assessment model. The model needs to integrate
the advantages of a variety of basic credit assessment models, simplify
the time-consuming process of parameter optimization as much as
possible, and reduce the redundant information of the fusion model. The
core issue in this stage lies on the choice of the components of the
multiple classifier systems. In order to meet the research requirements,
six base models with different assessment performance are selected,
namely, DT, RF, SVM, KNN, BP and XGBoost in the proposed method.
The reasons for selecting these methods are summarized as follows:
Classical. Decision tree is a classical and commonly method.
Although it is sensitive to data missing, and it is easy to ignore the
correlation between attributes in the data set. However, it can make
feasible and effective results for large data sources in a relatively short
time.
Popularity. RF is extensively considered in current credit scoring
studies and is even advocated as benchmark models replacing LR
(Lessmann et al., 2015). Compared with other algorithms, it has the
advantage of detecting the interaction between features in the training
process; and it has strong processing capabilities for unbalanced data
sets. But the classification effect is not good for small data or lowdimensional data.
Robustness and Generalization. SVM has good robustness and
generalization ability, and can solve problems such as small samples,
high-dimensional feature spaces and nonlinearities. But it’s similar to
decision trees in terms of sensitivity to missing data, and there is no
universal solution to nonlinear problems.
Simple and Efficient. KNN is simple and effective, which is more
suitable for automatic classification of class domains with large sample
size. Its disadvantage is that it has high computational cost, is not suit­
able for high-dimensional feature space, and it is also sensitive to un­
balanced data.
High-performance. BP has good performance of high classification
accuracy, strong robustness to noise and fault tolerance. It can also fully
approximate complex non-linear relationships, especially it can be
quickly adjusted to adapt to new problem situations. The disadvantage is
that a large number of parameters are required and the output results are
difficult to explain.
Efficiency. Computational cost is also important for credit scoring, It
can serve as a core competence if financial institutions can derive the
credit score of an applicant and make decisions immediately, thus
implying that computational cost should also be considered in modeling.
Xgboost is an ensemble model. It has good performance of computa­
tional cost. first, regular terms are added to the cost function to reduce
the complexity of the model; Second, it can be processed in parallel,
which greatly reduces the amount of calculation; The third is flexibility,
which supports user-defined objective functions and evaluation func­
tions. However, there are some shortcomings, such as high requirements
for data structure and unsuitable for processing high-dimensional
feature data.
Below is a brief introduction of the mechanism of the classifiers used
in our proposal.
2
T. Wang et al.
Expert Systems With Applications 191 (2022) 116236
2.1. Decision tree
(3) Finally, for the random forest of k CART trees, the voting method
is used for the classification problem, and the category with the
highest votes is used for the final judgment result.
Decision Tree (Davis, Edelman, & Gammerman, 1992), as a classic
algorithm in machine learning, is one of the commonly used classifica­
tion algorithms. It intuitively shows the hidden rules of all variables in
the data through the tree structure. Decision tree consists of a root node,
several internal nodes and several leaf nodes, as shown in Fig. 1. Each
path from the root node to the leaf node constitutes a rule, and the
characteristics of the nodes in the path correspond to the conditions of
the rule, and the category of the leaf nodes correspond to the results of
the rule. The decision tree algorithm selects the best features through
recursive methods, and classifies the data according to the features, so
that each subset of the data has the best classification results. First, place
all training data in the root node, select the best feature, and then divide
the training data set into several subsets according to the features, so
that each subset has the best classification under the current feature.
When these subsets can basically be classified correctly, set leaf nodes
and divide these subsets into corresponding leaf nodes; when there is a
subset that cannot be classified correctly, select another feature to
continue classification and set corresponding nodes. This is repeated
until all data are basically classified correctly or all features have been
used. Finally, each piece of data has a corresponding classification, and
the decision tree is generated (Bahnsen, Aouada, & Ottersten, 2015; Xia
et al., 2017).
2.3. Support vector machine
Support vector machine is a supervised machine learning algorithm.
It was first proposed by Soviet scholars Vladimir N. Vapnik et al (Cortes
& Vapnik, 1995). It is a classifier developed from the generalized
portrait algorithm in pattern recognition, and then the generalized al­
gorithm is further developed (Huang et al., 2007). The linear SVM with
hard margin, the nonlinear SVM and the nonlinear SVM with soft margin
have been established successively (Hens & Tiwari, 2012). The core idea
of the algorithm is to use the “hyperplane” formed by some support
vectors to divide different types of data, and select the optimal “hy­
perplane” from numerous “hyperplanes”. For the selection of the “hy­
perplane” structure, the distance algorithm is needed, which is
transformed into the objective function of the distance from all sample
points to the “hyperplane”. The objective function of the SVM model is
shown in Eq. (1).
J(w, b, i) = argw,b maxmin(di )
(1)
Where, di represents the distance from sample point i to a fixed seg­
mentation surface; min(di ) represents the minimum distance between all
sample points and a segmentation plane; argw,b maxmin(di ) represents to
find the “hyperplane” with the widest “segmentation band” among all
segmentation planes; where w and b represent the parameters of the
linear partition plane. Assuming that the linear partitioning plane is
expressed as w’ x + b = 0, then the distance di from the point to the
partitioning plane can be expressed as Eq. (2).
2.2. Random forest
Random forest algorithm was originally proposed by LCO Breiman
based on bagging method (Breiman, 2001). In recent years, it has been
widely used in various fields of classification and regression problems.
Random forest is an ensemble learning algorithm based on decision tree
as the base classifier. It introduces the random feature selection at the
node splitting of decision tree on the basis of bagging. The core idea of
random forest is to use voting mechanism of multiple decision trees to
complete classification or prediction problems. In classification, the
decision results of multiple trees are taken as votes, and finally classified
into the most voted categories according to the samples. The modeling
process of random forest is as follows:
di =
|w’ xi +b|
||w||
(2)
||w|| represents the second normal form of the w vector, that is:||w| | =
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
w21 + w22 + ⋯ + w2p . The specific solution can be solved by using the
mathematical knowledge of the Lagrangian multiplier method to
convert the original objective function into Eq. (3).
⎧
(
)
⎪
⎪
∑n
⎪
1∑n ∑n
⎪
⎪
min
a
a
y
y
−
α
⎨
α
i j i j
i
j=1
i=1
2 i=1
(3)
⎪
⎪
∑n
⎪
⎪
⎪
s.t. i=1 α̂l yi = 0, αi ≥ 0
⎩
(1) First, use Bootstrap sampling method to generate k data sets from
the original data set, and each data set contains N observations
and P independent variables.
(2) Second, construct a CART decision tree on each sampled data set.
For each tree node, P features (P < D) are randomly selected from
the original d features, and then split features and split points are
selected from the feature subspace composed of the P features.
The selection criterion is the decrease of the maximum impurity
in the classification problem or the decrease of the maximum
mean square error (MSE) in the regression problem. The above
process is repeated continuously to construct tree nodes one by
one until the stop condition is reached.
Where, (xi∙ xj ) represents the inner product of two sample points. Finally,
∑ ∑
∑n
calculate the minimum value of 12 ni=1 nj=1 ai aj yi yj −
i=1 αi according to
the known sample points (xi∙ xj ), and use the value of the Lagrange
multiplier αi to calculate the parameters w and b of the segmentation
surface w’ x + b = 0.
Fig. 1. Decision tree structure diagram.
3
T. Wang et al.
Expert Systems With Applications 191 (2022) 116236
⎧
∑n
⎪
⎨w
̂=
α̂l yi xi
i=1
∑n
⎪
⎩̂
b = yj −
α̂l yi (xi∙ xj )
(4)
i=1
In this paper, we define the credit assessment as a nonlinear problem
and use Gaussion kernel to optimize the hyper plane (Ling, Cao, &
Zhang, 2012; Harris, 2015).
2.4. K-nearest neighbors
The K-nearest neighbor algorithm, as the name suggests, is to search
the nearest K known category samples for the prediction of unknown
category samples. The measurement of “nearest” is the distance or
similarity between application points. In general, the integer K is not
greater than 20. This algorithm is a kind of lazy learning algorithm,
which does not actively summarize the input samples, but carries out
modeling only when new samples need to be classified. The classifica­
tion principle of KNN is based on the distance between samples, the
smaller the distance or the greater the similarity high, indicating that
they are more similar. For a single sample data, if the surrounding
samples belong to a certain category, there is a high probability that the
data will belong to the same category, and the new sample will be
assigned to the nearest category (Abdelmoula, 2015; Cover & Hart,
1967; Hand & Henley, 1997; Henley & Hand, 1996).
Suppose there are two n-dimensional objects X and Y, X and Y are
defined as follows:
Fig. 2. BP neural network structure diagram.
method in Boosting, and the purpose of Boosting algorithm is to
construct a strong classifier by integrating many weak classifiers, in
which XGboost uses the CART regression tree. Based on the GBDT al­
gorithm, this algorithm performs a second-order Taylor expansion of the
loss function and adds a regular term, which effectively avoids overfitting and speeds up the convergence speed. The idea of XGboost al­
gorithm is to continuously form new decision trees to fit the residuals of
previous predictions, so that the residuals between the predicted values
and the true values are continuously reduced, thereby improving the
prediction accuracy. As shown in Eq. (8), it can be expressed as a form of
addition:
∑K
ŷi =
fk (xi ), fk ∈ F
(8)
k=1
X = (X1 , X2 , ⋯, Xn )
Y = (Y1 , Y2 , ⋯, Yn )
Taking X and Y as examples, three distance algorithms are mainly
used in the KNN algorithm, namely Euclidean distance d1 , Manhattan
distance d2 and Minkowski distance d3 . The three distance algorithms
are written as:
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ √∑
̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
̅
n
(Xi − Yi )2
d1 = (X1 − Y1 )2 + (X2 − Y2 )2 + ⋯ + (Xn − Yn )2 =
i=1
Where, ŷi represents the predicted value of the model; K represents the
number of decision trees;fk represents the kth submodel;xi represents the
ith input sample; F represents the set of all decision trees. The objective
function and regular terms of XGboost are composed as shown in Eqs.
(9) and (10):
(5)
d2 =
∑n
i=1
|Xi − Yi |
√∑
̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅p̅
n
p
d3 =
(|Xi − Yi |)
i=1
(6)
L(∅)t =
(7)
n
∑
(
(t−
l yi , Ŷi
1)
)
+ ft (xi ) + φ(fk )
(9)
i=1
The KNN algorithm is easy to understand and simple to implement.
The disadvantage is that the amount of calculation is large, and the
result will be error when the sample data is unbalanced.
1
φ(fk ) = αT + β‖ω‖2
2
(10)
(t− 1)
Where, L(∅)t represents the objective function of the tth iteration; Ŷi
( )
represents the predicted value of the previous t-1 iteration; φ fk rep­
resents the regular term of the model of the tth iteration, which can
reduce overfitting.α and β represent regular term coefficients to prevent
the decision tree from being too complex; T represents the number of
leaf nodes in this model.
The objective function can be expanded by Taylor formula:
]
n [
T
∑
1
1 ∑
L(∅)t ≅
ωj 2
gi ft (xi ) + hi ft 2 (xi ) + αT + β
2
2
i=1
j=1
2.5. BP neural network
BP neural network is a multilayer feedforward neural network pro­
posed by Rumelhart et al. in 1986. The basic idea of the algorithm is to
transform the mapping problem of neural network learning input and
output into a nonlinear optimization problem, using the most optimized
gradient descent algorithm. The optimized gradient descent algorithm is
used to modify the network weights with iterative operations to mini­
mize the mean square error between the network output and the ex­
pected output. BP neural network is a recursive neural network with
error direction propagation. It has good generalization ability and the
advantages of pattern recognition and pattern classification (West,
2000). The structure of BP neural network is shown in Fig. 2:
[(
)
) ]
(
T
∑
∑
1 ∑
≅
gi ωj +
hi + β ω2j + αT
2 i∈Ij
i∈Ij
j=1
(11)
Where, gi represents the first derivative of sample xi ; hi represents the
second derivative of the sample xi ; ωj represents the output value of the
jth leaf node; Ij represents the jth leaf node value sample subset. The
objective function is a convex function. When the derivative of ωj is
equal to zero, the minimum value ωj * of the objective function can be
2.6. XGBoost
XGBoost uses the idea of iterative operations to transform a large
number of weak classifiers into strong classifiers to achieve accurate
classification results (Chen & Guestrin, 2016). XGboost is the classic
4
T. Wang et al.
obtained, as shown in Eq. (12):
∑
i∈Ij gi
ωj * = − ∑
i∈Ij hi + β
Expert Systems With Applications 191 (2022) 116236
Abellán & Mantas, 2014).
In D-S evidence theory, sets are generally used to represent propo­
sitions, and it is assumed that θ represents a set of mutually exclusive
and exhaustive elements, which is called the recognition frame. The
mass function assigns a probability to each element in the recognition
frame, which is called the basic probability distribution, as follows:
(12)
3. Construction of multi-classification assessment model based
on multi - source information fusion
⎧
m(∅) = 0
⎪
⎨∑
m(A) = 1
⎪
⎩ A⊂θ
m(A) ≥ 0
3.1. Dempster-Shafer’s (D-S) evidence theory
Multi-source information fusion is usually abbreviated as informa­
tion fusion, also known as data fusion, which originated from the mili­
tary applications in the 1970s. It merges or integrates information or
data from multiple sources at different levels of abstraction, and ulti­
mately obtains more complete, reliable, and accurate information or
inferences (Shipp & Kuncheva, 2002; Finlay, 2011). In information
fusion technology (Lee & Chen, 2005; Ling, Cao, & Zhang, 2012),
Dempster-Shafer’s (D-S) evidence theory is widely used in information
fusion due to its superiority in handling uncertain information (Demp­
ster, 1967; Shafer, 1976). D-S evidence theory cannot only deal with the
uncertainty caused by randomness, but also deal with the uncertainty
caused by fuzziness. The potential advantage of D-S method is that it
does not require a priori probability and conditional probability density.
In view of the advantages in this respect, it can be used completely
independently in uncertain assessments (Xu, Krzyzak, & Suen, 1992;
(13)
∅ represents the empty set,m(A) represents the basic probability
allocation of A, and represents the degree of trust to A, where function m
is a mass function.
The mass function for any set of elements in the recognition box is
defined as follows:
∑
Bel(A) =
m(B), (∀A⊂θ)
(14)
B⊂A
Bel(A) represents the total trust in A, that is, the mass function of A is
the sum of all probabilities in A.
Let m1 and m2 be the two mass functions defined on the recognition
frame θ, the calculation of the D-S combination rule for the two evi­
dences is shown in Eq. (15).
Fig. 3. Flow chart of the proposed method.
5
T. Wang et al.
Expert Systems With Applications 191 (2022) 116236
∑
m1 (C1 ) =
S1 ∩S2 =S m1 (S1 )m2 (S2 )
1 − K1
Table 1
Confusion matrix.
(15)
∑
Where K1 = S1 ∩S2 =∅ m1 (S1 )m2 (S2 ) < 1, ∀C⊂θ, C ∕
= ∅, m(∅) = 0.
Briefly, the rest of fusion calculation can be summarized as follows:
∑
Cn− 2 ∩Sn =Cn− 1 mn− 2 (Cn− 2 )mn (Sn )
mn− 1 (Cn− 1 ) =
(16)
1 − K1
∑
Kn−
1
mn− 2 (Cn− 2 )mn (Sn )
=
Predicted position
Predicted negative
Actually positive
Actually negative
TP
FP
FN
TN
matrix which highlights various terms like True Positives (TP), False
Positives (FP), True Negatives (TN), False Negatives (FN) which are
further used to define various assessment metrics used in this study. For
the multi-classification research of this article, on the basis of two
classifications, each category is regarded as “positive” separately, and all
other categories are regarded as “negative”.
(17)
Cn− 2 ∩Sn =∅
3.2. The proposed multi-classification assessment model
The process framework of the newly constructed multi-classification
assessment model based on multi-source information fusion in this paper
is shown in the Fig. 3. The calculation process is as follows:
Step1: Preprocess the raw data set and select feature by Pearson
correlation analysis;
Step2: The preprocessed data set is divided into training data and
testing data. The training data are used as input data into the six base
classifiers. Note that the default values or common values are generally
used for the basic model parameters, and excessive parameter tuning is
not performed to reduce the calculation cost;
Step3: Calculate the probability distribution value mi (Si ) of each base
model at different classification levels on the training set.
Accuracy =
TP + TN
TP + TN + FP + FN
(18)
Precision =
TP
TP + FP
(19)
Recall =
TP
TP + FN
F1 − score =
2*Precision*Recall
Precision + Recall
(20)
(21)
Step4: Gradually fuse the probability distribution value of different
classification levels DS evidence theory. For example, the probability
distribution calculated by the DT model is m1 (S1 ), and the probability
distribution calculated by the FR model is m2 (S2 ). Put m1 (S1 ) and m2 (S2 )
into Eq. (15) for D-S fusion to obtain the probability distribution result
m1 (C1 ), C1 = (C1 BPA1 , C1 BPA2 , ⋯, C1 BPA5 , ) of different classifica­
tion levels. Then put m1 (C1 ) and m3 (S3 ) into Eq. (15) to for D-S fusion. In
this way, the fusion is carried out gradually, and the classification result
is obtained through the final fusion result.
The accuracy is defined in Eq. (18). Precision is defined in Eq. (19),
recall is defined in Eq. (20), and the F1-score is defined in Eq. (21).
Accuracy represents the proportion of all correct predictions in the total
sample, which reflects the overall performance of a classification model.
Precision represents the proportion of the number of correctly predicted
positive samples to the total predicted positive samples, which reflects
the reliability of the output of classification model results. The recall
represents the proportion of those correctly predicted to be positive to
those actually positive, reflecting the coverage degree of the model
classification effect. F1-score is the harmonic average of recall rate and
precision, which is a comprehensive assessment index and a very reli­
able index for evaluating imbalanced data (Visentini, Snidaro, & Foresti,
2016).
4. Empirical design
4.3. Data preparation
4.1. Credit datasets
Data quality is directly related to the pros and cons of subsequent
analysis conclusions, and data preprocessing is an important step to
improve data quality. In the real world, the preliminary collection of
bank personal loan information data is mostly incomplete, noisy, and
low-quality data sets, which are not conducive to direct data analysis
and mining. Therefore, after obtaining the raw data, it is generally
necessary to analyze the data structure in order to eliminate the vari­
ables of duplication and redundancy; Secondly, data cleaning is needed
to deal with missing values and outliers; Besides, it is also necessary to
analyze the rationality of variable selection. The aim of this section is to
enhance data reliability using data preprocessing (Coussement et al.,
2017).
Si = (Si BPA1 , Si BPA2 , ⋯, Si BPA5 , )
The data set of this research comes from an anonymous commercial
bank in China. All the data are real information integrated in the cus­
tomer’s personal loan application records (Wang et al., 2019). The raw
data set includes a total of (27520) customer personal loan credit records
of the commercial bank and 27 related variables. These variables are: x1 :
Customer ID, x2 : Type of Loan Business, x3 : Guarantee the Balance, x4 :
Account Connection Amount, x5 : Security Guarantee Amount, x6 :
Whether Interest is Owed, x7 : Whether Self-service Loan, x8 : Type of
Guarantee, x9 : Safety Coefficient, x10 : Collateral Value (yuan),x11 :
Guarantee Method, x12 : Date Code,x13 : Approval Deadline, x14 : Whether
Devalue Account, x15 : Five-level Classification, x16 : Industry Category,
x17 : Down Payment Amount,x18 : Whether Personal Business Loan, x19 :
Whether Interest is Owed (regulatory standard), x20 : Repayment Type,
x21 : Installment Repayment Method (numerical type), x22 : Installment
Repayment Method (discrete type), x23 : Installment Repayment Cycle
(numerical type), x24 : Repayment Cycle (discrete type), x25 : Number of
Houses, x26 : Month Property Costs, x27 : Family Monthly Income.
4.3.1. Preliminary exploration of data structures
According to the preliminary analysis of the raw data set, there is no
valuable information and some redundant information in the personal
loan data of the bank in this study, so it is necessary to filter and clean
the variables of the raw data. Among them, we remove the variable x1
due to it has useless information; The three variables of variable x3 ,
variable x4 and variable x5 have the redundant information, where
variable x3 and variable x4 contain the same information. And the
variable x5 is the product of variable x4 and variable x9 , so variable x4
and variable x5 are deleted. There are two five-level classification at­
tributes, one of which is retained; Variable x18 is not related to the
characteristic attribute information, delete it; we remove the variable
x6 、variable x8 and variable x12 for the following reasons: variable x6
4.2. Assessment criteria
To compare the performance of our proposal and benchmarks
comprehensively, we choose four representative assessment measures
which terms consisting of accuracy (ACC), precision (P), recall (R), F1score (F) based on the confusion matrix. Table 1 illustrates a confusion
6
T. Wang et al.
Expert Systems With Applications 191 (2022) 116236
and variable x19 express the same information; variable x8 and variable
x11 are repeatedly expressed, and the same is true for Variable x12 and
variable x13 . we dropped out variable x20 , and variable x22 because they
have extremely high similarity value with the variable x21 ; Furthermore,
we remove variable x7 , variable x23 and variable x24 because they have
one value in the data set. The characteristic variables of data attributes
are shown in Table 2.
Table 3
Pre-processing: The percentage of missing values and outliers.
4.3.2. Data cleaning
(1) Missing value. The existence of missing values will affect the
results of data analysis and mining. Similarly, due to the partic­
ularity of the bank personal loan information data set, the data is
not recorded successfully due to the omission of records or the
inconvenience of personal privacy during data collection, or due
to external reasons such as the failure of the data entry storage
device, resulting in vacant values. It will have a certain impact on
the subsequent training of big data algorithms. Therefore, it is
necessary to combine the actual situation of missing data to
process the missing values and optimize the quality of the data to
improve the effectiveness and accuracy of the algorithm. The
former has been firstly performed by removing variable with a
percentage of missing values greater than 55%: variable x16 : In­
dustry Category. Among them, there are 1286 missing values for
the variable x26 : Month Property Costs, 757 missing values for the
variable x21 : Installment Repayment Method, and 20 missing
values for the variable x27 : Family Monthly Income. Missing data
accounts for a relatively low proportion, and these missing values
can be deleted directly for the entry where it is located, as shown
in Table 3.
(2) Outliers. In the data preprocessing, the detection and processing
of outliers is another important step. In reality, the composition
of real data is complex, and there will often be a small number of
abnormal maximum or minimum values. Algorithms that are
sensitive to outliers will lead to deviations in data results and
reduce effectiveness and other negative effects. Since the bank
Variable
types
Attribute
x17 : Down Payment
Amount
numeric
x17 ∈ (0, 7000000)
x3 : Guarantee the
Balance
numeric
x3 ∈ (0, 99000000)
x10 : Collateral Value
(yuan)
numeric
x10 ∈ (0, +∞)
x11 : Guarantee
Method
discrete
x11 = security, mortgage
x21 : Installment
Repayment Method
numeric
x21 = 1 represents equal amount of principal
and interest;x21 = 2 represents equal
principal;
x13 : Approval
Deadline
numeric
x13 ∈ (1, 11000)
x25 : Number of
Houses
numeric
x25 ∈ (0, 2)
x26 : Month Property
Costs
numeric
x26 ∈ (0, 350000)
x27 : Family Monthly
Income
numeric
x27 ∈ (1, +∞)
x19 : Whether Interest
is Owed
discrete
x19 = Y = arrears;x19 = N = no interest
owed;
x14 : Whether Devalue
Account
discrete
x9 : Safety Coefficient
numeric
x14 = Y denotes impairment;
X14 = N means no impairment
Y :Five-Classification
discrete
Missing
values
Outliers
% of Missing
values and
outliers
Processing
Methods
Industry
Category
Installment
Repayment
Method
Month Property
Costs
25,444
–
92.45%
757
–
2.75%
Remove
Variable
Remove
Values
1286
5.21%
Remove
Values
Family Monthly
Income
20
<20 or
>20000 =
150
<1500 = 6
0.10%
Remove
Values
personal loan credit data used in this article is non-normal dis­
tribution, the box percentile method can be selected for the
detection of outliers. Combining with the data set selected in this
paper, it is necessary to screen the variable x26 : Month Property
Costs and the variable x27 : Family Monthly Income for outliers. In
the Table 3 we can see that there are 150 records of variable x26
less than 20 yuan and more than 20,000 yuan and 6 records of
variable x27 less than 1500 yuan. These outlier records account
for a relatively small proportion, and can be directly deleted.
(3) Recoding of discrete variables. It can be seen from Table 1 that
many variables are discrete. These variables can’t be directly
used for data analysis. It is necessary to recode these discrete
variables of character type and convert them into dummy vari­
ables. As shown in Table 1, for example, x11 : Guarantee Method,
x19 : Whether Interest is Owed,x14 : Whether Devalue Account and
Y: Five-Classification data are recoded.
4.3.3. Variable correlation analysis
Table 4 shows the Pearson correlation coefficients between variables
x and Y. it can be seen that in the real data set adopted in this paper, most
variables x and Y show low and medium correlation, and there are no
unreasonable variables with high correlation. Since the data structure in
this paper is nonlinear, even the variables with small correlation co­
efficients are also very important information. Sometimes they may
have obvious effects when they work together, so it is obviously un­
reasonable to delete them. After correlation analysis, we think the above
variables are reasonable (Chen and Li, 2010).
Table 2
Bank personal loan credit attribute characteristic variable description.
Variable
Variable Name
4.3.4. Processing of imbalanced data sets
In practice, the loan customers of commercial banks generally have
more users with good credit than those with bad credit. In this data set,
categories with normal credit ratings account for up to 96%, and there is
a serious imbalance in the data. The imbalance of the personal loan
information data set will cause the classification results to be biased
towards a large number of categories, which will seriously affect the
accuracy of the classification prediction results (Fernández, Jesus, &
Herrera, 2010; Du et al., 2020). Therefore, it is necessary to process the
unbalanced data on the cleaned data. This paper adopts the SMOTE
algorithm proposed by Chawla in 2002, which is an improved technique
based on random oversampling algorithm (Junior et al., 2020). This
technique is currently a common method for processing unbalanced data
and has been unanimously recognized by the academic community (Yeh
and Lien, 2009). The core idea of the algorithm is to analyze and
simulate a few types of samples, and add new artificially simulated
samples to the data set, so that the categories in the raw data are not
seriously out of balance.
5. Experimental results
x9 = 100,80,75,70,60,50
Y = Normal, Secondary, Concern,
Suspicious, Loss
In this section, experiment results are presented to validate the ad­
vantages of the proposed model compared to other comparative
7
T. Wang et al.
Expert Systems With Applications 191 (2022) 116236
Table 4
Pearson correlation coefficient of variable x and Y.
Variable
x17
x3
x10
x11
x21
x13 :
x25
x26
x27
x19
x14 :
x9
r
0.16
0.244
0.08
0.31
− 0.34
0.37
− 0.39
0.452
0.59
0.64
0.67
0.11
seen that in the classification assessment of credit Normal Category, the
BP neural network shows excellent classification coverage and strong
reliability for the overall unbalanced data, followed by the XGboost
model with high reliability in the output of the classification results.
Another is the MIFCA model. Although the recall rate is not very high in
the classification assessment of Normal Category, the comprehensive
assessment index F1-score of the model is second only to XGboost; the
worst classification effect is the KNN model, and the assessment index
values are low.
In credit Secondary Category, as shown in Fig. 5, the accuracy of
MIFCA model is the highest (87.15%), followed by BP neural network
(84.89%), KNN (83.38%), and DT model (56.17%). MIFCA model had
the highest recall (88.27%), followed by DT model (82.29%), and KNN
model (77.70%). The highest F1-score is MIFCA model (87.71%), fol­
lowed by KNN model (80.44%), and DT model (66.77%). In the classi­
fication assessment of credit Secondary Category, the MIFCA model has
the highest assessment performance. The second is the KNN model. The
KNN model has higher accuracy, but its recall rate is the lowest among
several models. The recall of DT model is 82.29%, second only to MIFCA
model (88.27%), but its classification accuracy is the lowest, leading to
the lowest comprehensive assessment index F1-score of the model.
In credit Concern Category, as shown in Fig. 6, the BP neural network
has the highest accuracy at 89.22%, followed by the RF model (86.27%)
and the MIFCA model (84.80%), the worst is the KNN model (75.25%);
the SVM model has the highest recall (89.91%), followed by the SVM
model (89.91%). XGboost model (89.63%) and MIFCA model (83.78%),
the worst is KNN model (72.07%); the highest F1-score is MIFCA model
(86.69%), followed by BP neural network (86.15%) and XGboost model
(85.97%), the worst is KNN model (73.62%). Comprehensive assessment
index, the overall best assessment performance is the MIFCA model
constructed in this paper. Among several models, the BP neural network
has the highest accuracy, but its recall is much lower than the SVM
model. The recall of the SVM model is the highest, but its accuracy is
only higher than the worst KNN model. So comprehensively assessment,
the MIFCA model constructed in this paper has better stability and
overall assessment in terms of accuracy, recall, and F1-score.
classifiers and demonstrate the effectiveness of the proposed model. All
of the experiments used Python Version 3.7 on a PC with 3.0 GHz Intel
CORE i7 processor. The PC had 32 GB of RAM, and ran the Microsoft
Windows 10 operating system.
The comparison of various classifiers results with our proposed
approach are depicted in Tables 5–7 and Figs. 4–12 in terms of accuracy,
precision, recall, F1- score and confusion matrix heat map. In Section
5.1, we demonstrate the comparison of the results of six base classifiers.
Section 5.2 includes the comparison of our method with other classical
methods (Tripathi et al., 2021).
5.1. Classification results of base classifiers
This section aims to assess the performance of the six base classifiers
viz. Decision Tree (DT), Random Forest (RF), Support Vector Machine
(SVM), K-Nearest Neighbors (KNN), BP Neural Network (BP) and
XGboost in their ability to assess credit classification. In Section 5.1.1,
we compare the assessment performance results of the model on the five
classification levels through accuracy, recall and F1-score. In Section
5.1.2, the overall classification performance of the model is measured
from the confusion matrix heat map and accuracy.
5.1.1. Assessment performance of the model at each classification level
The accuracy, recall, and F1-score results of each model under five
credit classification are shown in Table 5 and Figs. 4-8. The results in the
figures and tables clearly show the classification performance of the
MIFCA model constructed in this paper and the six base models under
five credit classifications.
In credit Normal Category, as shown in Fig. 4, the XGboost model has
the highest accuracy at 87.69%, followed by the SVM model (86.64%)
and the MIFCA model (85.92%), and the KNN model (62.29%)is the
worst; the BP neural network model with the highest recall is 91.3 %,
followed by random forest model (87.68%)and XGboost model
(86.17%), the worst is KNN model (76.32%); the highest F1-score is BP
neural network model (87%), followed by XGboost model (86.92%) and
MIFCA model (86.91%), the lowest is the KNN model (68.59%). It can be
Table 5
Accuracy, precision, recall and F1-score of six base classifiers and MIFCA model under five credit classifications.
Model
Base Models
The proposed
ACC
Criterion
C1
C2
C3
C4
C5
DT
0.7295
RF
0.786
SVM
0.7615
KNN
0.7415
BP
0.783
XGboost
0.8025
MIFCA
0.8415
P
R
F
P
R
F
P
R
F
P
R
F
P
R
F
P
R
F
P
R
F
0.7876
0.8639
0.8240
0.8329
0.8769
0.8543
0.8665
0.8188
0.8419
0.6229
0.7632
0.6859
0.8305
0.9134
0.8700
0.8769
0.8617
0.6920
0.8592
0.7792
0.8691
0.5617
0.8229
0.6677
0.7204
0.8171
0.7657
0.6801
0.7965
0.7337
0.8338
0.7770
0.8044
0.8489
0.7294
0.7846
0.8111
0.7951
0.8030
0.8715
0.8827
0.8771
0.8407
0.7940
0.8167
0.8627
0.8341
0.8482
0.7647
0.8991
0.8265
0.7525
0.7207
0.7362
0.8922
0.833
0.8615
0.826
0.8963
0.8597
0.8980
0.8378
0.8669
0.6319
0.7356
0.6798
0.7493
0.7885
0.7684
0.7650
0.6526
0.7043
0.7702
0.7126
0.7403
0.7311
0.7568
0.7437
0.7676
0.7313
0.7490
0.8225
0.8726
0.8468
0.8168
0.5478
0.6558
0.7583
0.6395
0.6938
0.6718
0.6667
0.6692
0.7354
0.7372
0.7363
0.6031
0.6771
0.6380
0.6921
0.7234
0.7074
0.8215
0.8468
0.8340
Note:C1 represents normal category; C2 represents secondary category; C3 represents concern category; C4 represents suspicious category; C5 represents loss category.
8
T. Wang et al.
Expert Systems With Applications 191 (2022) 116236
Table 6
The overall accuracy of the personal credit assessment model.
Model Type
DT
RF
SVM
KNN
BP
XGboost
MIFCA
Accuracy
Accuracy improvement rate
72.95%
15.28%
78.6%
6.99%
76.15%
10.44%
74.15%
13.42%
78.3%
7.41%
80.25%
4.79%
84.15%
—
Table 7
Comparison of the performance achieved by various approaches of credit assessment under five credit classifications.
Model
Statistical Models
Ensemble Models
Advanced Methods
The proposed
ACC
Criterion
C1
C2
C3
C4
C5
LDA
0.6125
LR
0.6321
LightBGM
0.8034
CatBoost
0.8085
PLTR
0.7923
OCHE
0.8234
MIFCA
0.8415
P
R
F
P
R
F
P
R
F
P
R
F
P
R
F
P
R
F
P
R
F
0.7477
0.8074
0.7764
0.6399
0.8668
0.7363
0.8641
0.8452
0.8545
0.8963
0.8398
0.8671
0.8340
0.8828
0.8577
0.8348
0.8797
0.8566
0.8592
0.7792
0.8691
0.1428
0.9006
0.2464
0.3533
0.8162
0.4931
0.8547
0.808
0.8307
0.8520
0.8110
0.8310
0.7228
0.8484
0.7806
0.8082
0.8901
0.8471
0.8715
0.8827
0.8771
0.7987
0.7630
0.7804
0.8755
0.7117
0.7852
0.8216
0.8608
0.8408
0.8091
0.8890
0.8472
0.8782
0.8435
0.8605
0.8813
0.8447
0.8626
0.8980
0.8378
0.8669
0.7020
0.5086
0.5898
0.6694
0.528
0.5904
0.7928
0.7166
0.7528
0.8008
0.7168
0.7565
0.7601
0.7801
0.7699
0.7883
0.7946
0.7914
0.8225
0.8726
0.8468
0.6755
0.4521
0.5416
0.6237
0.4681
0.5348
0.6837
0.7966
0.7359
0.6843
0.8006
0.7379
0.767
0.6452
0.7008
0.8045
0.7260
0.7633
0.8215
0.8468
0.8340
Fig. 4. The assessment classification performance of each model on credit normal category. From left to right in the figure are the accuracy, recall, F1-score of each
credit assessment model on credit Normal Category.
Fig. 5. The assessment classification performance of each model on credit secondary category. From left to right in the figure are the accuracy, recall, F1-score of
each credit assessment model on credit Secondary Category.
9
T. Wang et al.
Expert Systems With Applications 191 (2022) 116236
Fig. 6. The assessment classification performance of each model on credit concern category. From left to right in the figure are the accuracy, recall, F1-score of each
credit assessment model on credit Concern Category.
Fig. 7. The assessment classification performance of each model on credit suspicious category. From left to right in the figure are the accuracy, recall, F1-score of
each credit assessment model on credit Suspicious Category.
Fig. 8. The assessment classification performance of each model on credit loss category. From left to right in the figure are the accuracy, recall, F1-score of each
credit assessment model on credit Loss Category.
In credit Suspicious Category, as shown in Fig. 7, the highest accu­
racy is MIFCA model (82.25%), followed by KNN model (77.02%) and
XGboost model (76.76%), and DT model (63.19%) is the worst. The
highest recall is MIFCA model (87.26%), followed by RF Model
(78.85%) and BP neural network model (75.68%), the worst recall is
SVM model (65.26%); The highest F1-score is MIFCA model (84.68%),
followed by RF model (76.84%) and XGboost model (74.37%). In the
assessment classification of credit Suspicious Category, the assessment
performance indexes of the MIFCA model are the highest, and the
classification effect of the DT model is relatively poor compared to other
models.
In credit Loss Category, as shown in Fig. 8, the DT model has the
highest accuracy at 81.68%, followed by the MIFCA model (80.15%)
and the RF model (75.83%), and the worst is the BP neural network
model (60.31%); the highest recall is the MIFCA model, followed by the
KNN model (73.72%) and the XGboost model (72.34%), and the worst is
the DT model (54.78%); the highest F1-score is the MIFCA model
(83.40%), followed by the KNN model (73.63 %), the lowest is the BP
neural network model (63.80%). In the credit Loss Category, based on
the assessment index values, the overall assessment performance of the
MIFCA model constructed in this paper is the best. The overall assess­
ment performance of the KNN model is second only to the MIFCA model.
The BP neural network model has the worst assessment performance.
Through the above analysis, it can be seen that among the five credit
classifications, DT model and KNN model have the most frequency of the
lowest assessment index values compared with other models. The
10
T. Wang et al.
Expert Systems With Applications 191 (2022) 116236
Fig. 9. Confusion matrix heat maps of six base models.
11
T. Wang et al.
Expert Systems With Applications 191 (2022) 116236
Fig. 10. Confusion matrix heat maps of MIFCA model.
Normal Category and the accuracy in Concern Category, but the recall
rate in Secondary Category is the same as Loss Category has the lowest
accuracy; RF model and XGBOOST model are better than DT model,
KNN model, SVM model and BP neural network model in the classifi­
cation performance of five credit categories. RF model and XGBOOST
model are better than DT model, KNN model, SVM model and BP neural
network model in the classification performance of five credit
categories.
5.1.2. Overall assessment performance of the model
For the overall assessment performance of the model, the confusion
matrix results and the overall accuracy of the model are used to assess
the performance in this part.
A confusion matrix heat map is used to visualize the confusion matrix
results of each model. In the confusion matrix heat map, the color of the
squares on the diagonal deeper representing a high classification accu­
racy for each category. As shown in Figs. 9–10, in the confusion matrix
heat map of DT model, the blue square on the diagonal has the darkest
color in C3, and the lighter color in C2 and C4, indicating that DT model
has the best classification performance in C3, and poor classification
performance in C2 and C4. Similarly, the classification performance of
RF model in C2 is slightly weaker than other categories; The classifica­
tion performance of SVM model is poor in C5. The classification per­
formance of KNN model is the worst in C2 and C5. BP model and XGboost
model have poor performance in C5. It can be seen from the confusion
matrix heat map of MIFCA that the proposed model has good classifi­
cation performance in all five categories, and the overall classification
effect is better than the other six base models, which greatly improves
Fig. 11. Overall accuracy of credit assessment models.
assessment performance of SVM model in Normal Category is second
only to BP neural network model, and the recall rate in Concern Cate­
gory is also the highest, but the recall rate in Suspicious Category is the
lowest. the BP neural network model has the highest recall rate in
12
T. Wang et al.
Expert Systems With Applications 191 (2022) 116236
Fig. 12. Confusion matrix heat maps of other credit assessment models.
13
T. Wang et al.
Expert Systems With Applications 191 (2022) 116236
the weakness of the base models in the classification performance of
some categories.
The overall accuracy of each model is shown in Table 6 and Fig. 11. It
can be seen that the overall assessment accuracy of the MIFCA model
constructed in this paper is as high as 84.1%. Compared with the multiclassification assessment model of bank personal credit constructed by
decision tree, random forest, support vector machine, K-nearest
neighbor, BP neural network and XGboost, the accuracy is significantly
improved (Wang, Hao, Ma, & Jiang, 2011). Among them, the accuracy
rate of the decision tree model is increased the most, up to 15.28%, and
the accuracy rate of the XGboost assessment model, which has better
performance than the assessment model, is also increased by 4.79%.
In summary, it can be clearly seen that when base classifier performs
multi-classification credit assessment, the performance of each classifi­
cation algorithm is different, resulting in different levels of assessment
accuracy errors in different categories, and there is complementarity
among each base classifier. The MIFCA model constructed in this paper
makes full use of the complementarity among multiple sources of in­
formation, fused the redundant information of each classifier, and re­
duces the overall uncertainty of the model, thereby improving the
classification accuracy. More importantly, the MIFCA model fully in­
tegrates the excellent performance of multiple classifiers, and is more
universal for different types of data structures.
assessment methods, the performance of PLTR is similar to RF, and the
classification effect is poor in the C2, C4 and C5 categories. Although the
PLTR evaluation result is weaker than the MIFCA method, its inter­
pretability is better than the machine learning algorithm. The credit
assessment performance of OCHE is slightly weaker than MIFCA. On the
whole, the performance of the five classifications is relatively balanced.
However, the overall classification accuracy of MIFCA is better than
OCHE, and the structure of MIFCA model is more concise than that of
OCHE. Through the above experimental analysis, the credit assessment
model constructed in this paper has shown good performance both on
the whole and in each category.
6. Conclusion
Personal credit assessment is particularly important for banks and
other financial institutions. Every 1% increase in the accuracy of
assessment can avoid the huge losses of banks and other financial in­
stitutions. Numerous classifiers methods have been proposed to handle
the problem of credit scoring. However, compared with most twoclassification credit assessment, there are few studies on universal
multi-classification credit assessment. This paper presents a new multiclassification credit assessment model (MIFCA). The MIFCA model takes
full account of the complementarity of the basic assessment models, and
reduces the uncertainty of the model by virtue of the advantage of D-S
evidence theory, so as to improve the accuracy and robustness of the
classifier. More importantly, MIFCA is a lightweight credit evaluation
model with few parameters and low calculation cost, which meets the
requirements of high efficiency and universality. We further conduct
several experiments to verify the performance. MIFCA and the bench­
marks are evaluated across four metrics using a real-world personal
credit risk data set of a commercial bank in China. Regarding predictive
accuracy, the results demonstrate that the proposed model significantly
outperforms most of benchmark models. No matter in the five credit
rating levels or in the overall level, MIFCA model shows high assessment
accuracy. It can be seen that MIFCA model can fully integrate the
excellent performance of a variety of classifiers, which is suitable for
different credit risk data and has more extensive practical value. It can
provide an important reference for the credit risk control of commercial
banks.
5.2. Comparison of MIFCA with other approaches of credit assessment
We select six classic and commonly credit assessment models from
the traditional statistical methods, the ensemble methods and the
advanced methods to compare and analyze the assessment results.
Among them, LDA and LR are the two most commonly credit assessment
benchmark models in statistics (Dumitrescu et al., 2021). LightBGM and
CatBoost are classic ensemble credit assessment methods, and PLTR and
OCHE are the latest two research credit assessment methods.
In previous studies, several typical classification models have been
applied in classification, such as LDA (Fisher, 1936), LR (Hand & Kelly,
2002). LDA as a classification approach is utilized to find a linear
combination of features that characterizes into the classes of objects; LR,
which is a widely used statistical modeling technique, could build a
model with classification outcome and has been proven as a powerful
algorithm (Lee et al., 2006); LightGBM is an advanced GBDT-based al­
gorithm developed by Ke et al. (2018). Some experiments have shown
that LightGBM is superior to the original GBDT while consuming less
computational costs. Ke et al even demonstrated that LightGBM provides
better results than XGBoost on a variety of datasets; Developed by
Prokhorenkova, Gusev, Vorobev, Dorogush, and Gulin (2018), CatBoost
is a powerful open-sourced GBDT-based technique that achieves prom­
ising results in a variety of ML tasks. The authors claimed that CatBoost
outperformed existing GBDT techniques; OCHE is a novel credit scoring
models which considered selective heterogeneous ensemble developed
by Xia et al. (2020). OCHE employs advanced tree-based classifiers as
base models and fully considers the overfitting in ensemble selection
stage; PLTR is a high-performance and interpretable credit scoring
method proposed by Dumitrescu et al. (2021), which uses information
from decision trees to improve the performance of logistic regression.
PLTR allows to capture non-linear effects that can arise in credit scoring
data while preserving the intrinsic interpretability of the logistic
regression model(Dumitrescu et al., 2021).
Table 7 gives a comparative study between the other classical ap­
proaches proposed in this domain and our proposed model MIFCA on
this paper obtained datasets. As can be seen from Table 7 and Fig. 12, the
results of the two traditional statistical credit assessments are poor,
which is also consistent with the conclusions of existing studies.
The credit assessment results obtained by the two ensemble methods
are relatively similar, and the classification effect on the C5 category is
relatively poor. Obviously, the ensemble method has better performance
than the general machine learning algorithm; For the latest credit
CRediT authorship contribution statement
Tianhui Wang: Conceptualization, Methodology, Software, Valida­
tion, Writing – original draft, Writing – review & editing. Renjing Liu:
Supervision, Writing – review & editing. Guohua Qi: Data curation,
Investigation.
Declaration of Competing Interest
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influence
the work reported in this paper.
Acknowledgement
This work was partially supported by the Major Project of the Na­
tional Social Science Fund of China (18ZDA104).
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.
org/10.1016/j.eswa.2021.116236.
14
T. Wang et al.
Expert Systems With Applications 191 (2022) 116236
References
Hens, A. B., & Tiwari, M. K. (2012). Computational time reduction for credit scoring: An
integrated approach based on support vector machine and stratified sampling
method. Expert Systems with Applications, 39(8), 6774–6781.
Hand, D. J., & Henley, W. E. (1997). Statistical classification methods in consumer credit
scoring: A review. Journal of the Royal Statistical Society, 160(3), 523–541.
Harris, T. (2015). Credit scoring using the clustered support vector machine. Expert
Systems with Applications, 42, 741–750.
Henley, W. E., & Hand, D. J. (1996). A k-Nearest-Neighbour Classifier for Assessing
Consumer Credit Risk. The Stasistician, 45(1), 77–95.
Ince, H., & Aktan, B. (2009). A comparison of data mining techniques for credit scoring in
banking: A managerial perspective. Journal of Business Economics and Management,
10, 233–240.
Junior, L. M., Nardini, F. M., Renso, C., Trani, R., & Macedo, J. A. (2020). A novel
approach to define the local region of dynamic selection techniques in imbalanced
credit scoring problems. Expert Systems with Applications, 152, 113351.
Lessmann, S., Baesens, B., Seow, H.-V., & Thomas, L. C. (2015). Benchmarking state-ofthe-art classification algorithms for credit scoring: An update of research. European
Journal of Operational Research, 247, 124–136.
Ling, Y., Cao, Q., & Zhang, H. (2012). Credit scoring using multi-kernel support vector
machine and chaos particle swarm optimization. International Journal of
Computational Intelligence & Applications, 11(3), 6774–7142.
Lee, T.-S., Chiu, C.-C., Chou, Y.-C., & Lu, C.-J. (2006). Mining the customer credit using
classification and regression tree and multivariate adaptive regression splines.
Computational Statistics and Data Analysis, 50, 1113–1130.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., & Liu, T.-Y. (2018). LightGBM: A
Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information
Processing Systems.. In (pp. 3149–3157).
Lee, T., & Chen, I. (2005). A two-stage hybrid credit scoring model using artificial neural
networks and multivariate adaptive regression splines. Expert Systems with
Applications, 28(4), 743–752.
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018).
CatBoost: unbiased boosting with categorical features. In Advances in Neural
Information Processing Systems, 6638–6648.
Shafer, G. (1976). A mathematical theory of evidence. Princeton University Press.
Shipp, C. A., & Kuncheva, L. I. (2002). Relationships between combination methods and
measures of diversity in combining classifiers. Information fusion, 3, 135–148.
Tripathi, D., Edla, D. R., Bablani, A., Shukla, A. K., & Reddy, B. R. (2021). Experimental
analysis of machine learning methods for credit score classification. Progress in
Artificial Intelligence, 1–27.
Visentini, I., Snidaro, L., & Foresti, G. L. (2016). Diversity-aware classifier ensemble
selection via f-score. Information Fusion, 28, 24–43.
Woźniak, M., Graña, M., & Corchado, E. (2014). A survey of multiple classifier systems as
hybrid systems. Information Fusion, 16, 3–17.
Wang, G., Hao, J., Ma, J., & Jiang, H. (2011). A comparative assessment of ensemble
learning for credit scoring. Expert Systems with Applications, 38, 223–230.
Wang, B., Kong, Y., Zhang, Y., Liu, D., & Ning, L. (2019). Integration of Unsupervised and
Supervised Machine Learning Algorithms for Credit Risk Assessment. Expert Systems
with Applications, 128(AUG), 301–315.
West, D. (2000). Neural network credit scoring models. Computers & Operations Research,
27(11–12), 1131–1152.
Xu, L., Krzyzak, A., & Suen, C. Y. (1992). Methods of combining multiple classifiers and
their applications to handwriting recognition. EEE Transactions on Information
Theory, 22(3), 418–435.
Xia, Y., Zhao, J., He, L., Li, Y., & Niu, M. (2020). A novel tree-based dynamic
heterogeneous ensemble method for credit scoring. Expert Systems with Applications,
159, 113615. https://doi.org/10.1016/j.eswa.2020.113615
Xiao, H., Xiao, Z., & Wang, Y. (2016). Ensemble classification based on supervised
clustering for credit scoring. Applied Soft Computing, 43, 73–86.
Xia, Y., Liu, C., Li, Y., & Liu, N. (2017). A boosted decision tree approach using bayesian
hyper-parameter optimization for credit scoring. Expert Systems with Applications, 78,
225–241.
Xia, Y., Liu, C., Da, B., & Xie, F. (2018). A novel heterogeneous ensemble credit scoring
model based on bstacking approach. Expert Systems with Applications, 93, 182–199.
Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the
predictive accuracy of probability of default of credit card clients. Expert Systems with
Applications, 36(2), 2473–2480.
Zhang, W., He, H., & Zhang, S. (2019). A novel multi-stage hybrid model with enhanced
multi-population niche genetic algorithm: An application in credit scoring. Expert
Systems with Applications, 121, 221–232.
Zhang, W., Yang, D., & Zhang, S. (2021). A new hybrid ensemble model with votingbased outlier detection and balanced sampling for credit scoring. Expert Systems with
Applications, 174, 114744.
Abid, L., Masmoudi, A., & Zouari-Ghorbel, S. (2016). The consumer loan’s payment
default predictive model: an application of the logistic regression and the
discriminant analysis in a tunisian commercial bank. Journal of the Knowledge
Economy, 9(3), 948-962.
Abdelmoula, A. K. (2015). Bank credit risk analysis with k-nearest-neighbor classifier:
Case of tunisian banks. Journal of Accounting & Management Information Systems, 14
(1), 79–106.
Abellán, J., & Mantas, C. J. (2014). Improving experimental studies about ensembles of
classifiers for bankruptcy prediction and credit scoring. Expert Systems with
Applications, 41(8), 3825–3830.
Akko, S. (2012). An empirical comparison of conventional techniques, neural networks
and the three-stage hybrid adaptive neuro fuzzy inference system (anfis) model for
credit scoring analysis: The case of turkish credit card data. European Journal of
Operational Research, 222(1), 168–178.
Bahnsen, A. C., Aouada, D., & Ottersten, B. (2015). Example-dependent cost-sensitive
decision trees. Expert Systems with Applications, 42(19), 6609–6619.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Bensic, M., Sarlija, N., & Zekic-Susac, M. (2005). Modelling small-business credit scoring
by using logistic regression, neural networks and decision trees. Intelligent Systems in
Accounting, Finance & Management: International Journal, 13, 133–150.
Baesens, B., Gestel, T. V., Viaene, S., Stepanova, M., & Vanthienen, J. S. (2003).
Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of
the Operational Research Society, 54(6), 627–635.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20,
273–297.
Coussement, K., Lessman, S., & Verstraeten, G. (2017). A comparative analysis of data
preparation algorithms for customer churn prediction: A case study in the
telecommunication. Decision Support Systems, 95, 27–36.
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings
of the 22nd Acm Sigkdd International Conference on Knowledge Discovery & Data Mining
(pp. 785–794). ACM.
Caigny, A. D., Coussement, K., & Bock, K. W. D. (2018). A new hybrid classification
algorithm for customer churn prediction based on logistic regression and decision
trees. European Journal of Operational Research, 269(2), 760–772.
Chen, F. L., & Li, F. C. (2010). Combination of feature selection approaches with SVM in
credit scoring. Expert Systems with Applications, 37(7), 4902–4909.
Dumitrescu, E., Hué, S., Hurlin, C., et al. (2021). Machine learning for credit scoring:
Improving logistic regression with non-linear decision-tree effects. European Journal
of Operational Research. https://doi.org/10.1016/j.ejor.2021.06.053
Du, G., Zhang, J., Luo, Z., Ma, F., & Li, S. (2020). Joint imbalanced classification and
feature selection for hospital readmissions. Knowledge-Based Systems, 200, 106020.
Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE
Transactions on Information Theory, 13, 21–27.
Davis, R. H., Edelman, D. B., & Gammerman, A. J. (1992). Machine-learning algorithms
for credit-card applications. IMA Journal of Management Mathematics, 4(1), 43–51.
Durand, D. (1941). Risk Elements in consumer Installment financing. New York: National
Bureau of Economic Research.
Desai, V. S., Crook, J. N., & Overstreet, G. A., Jr (1996). A comparison of neural networks
and linear scoring models in the credit union environment. European Journal of
Operational Research, 95(1), 24–37.
Dempster, A. (1967). Upper and lower probabilities induced by a multivalued mapping.
Annals of Mathematical Statistics, 38(2), 325–339.
Feng, Xiaodong, Xiao, Zhi, Zhong, Bo, et al. (2018). Dynamic ensemble classification for
credit scoring using soft probability. Applied Soft Computing, 65, 139–151.
Fernández, M. J. Jesus, F. Herrera. (2010). On the 2-tuples based genetic tuning
performance for fuzzy rule based classification systems in imbalanced datasets.
Information Sciences, 180(8), 1268-1291.
Finlay, S. (2011). Multiple classifier architectures and their application to credit risk
assessment. European Journal of Operational Research, 210, 368–378.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of
Human Genetics, 7(2), 179–188.
Hsieh, N. C., Hung, L. P., & Ho, C. L. (2009). A data driven ensemble classifier for credit
scoring analysis. Expert Systems with Applications, 37(1), 534–545.
Hand, D. J., & Kelly, M. G. (2002). Superscorecards. IMA Journal of Management
Mathematics, 13(4), 273–281.
Huang, C.-L., Chen, M.-C., & Wang, C.-J. (2007). Credit scoring with a data mining
approach based on support vector machines. Expert Systems with Applications, 33,
847–856.
15
Descargar