Heterogeneous ensemble learning with feature engineering for default prediction in peer-to-peer lending in China

Research output: Contribution to journalArticlepeer-review

33 Scopus citations

Abstract

In recent years, peer-to-peer (P2P) lending in China, which is a new form of unsecured financing that uses the Internet, has boomed, but the consequent credit risk problems are inevitable. A key challenge facing P2P lending platforms is accurately predicting the default probability of the borrower of each loan using the default prediction model, which effectively helps the P2P lending platform avoid credit risks. The traditional default prediction model based on machine learning and statistical learning does not meet the needs of P2P lending platforms in terms of default risk prediction because for data-driven P2P lending, credit data have a large number of missing values, are high-dimensional and have class-imbalanced problems, which makes it difficult to effectively train the default risk prediction model. To solve the above problems, this paper proposes a new default risk prediction model based on heterogeneous ensemble learning. Three individual classifiers, extreme gradient boosting (XGBoost), a deep neural network (DNN) and logistic regression (LR), are used simultaneously with a liner weight ensemble strategy. In particular, this model is able to process missing values. After generating discrete and rank features, this model adds missing values to the model for self-training. Then, the hyperparameters are optimized by the XGBoost model to improve the performance of the prediction model. Finally, compared with the benchmark model, the proposed method significantly improves the accuracy of the prediction results. In conclusion, the prediction method proposed in this paper solves the class-imbalanced problem.

Original languageEnglish
Pages (from-to)23-45
Number of pages23
JournalWorld Wide Web
Volume23
Issue number1
DOIs
StatePublished - 1 Jan 2020
Externally publishedYes

Keywords

  • Default prediction
  • Feature engineering
  • Heterogeneous ensemble learning
  • Hyperparameter optimization
  • imbalanced data

Fingerprint

Dive into the research topics of 'Heterogeneous ensemble learning with feature engineering for default prediction in peer-to-peer lending in China'. Together they form a unique fingerprint.

Cite this