Credit Risk Database: Credit Scoring Models for Thai SMEs
Abstract
This paper aims to provide an introduction to Credit Risk Database (CRD), a collection of financial and non-financial data for SME credit risk analysis, for Thailand. Aligning with the Bank of Thailand (BOT)'s strategic plan to develop the data ecosystem to help reduce asymmetric information problem in the financial sector, CRD is an initiative to effectively utilize data already collected from financial institutions as a part of the BOT's supervisory mandate. Our first use case is intended to help improve financial access for SMEs, by building credit risk models that can work as a complementary tool to help financial institutions and Credit Guarantee Corporation assess SMEs financial prospects in parallel with internal credit score. Focusing on SMEs who are new borrowers, we use only SME's financial and non-financial data as our explanatory variables while disregarding past default-related data such as loan repayment behavior. Credit risk models of various methodologies are then built from CRD data to allow financial institutions to conduct effective risk-based pricing, offering different sets of interest rates and loan terms. Statistical methods (i.e. logit regression and credit scoring) and machine learning methods (i.e. decision tree and random forest) are used to build credit risk models that can help quantify the SME's one-year forward probability of default. Out-of-sample prediction results indicate that the statistical and machine learning models yield reasonably accurate probability of default predictions, with the maximum Area under the ROC Curve (AUC) at approximately 70–80%. The model with the best performance, as compared by the maximum AUC, is the random forest model. However, the credit scoring model that is developed from logistic regression of weighted-of-evidence variables is more user-friendly for credit loan providers to interpret and develop practical application, achieving the second-best AUC.