Academic paper summary: Employing Explainable AI to Optimize the Return Target Function of a Loan Portfolio
Source: https://www.frontiersin.org/articles/10.3389/frai.2021.693022/full
Background
It is essential in finance to measure risk and to be able to price it accurately. In the case of credit scoring, pricing can be challenging because we have an asymmetric loss distribution with fat tails: significant losses have a low probability, while small gains have a high probability. As defaults are rare events, it can be challenging to improve prediction accuracy, and ML models would need to be very robust and based on high-quality data (which is rarely the case with credit data) to do so. This creates 3 types of problems:
- “optimization in the tail of the loss distribution as such is difficult — because of data issues, error propagation, and accuracy restrictions”
- “optimizing for the lowest number of defaults conditional to some lower accuracy bound will often result in a suboptimal economic situation, the cost/benefit ratio of defaulters and non-defaulters being highly asymmetrical”
- “The soaring use of advanced ML techniques in finance with the desired goal of higher prediction accuracy renders the decision process increasingly opaque. This, in turn, conflicts with the demands for transparency and explainability issued by regulatory bodies and supervisory authorities in finance”
Defining the economic pay-off function
False positives (loans granted but not repaid) are more expensive than false negatives (loans rejected that would have been repaid). As a result, the cost of a few defaults is balanced by the premium paid by many non-defaulters. Therefore, “omitting a default is very “valuable” and might be done, from an economic point of view, at the cost of forgoing quite a lot of premium-paying business, that is, accepting a high false positive rate (FPR).” This situation creates a fundamental issue: “since accepted defaults are much more costly than forgone non-defaulting business, the optimum with regard to the accuracy of predicting the number of defaults (what ML usually optimized) never can be the same as the optimum with regard to predicting the highest payoff (what financial institutions want)”.
A naïve example where a credit business accepts all loans:
- The false positive rate and the true positive rate will be zero
- Since defaults are (relatively) rare events, the accuracy of this estimator will be already quite high, although we accepted all of the defaulters
- Bringing down the number of accepted defaulters will increase the TPR, but as we have to reject businesses, it will also increase the FPR which will result in an income loss (due to increasingly rejecting non-defaulting businesses)
- Consequently, a substantial further gain in prediction accuracy is often simply not achievable, even when employing highly sophisticated ML techniques
Training simple models to identify defaulting and nondefaulting loan contracts
The authors trained two credit scoring models (logistic regression and a decision tree) to maximize profits rather than accuracy. As a result, they increase profits by 12–14% while keeping “both models as simple as possible: first, because this guarantees a certain level of ad hoc explainability as discussed in the previous section, and second, because more elaborate models often fail to significantly improve the models’ performance while quickly compromising its transparency.”
Logistic regression
Concept
In order to choose a threshold to maximize profit rather than accuracy, the authors “weigh each contract with the dollar cost resulting if it is wrongly classified, that is, we weigh the non-defaulting contracts by the income we lose, if we reject them [0.149 USD of income loss], and the defaulting contracts by the loss we incur if we accept them (0.80 USD, equal to the face value of the contract reduced by the recovery rate of 20%)”.
Results
This logistic regression model has a lower optimum threshold (0.17, meaning all contracts with default probability scores below 0.17 threshold are accepted), while a logistic regression model optimizing accuracy gives a threshold of 0.4.
“At a threshold of 1 (accepting all contracts), we get the profit of the naive model 140.06. At a threshold of 0 (rejecting all contracts), we have no business at all and thus also zero profit. However, in between these two extremes, at the threshold 0.17, a very pronounced maximum is reached with a profit of USD 265.26.” Using the best threshold for the accuracy-oriented model, the resulting profit would be 221.25 USD.
“It has a quite low acceptance rate (roughly 15% of all contracts are rejected). However, it manages to correctly identify and reject more than half of the defaulting contracts. This comes at the cost of wrongly rejecting many of the good, non-defaulting contracts (nearly 10% of the good business is rejected). Consequently, the model’s accuracy is (slightly) reduced compared to the accuracy-maximizing model. However, the model’s profit could be increased considerably since it was able to identify many more of the defaulting contracts.”
Decision Tree
Concept
The authors use a simple tree with a depth of 3 and minimize the Gini impurity measure to grow the tree. They also leave the threshold at 0.5 as only a depth of 3 used. “We can tune the growth of the decision tree by using weights for the two classes of contracts in the training data. We do this by tuning the relative weight of the non-defaulting contracts relative to the defaulting ones in a range from 10^− 4 to 10⁴. “A very low weight, that is, more or less neglecting non-defaulting contracts altogether, leads to a tree that basically rejects all contracts (corresponding to a logistic regression model with a threshold close to 0), whereas a high weight, that is, more or less neglecting the defaulting contracts, leads to a tree that accepts nearly all contracts (corresponding to the naive model or a logistic regression model with a threshold close to 1).” The best tree optimizing bank profits has a weight of the non-defaulting contracts reduced by a factor of 0.2 relative to the defaulting contracts.
Results
Similarly to the logistic regression model, accuracy improvement against a naive model is low but strongly improves the other performance figures. The profit-maximizing model obtains a profit of 282.54 USD (for an accuracy of 0.84), while the accuracy-maximizing model gives 224.78 USD for 0.91% accuracy.
Conclusion
“To maximize profit, it is crucial to include the user’s target function in the choice of the best possible model and parameters. In the case of the logistic regression, we tuned the threshold distinguishing between accepted and rejected contracts to maximize the given profit target function. In case of the decision tree, we used weighting to balance the data not to equal shares of both types of contracts but to reflect the impact of the model’s correct and wrong decisions on the target function, the bank’s profit.” Therefore instead of focusing on prediction accuracy with black box models resulting in only small improvement, it is better to use simple algorithms that can easily explain results while maximizing profit. “We observe that the profit-maximizing models tend to reject surprisingly many of the contracts, that is, these models accept a lot of falsely rejected good business in order to sort out a few more of the defaulting contracts. This is because the cost of a wrongly accepted defaulting contract by far outweighs the loss incurred by falsely rejected good, non-defaulting contract (forgone business).”