Credit risk with R
Back to Quantitative Finance with
Introduction.
Credit risk models play a pivotal role in the financial landscape, providing institutions with a systematic framework to assess and manage the risks associated with lending and investment activities. One crucial component of these models is the estimation of the probability of default, which quantifies the likelihood that a borrower will fail to meet their debt obligations. The relevance of credit risk models lies in their ability to enhance decision-making processes by offering a comprehensive understanding of the potential creditworthiness of individuals, companies, or financial instruments. By utilizing statistical methods, historical data, and various financial indicators, these models enable financial institutions to make informed decisions about lending, pricing, and portfolio management. The estimation of the probability of default not only aids in risk assessment but also supports regulatory compliance, allowing institutions to maintain a healthy balance between risk and return in their credit portfolios. In an ever-evolving financial landscape, the continual refinement and application of credit risk models are essential for fostering stability and resilience within the financial system.
1 Loan analysis.
The main objective of this section is to predict whether a person or client applying for a loan will repay it or not. This issue can be considered a classification problem, where based on information about the client and the characteristics of their loan application, the client is classified into one of two categories: whether they will default or not. There are various ways to approach this classification problem; in this section, we will use a simple machine learning model called logistic regression. The model is first estimated or trained and then evaluated using new data to determine how well it performs the classification.
Let’s load the required R packages first.
This section relies on the DataCamp course Credit Risk Modeling in R by Lore Dirick. However, we incorporate a slightly different database and an extended analysis.
1.1 Explore the database.
Let’s load the data called loan_data_ARF.rds
and then understand its structure before conducting any further analysis. This database is available here.
'data.frame': 29092 obs. of 10 variables:
$ loan_status : int 0 0 0 0 0 0 1 0 1 0 ...
$ loan_amnt : int 5000 2400 10000 5000 3000 12000 9000 3000 10000 1000 ...
$ int_rate : num 10.6 11 13.5 11 11 ...
$ grade : Factor w/ 7 levels "A","B","C","D",..: 2 3 3 1 5 2 3 2 2 4 ...
$ emp_length : int 10 25 13 3 9 11 0 3 3 0 ...
$ home_ownership: Factor w/ 4 levels "MORTGAGE","OTHER",..: 4 4 4 4 4 3 4 4 4 4 ...
$ annual_inc : num 24000 12252 49200 36000 48000 ...
$ age : int 33 31 24 39 24 28 22 22 28 22 ...
$ sex : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 2 2 1 2 ...
$ region : Factor w/ 4 levels "E","N","S","W": 1 1 3 3 2 2 2 2 4 1 ...
This dataset could represent a typical collection of data from a financial institution, such as a bank or a company that uses credit channels to sell its products or services. It contains 29,092 observations across 10 variables. Each observation corresponds to the personal and loan characteristics of an individual loan.
A key variable, our dependent variable, is loan_st
, which indicates loan status. A value of 0 represents no default, while a value of 1 represents default. A default occurs when a borrower fails to make timely payments, misses payments, or ceases payments on the interest or principal owed. The definition of default can vary depending on the goals and interests of the analysis. For the purposes of this study, we classify loans simply as either default or no default.
The variable loan_st
is dichotomous, or categorical. Our primary interest lies in predicting whether a new loan application will result in default or not.
Clearly, loan_data_ARF.rds
represents past information, as we know with certainty whether an individual defaulted (1) or not (0). Historical data is valuable for understanding how likely an individual is to default based on their personal and loan characteristics. It is also essential for training quantitative models to predict outcomes for new applicants and for evaluating the performance of our classifications.
A variable name is considered too long when a shorter name can convey the same purpose just as effectively. Let’s rename some of them.
Code
old_names new_names
1 loan_status loan_st
2 loan_amnt l_amnt
3 int_rate int
4 grade grade
5 emp_length emp_len
6 home_ownership home
7 annual_inc income
8 age age
9 sex sex
10 region region
New variable names have now been assigned.
We can take a look at the information in different ways. For example, look at the first 10 rows out of 29,092.
loan_st l_amnt int grade emp_len home income age sex region
1 0 5000 10.65 B 10 RENT 24000 33 0 E
2 0 2400 10.99 C 25 RENT 12252 31 0 E
3 0 10000 13.49 C 13 RENT 49200 24 0 S
4 0 5000 10.99 A 3 RENT 36000 39 0 S
5 0 3000 10.99 E 9 RENT 48000 24 1 N
6 0 12000 12.69 B 11 OWN 75000 28 1 N
7 1 9000 13.49 C 0 RENT 30000 22 1 N
8 0 3000 9.91 B 3 RENT 15000 22 1 N
9 1 10000 10.65 B 3 RENT 100000 28 0 W
10 0 1000 16.29 D 0 RENT 28000 22 1 E
Note that sex
is 1 for female and 0 for male. Now, instead of looking the details of the first 10 rows, we can summarize with respect to home
with the CrossTable()
function.
Cell Contents
|-------------------------|
| N |
| N / Table Total |
|-------------------------|
Total Observations in Table: 29092
| MORTGAGE | OTHER | OWN | RENT |
|-----------|-----------|-----------|-----------|
| 12002 | 97 | 2301 | 14692 |
| 0.413 | 0.003 | 0.079 | 0.505 |
|-----------|-----------|-----------|-----------|
We can also use CrossTable()
to summarize two variables. In particular, instead of counting for home ownership we can add a second dimension like loan_st
. This allows us to create more informative tables.
Code
Cell Contents
|-------------------------|
| N |
| N / Row Total |
|-------------------------|
Total Observations in Table: 29092
| dat$loan_st
dat$home | 0 | 1 | Row Total |
-------------|-----------|-----------|-----------|
MORTGAGE | 10821 | 1181 | 12002 |
| 0.902 | 0.098 | 0.413 |
-------------|-----------|-----------|-----------|
OTHER | 80 | 17 | 97 |
| 0.825 | 0.175 | 0.003 |
-------------|-----------|-----------|-----------|
OWN | 2049 | 252 | 2301 |
| 0.890 | 0.110 | 0.079 |
-------------|-----------|-----------|-----------|
RENT | 12915 | 1777 | 14692 |
| 0.879 | 0.121 | 0.505 |
-------------|-----------|-----------|-----------|
Column Total | 25865 | 3227 | 29092 |
-------------|-----------|-----------|-----------|
This table reveals defaults by home ownership. We can use histograms to see one variable distribution. In this case we have the interest rate distribution.
Code
The following is a dotplot figure for the annual income.
Code
The Figure 1.2 above appears suspicious. The horizontal axis shows a very large annual income value (6,000,000). Additionally, there are only a few individuals with extremely high incomes. We should investigate further to determine whether these are valid observations or errors in the original dataset.
loan_st l_amnt int grade emp_len home income age sex region
4861 0 12025 14.27 C 13 RENT 1782000 63 0 E
13931 0 10000 6.54 A 16 OWN 1200000 36 0 W
15386 0 1500 10.99 A 5 MORTGAGE 1900000 60 1 N
16713 0 12000 7.51 A 1 MORTGAGE 1200000 32 0 E
19486 0 5000 12.73 C 12 MORTGAGE 6000000 144 1 E
22811 0 10000 10.99 A 1 MORTGAGE 1200000 40 1 E
23361 0 6400 7.40 A 7 MORTGAGE 1440000 44 1 E
23683 0 6600 7.74 A 9 MORTGAGE 1362000 47 0 E
28468 0 8450 12.29 C 0 RENT 2039784 42 0 E
One individual (ID 19486) is not only extremely wealthy but also 144 years old. As a result, I have decided to remove these nine observations. Data cleaning is a common task when working with large datasets, and it is acceptable as long as the integrity of the data remains intact.
Remember, the original dataset contains 29,092 rows. After removing 9 observations, we are left with 29,083 rows. Let’s take a look at the result.
Code
Somewhat better now.
We could also explore some gender issues.
Code
# A tibble: 2 × 7
sex Mean Median Min Max Count Gender
<fct> <dbl> <dbl> <dbl> <dbl> <int> <chr>
1 0 66021. 56000 4000 900000 13877 Male
2 1 67064. 57000 4200 948000 15206 Female
The table shows that females tend to have slightly higher average and median incomes compared to males. Additionally, the income distribution for females appears more variable, with a wider range of incomes. While the sample sizes for both groups are similar, females are slightly more represented. Overall, the data suggests a modest income advantage for females and greater variability in their earnings.
The better we understand the data, the better positioned we are to analyze it, gain deeper insights into the problem, and interpret the solutions more effectively. However, there is a risk of falling into the trap of overexploration, losing sight of the main problem and failing to address the objectives. Data cleaning, for instance, can be an ongoing task. Removing outliers with excessively high incomes is just one example of how data cleaning can be approached. For now, we pause the exploration and cleaning process and move on to the next section, where we estimate models that will help us achieve the outlined objectives.
1.2 Logistic models.
Logistic models allows us to make predictions about loan defaults. Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable like loan_st
. In this case, the binary dependent variable is default (1) or no default (0). A good reference for this section is Hull (2020).
First, load the data and split it into two sets: (1) training and (2) test. The training set is for building and estimate models, and the test set is used to evaluate our model predictions with new data. When estimating models, it is common practice to separate the available data into two parts, training and test data, where the training data is used to estimate parameters (in-sample) and the test data is used to evaluate its accuracy (out of sample). Because the test data is not used in determining the estimation, it should provide a reliable indication of how well the model is likely to estimate or forecast on new data. In sum, we train the model, we test the model, and once we are OK with the model performance on new data, we are ready to use it in real-life applications. If we ignore this split and use the whole database to estimate our models, we may succeed at explaining defaults in our database but we may fail to explain defaults for new loan applications.
Code
# It is convenient to set the loan status as factor.
dat$loan_st <- as.factor(dat$loan_st)
set.seed(567)
index_train <- cbind(runif(1 : nrow(dat), 0 , 1), c(1 : nrow(dat)))
index_train <- order(index_train[, 1])
index_train <- index_train[1: (2/3 * nrow(dat))]
# Create training set
train <- dat[index_train, ]
# Create test set
test <- dat[-index_train, ]
Variables as factors are useful for model estimation and data visualization. Factors are variables in R which take on a limited number of different values; such variables are often referred to as categorical variables.
We have 29,083 observations in dat
. The code above randomly selects \(29083 \times (2/3)=19388\) rows to form the train
. The test
are the remaining \(29083-19388=9695\) rows. The random selection is highly recommended as the dat
may have some structure or sorting that could bias our model estimation and negatively impact our model test. For example, imagine that for some weird reason the database is sorted in such a way that the first observations are all no default. If that is the case, then the training and the test set would not have portions of default and no default cases and we may distort the whole analysis. The random selection allows us to replicate a real situation in which our database is unsorted, with different characteristics.
Let’s see if the recent created train
and test
have the same basic properties than dat
.
Code
# Combine data into a list for processing
data_list <- list(dat = dat$loan_st, train = train$loan_st, test = test$loan_st)
# Compute proportions and combine into a data frame
prop <- do.call(rbind, lapply(data_list, function(x) prop.table(table(x))))
colnames(prop) <- c("no defaults", "defaults")
row.names(prop) <- c("dat_prop", "train_prop", "test_prop")
prop
no defaults defaults
dat_prop 0.8890417 0.1109583
train_prop 0.8882298 0.1117702
test_prop 0.8906653 0.1093347
The proportions of “defaults” and “no defaults” in the train and test datasets are very similar to those in the original dataset, indicating that both samples are representative. Specifically, the proportion of “defaults” in the train and test datasets closely matches that of the original dataset, as does the proportion of “no defaults.” This suggests that the random sampling process has preserved the overall distribution of the target variable, ensuring that the train and test sets reflect the characteristics of the full dataset.
Assume we think that the loan_st
depends on the age of the individual. We can estimate a simple logistic model to learn about the relationship between age and loan status.
\(\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 \times \text{age}\).
Where:
\(p\) is the probability that
loan_st = 1
(e.g., the probability of default).\(\beta_0\) is the intercept, and \(\beta_1\) is the coefficient for the predictor variable
age
.
The dependent variable in logistic regression is expressed as: \(\log\left(\frac{p}{1-p}\right)\). This transformation, known as the logit function, maps probabilities \(p\) which range between 0 and 1 to the entire real number line \(-\infty\) to \(+\infty\). This allows the model to establish a linear relationship between the independent variable(s) (e.g., age) and the transformed dependent variable, making estimation feasible. The fraction \(\frac{p}{1-p}\), called the odds, represents the ratio of the probability of an event happening \(p\) to the probability of it not happening \(1-p\), making it a natural choice for modeling binary outcomes.
Let’s estimate the age model.
Code
Call: glm(formula = loan_st ~ age, family = "binomial", data = train)
Coefficients:
(Intercept) age
-1.90097 -0.00623
Degrees of Freedom: 19387 Total (i.e. Null); 19386 Residual
Null Deviance: 13580
Residual Deviance: 13580 AIC: 13580
The estimated model is: \(\log\left(\frac{p}{1-p}\right) = -1.90097 - 0.00623 \times \text{age}\).
The model suggests there is a negative relationship between age and loan status.
We can solve for \(p\) to find the estimate of the probability of default according to the age model: \(p = \frac{\exp(-1.90097 - 0.00623 \times \text{age})}{1 + \exp(-1.90097 - 0.00623 \times \text{age})}\). Let’s evaluate \(p\) for a couple of age values.
Substituting age = 18
: \(p = \frac{\exp(-1.90097 - 0.00623 \times 18)}{1 + \exp(-1.90097 - 0.00623 \times 18)}\),
\(p = \frac{\exp(-2.01311)}{1 + \exp(-2.01311)} \approx \frac{0.1333}{1 + 0.1333} \approx 0.1176\).
Substituting age = 60
: \(p = \frac{\exp(-1.90097 - 0.00623 \times 60)}{1 + \exp(-1.90097 - 0.00623 \times 60)}\),
\(p = \frac{\exp(-2.27477)}{1 + \exp(-2.27477)} \approx \frac{0.1037}{1 + 0.1037} \approx 0.0939\).
The results of the default probability estimation evaluated at ages 18 and 60 are not very promising because the difference in probabilities is very small. This means that the model does not generate significantly differentiated default probabilities, even when evaluated over a wide age range. This can be problematic because it suggests that this is a poor model.
The following is a graphical representation of the logistic regression model’s output.
Code
# Define the function for the logistic regression model
logistic_function <- function(age) {
beta0 <- -1.90097
beta1 <- -0.00623
p <- exp(beta0 + beta1 * age) / (1 + exp(beta0 + beta1 * age))
return(p)
}
# Generate a sequence of ages
ages <- seq(-1000, 500, by = 1)
# Apply the logistic function to each age
p_values <- sapply(ages, logistic_function)
# Create the plot
library(ggplot2)
ggplot(data = data.frame(age = ages, p = p_values),
aes(x = age, y = p)) +
geom_line(color = "blue", size = 2) +
labs(x = "Age", y = "Probability (p)") +
theme_minimal()
The logistic curve derived from this model is unreasonable because the probability \(p\) changes very slowly with respect to age. This implies that, to observe the full range of probabilities from 0 to 1, we would require age values that are unrealistic or implausibly extreme. Such behavior suggests that the model is poorly specified, as it fails to generate meaningful distinctions in probabilities across a realistic range of ages.
The AIC value of the age model (13,580) is useful when comparing models. The Akaike information criterion (AIC) is a mathematical method for evaluating how well a model fits the data it was generated from. In statistics, AIC is used to compare different possible models and determine which one is the best fit for the data. At the moment we cannot interpret the AIC simply because we only have one model and we cannot compare it with another AIC.
Let’s estimate another simple model where the interest rate category is used as a predictor of the loan_st
. Remember we are not conducting any prediction at all at this moment, we are only estimating models using the training set.
Code
Call: glm(formula = loan_st ~ int, family = "binomial", data = train)
Coefficients:
(Intercept) int
-3.710 0.142
Degrees of Freedom: 19387 Total (i.e. Null); 19386 Residual
Null Deviance: 13580
Residual Deviance: 13210 AIC: 13220
The AIC is a lower (13,220 versus 13,580), so we have a better model now.
Using one single predictor as age or interest rate is clearly a limited approach. Let’s add some more predictors. Also, let’s introduce the summary()
function to extract more information about the model estimation results. The logi_multi
below assumes that the loan status depend on the age, interest rate, grade, loan amount, and annual income.
Code
Call:
glm(formula = loan_st ~ age + int + grade + log(l_amnt) + log(income),
family = "binomial", data = train)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.996240 0.477911 4.177 2.95e-05 ***
age -0.002302 0.003825 -0.602 0.5473
int 0.038767 0.017249 2.247 0.0246 *
gradeB 0.503409 0.087435 5.758 8.54e-09 ***
gradeC 0.748229 0.117765 6.354 2.10e-10 ***
gradeD 0.964343 0.147283 6.548 5.85e-11 ***
gradeE 1.033442 0.190817 5.416 6.10e-08 ***
gradeF 1.619470 0.257900 6.279 3.40e-10 ***
gradeG 1.867494 0.440232 4.242 2.21e-05 ***
log(l_amnt) 0.015718 0.036341 0.433 0.6654
log(income) -0.470748 0.046423 -10.140 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13579 on 19387 degrees of freedom
Residual deviance: 13028 on 19377 degrees of freedom
AIC: 13050
Number of Fisher Scoring iterations: 5
Our multi-factor model works well. In logi_multi
, the AIC value is the lowest so far (13,050 versus 13,220), so this should be considered as the best in-sample model at the moment. The summary()
function shows the significance levels of the estimators, but we are currently more interested in the goodness of fit of the models because we want to conduct predictions about the loan_st
. This is, we are interested to use a model to find out whether new applicants in the test set are expected to default or not, rather than in the applicants’ credit risk factors. This is why we are concentrated in AIC now.
When a customer fill out a credit application form, we collect information but we do not know for sure whether she or he will eventually default. A credit risk model can help us in this task.
1.3 Prediction and model evaluation.
Let’s evaluate the predictive performance of our three models: logi_age
, logi_int
, and logi_multi
, introduced in the previous subsection. We begin by selecting a single observation from the test set and asking each model to predict the loan_st
(loan status) for this individual. Specifically, we take the age of the first person in the test set, use it as input for the logi_age
model, and then compare the model’s predicted loan_st with the actual outcome recorded in the test set. Since the test set contains the true loan status for this individual, we can directly assess how well the model performed. Each model is expected to generate a potentially different prediction for the loan_st
. A good model should produce a prediction that matches the actual outcome for this observation, indicating its reliability in capturing the underlying patterns in the data.
Let’s call the first individual John Doe for practical purposes.
loan_st l_amnt int grade emp_len home income age sex region
1 0 5000 10.65 B 10 RENT 24000 33 0 E
We know in advance that the loan_st
for this observation, taken from the test set, is 0. However, the models cannot know this because the test set was not used to estimate the logistic models. Instead, the models were trained using the training set. A good credit risk model should predict no default for this new applicant called John Doe.
The values of loan_st
in the test set are either 0 or 1. However, the logistic models predict the loan_st
as probability estimates within the range of 0 to 1, as illustrated in Figure 1.4. This means we would expect the estimated loan_st
for John Doe to be close to 0. But how close? We will address this issue later.
This code predicts the loan status for John Doe using three logistic regression models. Each prediction represents the probability that John Doe defaults on a loan, based on the respective model.
Code
# List of models
models <- list("logi_age" = logi_age, "logi_int" = logi_int, "logi_multi" = logi_multi)
# Predict the loan status for John Doe using all models
pred_John <- sapply(models, predict, newdata = John_Doe, type = "response")
# Convert to a table with appropriate column name
pred_John <- as.data.frame(pred_John)
colnames(pred_John) <- "Loan status predictions for John Doe."
# Display the predictions
pred_John
Loan status predictions for John Doe.
logi_age.1 0.10846236
logi_int.1 0.09996702
logi_multi.1 0.14461840
These values are low, as they are close to 0. We could interpret this as some ability of the models to correctly predict that John Doe will not default. However, we have already illustrated that at least the age model is highly problematic. Additionally, several questions remain unanswered and require further analysis. For example, how can we determine if the prediction is low enough to classify it as a non-default? We may need a cutoff value to make this decision. We will explore this issue later.
Another important consideration is: What about the rest of the cases in the test set? The test set contains 9,695 observations, yet the example above only evaluates the first one. Our interest lies in assessing the entire test set, not just John Doe. Fortunately, this issue is easy to resolve by modifying the newdata
parameter in the predict()
function. Specifically, instead of using newdata = John_Doe
, which refers to a single observation, we can replace it with newdata = test
, encompassing all 9,695 observations in the test set.
Code
# Predict the loan status with the three models.
pred_logi_age <- predict(logi_age, newdata = test, type = "response")
pred_logi_int <- predict(logi_int, newdata = test, type = "response")
pred_logi_multi <- predict(logi_multi, newdata = test,
type = "response")
pred_range <- rbind("logi_age" = range(pred_logi_age),
"logi_int" = range(pred_logi_int),
"logi_multi" = range(pred_logi_multi))
aic <- rbind(logi_age$aic, logi_int$aic, logi_multi$aic)
pred_range <- cbind(pred_range, aic)
colnames(pred_range) <- c("min(loan_st)", "max(loan_st)", "AIC")
pred_range
min(loan_st) max(loan_st) AIC
logi_age 0.08417892 0.1165453 13580.65
logi_int 0.05019756 0.3982977 13215.61
logi_multi 0.01739107 0.4668004 13050.30
Now, instead of predicting for a single applicant, we have made predictions for all 9,695 applicants in the test set. The first column shows the lower predicted loan_st
for each model, and the logistic models produce values in the range of 0 to 1. In this case, the predicted ranges are quite narrow.
Narrow ranges (i.e., the small difference between the highest and lowest predicted loan_st
values) could be problematic because the model might struggle to discriminate between defaults (predictions closer to 1) and non-defaults (predictions closer to 0) as suggested in Figure 1.4. The higher the AIC, the worse the in-sample model, and the lower the AIC, the better the model. Here, we see some consistency between in-sample and out-of-sample performance: the model with the highest prediction range also has the lowest AIC, indicating it is the best in-sample model.
Let’s explore all the predicted loan_st
values for the logi_age
. First let’s see all of them.
Code
# Convert the predictions to a data frame
pred_data <- data.frame(
Observation = seq_along(pred_logi_age), # Sequence for observation index
Predicted_Probabilities = pred_logi_age
)
# Create the plot
ggplot(pred_data, aes(x = Observation, y = Predicted_Probabilities)) +
geom_point(alpha = 0.6, color = "blue") + # Points for predictions
labs(x = "Observation (test set)",
y = "Default prediction") +
theme_minimal()
Now its distribution.
Code
The logi_age
fails to predict values ranging from 0 to 1. In fact, these values are quite concentrated in a very small range of values. As a consequence, this model fails to differentiate between default and no default predictions.
Let’s visualize the predictions of the logi_int
and logi_multi
. We first collect all predictions in a single data frame just for convenience.
Now we plot the logi_int
and logi_multi
.
Code
Let’s add the age
model as well.
Code
Presumably, a model which considers all available predictors could be better for predicting the loan_st
.
Code
# Logistic regression model using all available predictors in the data set.
logi_full <- glm(loan_st ~ age + int + grade + log(l_amnt) +
log(income) + emp_len + home + sex +
region, family = "binomial", data = train)
# Loan status predictions for all test set elements.
pred_logi_full <- predict(logi_full, newdata = test, type = "response")
# Look at the predictions range.
range(pred_logi_full)
[1] 1.422469e-09 8.544424e-01
Now, the pred_logi_full
prediction range is wider. A wider range means that the loan_st
predictions are now closer to 1. This is good because we need the model to be able to predict both no-defaults (0) and defaults (1). Let’s see a prediction comparison with respect to the rest of the models.
Code
pred_range <- rbind("logi_age" = range(pred_logi_age),
"logi_int" = range(pred_logi_int),
"logi_multi" = range(pred_logi_multi),
"logi_full" = range(pred_logi_full))
aic <- rbind(logi_age$aic, logi_int$aic, logi_multi$aic, logi_full$aic)
pred_range <- cbind(pred_range, aic)
colnames(pred_range) <- c("min(loan_st)", "max(loan_st)", "AIC")
pred_range
min(loan_st) max(loan_st) AIC
logi_age 8.417892e-02 0.1165453 13580.65
logi_int 5.019756e-02 0.3982977 13215.61
logi_multi 1.739107e-02 0.4668004 13050.30
logi_full 1.422469e-09 0.8544424 10763.67
A graphical comparison of the new pred_logi_full
and pred_logi_multi
.
Code
And all of them together.
Code
The logi_full
model predictions looks better than the other models.
Another question is: How can we know these model predictions corresponds to a default or no default? The loan status predictions go from 0 to 0.854 for the case of the logi_full
model. At the end, these loan status estimations need to be interpreted or classified as a default or no default because we are interested on that. Are they closer enough to 0 to consider a no default? This issue is addressed by setting up a cutoff rate so we can split all estimated loan status into 0 or 1.
Let us arbitrarily set a cutoff of 0.15 for now. This implies that any estimated loan status below 0.15 will be classified as 0 (no default), while any estimated loan status above 0.15 will be classified as 1 (default). This classification rule can guide financial firms in deciding whether to approve or reject new loan applications. Applications with an estimated loan status above 0.15 will be rejected, whereas those with an estimated loan status below 0.15 will be approved. Thus, the loan approval or rejection decision is now based on the model’s estimation.
Choosing a cutoff value as a classification and acceptance rule can be controversial, as managers may prefer to allocate more loans rather than fewer. In other words, managers may prefer more sales at the cost of a higher credit risk. Specifically, the higher the cutoff, the greater the number of loans allocated, since the allocation rule selects those applicants whose predicted loan status is below the cutoff.
An alternative that balances credit risk with managerial interests is to implement multiple cutoff ranges instead of relying on a single cutoff value, such as 0.15. For instance, a financial firm could accept loan applications with an estimated loan status between 0.15 and 0.25 but charge a higher interest rate to offset the increased credit risk. Alternatively, a different perspective might involve incorporating social criteria to support individuals typically excluded by the financial industry due to their economic circumstances. In such cases, the focus could shift toward identifying applicants with lower estimated loan statuses.
For now, our cutoff and allocation rule assume that any estimated loan status below 0.15 will be classified as 0 (no default), while any estimated loan status above 0.15 will be classified as 1 (default).
Graphically, it looks like this:
Code
# Convert the predictions to a data frame
pred_data <- data.frame(
Observation = seq_along(pred_logi_full), # Sequence for observation index
Predicted_Probabilities = pred_logi_full
)
# Create the plot
ggplot(pred_data, aes(x = Observation, y = Predicted_Probabilities)) +
geom_point(alpha = 0.6, color = "blue") + # Points for predictions
geom_hline(yintercept = 0.15, color = "red", size = 1) + # Horizontal line
labs(x = "Observation (test set)",
y = "Default prediction") +
theme_minimal()
Alternatively, we can illustrate the distribution.
Code
Here, predicted loan_st
values at the left of the dashed line represent no default predictions and values at the right of the dashed line represent default predictions. Let’s set up the rule to convert the estimated loan status into a binary (0 or 1) variable.
See how this transformation takes place:
Code
pred_logi_full pred_logi_full_rounded pred_cutoff_15
1 1.357995e-08 0.0000 0
2 2.986278e-08 0.0000 0
18 2.299645e-02 0.0230 0
26 2.403879e-01 0.2404 1
27 1.778986e-01 0.1779 1
28 2.600172e-02 0.0260 0
These are only the first 6 rows in the test set. We can see that the rule works as expected because every estimated loan status below 0.15 is now considered as 0 (no-default), and every estimated loan status above 0.15 is considered as 1 (default). The table above shows how we can create this binary variable given the logistic model prediction.
Note that the rows numbers in the table above are 1, 2, 18, 26, 27 and 28. These are not 1, 2, 3, 4, 5 and 6 because the test
rows were selected randomly out of the dat
. So, the row numbers in the table above correspond to the original place in dat
.
Is logi_full
a good model after all? We can add a new column to the previous table. This new variable represents what really happened with the loan. Then, the first column is the logistic model prediction, the second column the transformed binary variable given a cutoff of 0.15, and the third column is what actually happened (default or no-default). Let’s take a sample of 10 observations to conduct the comparison easily. Note that the model correctly predicts a no default in most cases. Rows 308 and 329 predict a default incorrectly and row 323 predict a default correctly.
Code
pred_logi_full pred_cutoff_15 test$loan_st
308 1.537106e-01 1 0
309 4.137654e-02 0 0
310 3.113926e-02 0 0
312 5.186962e-09 0 0
318 5.337092e-02 0 0
319 8.402239e-02 0 0
323 2.795535e-01 1 1
328 5.384112e-09 0 0
329 1.893745e-01 1 0
330 9.012136e-03 0 0
There is an easy way to evaluate the rest of the cases in the test set using a simple table called confusion matrix.
Predicted
Actual 0 1
0 6554 2081
1 308 752
The logi_full
model correctly predicts 6,554 no-defaults and 752 defaults. However, it also misclassifies 2,081 defaults as no-defaults and 308 no-defaults as defaults. How good are these results? Which of these four values is most important? These are questions we will address later. For now, it is important to note that different models and cutoff rates yield different confusion matrix results. Note that the sum of these four values equals 9,695, which corresponds to the total number of observations in the test set.
You may want to see relative and not absolute values. Let’s try the CrossTable()
function instead.
Code
Cell Contents
|-------------------------|
| N |
| N / Table Total |
|-------------------------|
Total Observations in Table: 9695
| Predicted
Actual | 0 | 1 | Row Total |
-------------|-----------|-----------|-----------|
0 | 6554 | 2081 | 8635 |
| 0.676 | 0.215 | |
-------------|-----------|-----------|-----------|
1 | 308 | 752 | 1060 |
| 0.032 | 0.078 | |
-------------|-----------|-----------|-----------|
Column Total | 6862 | 2833 | 9695 |
-------------|-----------|-----------|-----------|
Here, 67.6% of all cases were correctly predicted as non-default (true negatives) while 21.5% were incorrectly predicted as default (false positives), together making up 89.1% of cases and showing non-default is the majority class; meanwhile, 7.8% of all cases were correctly predicted as default (true positives) and 3.2% were incorrectly predicted as non-default (false negatives), comprising 10.9% of cases and indicating default is the minority class, which suggests the model has better prediction accuracy for non-default (67.6% correct) compared to default (7.8% correct), though this should be considered in the context of the imbalanced class distribution.
Instead of arbitrarily consider a cutoff of 0.15, we can follow a different approach. Now consider that we are interested in taking the 20% highest estimates (closer to 1) of pred_logi_full
as default. Equivalently, this is to take the lowest 80% pred_logi_full
estimates (closer to 0) as no-default. Let’s calculate the new cutoff that meets this new criterion.
This new approach (taking the 20% highest estimates of pred_logi_full
as default) represents a cutoff of 0.1994621. Graphically:
Code
Now the cutoff is 0.1994621. This splits the loan status predictions into two parts: higher than the cutoff is a default, and lower than the cutoff is a no-default. Here are the cutoff values as a function of quantiles.
Code
cut_off
10% 0.00000
20% 0.01469
30% 0.02410
40% 0.03751
50% 0.06184
60% 0.09617
70% 0.14673
80% 0.19946
90% 0.29227
100% 0.85444
We can show a similar summary table as we did with the cutoff of 0.15. Here, we show the predictive ability of the logi_full
model with new cutoff of 0.1994621.
Code
Cell Contents
|-------------------------|
| N |
| N / Table Total |
|-------------------------|
Total Observations in Table: 9695
| pred_full_20
test$loan_st | 0 | 1 | Row Total |
-------------|-----------|-----------|-----------|
0 | 7309 | 1326 | 8635 |
| 0.754 | 0.137 | |
-------------|-----------|-----------|-----------|
1 | 447 | 613 | 1060 |
| 0.046 | 0.063 | |
-------------|-----------|-----------|-----------|
Column Total | 7756 | 1939 | 9695 |
-------------|-----------|-----------|-----------|
With a cutoff of 0.1994621 we accept 7,309 + 447 = 7,756 applications as those are the ones that the model predicts a no default. Previously, in the case of a cutoff of 0.15 we accept 6,554 + 308 = 6,862.
Let’s compare both confusion matrices:
Code
cat <- c("Correct no-default (true negatives).",
"False default (false positives).",
"False no-default (false negatives).",
"Correct default (true positives).")
cut_15 <- c("67.6%", "21.5", "3.2", "7.8%")
cut_1994621 <- c("75.4%", "13.7%", "4.6%", "6.3%")
cbind("Classification." = cat, cut_15, cut_1994621)
Classification. cut_15 cut_1994621
[1,] "Correct no-default (true negatives)." "67.6%" "75.4%"
[2,] "False default (false positives)." "21.5" "13.7%"
[3,] "False no-default (false negatives)." "3.2" "4.6%"
[4,] "Correct default (true positives)." "7.8%" "6.3%"
Type I and Type II errors are key concepts in hypothesis testing and statistical decision-making, particularly relevant in loan default prediction. A Type I error (false positive) occurs when good customers, who would not default, are incorrectly rejected. On the other hand, a Type II error (false negative) happens when bad customers, who will default, are mistakenly accepted.
The new cutoff of 0.1994621 improves the identification of no-defaults but worsens the identification of defaults. Additionally, the new cutoff results in fewer false positives but more false negatives. This suggests that there is a trade-off between the two outcomes.
We can also look the detail of 0.1994621 cutoff. Comparing two columns, the one in the left with the actual loan status, and the right column with the estimated loan status.
Code
test$loan_st pred_full_20 Did the model succeed?
398 0 0 TRUE
399 1 1 TRUE
404 0 1 FALSE
414 0 0 TRUE
417 1 1 TRUE
419 0 0 TRUE
420 0 0 TRUE
425 1 1 TRUE
431 0 0 TRUE
435 0 0 TRUE
In this sample, the model fails for 1 out of 10 individuals, a fairly good result. Now, let’s imagine we are a bank with 9,695 loan applications on our desk (or computer). Suppose we use the predictions from pred_full_20
to decide whether to accept a loan application. Our model-based acceptance rule is as follows: if pred_full_20 = 0
, the model predicts no default, and we accept the loan application. According to the excerpt from the table above, we correctly reject applications 399, 417, and 425 because those were indeed defaults. However, we incorrectly reject application 404, which did not default. Overall, using a model as the basis for a decision rule can lead to better outcomes than guessing or applying a random approval policy.
Let’s count how many applications are accepted and rejected according to our rule.
Code
total accept_20 reject_20
1 9695 7756 1939
By taking the top 20% of the pred_logi_full
estimates as defaults and the bottom 80% as non-defaults, we construct a rule that accepts 7,756 loan applications (80% of the total) and rejects 1,939 applications (20% of the total). The cutoff criterion determines the number of accepted applications, while the logi_full
model specifies which applications to accept or reject.
We can evaluate this loan acceptance/rejection rule by comparing it against the actual outcomes recorded in loan_st
. To begin, let’s illustrate the decision-making process based on the model’s estimates.
Code
test.loan_st pred_full_20 decision
1 0 0 accept
2 0 0 accept
18 1 0 accept
26 0 1 reject
27 0 0 accept
28 0 0 accept
30 0 0 accept
32 0 0 accept
34 0 1 reject
35 0 0 accept
36 0 0 accept
37 1 0 accept
We can add an evaluation column.
Code
# First 10 accept decisions.
head(data.frame(real_pred_20[,1:2],
decision = ifelse(real_pred_20$pred_full_20 == 0,
"accept", "reject"),
evaluation = ifelse(real_pred_20$pred_full_20 == 1,
"we rejected a good customer",
ifelse(real_pred_20$pred_full_20 == 0 & real_pred_20$`test$loan_st` == 0,
"good decision", "bad decision"))), 12)
test.loan_st pred_full_20 decision evaluation
1 0 0 accept good decision
2 0 0 accept good decision
18 1 0 accept bad decision
26 0 1 reject we rejected a good customer
27 0 0 accept good decision
28 0 0 accept good decision
30 0 0 accept good decision
32 0 0 accept good decision
34 0 1 reject we rejected a good customer
35 0 0 accept good decision
36 0 0 accept good decision
37 1 0 accept bad decision
In the table above, we see the outcomes of the first 12 loan applications. According to our rule, we accept 10 applications and reject 2 (applications #26 and #34). A “good decision” refers to cases where we accept a loan application that does not default, meaning we correctly identify the applicant as creditworthy. Conversely, a “bad decision” occurs when we accept a loan application that defaults, leading to a financial loss. Additionally, there are instances where we reject applications from good customers who would not have defaulted, which is also undesirable because it represents lost opportunities for profitable lending.
While this table provides valuable insights, it is somewhat problematic because it incorporates a counterfactual approach. Specifically, we are evaluating the rejected applications based on their potential outcomes if they had been accepted. This requires us to hypothesize about what would have happened, which introduces assumptions and uncertainty, as these scenarios did not actually occur. A more practical and grounded approach is to focus on evaluating the rule based solely on the cases where we do accept loan applications. This is where the concept of the “bad rate” becomes useful. The bad rate measures the proportion of accepted loans that default, providing a straightforward and actionable metric. By concentrating on the actual outcomes of accepted applications, the bad rate allows us to assess the effectiveness of our rule without relying on speculative counterfactuals, making it a more reliable and convenient evaluation method.
Remember that the real_pred_20
object is significant because it contains both the binary model predictions and the actual loan status, essentially, the model’s prediction and the corresponding real outcomes. In the code below, we first filter for accepted loans, which are cases where the prediction equals 0. If this condition is met, we assign the value from the first column, which represents the loan status.
As a result, accepted_loans
becomes a vector of zeros and ones: a value of 0 indicates a loan that did not default, while a value of 1 indicates a loan that defaulted. The length of the accepted_loans
vector 7,756 corresponds to the number of loans allocated based on the cutoff value of 0.1994621 and the predictions from the logi_full
model.
Code
[1] 7756
Note that the third and the tenth application default. These are applications #18 and #37 as expected. Now let’s evaluate not 12 applications but all of them, which are 7,756. The bad rate is now expressed as a percentage of the total:
Code
[1] 0.0576328
This is, by following the model-based rule, we accepted 7,756 loan applications that represent 80% of the total applications. However, 5.76% of those accepted applications were in fact a default. In particular, we accepted 447 loans that are defaults so: \(447/7756=0.0576328\).
Models are not perfect but we are always interested to find out a good model that leads to a lower bad rate because (in principle) we do not want to accept many defaults. If we keep the same model, we could reduce this 5.76% by being more strict in the loan application which in simple terms mean to reduce the acceptance rate. This alternative could be controversial as a lower acceptance rate represents lower income (less customers, fewer sales) for the bank or financial firm.
Let’s consider we reduce the acceptance rate from 80% to 65% so we can evaluate the impact over the bad rate.
Code
# New cutoff value.
cutoff <- quantile(pred_logi_full, 0.65)
# Split the pred_logi_full into a binary variable.
pred_full_35 <- ifelse(pred_logi_full > cutoff, 1, 0)
# A data frame with real and predicted loan status.
real_pred_35 <- cbind.data.frame(test$loan_st, pred_full_35)
# Loans that we accept given these new rules.
accepted_loans <- real_pred_35[pred_full_35 == 0, 1]
# Bad rate (accepted loan applications that are defaults).
bad_rate <- sum(accepted_loans == 1)/length(accepted_loans)
# Show the bad rate.
bad_rate
[1] 0.03713107
As expected, the bad rate is lower (from 5.76%% to 3.71%). This is, the lower the acceptance rate the lower the bad rate. In the extreme, if we accept 0 loan applications then our bad rate would be zero, but doing so is equivalent as going out of business. We can create a function such that given a vector of prediction of loan status we can return the bad rate for different cutoff values. This could be useful to build the bank strategy. This function will reveal the trade-off between the acceptance rate and the bad rate. In particular, the lower the acceptance rate, the lower the income (bad thing) and the lower the bad rate (good thing). So, which combination is the optimal?
1.4 The bank strategy.
A bank could be interested to understand the relationship between the acceptance rate and the bad rate given a model that predicts the loan status.
Code
bank <- function(prob_of_def) {
# Pre-define the acceptance rates
accept_rate <- seq(1, 0, by = -0.05)
# Calculate cutoffs for each acceptance rate
cutoff <- quantile(prob_of_def, accept_rate)
# Calculate bad rates for each cutoff
bad_rate <- sapply(cutoff, function(thresh) {
pred_i <- ifelse(prob_of_def > thresh, 1, 0)
pred_as_good <- test$loan_st[pred_i == 0]
mean(pred_as_good == 1)
})
# Create the result table
table <- cbind(
accept_rate,
cutoff = round(cutoff, 4),
bad_rate = round(bad_rate, 4)
)
return(list(table = table, bad_rate = bad_rate, accept_rate = accept_rate, cutoff = cutoff))
}
We can evaluate this function for the logi_full
model and compare it to a less effective model, such as logi_age
. In principle, we expect the logi_full
model to demonstrate a more favorable relationship between the acceptance rate and the bad rate, that is, lower bad rates for a given acceptance rate. Financial institutions are generally interested in increasing the acceptance rate without significantly increasing the bad rate.
Let’s apply the function to the pred_logi_full
and the logi_age
.
Code
accept_rate Bad_model_bad_rate Good_model_bad_rate.
100% 1.00 0.10933471 0.109334709
95% 0.95 0.10849715 0.090662324
90% 0.90 0.10849715 0.076790831
85% 0.85 0.10849715 0.066626214
80% 0.80 0.10547397 0.057632800
75% 0.75 0.10547397 0.050612020
70% 0.70 0.10396251 0.043619216
65% 0.65 0.10396251 0.037131070
60% 0.60 0.10092961 0.029396596
55% 0.55 0.10092961 0.023443361
50% 0.50 0.09933775 0.021039604
45% 0.45 0.10206865 0.018565207
40% 0.40 0.10206865 0.015471893
35% 0.35 0.09907039 0.011788977
30% 0.30 0.09972041 0.009281540
25% 0.25 0.10343554 0.005363036
20% 0.20 0.09722922 0.003610108
15% 0.15 0.09360190 0.001374570
10% 0.10 0.10125362 0.000000000
5% 0.05 0.10516605 0.000000000
0% 0.00 0.00000000 0.000000000
The full model is superior because it achieves a lower bad rate at any given acceptance rate. A plot can effectively highlight the key differences between the two models: logi_age
and logi_full
.
Code
# Combine the two datasets for easier comparison
bank_logi_full$model <- "logi_full"
bank_logi_age$model <- "logi_age"
combined_data <- bind_rows(bank_logi_full, bank_logi_age)
# Add vertical and horizontal lines for visualization
highlight_lines <- tibble(
model = c("logi_full", "logi_full", "logi_age", "logi_age"),
accept_rate = c(bank_logi_full[["accept_rate"]][8],
bank_logi_full[["accept_rate"]][5],
bank_logi_age[["accept_rate"]][8],
bank_logi_age[["accept_rate"]][5]),
bad_rate = c(bank_logi_full[["bad_rate"]][8],
bank_logi_full[["bad_rate"]][5],
bank_logi_age[["bad_rate"]][8],
bank_logi_age[["bad_rate"]][5]),
color = c("black", "red", "black", "red")
)
# Add a color column to distinguish facets
combined_data <- combined_data %>%
mutate(facet_color = ifelse(model == "logi_full", "blue", "orange"))
ggplot(combined_data, aes(x = accept_rate, y = bad_rate, color = facet_color)) +
geom_line(size = 1.2) + # Use facet-specific colors for the lines
geom_vline(data = highlight_lines, aes(xintercept = accept_rate, color = color),
linetype = "dashed", size = 1.1) +
geom_hline(data = highlight_lines, aes(yintercept = bad_rate, color = color),
linetype = "dashed", size = 1.1) +
facet_wrap(~model, nrow = 1) +
labs(x = "Acceptance rate", y = "Bad rate") +
theme_minimal(base_size = 14) +
theme(legend.position = "none") +
scale_color_identity() + # Use the colors directly without adding them to the legend
scale_x_continuous(breaks = c(0, 0.65, 0.8, 1)) +
scale_y_continuous(breaks = c(0, 0.0371, 0.0576, 0.1093))
The logi_full
model is superior because, at any acceptance rate, it achieves a lower bad rate compared to the logi_age
model. This advantage arises from the logi_full
model’s ability to identify defaults and non-defaults with greater precision. The value of a strong model lies in its capacity to support better business decisions, in this case, more accurate credit evaluations.
The ROC (Receiver Operating Characteristic) curve is a graphical representation used to evaluate the performance of a binary classification model, such as a credit risk model designed to predict defaults and non-defaults. In this context, the ROC curve plots the true positive rate, or sensitivity, against the false positive rate or specificity across various cutoff or acceptance rates.
The area under the ROC curve (AUC) quantifies the model’s ability to discriminate between the two classes: a higher AUC indicates better performance. For example, a model with an AUC close to 1 suggests it can effectively distinguish between defaulting and non-defaulting borrowers, while an AUC of 0.5 indicates no better performance than random guessing. By analyzing the trade-off between sensitivity (correctly identifying defaults) and specificity (correctly identifying non-defaults), the ROC curve helps determine an optimal threshold for classification, balancing the cost of misclassifying borrowers.
Code
# Calculate ROC curves
ROC_logi_full <- roc(test$loan_st, pred_logi_full)
ROC_logi_age <- roc(test$loan_st, pred_logi_age)
# Draw the ROC curves on one plot
plot(ROC_logi_full, col = "blue", lwd = 3)
lines(ROC_logi_age, col = "orange", lwd = 3)
legend("bottomright",
legend = c("Full model AUC: 0.8213",
"Age model AUC: 0.5301"),
col = c("blue", "orange"),
lwd = 3)
As expected, in Figure 1.15 the area under the curve (AUC) is higher for the blue line which corresponds to the logi_full
model. We can calculate the exact values for both models:
Area under the curve: 0.8213
Area under the curve: 0.5301
Note that the logi_age
has an AUC of 0.5301. This is close to a loan approval process in which we randomly accept and reject with no further analysis. In other words, the logi_age
is so bad that it is almost equivalent as using no model at all and accept and reject loan applications based on a random rule.
Sensitivity refers to the model’s ability to correctly identify defaults, also known as true positives. Specificity, on the other hand, measures the model’s ability to correctly identify non-default loans, also referred to as true negatives. In practical terms, a highly sensitive model minimizes the risk of missing actual defaults, ensuring that loans likely to fail are identified and rejected. However, an extreme focus on sensitivity can lead to over-rejection, where many creditworthy applicants are misclassified as high risk. This conservative approach reduces potential defaults but can severely limit revenue, harming the firm’s profitability and growth.
Conversely, a model with high specificity ensures that most creditworthy borrowers are approved, maximizing loan issuance and short-term revenue. Yet, an exclusive focus on specificity may allow a significant number of risky loans to go undetected, resulting in a higher rate of defaults. Over time, the accumulation of bad debts could undermine the firm’s financial stability, potentially leading to insolvency. Thus, balancing sensitivity and specificity is essential: the former safeguards against default-related losses, while the latter promotes profitable lending. The optimal trade-off will depend on the firm’s risk appetite, financial goals, and regulatory constraints.
If a firm were to accept all loans, it would achieve 100% specificity by correctly identifying all non-default loans, but it would fail to identify defaults, leading to the maximum bad rate possible and significant financial losses. Conversely, if the firm were to reject all loans that might potentially default, it would achieve 100% sensitivity by correctly identifying all defaults. However, this would result in rejecting all loan applications, including creditworthy borrowers, and consequently, the firm would not issue any loans. While this would eliminate credit risk, it would also severely restrict the firm’s revenue and growth potential. In both cases, while some risks may be mitigated, the firm would face substantial operational challenges: one approach would expose the firm to financial instability due to the maximum bad rate, while the other would stifle business growth by not issuing any loans. Both scenarios emphasize the importance of balancing risk mitigation with sustainable business growth.
The Figure 1.15 has a full range of acceptance rates. Let’s evaluate the values of sensitivity and specificity at two acceptance rates that we have reviewed before: 65% and 80%.
Code
# Obtener coordenadas
coords_full80 <- coords(ROC_logi_full, x = 0.20)
coords_age80 <- coords(ROC_logi_age, x = 0.20)
coords_full65 <- coords(ROC_logi_full, x = 0.35)
coords_age65 <- coords(ROC_logi_age, x = 0.35)
# Crear un data.frame combinando los resultados
results_df <- data.frame(
Model = c("logi_full", "logi_age", "logi_full", "logi_age"),
Acceptance_rate = 1-c(coords_full80$threshold, coords_age80$threshold,
coords_full65$threshold, coords_age65$threshold),
Sensitivity = c(coords_full80$sensitivity, coords_age80$sensitivity,
coords_full65$sensitivity, coords_age65$sensitivity),
Specificity = c(coords_full80$specificity, coords_age80$specificity,
coords_full65$specificity, coords_age65$specificity)
)
# Mostrar el data.frame
print(results_df)
Model Acceptance_rate Sensitivity Specificity
1 logi_full 0.80 0.5773585 0.8473654
2 logi_age 0.80 0.0000000 1.0000000
3 logi_full 0.65 0.2754717 0.9579618
4 logi_age 0.65 0.0000000 1.0000000
Code
# Draw the ROCs on one plot with a title
plot(ROC_logi_full, col = "blue", lwd = 3)
abline(v = coords_full80$specificity, col = "red", lty = 2, lwd = 2)
abline(h = coords_full80$sensitivity, col = "red", lty = 2, lwd = 2)
abline(v = coords_full65$specificity, col = "black", lty = 2, lwd = 2)
abline(h = coords_full65$sensitivity, col = "black", lty = 2, lwd = 2)
legend("bottomright",
legend = c("Acceptance rate of 80%",
"Acceptance rate of 65%"),
col = c("red", "black"),
lwd = 3, lty = 2)
Let’s check if these sensitivity and specificity results are correct.
The sensitivity value is: \(\text{Sensitivity} = \frac{\text{True positives}}{\text{True positives} + \text{False negatives}}\).
The specificity value is: \(\text{Specificity} = \frac{\text{True negatives}}{\text{True negatives} + \text{False positives}}\).
For the case of an acceptance rate of 80%:
Code
Cell Contents
|-------------------------|
| N |
|-------------------------|
Total Observations in Table: 9695
| pred_full_20
test$loan_st | 0 | 1 | Row Total |
-------------|-----------|-----------|-----------|
0 | 7317 | 1318 | 8635 |
-------------|-----------|-----------|-----------|
1 | 448 | 612 | 1060 |
-------------|-----------|-----------|-----------|
Column Total | 7765 | 1930 | 9695 |
-------------|-----------|-----------|-----------|
\(\text{Sensitivity} = \frac{612}{612 + 448} = 0.5773585\).
\(\text{Specificity} = \frac{7317}{7317 + 1318} = 0.8473654\).
For the case of an acceptance rate of 65%:
Code
Cell Contents
|-------------------------|
| N |
|-------------------------|
Total Observations in Table: 9695
| pred_full_35
test$loan_st | 0 | 1 | Row Total |
-------------|-----------|-----------|-----------|
0 | 8272 | 363 | 8635 |
-------------|-----------|-----------|-----------|
1 | 768 | 292 | 1060 |
-------------|-----------|-----------|-----------|
Column Total | 9040 | 655 | 9695 |
-------------|-----------|-----------|-----------|
\(\text{Sensitivity} = \frac{292}{292 + 768} = 0.2754717\).
\(\text{Specificity} = \frac{8272}{8272 + 363} = 0.9579618\).
In a credit risk model that predicts default and non-default cases, there is an inherent trade-off between sensitivity and specificity. Increasing sensitivity often comes at the cost of reducing specificity, as lowering the cutoff for classifying defaults (or increasing the acceptance rate) may capture more actual defaulters but also increase the number of non-defaulting borrowers misclassified as defaulters (false positives). Conversely, raising the cutoff (or reducing the acceptance rate) improves specificity by reducing false positives but risks missing actual defaulters (false negatives), thereby lowering sensitivity.
A pure-random approval rule would look like this:
Code
Area under the curve: 0.5105
In theory, this random evaluation process would lead to an AUC of \(0.5 = (1 \times 1)/2\). In contrast, now imagine we have a perfect model:
Code
Area under the curve: 1
In a perfect model, the AUC is equal to 1 (\(1 \times 1\)). The model correctly identify defaults (100% sensitivity) and at the same time the model correctly identify no-defaults (100% specificity).
Edward Malthouse explains these concepts quite well:
2 The Merton model.
The Merton model is useful when we are interested in evaluating the credit risk of public firms. In short, it evaluates how likely it is that the value of the future firm’s assets fall below a certain threshold represented by the firm’s debt. By doing that, the model is able to estimate a probability of default among other results. The model can be used as a tool to propose changes (for example) in the balance sheet to reduce the credit risk exposure of a firm.
The philosophy of these models goes back to Black and Scholes (1973), Merton (1973) and Merton (1974), they consider corporate liabilities as contingent claims on the assets of the firm.
2.1 Minimize a function in R.
The estimation of the credit risk Merton’s model requires a minimization of a function. In this section, we show a simple minimization example. Here are some useful online resources for further references.
- FRM: How d2 in Black-Scholes becomes PD in Merton model.
- Normal Probability Distribution Graph. Interactive.
- How to use optim in R.
The objective function to be minimized is \(f(a,b)=\sum(a + bx_i-y_i)^2\). We have six values for \(x\) and \(y\), and we are expected to minimize the function \(f\) by finding the parameter values \(a\) and \(b\). The \(x_i\) and \(y_i\) data looks like this:
Code
x | y | |
---|---|---|
1 | 1 | 1 |
2 | 2 | 3 |
3 | 3 | 5 |
4 | 4 | 6 |
5 | 5 | 8 |
6 | 6 | 12 |
Graphically:
Let’s implement the \(f(a,b)=\sum(a + bx_i-y_i)^2\) minimization.
Code
# Function that calculates the residual sum of squares.
min.RSS <- function(data, par) {
with(data, sum((par[1] + par[2] * x - y)^2)) }
# Function minimization.
result <- optim(par = c(0, 1), min.RSS, data = dat)
a <- result$par[1]
b <- result$par[2]
f <- sum((a + b * dat$x - dat$y)^2)
# Minimization output.
ans <- data.frame(a, b, f)
kable(ans, caption = "Minimization results f(a,b).")
a | b | f |
---|---|---|
-1.266846 | 2.02862 | 2.819048 |
The minimization result is represented by a line, and the line is characterized by an intercept equal to \(-1.266846\) and a slope equal to \(2.02862\).
We can plot it and verify that this is indeed the right answer.
Code
The function that we minimize here is basically the OLS criterion. This is why the regression model leads to the same results.
Code
Call:
lm(formula = y ~ x, data = dat)
Residuals:
1 2 3 4 5 6
0.2381 0.2095 0.1810 -0.8476 -0.8762 1.0952
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.2667 0.7815 -1.621 0.180388
x 2.0286 0.2007 10.109 0.000539 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8395 on 4 degrees of freedom
Multiple R-squared: 0.9623, Adjusted R-squared: 0.9529
F-statistic: 102.2 on 1 and 4 DF, p-value: 0.000539
The function that we minimize in the estimation of the credit risk Merton model is more elaborated, but we follow the same principle as above.
2.2 The estimation.
For an introductory reference to understand the basics of credit see Brealey et al. (2020).
This is a nice introduction of the Merton Model for credit risk assessment:
Let’s follow the example 24.3 in (Hull 2015). We have five parameters: \(E_0=3\), \(\sigma_E=0.8\), \(rf=0.05\), \(T=1\), \(D=10\) that lead to an estimate of the value of the assets today \(V_0\) and the volatility of the assets \(\sigma_V\). With these seven parameters we can estimate the probability of default \(pd\) among other interesting results.
These are the known parameters.
We need equations 24.3 and 24.4 to estimate \(V_0\) and \(\sigma_V\).
Code
eq24.3 <- function(V0, sv) {
((V0 * pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) -
D * exp(-rf * TT) * pnorm(((log(V0 / D) +
(rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) - sv * sqrt(TT)) - E0)) }
eq24.4 <- function(V0, sv) {
((pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) *
sv * V0 - se * E0)) }
# Footnote 10 indicates that we should minimize F(x,y)^2+G(x,y)^2
min.footnote10 <- function(x)
(eq24.3(x[1], x[2]))^2 + (eq24.4(x[1], x[2]))^2
# The minimization leads to the values of V0 and sv.
V0_sv <- optim(c(1, 1), min.footnote10)
# Define the values as parameters.
V0 <- V0_sv$par[1]
sv <- V0_sv$par[2]
kable(data.frame("Market value" = V0,
"Volatility" = sv),
caption = "Assets.")
Market.value | Volatility |
---|---|
12.39578 | 0.2123041 |
Calculate the probability of default \(N(-d_2)\) as a function of \(V_0\) and \(\sigma_V\).
Code
[1] 0.1269396
The probability of default is 12.69396%.
Let’s analyze the role of the values of \(d_1\), \(d_2\), \(N(d_1)\) and \(N(d_2)\). First, we isolate these values.
Code
d1 Nd1 d2 Nd2
1 1.353282 0.9120172 1.140978 0.8730604
Now, let’s illustrate how \(N(-d_2)\) lead to the probability of default given the properties of a standard normal distribution.
Code
xseq <- seq(-4, 4, 0.01)
densities <- dnorm(xseq, 0, 1) # Standard normal distribution.
# Graphical evaluation of N(-d_2).
plot(xseq, densities, xlab = "d values (-d_2 is the dotted line)",
ylab = "Density", type = "l", lwd = 2, cex = 2)
abline(h = 0)
polygon(c(xseq[xseq >= -d2], -d2), c(densities[xseq >= -d2], 0), col = "green")
polygon(c(xseq[xseq < -d2], -d2), c(densities[xseq < -d2], 0), col = "red")
legend("topleft", legend = c("N(d_2)=0.8730604
in green and
N(-d_2)=0.1269396 in red.
N(d_2)+N(-d_2)=1."),
bg = "white", bty = "n", cex = 0.9)
abline(v = -d2, lty = 2, lwd = 3)
Here you can find an alternative explanation about \(N(-d_2)\):
Now, let’s analyze the minimization that leads to \(V_0\) and \(\sigma_V\). We do not have an analytic math expressions (closed form) to calculate \(V_0\) and \(\sigma_V\), instead we have approximate values or estimates given the minimization of equations 24.3 and 24.4 in (Hull 2015). In order to see how good this approximation is, we can retrieve the value of \([F(x,y)]^2+[G(x,y)]^2\) evaluated at \(x=V_0\) and \(y=\sigma_v\). This value is supposed to be positive as both \(F(x,y)\) and \(G(x,y)\) functions are squared, so we expect a positive number close to zero when evaluated at \(x=V_0\) and \(y=\sigma_v\).
[1] 1.425797e-07
[1] 0
The minimization conducted in the function min.footnote10()
above worked fairly well as the value is zero in practical terms.
There is another way to approach how the minimization work. We can propose a graphical analysis. In particular, we can verify whether \(V_0=12.39578\) and \(\sigma_V=0.2123041\) minimizes the objective function. To illustrate this in two axis, we need to fix either \(V_0\) or \(\sigma_V\), and evaluate \([F(x,y)]^2+[G(x,y)]^2\) with different values of \(V_0\) or \(\sigma_V\). Let’s do that.
Code
# Fix sv and evaluate different values of V0.
V0.seq <- seq(from = 4, to = 20, length.out = 100)
sv.rep <- rep(sv, 100)
# Function to be evaluated.
FG <- function(V0, sv) { (eq24.3(V0, sv))^2 + (eq24.4(V0, sv))^2 }
# Apply the function with fixed sv and different V0 values.
FG.V0 <- mapply(FG, V0.seq, sv.rep)
We are ready to plot.
Code
Clearly, the minimization worked well here. For the sake of completeness, now we fix the \(V_0\) value and evaluate the function again.
Code
# Now fix V0, and evaluate different sv values.
sv.seq <- seq(0, 0.6, length.out = 100)
V0.rep <- rep(V0, 100)
# Evaluate the function with these parameters.
FG.sv <- mapply(FG, V0.rep, sv.seq)
# Plot the results.
plot(sv.seq, FG.sv, type = "l", xlab = expression(paste(sigma[V])),
ylab = expression(paste(F(x, y)^2 + G(x, y)^2)),
cex.lab = 0.8, lwd = 3)
abline(v = sv, lty = 2)
abline(h = FG(V0, sv), lty = 2)
points(sv, FG(V0, sv), pch = 1, cex = 3, col = "red", lwd = 2)
We can show the same as before but in a 3D plot.
Code
Contour plots or level plots are a way to show a three-dimensional surface on a two-dimensional plane.
Code
The following plot cannot be seen in a PDF output, but it can be seen in a browser as an interactive plot. This green plot is nice but it is not interactive. We could plot interactive three-dimension plots with other R packages like plotly
and show them in a html context or java environment.
Code
The Merton model allows us to calculate more values in this credit risk assessment.
Code
# Other results in the Merton's model.
market_value_debt <- V0 - E0
pv_promised_payment_debt <- D * exp(-rf * TT)
Expected_Loss <- (pv_promised_payment_debt - market_value_debt) /
pv_promised_payment_debt
recovery_rate <- 1 - (Expected_Loss / pd)
ans <- data.frame(market_value_debt, pv_promised_payment_debt,
Expected_Loss, recovery_rate, pd)
kable(t(ans), caption = "Other results in the Merton's model.")
market_value_debt | 9.3957765 |
pv_promised_payment_debt | 9.5122942 |
Expected_Loss | 0.0122492 |
recovery_rate | 0.9035039 |
pd | 0.1269396 |
Could we figure out the implied bond yield spread? It would be similar to the fair interest rate that this firm has to pay given its probability of default and recovery rate. Take Hull’s equation 24.2 as a reference: \(\lambda(T)=S(T)/(1-R)\) and solve for \(S(T)\): \(S(T)=\lambda(T)\times(1-R)\), so \(S(T)=(0.12693963)\times(1-0.90350393)\). This is 122.4918 basis points more than the risk-free rate. Then, with five parameters we can calculate (among other things) the firm’s probability of default and the correspondent yield spread of this firm’s bond. It would be interesting to compare this value with the market yield spread and evaluate whether the market yield is over or underestimated according to our theoretical value.
This is the spread function.
Code
spread <- function(E0, se, rf, TT, D) {
eq24.3 <- function(V0, sv) {
((V0 * pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) -
D * exp(-rf * TT) * pnorm(((log(V0 / D) +
(rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) - sv * sqrt(TT)) - E0)) }
eq24.4 <- function(V0, sv) {
((pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) *
sv * V0 - se * E0)) }
min.footnote10 <- function(x)
(eq24.3(x[1], x[2]))^2 + (eq24.4(x[1], x[2]))^2
V0_sv <- optim(c(1, 1), min.footnote10)
V0 <- V0_sv$par[1]
sv <- V0_sv$par[2]
pd <- pnorm(-(((log(V0 / D) + (rf + sv^2 / 2) * TT) /
(sv * sqrt(TT))) - sv * sqrt(TT)))
market_value_debt <- V0 - E0
pv_promised_payment_debt <- D * exp(-rf * TT)
Expected_Loss <- (pv_promised_payment_debt - market_value_debt) /
pv_promised_payment_debt
recovery_rate <- 1 - (Expected_Loss / pd)
spread <- pd * (1-recovery_rate)
spread
}
Let’s evaluate the spread function.
2.3 Inside view of the Merton’s model.
The model assumes that the assets follow a geometric Brownian motion stochastic process. A regular Wiener stochastic process looks like this.
Let’s go back to our main example and assume the daily evolution of the market value of the firm assets over a year. Here are 10 simulated paths of \(V_t\) from \(t=0,...,T\). The reference of this section is (Hull 2015) Chapter 14 and 15.
Code
# Merton assumes V follows a geometric Brownian motion process.
V0.sim <- function(i) { # the argument i is not used here.
GBM(N = 365, T = 1, t0 = 0, x0 = V0, theta = 0.05, sigma = sv) }
set.seed(3) # Reproducibility
paths <- sapply(1:10, V0.sim) # Create 10 paths for V.
# Plot results.
matplot(paths, type = "l", col = "black", lwd = 1, lty = 1,
ylab = expression(paste(V[t])),
xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")
points(rep(366, 8), paths[366, paths[366, ] > 10],
pch = 1, cex = 1.5, col = "green")
points(rep(366, 2), paths[366, paths[366, ] < 10],
pch = 1, cex = 1.5, col = "red")
These simulations can be interpreted as the evolution of the value of the firm’s assets in 10 parallel universes. In the past, some students have called this figure la gráfica de los pelitos, this is OK as long as you understand the figure implications. The right name is 10 simulated geometric Brownian motion processes. Let’s count the cases in which \(V_t<D\) note that here I use \(t\) and not \(T\), so we are counting the cases in which at least at some time \(t\) the value of the assets were lower than 10.
The value of the assets was lower than $10 at some time in the second, fourth and ninth alternate universes. Let’s plot these cases.
Code
# There might be an easier way to select these cases.
matplot(paths[,which(colSums(paths < 10) > 0)], type = "l",
col = c("green", "blue", "red"), lwd = 2, lty = 1,
ylab = expression(paste(V[t])),
xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")
The blue path was lower than 10 at some time, but at \(t=T\) it is clearly higher than 10. For this reason, the blue path does not represent a firm’s default. See how the assets are not high enough to pay the debt at maturity only in two cases: green and red. Thus, according to this simulation the probability of default is 20%. Here is how we can calculate the cases in which \(V_T<D\).
The probability of default of 20% can be disappointing because it is significantly higher than 12.693963%. This is because the number of simulations is small. The following examples show how these values tend to converge as we increase the number of simulations from 10 to 100 and to 100,000.
Here are 100 simulated paths of \(V_0\) to \(V_T\).
Code
set.seed(3) # Reproducibility
paths <- sapply(1:100, V0.sim) # Create 100 paths.
# Plot results.
matplot(paths, type = "l", col = "black", lwd = 1, lty = 1,
ylab = expression(paste(V[t])),
xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")
Given the simulation above, the probability of default is 17%. This is closer to 12.693963%.
Let’s see all the 17 cases that end in default.
Code
Now, let’s see the 11 cases in which \(V_t\) eventually went lower than $10 but at the end did not default.
Code
# V_t < 10
almost_default <- which(colSums(paths < 10) > 0)
# V_T < 10
default <- which((paths[366,] < 10))
# There should be an easier way to do this:
matplot(paths[,almost_default[which(almost_default %in%
default == FALSE)]],
type = "l", col = c(1:11), lwd = 1, lty = 1,
ylab = expression(paste(V[t])),
xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")
Here are 100,000 simulated \(V_T\) showed in a histogram. The reference for this section is 15.1 ((Hull 2015)). Note that instead of drawing the complete paths, we can directly get the distribution of \(V_T\). The resulting distribution is log-normal.
Code
set.seed(13)
# 100,000 values of VT at once.
VT <- exp(rnorm(100000, log(V0) + (rf - (sv^2) / 2)*TT, sv*sqrt(TT)))
# Plot results.
h <- hist(VT, 100, plot = FALSE)
ccat <- cut(h$breaks, c(-Inf, D, Inf))
# You may also need to change the xlim if a big red square appears as changes in maturity may imply larger values of VT. I recommend trying without xlim and see.
plot(h, main = NULL, col = c("red", "blue")[ccat],
xlab = "Value of the assets at maturity", xlim = c(4, 30))
legend("topright",
legend = c("The red area is 12.657%, and represents
the simulated probability of default.
The probability of default of the model
following the textbook example is
N(-d_2)=12.69396%. As we can see,
both are very close."),
col = c("red", "blue"), pch = c(19), bg = "white", bty = "n", cex = 0.8)
The probability of default can be calculated as:
Note that 0.12657 is now much closer than 0.12693963. Convergence achieved.
This firm can be represented as an European call option payoff. The payoff of a typical stock option is \(c=max(S_T-K, 0)\) or equivalently in terms of Merton’s model: \(E_T=max(V_T-D, 0)\).
Code
ET <- pmax(VT - D, 0) # payoff function of a typical call option.
plot(sort(VT), sort(ET), type = "l", lwd = 4, xlim = c(5, 20),
ylim = c(0, 10), xlab = "Simulated assets at maturity (V_T)",
ylab = "Simulated equity at maturity (E_T)")
abline(v = 10, lty = 2)
legend("left", legend = c("According to the model,
E_T=0 happens
in 12.657% of the
100,000 cases."),
bg = "white", bty = "n", cex = 0.9)
This diagram show the firm’s market values in the future at maturity. This is because the \(E_T\) is a function of \(V_T\) and \(D\). As long as \(V_T<D\), then the value of \(E_T=0\). Otherwise, the value of \(E_T= V_T-D\), just as the accounting equation suggest but in future terms. Note that the Merton’s model says nothing about the estimated value of \(E_T\), instead we have an estimate about how likely is that \(V_T<D\), or in other words how likely is that \(E_T<0\).
Then, we know that \(E_T= max(V_T-D, 0)\). The Black-Scholes-Merton formula can also estimate the theoretical value of \(E_0\). This is, if the observable market value given by the market capitalization available in any financial site is \(E_0=3\), we can retrieve the theoretical value of \(E_0\) so we can compare whether the firm observable market value is over or undervalued. In other words, the Black-Scholes-Merton model can give us an estimate of the firm value. In particular, the model gives the value of the equity today as:
\(E_0=V_0N(d_1)-De^{-rT}N(d_2)\)
This is equation 24.3 in (Hull 2015). To estimate \(E_0\), we need \(V_0\) and \(\sigma_V\) and these two values were already estimated before with the implementation of the min.footnote10 function above. It is true that this minimization requires \(E_0\), the point here is to use the observable market capitalization \(E_0=3\) to find out the unobservable \(V_0\) and \(\sigma_V\) and then estimate the theoretical value of \(E_0\).
The code is:
Code
# Black-Scholes call.
E0.theo <- function(V0, D, rf, TT, sv) {
d1 <- (log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))
d2 <- d1 - sv * sqrt(TT)
c <- V0 * pnorm(d1) - D * exp(-rf * TT) * pnorm(d2)
}
V0 <- 12.39578 # Assets.
D <- 10 # Debt.
rf <- 0.05 # Risk-free rate.
TT <- 1 # Maturity.
sv <- 0.2123041 # Volatility of assets.
(E0.theo(V0, D, rf, TT, sv))
[1] 3.000357
Then, we can argue that the market value of the firm is slightly undervalued since the theoretical value of the firm is 3.000357 compared with 3. This approach opens the possibility to value firms. It also opens the possibility to compare the book value in the Balance sheet of the total assets versus the estimate \(V_0\). This will allow us to understand the difference between book value and market value of the assets, not only of the equity. We can even implement a similar analysis about the market value of the debt just as we did before.
2.4 The probability of default as a function of some parameters.
We can expand our analysis by evaluating how changes in parameters change the probability of default in the context of the Merton model. This is interesting because we can move from estimating a probability of default to propose changes in the firm’s parameters like the capital structure to reduce a firm’s probability of default. First, we need the probability of default as a function.
Code
pd <- function(E0, se, rf, TT, D) {
eq24.3 <- function(V0, sv) {
((V0 * pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) -
D * exp(-rf * TT) * pnorm(((log(V0 / D) +
(rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) - sv * sqrt(TT)) - E0)) }
eq24.4 <- function(V0, sv) {
((pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) *
sv * V0 - se * E0)) }
min.footnote10 <- function(x)
(eq24.3(x[1], x[2]))^2 + (eq24.4(x[1], x[2]))^2
V0_sv <- optim(c(1, 1), min.footnote10)
V0 <- V0_sv$par[1]
sv <- V0_sv$par[2]
pd <- pnorm(-(((log(V0 / D) + (rf + sv^2 / 2) * TT) /
(sv * sqrt(TT))) - sv * sqrt(TT)))
pd }
Now we have a convenient function: \(pd = f(E_0, \sigma_E, rf, T, D)\). Remember these five parameters are required to estimate \(V_0\) and \(\sigma_V\) before calculating \(pd\).
Now, we create vectors of parameters and evaluate the pd
function.
Code
l = 50 # Vectors of length 50.
# Create vectors of the parameters.
x.E <- seq(from = 1, to = 20, length.out = l)
x.rf <- seq(0, 4, length.out = l)
x.D <- seq(1, 20, length.out = l)
x.T <- seq(0.5, 20, length.out = l)
x.se <- seq(0, 3, length.out = l)
# Evaluate the pd at different values.
pd.E <- mapply(pd, x.E, se, rf, TT, D)
pd.rf <- mapply(pd, E0, se, x.rf, TT, D)
pd.D <- mapply(pd, E0, se, rf, TT, x.D)
pd.T <- mapply(pd, E0, se, rf, x.T, D)
pd.se <- mapply(pd, E0, x.se, rf, TT, D)
Let’s plot the probability of default as a function of equity. What we are doing here is to evaluate the probability of default function 50 times, as the \(E_0\) has 50 values from 1 to 20. By doing this, we end up with 50 values of the probability of default.
Code
These relationships between the parameters and the probability of default are all non-linear. This is interesting because according to the model, the current levels of the parameters are important when deciding which parameter might lead to a high or low impact over the probability of default.
Code
Probability of default as a function of debt. Note that adding 1 or subtracting 1 unit of debt has different impact over the probability of default.
Code
Probability of default as a function of maturity. Note that asking for a higher debt maturity (as in a negotiation), do not lead to a lower probability of default if we keep the rest of the parameters unchanged in our firm. Imagine we arrange a meeting with the bank manager and we ask for more time to pay our debt. The bank manager should not (in principle) extend the debt maturity as long as we commit to do some changes in the firm. For example, we can negotiate a longer maturity and promise to increase the firm’s equity.
Code
Probability of default as a function of the volatility of the equity \(\sigma_E\).
Code
In the previous plots we evaluate the probability of default function by changing one of the following parameters: \(E_0\), \(\sigma_E\), \(rf\), \(T\) or \(D\). We were able to do that because we constructed a vector of 50 different values for these five parameters. Note that \(V_0\) and \(\sigma_V\) do not remain fixed because they are at the same time a function of \(E_0\), \(\sigma_E\), \(rf\), \(T\) and \(D\). Now, let’s start by plotting the probability of default as a function of \(E_0\) and \(\sigma_E\).
One alternative is to evaluate the probability of default function 50 times by taking the 50 pairs of values of \(E_0\) and \(\sigma_E\). By doing that, we will end with three vectors of size 50 that we can plot as a three dimension scatter plot.
Code
The red point represent the case of the original textbook example in which \(pd = f(E_0=3, \sigma_E=0.8,rf=0.05,T=1,D=10)=0.1269396\). Note that the red point is not part of the 50 plotted observations simply because there is no case in which the probability of default function is evaluated in \(E_0=3, \sigma_E=0.8\). Although the plot above is not incorrect, it might be incomplete as we are not showing all possible values of the probability of default. One alternative to show a more complete plot is to evaluate all possible combinations of \(E_0\) and \(\sigma_E\) to evaluate the probability of default function. If we do that, we will have 50 values for \(E_0\) and \(\sigma_E\) that represent the \(x\) and \(y\) axis, and a \(50 \times 50\) matrix containing the probability of default. This allows us to plot a surface plot and a contour plot.
Let’s plot the results.
Code
# Plot results.
p.ED1 <- persp(x.E, x.se, p_E_se, zlab = "pd",
xlab = "Equity at time zero",
ylab = "Volatility of equity", theta = 60, phi = 10,
expand = 0.5, col = "orange", shade = 0.2,
ticktype = "detailed")
# Add the original pd value as in the textbook example.
points(trans3d(E0, se, pd1, p.ED1), cex = 2, pch = 19, col = "blue")
Now the plot becomes a plane. A contour plot simplify the reading of a three dimension plot by reducing it into two dimensions: \(x\) and \(y\) coordinates. The value of the probability of default is represented by the contour lines.
Code
The blue point is the original case. Note that the value is slightly higher than 0.1, which is consistent with the \(pd=0.1269396\).
Code
l = 40 # Vectors of length 50.
# Create vectors of the parameters.
x.E <- seq(from = 2, to = 4, length.out = l)
x.se <- seq(0.2, 1.5, length.out = l)
# Create the empty matrix.
p_E_se <- matrix(0, nrow = l, ncol = l)
# Fill the empty matrix with probability of default values.
for(i in 1 : l){ # Is there an easier way to do this?
for(j in 1 : l){
p_E_se[i,j] <- mapply(pd, x.E[i], x.se[j], rf, TT, D) } }
Code
plot_ly(type = "surface" , x = x.se, y = x.E , z = p_E_se ) %>%
layout(#title = "Minimization of f at V0=12.395 an sv=0.212",
scene = list(xaxis = list(title = "Volatility of equity"),
yaxis = list(title = "Market value of equity"),
zaxis = list(title = "Probability of default"))) %>%
hide_colorbar()
2.5 GoT: capital structure.
The problem. Daenerys Targaryen owns a big manufacturing firm that produces fire extinguishers. Surprisingly, the firm data corresponds exactly as the data in the example 24.3 ((Hull 2015)). She is planning to ask the Iron Bank of Braavos for a loan using the peaceful civilized way (without dragons). However, she would like to reduce the probability of default of the firm first, in that way she might negotiate better credit conditions. Daenerys has some understanding about very basic finance because she knows that she could either reduce the debt, or increase the capital in order to reduce the probability of default. Doing both alternatives at the same time is clearly more expensive so she is not very keen about it.
What is the best strategy to reduce the probability of default? Reduce the debt by 2 or increase the capital by 2?
Code
D.seq <- seq(from = 0, to = 13, by = 0.1)
# Evaluate pd function: E0 changes from 1 to 5; debt goes from 0 to 13.
E1 <- mapply(pd, 1, se, rf, TT, D.seq)
E2 <- mapply(pd, 2, se, rf, TT, D.seq)
E3 <- mapply(pd, 3, se, rf, TT, D.seq)
E4 <- mapply(pd, 4, se, rf, TT, D.seq)
E5 <- mapply(pd, 5, se, rf, TT, D.seq)
Plot the results.
Code
# We use these vectors for colors and legends in the following plots.
colors <- c("green", "purple", "black", "blue", "red")
leg <- c("E0=1", "E0=2", "E0=3 (initial value)", "E0=4", "E0=5")
# Plot.
plot(D.seq, E1, type = "l", col = "green", lwd = 3,
xlab = "Debt value. Initially, D=10",
ylab = "Probability of default. Initially PD=12.69%")
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = D, lty = 2)
abline(h = pd1, lty = 2)
legend("bottomright", legend = leg, lwd = rep(3, 5), col = colors,
bg = "white")
points(D, pd1, pch = 19, cex = 2)
Let’s illustrate the alternatives as clear as possible.
Code
plot(D.seq, E1, type = "l", col = "green", lwd = 3,
xlab = "Debt value. Initially, D=10",
ylab = "Probability of default. Initially PD=12.69%")
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = D, lty = 2)
abline(v = 8, lty = 2)
abline(h = pd1, lty = 2)
abline(h = pd(E0, se, rf, TT, 8), lty = 2)
abline(h = pd(5, se, rf, TT, D), lty = 2)
legend("bottomright", legend = leg, lwd = rep(3, 5), col = colors,
bg = "white")
points(D, pd1, pch = 19, cex = 2)
points(D, pd(5, se, rf, TT, D), pch = 19, cex = 2, col = "red")
points(8, pd(3, se, rf, TT, 8), pch = 19, cex = 2, col = "yellow")
Consider a different initial situation. Now, \(E_0=2\) and \(D=2\) and the rest remains the same. In this case, the probability of default is now 7.13%. What is the best strategy to reduce the probability of default? Reduce the debt by 1 or increase the capital by 1?
Code
leg2 <- c("E0=1", "E0=2 (initial value)", "E0=3", "E0=4", "E0=5")
plot(D.seq, E1, type = "l", col = "green", lwd = 3,
xlab = "Debt value. Initially, D=2",
ylab = "Probability of default. Initially PD=7.13%",
xlim = c(0, 2.5), ylim = c(0, 0.08))
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = 2, lty = 2)
abline(h = pd(2, se, rf, TT, 2), lty = 2)
legend("bottomright", legend = leg2, lwd = rep(3, 5), col = colors,
bg = "white", cex = 0.6)
points(2, pd(2, se, rf, TT, 2), pch = 19, cex = 2, col = "purple")
The solution can be illustrated as follows:
Code
plot(D.seq, E1, type = "l", col = "green", lwd = 3,
xlab = "Debt value. Initially, D=2",
ylab = "Probability of default. Initially PD=7.13%",
xlim = c(0, 2.5), ylim = c(0, 0.08))
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = 2, lty = 2)
abline(v = 1, lty = 2)
abline(h = pd(2, se, rf, TT, 2), lty = 2)
abline(h = pd(2, se, rf, TT, 1), lty = 2)
abline(h = pd(E0, se, rf, TT, 2), lty = 2)
legend("bottomright", legend = leg2, lwd = rep(3, 5), col = colors,
bg = "white", cex = 0.6)
points(2, pd(2, se, rf, TT, 2), pch = 19, cex = 2, col = "purple")
points(1, pd(2, se, rf, TT, 1), pch = 19, cex = 2, col = "yellow")
points(2, pd(E0, se, rf, TT, 2), pch = 19, cex = 2)
Interesting.
Consider a third problem. In a parallel universe, Daenerys Targaryen’s firm has some liquidity short-term troubles. She needs either more cash now or some more time to pay her current debt. She would like to know which alternative leads to the lowest increase of the probability of default. Add $2 more to her current debt, or ask for a half year more time to pay her current debt?
Code
leg3 <- c("T=2.5", "T=2", "T=1.5", "T=1", "T=0.5")
plot(D.seq, T5, type = "l", col = "green", lwd = 3,
xlab = "Debt value. Initially, D=6",
ylab = "Probability of default. Initially PD=20.33%")
lines(D.seq, T4, col = "purple", lwd = 3)
lines(D.seq, T3, col = "black", lwd = 3)
lines(D.seq, T2, col = "blue", lwd = 3)
lines(D.seq, T1, col = "red", lwd = 3)
abline(v = 6, lty = 2)
abline(h = pd(E0, se, rf, 1.5, 6), lty = 2)
legend("bottomright", legend = leg3, lwd = rep(3, 5), col = colors,
bg = "white")
points(6, pd(E0, se, rf, 1.5, 6), pch = 19, cex = 2)
Let’s visualize the result.
Code
plot(D.seq, T5, type = "l", col = "green", lwd = 3,
xlab = "Debt value. Initially, D=6",
ylab = "Probability of default. Initially PD=20.33%")
lines(D.seq, T4, col = "purple", lwd = 3)
lines(D.seq, T3, col = "black", lwd = 3)
lines(D.seq, T2, col = "blue", lwd = 3)
lines(D.seq, T1, col = "red", lwd = 3)
abline(v = 6, lty = 2)
abline(v = 8, lty = 2)
abline(h = pd(E0, se, rf, 1.5, 6), lty = 2)
abline(h = pd(E0, se, rf, 2, 6), lty = 2)
abline(h = pd(E0, se, rf, 1.5, 6), lty = 2)
abline(h = pd(E0, se, rf, 1.5, 8), lty = 2)
legend("bottomright", legend = leg3, lwd = rep(3, 5), col = colors,
bg = "white")
points(6, pd(E0, se, rf, 1.5, 6), pch = 19, cex = 2)
points(8, pd(E0, se, rf, 1.5, 8), pch = 19, cex = 2, col = "yellow")
points(6, pd(E0, se, rf, 2, 6), pch = 19, cex = 2, col = "purple")
Nice.
3 The Gaussian copula model.
Copulas allow us to decompose a joint probability distribution into their marginals (which by definition have no correlation) and a function which couples (hence the name) them together and thus allows us to specify the correlation separately. The copula is that coupling function. Here, we introduce the simplest type of copulas into a very common problem in credit risk which is the time to default.
3.1 The basics.
In a finance-context, the variable \(x\) represents a firm’s future (unknown) performance measure that goes from \(-4\) to \(4\) in the horizontal axis. Strictly speaking, more extreme values of \(x\) like \(-\infty\) and \(+\infty\) are theoretically possible but are very rare and happen quite infrequently at least in real-life situations. At the moment, we do not care too much about what kind of performance measure this is, it could be liquidity for example, solvency, or any other that it is normalized to have values between \(-4\) and \(4\). In a statistics-context, we can think that \(x\) is equivalent to the so-called \(z\)-score in the context of the standardized normal distribution function.
Code
# The standard normal distribution function is: y = f(x).
x <- seq(-4, 4, 0.001) # First define x.
y <- 1 / sqrt(2 * pi) * exp(-x ^ 2 / 2) # Define y as a function of x.
# Now plot.
plot(x, y, type = "l", lwd = 3, col = "blue" ,
ylab = "Standard normal density function: dnorm(x)")
abline(h = 0, lty = 2)
The copula model, including the Gaussian, considers that the future firm performance measure \(x\) in the horizontal axis is related with the firm’s probability of default (from 0 to 1, or 0% to 100%) represented by the value of the cumulative density function pnorm()
of the normal distribution.
Assume the firm will default in six months as long as the firm performance \(x\) in six months is lower than a given threshold: \(\bar{x}=-2.5\). The threshold \(\bar{x}=-2.5\) in the Gaussian copula model is equivalent to the \(-d_2\) in the Merton model. Then, the six months firm’s probability of default in math notation is \(N(-2.5)\), or pnorm(-2.5)=0.006209665
in R code, which is very low as it is close to 0. It is the red area below, which is 0.62% of the 100% possible area under the curve.
Code
Now assume the firm will default in one year as long as the firm performance \(x\) in one year is lower than a higher given threshold: \(\bar{x}=-2\). Then, the one year probability of default is \(N(-2)\) or pnorm(-2)=0.02275013
which is still low but higher than the six months probability of default. Note that the new red area is 2.27%.
Code
At the moment, we have not discussed how to estimate the threshold values \(\bar{x}\). However, we can say that the Gaussian copula model assumes that the probability of default increases as time passes. As if the firm will eventually default in the future. The six month probability of default 0.62% is low because it is not very likely that the future firm performance lies below -2.5 in six months. The one year probability of default 2.27% is a bit higher because it is more likely that the future firm performance lies below -2 in one year.
Then, the pnorm()
R function allows us to transform threshold values \(\bar{x}\) into a probabilities of default. Transform probabilities into \(\bar{x}\) is also possible as the function qnorm()
is the inverse of pnorm()
.
From \(\bar{x}\) to probabilities:
From probabilities to \(\bar{x}\):
Nice.
The function dnorm()
is relevant when we implement a graphical approach because it represents how frequent (or how likely) these values are given the standard normal distribution, so in both extreme values of \(x\) the value of dnorm()
is low. Here, extreme values of \(x\) can be represented by \(-4\) and \(4\). This function delivers the height of the standard normal distribution, dnorm(4)=0.0001338302
, and dnorm(-4)=0.0001338302
. Given that the standard normal function is symmetrical, we have that in general: dnorm(-x)=dnorm(x)
. The letter \(d\) in dnorm()
stands for density and it is maximum at \(x=0\). When plotting the density values we get the bell-shaped normal curve.
See how these pnorm()
, qnorm()
, dnorm()
R functions work and relate:
Code
# Here, x has 11 values only.
x.summary <- c(-Inf, -4:4, Inf) # vector of x values to evaluate functions.
ans <- data.frame(x.summary, dnorm(x.summary),
pnorm(x.summary), qnorm(pnorm(x.summary)))
colnames(ans) <- c("x", "dnorm(x)=height", "pnorm(x)=pd", "qnorm(pd)=x")
kable(ans, caption = "Review of standard normal distribution functions.",
digits = 5)
x | dnorm(x)=height | pnorm(x)=pd | qnorm(pd)=x |
---|---|---|---|
-Inf | 0.00000 | 0.00000 | -Inf |
-4 | 0.00013 | 0.00003 | -4 |
-3 | 0.00443 | 0.00135 | -3 |
-2 | 0.05399 | 0.02275 | -2 |
-1 | 0.24197 | 0.15866 | -1 |
0 | 0.39894 | 0.50000 | 0 |
1 | 0.24197 | 0.84134 | 1 |
2 | 0.05399 | 0.97725 | 2 |
3 | 0.00443 | 0.99865 | 3 |
4 | 0.00013 | 0.99997 | 4 |
Inf | 0.00000 | 1.00000 | Inf |
You can type for example ?dnorm
in the RStudio console to see more details about this (and other) functions.
Note that the only distribution function that we are currently analyzing is the standard normal (Gaussian) distribution function. A standard normal distribution function is the one that has mean 0 and variance 1. There are other copula models that assume other kinds of distributions. These other kinds of copulas are useful when we are interested in modeling cases in which extreme values are more likely to happen compared with the standard normal distribution. Understanding Gaussian copulas allows you to understand other more elaborated copulas.
Instead of a density function, we can plot the cumulative probability distribution. Now, we do not need the \(N(\cdot)\) function to compute the probability as the vertical axis already represents the probability of default.
Code
Both in the same plot.
3.2 Introduction to example 24.6.
Here, we start analyzing Hull (2015) example 24.6. We do not analyze the 10 firms yet as in the original example. Instead, we first make some sense about the data, the relevant analysis, the logic, and the model basics before dealing with the full features in example in Hull (2015) 24.6.
Recall that, pnorm()
leads to a probability, whereas qnorm()
leads to a value of \(x\). According to Hull (2015), the cumulative probabilities of default of 1%, 3%, 6%, 10% and 15% are taken as given for the maturities of 1, 2, 3, 4 and 5 years respectively. To implement the model, we take this information to derive the corresponding threshold values \(\bar{x}\):
Code
year | pd | x (values given in Hull) |
---|---|---|
1 | 0.01 | -2.326348 |
2 | 0.03 | -1.880794 |
3 | 0.06 | -1.554774 |
4 | 0.10 | -1.281552 |
5 | 0.15 | -1.036433 |
Note that the example assumes that the probability of default increases as we consider a longer maturity (from 1 year to 5 years). Probabilities above are cumulative, so they go from year 0 to year 1, from year 0 to year 2 and so on. To calculate the probability of default during a specific year we need to calculate the differences. In particular:
Code
PD | |
---|---|
x.y1 to x.y2 | 0.02 |
x.y2 to x.y3 | 0.03 |
x.y3 to x.y4 | 0.04 |
x.y4 to x.y5 | 0.05 |
This is how we can illustrate the case of a probability of default of 15% in 5 years. The green area represents the 15% of the whole area below the bell-shaped curve.
Code
plot(x, D, type = "l", col = 'black', lwd = 3, ylim = c(0, 0.4), xlab = "x",
ylab = "Standard normal density function: dnorm(x)")
abline(h = 0, lty = 2)
abline(v = x.y[5], lty = 2)
polygon(c(x[x < x.y[5]], x.y[5]),
c(D[x < x.y[5]], 0), col = "green")
legend("topright", legend=c(
"pd(5years)=15%
The left green area
represents 15% of the
whole bell-shape.
N^-1(0.15)=-1.036433
N(-1.036433)=0.15"),
bg = "white", pch = 19, cex = 0.8, bty = "n", col = "green")
A complementary view is the cumulative probability function. Let’s illustrate the same case: a probability of default of 15% in 5 years. In this case, we do not need to calculate the area since the y-axis already represents the probability.
Code
plot(x, P, type = "l", col = 'black', lwd = 3, ylim = c(0, 1),
xlab = "x",
ylab = "Cumulative probability function: pnorm(x)")
abline(h = 0, lty = 2)
abline(h = 1, lty = 2)
lines(seq(-5, x.y[5], length.out = 2), rep(pnorm(x.y[5]), 2),
col = "green", lwd = 3)
lines(rep(x.y[5], 2), seq(0, pnorm(x.y[5]), length.out = 2),
col = "green", lwd = 3)
points(x.y[5], 0.15, pch = 19, col = "green", cex = 2)
legend("right", legend=c(
"pd(5years)=15%
N^-1(0.15)=-1.036433
N(-1.036433)=0.15"),
bg = "white", pch = 19, cex = 1, bty = "n", col = "green")
A closer view to the figure above to see the 3-year and 5-year cases:
Code
plot(x, P, type = "l", col = 'black', lwd = 5, ylim = c(0, 0.22),
xlim = c(-4, 0), ylab = "Cumulative probability function: pnorm(x)",
xlab = "x")
abline(h = 0, lty = 2)
abline(h = 1, lty = 2)
lines(seq(-5, x.y[3], length.out = 2), rep(pnorm(x.y[3]), 2),
col = "purple", lwd = 3, lty = 2)
lines(rep(x.y[3], 2), seq(0, pnorm(x.y[3]), length.out = 2),
col = "purple", lwd = 3, lty = 2)
lines(seq(-5, x.y[5], length.out = 2), rep(pnorm(x.y[5]), 2),
col = "green", lwd = 3, lty = 2)
lines(rep(x.y[5], 2), seq(0, pnorm(x.y[5]), length.out = 2),
col = "green", lwd = 3, lty = 2)
points(x.y[3], 0.06, pch = 19, col = "purple", cex = 2)
points(x.y[5], 0.15, pch = 19, col = "green", cex = 2)
legend("topleft", legend=c("pd(5years)=15%: N(-1.036433)=0.15",
"pd(3years)=6%: N(-1.554774)=0.06"),
pch = 19, col = c("green", "purple"), bg = "white", cex = 1, bty = "n")
Nice.
3.3 One firm.
Now let’s introduce a simulation approach. This is, instead of generating continuous values of \(x\) from \(-4\) to \(4\), we simulate many \(x\) values (10,000 in this case) that follow a standard normal distribution function using the rnorm()
function. The simulation approach is useful especially when we are interested in replicating what happens in real-life situations because we can artificially replicate the observed behavior many times and understand what may happen in the future. In other words, x
was used before to characterize a perfect normal distribution. Now, we incorporate x.sim
that behaves as a normal distribution and allows some randomness just as in real life situations.
See the difference between both approaches to generate \(x\).
Code
x.sim | x |
---|---|
-1.7939058 | -4.000 |
-0.7515449 | -3.999 |
-0.3569900 | -3.998 |
0.9629299 | -3.997 |
0.3680735 | -3.996 |
1.0714250 | -3.995 |
Note that the probabilities of default per maturity in the simulated approach are close to the values of the previous section. In particular, \(0.01\) is equivalent to \(0.0102\), and \(0.03\) is equivalent to \(0.0307\). They do not match exactly simply because we are comparing theoretical versus simulated probabilities.
Code
# Function to calculate proportions that we understand as probabilities.
prop <- function(x) {
ans <- length(x.sim[x.sim <= x]) / N
}
pd.sim <- mapply(prop, x.y) # Apply the function.
ans <- data.frame(x.y, pd, pd.sim)
colnames(ans) <- c("x", "pd.theo", "pd.sim")
rownames(ans) <- c("y1", "y2", "y3", "y4", "y5")
kable(ans, caption = "Theoretic versus simulated probabilities of default.")
x | pd.theo | pd.sim | |
---|---|---|---|
y1 | -2.326348 | 0.01 | 0.0102 |
y2 | -1.880794 | 0.03 | 0.0307 |
y3 | -1.554774 | 0.06 | 0.0629 |
y4 | -1.281552 | 0.10 | 0.1008 |
y5 | -1.036433 | 0.15 | 0.1480 |
Let’s view the results of the simulated approach. First in a histogram.
Code
# Some parameters we need to plot.
L <- c(-4, 4) # axis limits.
colors2 <- c("blue", "red", "purple", "pink", "green")
legend2 = c("pd(1year)=1.02%: x<=-2.326348",
"pd(2years)=3.07%: x<=-1.880794", "pd(3years)=6.29%: x<=-1.554774",
"pd(4years)=10.08%: x<=-1.281552", "pd(5years)=14.8%: x<=-1.036433")
# The histogram.
hist(x.sim, 500, xlim = L, ylim = c(0, 100), main = NULL, xlab = "x.sim")
abline(h = 0, lty = 2)
abline(v = x.y[1], lwd = 3, col = "blue")
abline(v = x.y[2], lwd = 3, col = "red")
abline(v = x.y[3], lwd = 3, col = "purple")
abline(v = x.y[4], lwd = 3, col = "pink")
abline(v = x.y[5], lwd = 3, col = "green")
legend("topright", legend = legend2, bg = "transparent",
text.col = colors2, cex = 0.9)
The area at the left hand side of each colored line represents the cumulative probability of default just as we explained before. In the same way, the area between two colored lines represents the probability of default in a specific period of time.
Now let’s see all the simulated data at once.
Code
plot(x.sim, ylab = "x.sim", pch = ".", ylim = c(-4, 4),
xlab = "Number of simulaltions")
abline(h = x.y[1], lwd = 2, col = "blue")
abline(h = x.y[2], lwd = 2, col = "red")
abline(h = x.y[3], lwd = 2, col = "purple")
abline(h = x.y[4], lwd = 2, col = "pink")
abline(h = x.y[5], lwd = 2, col = "green")
abline(v = 0, lty = 2)
abline(v = 10000, lty = 2)
# legend("topright", legend = legend2, bg = "white",
# text.col = colors2, cex = 0.9)
We normally conduct a simulation approach because we might adapt the distribution function parameters to match what we see in the real life situations. This is how we can reproduce what may happen in the future and at the same time allow for some randomness.
In a different context, here we have an old video in which we drop 2,000 blue balls. Balls are dropped randomly given a normal distribution. Note how the bell-shape distribution is formed as we increase the number of balls.
3.4 Two firms.
Consider the case in which we have two firms instead of one. The main difference now is that instead of simulating one firm we need two. Moreover, each firm follows a standard normal distribution function and both of them are correlated by a given correlation value so the firms are not independent. If they are not independent, then what happens to one firm has some impact on what happens to the other.
In a different context, here we have an old video in which we drop many blue balls. Balls are dropped randomly given a multi-variate normal distribution. Note how the 3-D bell-shape distribution is formed as we increase the number of balls.
The case of two firms is not the one presented in Hull (2015) example but it can help us to visualize how the Gaussian copula approach works in a two-dimension plot where the default correlation is 0.2.
The simulation of both firms’ performance measures is x2
.
Code
m <- 2 # number of firms
n <- 10000 # number of simulations
rho_pos02 <- 0.2 # correlation
corr_pos02 <- matrix(rep(rho_pos02, m * m), m, m) # correlation matrix
diag(corr_pos02) <- 1
set.seed(130575)
x2 <- mvrnorm(n, mu = rep(0, m), Sigma = corr_pos02)
x2 <- data.frame(x2)
colnames(x2) <- c("f1", "f2")
The matrix x2
length is 10,000 for each firm. In other words, we have 10,000 observations of the performance measure or \(z\)-score for two firms that are related. This matrix is big, but we can visualize the header (the first six observations).
f1 | f2 | |
---|---|---|
1 | 0.0880039 | -2.8671108 |
2 | 0.0092849 | -1.1735732 |
3 | 0.0323809 | -0.5854275 |
4 | 1.5547886 | -0.0630241 |
5 | 0.9679767 | -0.3977597 |
6 | 0.7751329 | 0.8847116 |
Remember the cumulative probabilities of default are 1%, 3%, 6%, 10% and 15% for the maturities of 1, 2, 3, 4 and 5 years respectively. How do we extract those cases in which both firms will default in 5 years? In rows 15, 53, 61 and so on both firms default at the same time. Note that in all cases the \(x\) values are indeed below -1.036433.
Code
# These names are going to be useful later.
n.year <- c("year 1", "year 2", "year 3", "year 4", "year 5")
n.pd <- c("pd.y1", "pd.y2", "pd.y3", "pd.y4", "pd.y5")
n.f <- c("f1", "f2", "f1 default?", "f2 default?")
# Function to calculate cases in which firms default and probabilities.
fun.X <- function(x) {
both <- x2[x2$f2 < x & x2$f1 < x, ] # both default.
any <- x2[x2$f2 < x | x2$f1 < x, ] # at least one firm default.
onlyf1 <- x2[x2$f2 > x & x2$f1 < x, ] # only firm 1 default.
onlyf2 <- x2[x2$f1 > x & x2$f2 < x, ] # only firm 2 default.
one <- x2[(x2$f2 < x & x2$f1 > x | # only one firm default.
x2$f2 > x & x2$f1 < x),]
none <- x2[x2$f2 > x & x2$f1 > x, ] # no firm default.
# Gather results and probabilities in a list.
ans <- list(both = both, any = any, onlyf1 = onlyf1,
onlyf2 = onlyf2, one = one, none = none,
both.pd = (nrow(both) / n), any.pd = (nrow(any) / n),
onlyf1.pd = (nrow(onlyf1) / n), onlyf2.pd = (nrow(onlyf2)/ n),
one.pd = (nrow(one) / n), none.pd = (nrow(none) / n))
}
# X has all the relevant results for x2.
X <- mapply(fun.X, x.y)
See the cases in which both firms default in 5 years.
Code
f1 | f2 | f1 default? | f2 default? | |
---|---|---|---|---|
15 | -1.123330 | -1.783569 | TRUE | TRUE |
53 | -2.724912 | -1.235614 | TRUE | TRUE |
61 | -1.968789 | -1.516563 | TRUE | TRUE |
108 | -2.121617 | -1.807062 | TRUE | TRUE |
156 | -1.559930 | -1.132879 | TRUE | TRUE |
157 | -1.194204 | -2.120824 | TRUE | TRUE |
In total, we have 343 cases in which both firms default at the same time. The first case is number 15, the second 53, the third 61 and so on. It is easy to know the total cases if we count the number of rows.
How do we extract those cases in which any firm will default in 5 years? This is, only firm 1, only firm 2, or both at the same time. This is a less strict condition so we would expect to have more cases to match this new criteria compared with both
. Note that in all cases at least one one firm is indeed below -1.036433. In row 1, 2 and 10 firm 2 defaults. In row 15 both firms default. In row 18 and 20 firm 1 and firm 2 default respectively.
Code
f1 | f2 | f1 default? | f2 default? | |
---|---|---|---|---|
1 | 0.0880039 | -2.8671108 | FALSE | TRUE |
2 | 0.0092849 | -1.1735732 | FALSE | TRUE |
10 | 1.4475216 | -1.1512274 | FALSE | TRUE |
15 | -1.1233296 | -1.7835692 | TRUE | TRUE |
18 | -1.5829965 | -0.7174298 | TRUE | FALSE |
20 | -0.9221666 | -1.9982869 | FALSE | TRUE |
Now we have 2,660 cases. Considerably more as the |
restriction is less strict than the &
. Let’s see the cases in which only firm 1 defaults.
Code
f1 | f2 | f1 default? | f2 default? | |
---|---|---|---|---|
18 | -1.582996 | -0.7174298 | TRUE | FALSE |
34 | -2.842355 | 1.5400020 | TRUE | FALSE |
35 | -2.090320 | 0.7431275 | TRUE | FALSE |
36 | -1.600142 | 0.3082653 | TRUE | FALSE |
41 | -2.193496 | -0.8465008 | TRUE | FALSE |
49 | -1.764711 | -0.3914202 | TRUE | FALSE |
Only firm 2 defaults.
Code
f1 | f2 | f1 default? | f2 default? | |
---|---|---|---|---|
1 | 0.0880039 | -2.867111 | FALSE | TRUE |
2 | 0.0092849 | -1.173573 | FALSE | TRUE |
10 | 1.4475216 | -1.151227 | FALSE | TRUE |
20 | -0.9221666 | -1.998287 | FALSE | TRUE |
22 | -0.1110555 | -1.348605 | FALSE | TRUE |
40 | 0.1058949 | -1.752620 | FALSE | TRUE |
Only one firm default at year 5.
Code
f1 | f2 | f1 default? | f2 default? | |
---|---|---|---|---|
1 | 0.0880039 | -2.8671108 | FALSE | TRUE |
2 | 0.0092849 | -1.1735732 | FALSE | TRUE |
10 | 1.4475216 | -1.1512274 | FALSE | TRUE |
18 | -1.5829965 | -0.7174298 | TRUE | FALSE |
20 | -0.9221666 | -1.9982869 | FALSE | TRUE |
22 | -0.1110555 | -1.3486051 | FALSE | TRUE |
Lastly, when no firm defaults.
Code
f1 | f2 | f1 default? | f2 default? | |
---|---|---|---|---|
3 | 0.0323809 | -0.5854275 | FALSE | FALSE |
4 | 1.5547886 | -0.0630241 | FALSE | FALSE |
5 | 0.9679767 | -0.3977597 | FALSE | FALSE |
6 | 0.7751329 | 0.8847116 | FALSE | FALSE |
7 | 1.3361498 | 1.0392145 | FALSE | FALSE |
8 | 1.3039572 | 1.5191303 | FALSE | FALSE |
Finally, probabilities.
Code
both.pd <- t(data.frame(X["both.pd",]))
any.pd <- t(data.frame(X["any.pd",]))
onlyf1.pd <- t(data.frame(X["onlyf1.pd",]))
onlyf2.pd <- t(data.frame(X["onlyf2.pd",]))
one.pd <- t(data.frame(X["one.pd",]))
none.pd <- t(data.frame(X["none.pd",]))
ans <- data.frame(both.pd, any.pd, onlyf1.pd, onlyf2.pd,
one.pd, none.pd)
rownames(ans) <- n.year
kable(ans, caption = "Probabilities of default.")
both.pd | any.pd | onlyf1.pd | onlyf2.pd | one.pd | none.pd | |
---|---|---|---|---|---|---|
year 1 | 0.0003 | 0.0192 | 0.0103 | 0.0086 | 0.0189 | 0.9808 |
year 2 | 0.0019 | 0.0560 | 0.0282 | 0.0259 | 0.0541 | 0.9440 |
year 3 | 0.0066 | 0.1141 | 0.0561 | 0.0514 | 0.1075 | 0.8859 |
year 4 | 0.0176 | 0.1854 | 0.0864 | 0.0814 | 0.1678 | 0.8146 |
year 5 | 0.0343 | 0.2660 | 0.1210 | 0.1107 | 0.2317 | 0.7340 |
So interesting.
Note that 15% is the theoretical probability that one firm will default in 5 years. Here, this 15% is 12.1% for firm 1 and 11.07% for firm 2 when the data is simulated.
We can perform a simple test to verify that everything is alright. For example, this equation must hold: onlyf1+onlyf2=one
. Substituting for the year 5: \(0.121+0.1107=0.2317\). As you can see, everything is alright. This equation must hold as well: any-both=one
. Substituting for the year 5: \(0.266-0.0343=0.2317\).
Let’s visualize all 10,000 cases. Each dot represents a couple of Firm 1 and Firm 2 \(x\) values and the dotted lines the threshold that represents the probability of default in 5 years.
Code
These 10,000 observations are highly concentrated around the mean which is very close to zero. This can be also easily seen in the following density plot.
Code
This is the 3-D rayshader
video version of the plot above.
And a third interactive version.
Code
f <- function(x, y) dmnorm(cbind(x, y), c(0, 0), corr_pos02)
z <- outer(sort(x2[1:100,1]), sort(x2[1:100,2]), f)
#create contour plot
#contour(x2[1:100,1], x2[1:100,2], z)
plot_ly(type = "surface" , x = sort(x2[1:100,2]),
y = sort(x2[1:100,1]) , z = z ) %>%
layout(#title = "Surface",
scene = list(xaxis = list(title = "f1", range = c(-2,2)),
yaxis = list(title = "f2", range = c(-2,2)),
zaxis = list(title = "Density"))) %>%
hide_colorbar()
We can visualize the default cases. First, the case when both firms default at year 5.
Code
The values within the plot represent the number of cases. Here, we have 343 times (out of 10,000) in which both firms default at the same time in 5 years. Note that this is a cumulative probability of default.
Now, the case in which any firm defaults in 5 years.
Code
Only firm 1 defaults in 5 years.
Code
Only firm 2 defaults in 5 years.
Code
This is the case in which only one firm defaults.
Code
This is the case in which no one firm defaults.
Code
It is a good idea to summarize all previous plots in one.
Code
# none
par(mfrow = c(2, 3), oma = c(0, 0, 2, 0))
par(pty = "s")
plot(X[["none", 5]], xlim = L, ylim = L, pch = ".", cex = 1,
main = "None.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("bottomright", legend = c(paste(none.pd[5]*n)), bty = "n")
# both
par(pty = "s")
plot(X[["both", 5]], xlim = L, ylim = L, pch = ".", cex = 1,
main = "Both.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(both.pd[5]*n)), bty = "n")
# any
par(pty = "s")
plot(X[["any", 5]], xlim = L, ylim = L, pch = ".", cex = 1,
main = "Any.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(any.pd[5]*n)), bty = "n")
# onlyfirm1
par(pty = "s")
plot(X[["onlyf1", 5]], xlim = L, ylim = L, pch = ".", cex = 1,
main = "Only f1.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(onlyf1.pd[5]*n)), bty = "n")
# onlyfirm2
par(pty = "s")
plot(X[["onlyf2", 5]], xlim = L, ylim = L, pch = ".", cex = 1,
main = "Only f2.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(onlyf2.pd[5]*n)), bty = "n")
# one
par(pty = "s")
plot(X[["one", 5]], xlim = L, ylim = L, pch = ".", cex = 1,
main = "One.")
abline(h = x.y[5], lty = 2)
abline(v = x.y[5], lty = 2)
legend("topright", legend = c(paste(one.pd[5]*n)), bty = "n")
It is interesting to compare two different correlation values. Here, we compare 0 versus 0.2.
Code
m <- 2 # number of firms
n <- 10000 # number of simulations
rho_pos02 <- 0 # correlation
corr_pos02 <- matrix(rep(rho_pos02, m * m), m, m) # correlation matrix
diag(corr_pos02) <- 1
set.seed(130575)
X0 <- mvrnorm(n, mu = rep(0, m), Sigma = corr_pos02)
X0 <- data.frame(X0)
colnames(X0) <- c("f1", "f2")
Just as before, we create a function, evaluate it, and store results in X0
.
Code
fun.X0 <- function(x) {
both <- X0[X0$f2 < x & X0$f1 < x, ]
any <- X0[X0$f2 < x | X0$f1 < x, ]
onlyf1 <- X0[X0$f2 > x & X0$f1 < x, ]
onlyf2 <- X0[X0$f1 > x & X0$f2 < x, ]
one <- X0[(X0$f2 < x & X0$f1 > x |
X0$f2 > x & X0$f1 < x),]
none <- X0[X0$f2 > x & X0$f1 > x, ]
ans <- list(both = both, any = any, onlyf1 = onlyf1,
onlyf2 = onlyf2, one = one, none = none,
both.pd = (nrow(both) / n), any.pd = (nrow(any) / n),
onlyf1.pd = (nrow(onlyf1) / n), onlyf2.pd = (nrow(onlyf2) /n),
one.pd = (nrow(one) / n), none.pd = (nrow(none) / n)) }
X0 <- mapply(fun.X0, x.y)
none.pd0 <- data.frame(X0["none.pd",]) * n
both.pd0 <- data.frame(X0["both.pd",]) * n
onlyf1.pd0 <- data.frame(X0["onlyf1.pd",]) * n
onlyf2.pd0 <- data.frame(X0["onlyf2.pd",]) * n
A graphical analysis shows that in the case of 0.2 it is more likely that both firms default at the same time, and it is less likely that any firm default at the same time.
Code
par(mfrow=c(1, 2), oma = c(0, 0, 2, 0))
par(pty = "s")
plot(X[["none", 5]], pch = ".", xlim = L, ylim = L,
cex = 1, main = "Correlation = 0.2")
points(X[["both", 5]], pch = ".", col = "red")
points(X[["onlyf1", 5]], pch = ".", col = "purple")
points(X[["onlyf2", 5]], pch = ".", col = "blue")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("bottomleft", legend = c(paste(both.pd[5]*n)), bty = "n")
legend("topright", legend = c(paste(none.pd[5]*n)), bty = "n")
legend("topleft", legend = c(paste(onlyf1.pd[5]*n)), bty = "n")
legend("bottomright", legend = c(paste(onlyf2.pd[5]*n)), bty = "n")
par(pty = "s")
plot(X0[["none", 5]], pch = ".", xlim = L, ylim = L,
cex = 1, main = "Correlation = 0.")
points(X0[["both", 5]], pch = ".", col = "red")
points(X0[["onlyf1", 5]], pch = ".", col = "purple")
points(X0[["onlyf2", 5]], pch = ".", col = "blue")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("bottomleft", legend = c(paste(both.pd0[5])), bty = "n")
legend("topright", legend = c(paste(none.pd0[5])), bty = "n")
legend("topleft", legend = c(paste(onlyf1.pd0[5])), bty = "n")
legend("bottomright", legend = c(paste(onlyf2.pd0[5])), bty = "n")
par(pty = "s")
Very interesting indeed.
The Gaussian copula model is not the only copula model. See how other distributions work.
t-coupla:
Gumbel coupla:
Frank coupla:
Clayton coupla:
3.5 Ten firms.
The original Hull (2015) example proposes a 10 firm case and here we implement this example following a simulation approach. First, we need a \(10\times10\) correlation matrix to produce the new \(x\) values using a random multi-variate distribution algorithm. According to Hull (2015) example the copula default correlations between each pair of companies is 0.2. The code below has the option to vary the default correlation given a uniform random distribution.
The new \(10\times10\) correlation matrix is then:
Code
1.0 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 1.0 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 1.0 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 1.0 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 1.0 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 1.0 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 1.0 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 1.0 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 1.0 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 1.0 |
Now, we can simulate the multivariate normal distribution. The variable X10
length is 1,000,000 for each firm. We choose a higher number of simulations because otherwise the most restrictive constraints would not lead to any observations.
Code
# Create the simulated cases.
set.seed(130575) # Reproducibility.
X10 <- mvrnorm(n, mu = rep(0, m), Sigma = x)
X10 <- data.frame(X10) # 10,000,000 observations.
colnames(X10) <- c("f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10")
kable(head(X10), caption = "10 firms' performance, 1,000,000 simulations.",
digits = 3)
f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 | f10 |
---|---|---|---|---|---|---|---|---|---|
0.449 | 2.362 | 1.716 | -0.350 | 0.807 | 1.578 | -0.085 | 1.295 | 0.634 | 1.086 |
0.795 | -2.315 | 0.722 | -0.607 | -0.514 | 0.540 | 1.620 | 2.240 | 0.310 | 1.184 |
0.513 | 0.318 | 0.256 | -0.146 | 0.668 | 1.071 | 0.929 | -1.511 | 1.701 | -1.908 |
0.643 | -1.120 | -0.818 | 0.398 | -0.985 | -2.094 | 0.412 | -0.001 | -0.699 | -0.831 |
-1.404 | -0.385 | 0.916 | 1.768 | 0.785 | 0.425 | -0.873 | -1.025 | -0.164 | -1.990 |
0.038 | -0.818 | 0.402 | -1.354 | -0.153 | -0.830 | -1.523 | -0.856 | -0.433 | -0.142 |
How do we extract those cases in which all 10 firms will default in 5 years (at the same time)? Here are the first 6 of those cases. Note that in all cases the X10
values are lower than \(-1.036433\).
Code
# Given that we have 10 firms, it is easier to use filter_all function.
# Although probably this could be simplified even further.
y5.all.2 <- filter_all(X10, all_vars(. < x.y[5]))
y4.all.2 <- filter_all(X10, all_vars(. < x.y[4]))
y3.all.2 <- filter_all(X10, all_vars(. < x.y[3]))
y2.all.2 <- filter_all(X10, all_vars(. < x.y[2]))
y1.all.2 <- filter_all(X10, all_vars(. < x.y[1]))
Let’s analyze the case of 5 years.
Code
f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 | f10 |
---|---|---|---|---|---|---|---|---|---|
-1.625 | -1.198 | -1.740 | -2.203 | -1.489 | -3.152 | -1.395 | -1.409 | -2.809 | -2.027 |
-1.410 | -3.567 | -1.849 | -2.273 | -1.336 | -1.638 | -1.289 | -2.135 | -2.671 | -1.675 |
-3.617 | -1.495 | -1.210 | -1.367 | -3.220 | -1.481 | -1.784 | -1.891 | -1.443 | -1.375 |
-2.624 | -1.162 | -2.479 | -2.613 | -1.929 | -1.734 | -3.395 | -2.635 | -2.289 | -1.593 |
-1.834 | -1.063 | -1.888 | -1.390 | -3.494 | -1.280 | -2.108 | -1.664 | -4.007 | -2.058 |
-2.083 | -1.800 | -2.085 | -2.775 | -1.437 | -2.610 | -1.215 | -1.739 | -1.219 | -2.841 |
f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 | f10 |
---|---|---|---|---|---|---|---|---|---|
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
How many cases are there?
Which are those 72 cases?
Code
11119 | 92221 | 252520 | 350035 | 402689 | 496478 | 574643 | 633402 | 701255 |
18645 | 131986 | 266761 | 354133 | 413425 | 511731 | 576215 | 643935 | 708339 |
45042 | 165394 | 284189 | 363067 | 423852 | 522630 | 585654 | 644895 | 727092 |
57098 | 201832 | 284680 | 374237 | 437629 | 523010 | 615497 | 647511 | 747221 |
57696 | 227833 | 302542 | 384869 | 438202 | 527750 | 630487 | 650216 | 789221 |
81982 | 248381 | 328106 | 387617 | 477385 | 560115 | 630626 | 688847 | 797265 |
In total, we only have 72 cases. This is, the 10 firms will default at the same time in 5 years in 72 out of 1,000,000 total cases. For the rest of the years, the cases are less frequent, in fact we have zero cases for year 1, 2 and 3.
How do we extract those cases in which at least one of the 10 firms will default?
Here are the first 6 cases for the 5-year default.
f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 | f10 |
---|---|---|---|---|---|---|---|---|---|
0.795 | -2.315 | 0.722 | -0.607 | -0.514 | 0.540 | 1.620 | 2.240 | 0.310 | 1.184 |
0.513 | 0.318 | 0.256 | -0.146 | 0.668 | 1.071 | 0.929 | -1.511 | 1.701 | -1.908 |
0.643 | -1.120 | -0.818 | 0.398 | -0.985 | -2.094 | 0.412 | -0.001 | -0.699 | -0.831 |
-1.404 | -0.385 | 0.916 | 1.768 | 0.785 | 0.425 | -0.873 | -1.025 | -0.164 | -1.990 |
0.038 | -0.818 | 0.402 | -1.354 | -0.153 | -0.830 | -1.523 | -0.856 | -0.433 | -0.142 |
0.353 | -0.045 | -0.393 | -1.914 | -0.168 | -0.439 | -0.379 | -0.742 | -1.612 | -2.775 |
f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 | f10 |
---|---|---|---|---|---|---|---|---|---|
FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | TRUE |
FALSE | TRUE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE |
TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE |
FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE |
FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE |
How many cases are these?
In total, we have 682,148 cases. This is, at least one of 10 firms will default in 5 years in 682,148 of 1,000,000 cases.
And how to convert them into probabilities?
Code
atleastone02 <- t(data.frame(nrow(y1.any.2), nrow(y2.any.2),
nrow(y3.any.2), nrow(y4.any.2), nrow(y5.any.2)))
all02 <- t(data.frame(nrow(y1.all.2), nrow(y2.all.2),
nrow(y3.all.2), nrow(y4.all.2), nrow(y5.all.2)))
res02 <- data.frame(all02 / n, atleastone02 / n)
rownames(res02) <- n.year
colnames(res02) <- c("All firms", "At least one")
kable(res02, caption = "Probabilities of default (10 firms, corr=0.2).")
All firms | At least one | |
---|---|---|
year 1 | 0.0e+00 | 0.086835 |
year 2 | 0.0e+00 | 0.226086 |
year 3 | 0.0e+00 | 0.386264 |
year 4 | 1.0e-05 | 0.543684 |
year 5 | 7.2e-05 | 0.682440 |
Let’s see the difference when we assume a different correlation matrix. This case, the correlation vary randomly between 0.45 and 0.65.
Code
1.000 | 0.622 | 0.597 | 0.622 | 0.530 | 0.517 | 0.518 | 0.490 | 0.455 | 0.578 |
0.622 | 1.000 | 0.523 | 0.594 | 0.465 | 0.501 | 0.558 | 0.560 | 0.460 | 0.513 |
0.597 | 0.523 | 1.000 | 0.556 | 0.485 | 0.603 | 0.534 | 0.524 | 0.611 | 0.547 |
0.622 | 0.594 | 0.556 | 1.000 | 0.564 | 0.555 | 0.496 | 0.500 | 0.502 | 0.513 |
0.530 | 0.465 | 0.485 | 0.564 | 1.000 | 0.608 | 0.513 | 0.559 | 0.647 | 0.505 |
0.517 | 0.501 | 0.603 | 0.555 | 0.608 | 1.000 | 0.452 | 0.486 | 0.644 | 0.518 |
0.518 | 0.558 | 0.534 | 0.496 | 0.513 | 0.452 | 1.000 | 0.577 | 0.635 | 0.466 |
0.490 | 0.560 | 0.524 | 0.500 | 0.559 | 0.486 | 0.577 | 1.000 | 0.585 | 0.524 |
0.455 | 0.460 | 0.611 | 0.502 | 0.647 | 0.644 | 0.635 | 0.585 | 1.000 | 0.474 |
0.578 | 0.513 | 0.547 | 0.513 | 0.505 | 0.518 | 0.466 | 0.524 | 0.474 | 1.000 |
The resulting values of Xr
are:
Code
X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 |
---|---|---|---|---|---|---|---|---|---|
0.673 | 1.507 | 0.776 | 1.038 | 1.997 | 0.557 | 1.759 | 1.578 | 2.529 | 1.326 |
-0.181 | -0.887 | -0.485 | 1.532 | 0.829 | 2.031 | 0.800 | 0.824 | 0.471 | 0.838 |
0.731 | -0.826 | 1.679 | -1.181 | -0.034 | -0.270 | 0.478 | 0.920 | 0.412 | 0.818 |
-0.852 | -0.288 | -0.362 | -0.457 | -1.596 | 0.005 | -0.153 | -1.029 | -1.703 | -0.914 |
0.692 | -0.689 | 0.906 | -1.140 | -0.080 | 0.082 | -0.968 | -2.028 | 0.089 | 0.256 |
-1.123 | -0.810 | -1.242 | -1.438 | -0.850 | -0.394 | -0.601 | -0.458 | 0.080 | -1.393 |
Extract the cases in which all ten firms default at the same time in 5 years and the cases in which any firm default at the same time in 5 years.
Code
Xr <- as_tibble(Xr)
# All firms.
Xr.y1.all <- filter_all(Xr, all_vars(. < x.y[1]))
Xr.y2.all <- filter_all(Xr, all_vars(. < x.y[2]))
Xr.y3.all <- filter_all(Xr, all_vars(. < x.y[3]))
Xr.y4.all <- filter_all(Xr, all_vars(. < x.y[4]))
Xr.y5.all <- filter_all(Xr, all_vars(. < x.y[5]))
# At least one firm.
Xr.y1.any <- filter_all(Xr, any_vars(. < x.y[1]))
Xr.y2.any <- filter_all(Xr, any_vars(. < x.y[2]))
Xr.y3.any <- filter_all(Xr, any_vars(. < x.y[3]))
Xr.y4.any <- filter_all(Xr, any_vars(. < x.y[4]))
Xr.y5.any <- filter_all(Xr, any_vars(. < x.y[5]))
All firms default in 5,935 cases out of 1,000,000.
Let’s see the first 6 cases:
Code
X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 |
---|---|---|---|---|---|---|---|---|---|
-1.357 | -1.241 | -2.165 | -1.486 | -1.820 | -1.181 | -1.816 | -1.146 | -1.823 | -2.462 |
-1.874 | -1.751 | -2.281 | -1.390 | -2.020 | -2.189 | -1.752 | -2.580 | -2.062 | -2.731 |
-1.701 | -2.546 | -1.671 | -1.895 | -1.852 | -1.603 | -1.292 | -2.384 | -1.184 | -1.592 |
-1.443 | -1.452 | -2.499 | -1.561 | -1.166 | -2.392 | -1.658 | -1.369 | -3.522 | -2.372 |
-1.326 | -2.036 | -1.252 | -1.729 | -2.569 | -1.650 | -1.795 | -1.235 | -2.822 | -1.722 |
-2.314 | -2.703 | -2.063 | -3.262 | -1.551 | -2.172 | -2.205 | -3.088 | -2.130 | -1.908 |
Verify that those cases default.
X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 |
---|---|---|---|---|---|---|---|---|---|
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
At least one firm default in 497,987 out of 1,000,000.
X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 |
---|---|---|---|---|---|---|---|---|---|
0.731 | -0.826 | 1.679 | -1.181 | -0.034 | -0.270 | 0.478 | 0.920 | 0.412 | 0.818 |
-0.852 | -0.288 | -0.362 | -0.457 | -1.596 | 0.005 | -0.153 | -1.029 | -1.703 | -0.914 |
0.692 | -0.689 | 0.906 | -1.140 | -0.080 | 0.082 | -0.968 | -2.028 | 0.089 | 0.256 |
-1.123 | -0.810 | -1.242 | -1.438 | -0.850 | -0.394 | -0.601 | -0.458 | 0.080 | -1.393 |
-1.089 | -1.470 | -1.292 | -2.389 | -1.046 | -1.960 | 0.653 | -1.052 | -1.027 | -1.017 |
-1.280 | -0.320 | -2.717 | -1.976 | -1.970 | -1.238 | -0.891 | -2.161 | -0.682 | -0.657 |
X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 |
---|---|---|---|---|---|---|---|---|---|
FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | TRUE | FALSE |
FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE |
TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | TRUE | FALSE | FALSE |
TRUE | FALSE | TRUE | TRUE | TRUE | TRUE | FALSE | TRUE | FALSE | FALSE |
And the corresponding probabilities:
Code
allR <- t(data.frame(nrow(Xr.y1.all), nrow(Xr.y2.all),
nrow(Xr.y3.all), nrow(Xr.y4.all), nrow(Xr.y5.all)))
atleastoneR <- t(data.frame(nrow(Xr.y1.any), nrow(Xr.y2.any),
nrow(Xr.y3.any), nrow(Xr.y4.any), nrow(Xr.y5.any)))
resR <- data.frame(allR / n, atleastoneR / n)
ans <- data.frame(res02, resR)
rownames(ans) <- n.year
colnames(ans) <- c("All (corr=0.2)", "At least one (corr=0.2)",
"All (corr=rand)", "At least one (corr=rand)")
kable(ans, caption = "Probability of default.",
row.names = TRUE)
All (corr=0.2) | At least one (corr=0.2) | All (corr=rand) | At least one (corr=rand) | |
---|---|---|---|---|
year 1 | 0.0e+00 | 0.086835 | 0.000021 | 0.062784 |
year 2 | 0.0e+00 | 0.226086 | 0.000184 | 0.156169 |
year 3 | 0.0e+00 | 0.386264 | 0.000783 | 0.266124 |
year 4 | 1.0e-05 | 0.543684 | 0.002438 | 0.382602 |
year 5 | 7.2e-05 | 0.682440 | 0.006144 | 0.497513 |
4 Credit VaR example 24.7.
This replicates Hull (2015) example 24.7.
This is the Vasicek model (equation 24.10) in a function form.
See if it works. Let’s evaluate equation 24.10 at 99% and 99.9%. The 1-year 99% and 99.9% credit VaR is:
Code
[1] 5.129484
[1] 3.294271
Let’s evaluate the model at all confidence levels (from 0 to 1) simulating 10,000 values.
Now, visually:
Code
hist(Loss, 100, xlim = c(0, 7), xlab = "Losses in millions", main = NULL)
abline(v = CVaR.999, lty = 2, col = "red", lwd = 2)
abline(v = CVaR.99, lty = 2, col = "blue", lwd = 2)
legend("topright", legend = c("1-year 99% credit VaR is 3.294271",
"1-year 99.9% credit VaR is 5.129484"),
col = c("blue", "red"), lwd = 2, lty = 2, bg = "white", cex = 0.8)
Code
dat <- data.frame(Loss, l)
ggplot(dat, aes(x = Loss, fill = "Losses in millions")) +
geom_density(color = "darkblue", fill = "lightblue") +
geom_vline(aes(xintercept = CVaR.999),
color = "red", linetype = "dashed", size = 1) +
geom_vline(aes(xintercept = CVaR.99 ),
color = "blue", linetype = "dashed", size = 1) +
scale_x_continuous(labels = scales::dollar)
Nice.
This document took 40.58 seconds to compile in Quarto version 1.5.54, and R version 4.4.1 (2024-06-14 ucrt).