Credit risk with R

Author

Dr. Martín Lozano.

Modified

January 2, 2024, 23:17:44 pm.

Abstract
This material relies on Hull (2015) credit risk chapters and Lore Dirick credit risk DataCamp course. Some mathematical background is skipped to emphasize the data analysis, model logic, discussion, graphical approach and R coding (R Core Team 2024). As in the philosophy of Knuth (1984), the objective of this document is to explain to human beings what we want a computer to do as literate programming. This is a work in progress and it is under revision.

Back to Quantitative Finance with R Icon

martin-lozano-21818a22 mailto:mlozanoqf@gmail.com martin-lozano-21818a22 https://www.youtube.com/channel/drmartinlozano https://www.youtube.com/channel/drmartinlozano zoom


Introduction.

Credit risk models play a pivotal role in the financial landscape, providing institutions with a systematic framework to assess and manage the risks associated with lending and investment activities. One crucial component of these models is the estimation of the probability of default, which quantifies the likelihood that a borrower will fail to meet their debt obligations. The relevance of credit risk models lies in their ability to enhance decision-making processes by offering a comprehensive understanding of the potential creditworthiness of individuals, companies, or financial instruments. By utilizing statistical methods, historical data, and various financial indicators, these models enable financial institutions to make informed decisions about lending, pricing, and portfolio management. The estimation of the probability of default not only aids in risk assessment but also supports regulatory compliance, allowing institutions to maintain a healthy balance between risk and return in their credit portfolios. In an ever-evolving financial landscape, the continual refinement and application of credit risk models are essential for fostering stability and resilience within the financial system.

1 Loan analysis.

Code
# Logistic models
library(gmodels) # CrossTable()
library(ggplot2)
library(tidyr) # gather()
library(dplyr)
library(pROC) # roc
library(vembedr)

This section relies on the DataCamp course Credit Risk Modeling in R by Lore Dirick. However, we incorporate a slightly different database and an extended analysis.

1.1 Explore the database.

Let’s load the data called loan_data_ARF.rds and then understand its structure before conducting any further analysis. This database is available here.

Code
dat <- readRDS("loan_data_ARF.rds")
str(dat)
'data.frame':   29092 obs. of  10 variables:
 $ loan_status   : int  0 0 0 0 0 0 1 0 1 0 ...
 $ loan_amnt     : int  5000 2400 10000 5000 3000 12000 9000 3000 10000 1000 ...
 $ int_rate      : num  10.6 11 13.5 11 11 ...
 $ grade         : Factor w/ 7 levels "A","B","C","D",..: 2 3 3 1 5 2 3 2 2 4 ...
 $ emp_length    : int  10 25 13 3 9 11 0 3 3 0 ...
 $ home_ownership: Factor w/ 4 levels "MORTGAGE","OTHER",..: 4 4 4 4 4 3 4 4 4 4 ...
 $ annual_inc    : num  24000 12252 49200 36000 48000 ...
 $ age           : int  33 31 24 39 24 28 22 22 28 22 ...
 $ sex           : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 2 2 1 2 ...
 $ region        : Factor w/ 4 levels "E","N","S","W": 1 1 3 3 2 2 2 2 4 1 ...

This could be a typical database taken from any given financial institution like a bank or a firm that uses credit channels to sell their products or services. Here, we have 29,092 observations of 10 variables. Each observation corresponds to the personal and loan characteristics of one loan. An important variable, our dependent variable, is loan_st the value of 0 is no default and the value of 1 is default. A default occurs when a borrower is unable to make timely payments, misses payments, or avoids or stops making payments on interest or principal owed. Then, the definition of default depends on the interests and objectives of the analysis, here we simply classify between default or no default. The variable loan_st is dichotomous or categorical. Here, we are interested to predict whether a new application will default or not.

Clearly, loan_data_ARF.rds is past information as we know with certainty whether the individual defaulted (1) or not (0). Past information is helpful to better understand how likely is that one individual may default according to their personal and loan characteristics. Past information is useful to train our quantitative models to make predictions of new applicants, and even evaluate the performance of our classifications.

A variable name is too long when there exists a shorter name that equally conveys the purpose of the variable. Let’s rename some of them.

Code
old_names <- colnames(dat)
colnames(dat) <- c("loan_st", "l_amnt", "int", "grade", "emp_len", 
                   "home", "income", "age", "sex", "region")
data.frame(old_names, "new_names" = colnames(dat))
        old_names new_names
1     loan_status   loan_st
2       loan_amnt    l_amnt
3        int_rate       int
4           grade     grade
5      emp_length   emp_len
6  home_ownership      home
7      annual_inc    income
8             age       age
9             sex       sex
10         region    region

We can take a look at the information in different ways. For example, look at the first 10 rows out of 29,092.

Code
head(dat, 10)
   loan_st l_amnt   int grade emp_len home income age sex region
1        0   5000 10.65     B      10 RENT  24000  33   0      E
2        0   2400 10.99     C      25 RENT  12252  31   0      E
3        0  10000 13.49     C      13 RENT  49200  24   0      S
4        0   5000 10.99     A       3 RENT  36000  39   0      S
5        0   3000 10.99     E       9 RENT  48000  24   1      N
6        0  12000 12.69     B      11  OWN  75000  28   1      N
7        1   9000 13.49     C       0 RENT  30000  22   1      N
8        0   3000  9.91     B       3 RENT  15000  22   1      N
9        1  10000 10.65     B       3 RENT 100000  28   0      W
10       0   1000 16.29     D       0 RENT  28000  22   1      E

Note that sex is 1 for female and 0 for male. Now, instead of looking the details of the first 10 rows, we can summarize with respect to home with the CrossTable() function.

Code
CrossTable(dat$home)

 
   Cell Contents
|-------------------------|
|                       N |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  29092 

 
          |  MORTGAGE |     OTHER |       OWN |      RENT | 
          |-----------|-----------|-----------|-----------|
          |     12002 |        97 |      2301 |     14692 | 
          |     0.413 |     0.003 |     0.079 |     0.505 | 
          |-----------|-----------|-----------|-----------|



 

We can also use CrossTable() to summarize two variables. In particular, instead of counting for home ownership we can add a second dimension like loan_st. This allows us to create more informative tables.

Code
CrossTable(dat$home, dat$loan_st, prop.r = TRUE,
           prop.c = FALSE, prop.t = FALSE, prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|-------------------------|

 
Total Observations in Table:  29092 

 
             | dat$loan_st 
    dat$home |         0 |         1 | Row Total | 
-------------|-----------|-----------|-----------|
    MORTGAGE |     10821 |      1181 |     12002 | 
             |     0.902 |     0.098 |     0.413 | 
-------------|-----------|-----------|-----------|
       OTHER |        80 |        17 |        97 | 
             |     0.825 |     0.175 |     0.003 | 
-------------|-----------|-----------|-----------|
         OWN |      2049 |       252 |      2301 | 
             |     0.890 |     0.110 |     0.079 | 
-------------|-----------|-----------|-----------|
        RENT |     12915 |      1777 |     14692 | 
             |     0.879 |     0.121 |     0.505 | 
-------------|-----------|-----------|-----------|
Column Total |     25865 |      3227 |     29092 | 
-------------|-----------|-----------|-----------|

 

This table reveals defaults by home ownership. We can use histograms to see one variable distribution. In this case we have the interest rate distribution.

Code
ggplot(dat, aes(x = int)) + 
  geom_histogram(aes(y=..density..), binwidth = 0.5, colour = "black", 
                 fill = "white") +
  labs(y = "Density", x = "Interest rate") +
  theme(legend.position = "bottom", legend.title = element_blank())

Figure 1.1: Interest rate histogram.

The following is a boxplot figure for the annual income.

Code
ggplot(dat, aes(income)) + 
  geom_boxplot() +
  labs(y = "Density",
       x = "Annual income") +
  theme(legend.position = "bottom", legend.title = element_blank())

Figure 1.2: Annual income boxplot.

The plot above looks suspicious. We have a very large value of the annual income in the horizontal axis (6,000,000). Also, there are few individuals with a very high income. We should explore further and investigate if these are valid observations or simply a mistake in the original database.

Code
high_income <- dat[(dat$income > 1000000), ]
high_income
      loan_st l_amnt   int grade emp_len     home  income age sex region
4861        0  12025 14.27     C      13     RENT 1782000  63   0      E
13931       0  10000  6.54     A      16      OWN 1200000  36   0      W
15386       0   1500 10.99     A       5 MORTGAGE 1900000  60   1      N
16713       0  12000  7.51     A       1 MORTGAGE 1200000  32   0      E
19486       0   5000 12.73     C      12 MORTGAGE 6000000 144   1      E
22811       0  10000 10.99     A       1 MORTGAGE 1200000  40   1      E
23361       0   6400  7.40     A       7 MORTGAGE 1440000  44   1      E
23683       0   6600  7.74     A       9 MORTGAGE 1362000  47   0      E
28468       0   8450 12.29     C       0     RENT 2039784  42   0      E

One guy (19486) is not only rich, he is 144 years old. So, my decision is to drop these 9 observations. Cleaning data is a common task when dealing with large databases. This is fine as long as we do not alter the nature of the data.

Code
high_income_index <- data.frame(value = as.integer(rownames(high_income)))
dat <- dat[-high_income_index$value,]

Remember the database has originally 29,092 rows and now we are dropping 9 so we end up with 29,083. See the result.

Code
ggplot(dat, aes(income)) + 
  geom_boxplot() +
  labs(y = "Density", x = "Annual income") +
  theme(legend.position = "bottom", legend.title = element_blank())

Figure 1.3: Annual income without extreme values.

Somewhat better now.

1.2 Logistic models.

Logistic models allows us to make predictions about loan defaults. Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable like loan_st. In this case, the binary dependent variable is default (1) or no default (0). A good reference for this section is Hull (2020).

First, load the data and split it into two sets: (1) training and (2) test. The training set is for building and estimate models, and the test set is used to evaluate our model predictions with new data. When estimating models, it is common practice to separate the available data into two parts, training and test data, where the training data is used to estimate parameters (in-sample) and the test data is used to evaluate its accuracy (out of sample). Because the test data is not used in determining the estimation, it should provide a reliable indication of how well the model is likely to estimate or forecast on new data. In sum, we train the model, we test the model, and once we are OK with the model performance on new data, we are ready to use it in real-life applications. If we ignore this split and use the whole database to estimate our models, we may succeed at explaining defaults in our database but we may fail to explain defaults for new loan applications.

Code
# It is convenient to set the loan status as factor.
dat$loan_st <- as.factor(dat$loan_st)
set.seed(567)
index_train <- cbind(runif(1 : nrow(dat), 0 , 1), c(1 : nrow(dat)))
index_train <- order(index_train[, 1])
index_train <- index_train[1: (2/3 * nrow(dat))]
# Create training set
train <- dat[index_train, ]
# Create test set
test <- dat[-index_train, ]

We have 29,083 observations in dat. The code above randomly selects \(29083 \times (2/3)=19388\) rows to form the train. The test are the remaining \(29083-19388=9695\) rows. The random selection is highly recommended as the dat may have some structure or sorting that could bias our model estimation and negatively impact our model test. For example, imagine that for some weird reason the database is sorted in such a way that the first observations are all no default. If that is the case, then the training and the test set would not have portions of default and no default cases and we may distort the whole analysis. The random selection allows us to replicate a real situation in which our database is unsorted, with different characteristics.

Let’s see if the recent created train and test have the same basic properties than dat.

Code
dat_prop <- table(dat$loan_st)/sum(table(dat$loan_st))
train_prop <- table(train$loan_st)/sum(table(train$loan_st))
test_prop <- table(test$loan_st)/sum(table(test$loan_st))
prop <- data.frame(rbind(dat_prop, train_prop, test_prop))
colnames(prop) <- c("no defaults", "defaults")
prop
           no defaults  defaults
dat_prop     0.8890417 0.1109583
train_prop   0.8882298 0.1117702
test_prop    0.8906653 0.1093347

Take a look of the training set.

Code
# See the data structure.
head(train)
      loan_st l_amnt   int grade emp_len     home income age sex region
21547       0   4000 10.25     B       6     RENT  26000  27   1      N
22625       0   5000 14.26     C       3     RENT 280000  36   1      N
20340       0  18000 18.30     F      10     RENT 121000  24   1      N
1911        0   5600  7.90     A       3     RENT  32000  22   0      W
4021        0   2700 14.27     C       5 MORTGAGE  88500  32   1      N
16887       0  15000 10.38     B      10 MORTGAGE  75000  23   0      W

Variables as factors are useful for model estimation and data visualization. Factors are variables in R which take on a limited number of different values; such variables are often referred to as categorical variables.

Assume we think that the loan_st depends on the age of the individual. We can estimate a simple logistic model to learn about the relationship between age and loan status.

Code
# Fitting a simple logistic model.
logi_age <- glm(loan_st ~ age, family = "binomial", data = train)
logi_age

Call:  glm(formula = loan_st ~ age, family = "binomial", data = train)

Coefficients:
(Intercept)          age  
   -1.90097     -0.00623  

Degrees of Freedom: 19387 Total (i.e. Null);  19386 Residual
Null Deviance:      13580 
Residual Deviance: 13580    AIC: 13580

Apparently, there is a negative relationship between age and loan status. The AIC value (13,580) is useful when comparing models. The Akaike information criterion (AIC) is a mathematical method for evaluating how well a model fits the data it was generated from. In statistics, AIC is used to compare different possible models and determine which one is the best fit for the data. At the moment we cannot interpret the AIC simply because we only have one model and we cannot compare it with another AIC.

Let’s estimate another simple model where the interest rate category is used as a predictor of the loan_st. Remember we are not conducting any prediction at all at this moment, we are only estimating models using the training set.

Code
# Build a glm model with variable interest rate as a predictor.
logi_int <- glm(formula = loan_st ~ int, family = "binomial", data = train)
# Print the parameter estimates.
logi_int

Call:  glm(formula = loan_st ~ int, family = "binomial", data = train)

Coefficients:
(Intercept)          int  
     -3.710        0.142  

Degrees of Freedom: 19387 Total (i.e. Null);  19386 Residual
Null Deviance:      13580 
Residual Deviance: 13210    AIC: 13220

The AIC is a lower (13,220 versus 13,580), so we have a better model now.

Using one single predictor as age or interest rate is clearly a limited approach. Let’s add some more predictors. Also, let’s introduce the summary() function to extract more information about the model estimation results. The logi_multi below assumes that the loan status depend on the age, interest rate, grade, loan amount, and annual income.

Code
# Multiple variables in a logistic regression model.
logi_multi <- glm(loan_st ~ age + int + grade + log(l_amnt) + 
                  log(income) , family = "binomial", data = train)
# Obtain significance levels using summary().
summary(logi_multi)

Call:
glm(formula = loan_st ~ age + int + grade + log(l_amnt) + log(income), 
    family = "binomial", data = train)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.996240   0.477911   4.177 2.95e-05 ***
age         -0.002302   0.003825  -0.602   0.5473    
int          0.038767   0.017249   2.247   0.0246 *  
gradeB       0.503409   0.087435   5.758 8.54e-09 ***
gradeC       0.748229   0.117765   6.354 2.10e-10 ***
gradeD       0.964343   0.147283   6.548 5.85e-11 ***
gradeE       1.033442   0.190817   5.416 6.10e-08 ***
gradeF       1.619470   0.257900   6.279 3.40e-10 ***
gradeG       1.867494   0.440232   4.242 2.21e-05 ***
log(l_amnt)  0.015718   0.036341   0.433   0.6654    
log(income) -0.470748   0.046423 -10.140  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 13579  on 19387  degrees of freedom
Residual deviance: 13028  on 19377  degrees of freedom
AIC: 13050

Number of Fisher Scoring iterations: 5

Our multi-factor model works well. In logi_multi, the AIC value is the lowest so far (13,050 versus 13,220), so this should be considered as the best in-sample model at the moment. The summary() function shows the significance levels of the estimators, but we are currently more interested in the goodness of fit of the models because we want to conduct predictions about the loan_st. This is, we are interested to use a model to find out whether new applicants in the test set are expected to default or not, rather than in the applicants’ credit risk factors. This is why we are concentrated in AIC now.

When a customer fill out a credit application form, we collect information but we do not know for sure whether she or he will eventually default. A credit risk model can help us in this task.

1.3 Prediction and model evaluation.

Let’s take our three models: logi_age, logi_int and logi_multi from the previous subsection to carry out a simple prediction exercise. We start by identifying one observation in the test set and ask the models to predict the loan_st. This is, we take the first guy age, then we apply the logi_age model, and compare the predicted loan_st with respect to what really happened. Remember we know what really happened with this guy because we have the information in the test set. Every model is expected to produce different loan_st predictions. If the model is good, then the predicted loan_st will match what really happened.

Code
# Define one single observation in test_set.
John_Doe <- as.data.frame(test[1, ])
John_Doe
  loan_st l_amnt   int grade emp_len home income age sex region
1       0   5000 10.65     B      10 RENT  24000  33   0      E

We know in advance that the loan_st of this observation taken from the test set is 0. However, the models cannot know this simply because we did not use the test set to estimate the logistic models. Our models were estimated using the training set. A good credit risk model should predict a no default given this new applicant.

The values of loan_st in the test set is either 0 or 1. However, the logistic models estimate the loan_st as values in the range of 0 to 1. This mean that we would expect the estimated loan_st to be close to 0. But, how close? We will deal with this issue later.

Code
# Predict the loan status.
logi_age_pred <- predict(logi_age, newdata = John_Doe, type = "response")
logi_int_pred <- predict(logi_int, newdata = John_Doe, type = "response")
logi_multi_pred <- predict(logi_multi, newdata = John_Doe, type = "response")
# Collect all.
pred_John <- rbind("logi_age" = logi_age_pred, 
                     "logi_int" = logi_int_pred, 
                     "logi_multi" = logi_multi_pred)
# Prepare a table.
colnames(pred_John) <- "Loan status predictions for John Doe."
pred_John
           Loan status predictions for John Doe.
logi_age                              0.10846236
logi_int                              0.09996702
logi_multi                            0.14461840

These values are low as they are close to 0. We could interpret this as a certain ability of the models to predict this single case from the test set. However, several questions remains unanswered and requires further analysis. For example: How can we determine if the prediction is low enough to consider it as a non-default? We may need a cut-off value to decide. We will explore this issue later.

Another aspect is: What about the rest of the cases in the test set? We have 9,695 observations in the test set and in the example above we only test for the first one. We are interested in the entire test set, not only for John Doe. Fortunately, this issue is easy to address as we only need to change the newdata parameter in the predict() function. In particular, instead of newdata = John_Doe, which is one observation, and can change it to newdata = test, which is the entire 9,695 test set.

Code
# Predict the loan status with the three models.
pred_logi_age <- predict(logi_age, newdata = test, type = "response")
pred_logi_int <- predict(logi_int, newdata = test, type = "response")
pred_logi_multi <- predict(logi_multi, newdata = test, 
                           type = "response")

pred_range <- rbind("logi_age" = range(pred_logi_age), 
                     "logi_int" = range(pred_logi_int),
                     "logi_multi" = range(pred_logi_multi))
aic <- rbind(logi_age$aic, logi_int$aic, logi_multi$aic)
pred_range <- cbind(pred_range, aic)
colnames(pred_range) <- c("min(loan_st)", "max(loan_st)", "AIC")
pred_range
           min(loan_st) max(loan_st)      AIC
logi_age     0.08417892    0.1165453 13580.65
logi_int     0.05019756    0.3982977 13215.61
logi_multi   0.01739107    0.4668004 13050.30

Now, instead of the prediction of one single applicant we have conducted a prediction of all 9,695 applicants in the test set. The lower value column corresponds to the lower predicted loan_st for each model. The logistic models produce values in the range of zero to one, and in this case the ranges are rather narrow.

Narrow ranges (the difference between higher and lower loan_st predicted values) could be problematic because the model could not be able to discriminate between defaults (predictions closer to 1) and no-defaults (predictions closer to 0). The higher AIC corresponds to the worst in-sample model and the lower AIC to the best in-sample model. Here, we can see some in and out of sample consistency because the best model according to the AIC, corresponds to the model with the higher prediction range.

Let’s explore all the predicted loan_st values for the logi_age:

Code
ggplot(data.frame(pred_logi_age), aes(x = pred_logi_age)) + 
  geom_density(fill = "red") +
  labs(y = "Density", x = "Default prediction") +
  theme(legend.position = "bottom", legend.title = element_blank())

Figure 1.4: Age model prediction histogram.

The logi_age fails to predict values ranging from 0 to 1. In fact, these values are quite concentrated in a very small range of values. As a consequence, this model fails to differentiate between default and no default predictions.

Let’s visualize the predictions of the logi_int and logi_multi. We first collect all predictions in a single data frame just for convenience.

Code
pred_logi <- data.frame(cbind(pred_logi_age, pred_logi_int,
                                  pred_logi_multi))
pred_logi <- gather(pred_logi, key = "model",  value = "pred")

Now we plot the logi_int and logi_multi.

Code
ggplot(pred_logi[pred_logi$model != "pred_logi_age",], 
       aes(x = pred, fill = model)) + 
  geom_density(alpha = 0.4) +
  labs(y = "Density", x = "Default prediction") +
  theme(legend.position = "bottom", legend.title = element_blank())

Figure 1.5: Interest rate and multi models predictions histograms.

Let’s add the age model as well.

Code
ggplot(pred_logi, aes(x = pred, y = model, fill = model)) + 
  geom_boxplot() +
  labs(y = "Model", x = "Default prediction") +
  theme(legend.position = "none", legend.title = element_blank())

Figure 1.6: Age, Interest rate and multi models predictions boxplot.

Presumably, a model which considers all available predictors could be better for predicting the loan_st.

Code
# Logistic regression model using all available predictors in the data set.
logi_full <- glm(loan_st ~ age + int + grade + log(l_amnt) + 
                   log(income) + emp_len + home + sex +
                   region, family = "binomial", data = train)
# Loan status predictions for all test set elements.
pred_logi_full <- predict(logi_full, newdata = test, type = "response")
# Look at the predictions range.
range(pred_logi_full)
[1] 1.422469e-09 8.544424e-01

Now, the pred_logi_full prediction range is wider. A wider range means that the loan_st predictions are now closer to 1. This is good because we need the model to be able to predict both no-defaults (0) and defaults (1). Let’s see a prediction comparison with respect to the rest of the models.

Code
pred_range <- rbind("logi_age" = range(pred_logi_age), 
                     "logi_int" = range(pred_logi_int),
                     "logi_multi" = range(pred_logi_multi),
                     "logi_full" = range(pred_logi_full))
aic <- rbind(logi_age$aic, logi_int$aic, logi_multi$aic, logi_full$aic)
pred_range <- cbind(pred_range, aic)
colnames(pred_range) <- c("min(loan_st)", "max(loan_st)", "AIC")
pred_range
           min(loan_st) max(loan_st)      AIC
logi_age   8.417892e-02    0.1165453 13580.65
logi_int   5.019756e-02    0.3982977 13215.61
logi_multi 1.739107e-02    0.4668004 13050.30
logi_full  1.422469e-09    0.8544424 10763.67
Code
pred_logi <- data.frame(cbind(pred_logi_age, pred_logi_int,
                                  pred_logi_multi, pred_logi_full))
pred_logi <- gather(pred_logi, key = "model",  value = "pred")

A graphical comparison of the new pred_logi_full and pred_logi_multi.

Code
ggplot(pred_logi[pred_logi$model != "pred_logi_age" &
                  pred_logi$model != "pred_logi_int" ,], 
       aes(x = pred, fill = model)) + 
  geom_density(alpha = 0.4) +
  labs(y = "Density", x = "Default prediction") +
  theme(legend.position = "bottom", legend.title = element_blank())

Figure 1.7: Multi and full models predictions histograms.

And all of them together.

Code
ggplot(pred_logi, aes(x = pred, y = model, fill = model)) + 
  geom_boxplot() +
  labs(y = "Model", x = "Default prediction") +
  theme(legend.position = "none", legend.title = element_blank())

Figure 1.8: Age, Interest rate, multi and full models predictions boxplot.

The logi_full model predictions looks better than the other models.

Another question is: How can we know these model predictions corresponds to a default or no default? The loan status predictions go from 0 to 0.854 for the case of the logi_full model. At the end, these loan status estimations need to be interpreted or classified as a default or no default because we are interested on that. Are they closer enough to 0 to consider a no default? This issue is addressed by setting up a cut-off rate so we can split all estimated loan status into 0 or 1.

Let’s arbitrarily consider a cutoff of 0.15 for now. This means that every estimated loan status below 0.15 will be considered as 0 (no-default), and every estimated loan status above 0.15 will be considered as 1 (default). This classification rule can be used by a financial firm to decide whether to accept or reject new loan applications by rejecting loan applications with an estimated loan status above 0.15, and accepting those with an estimated loan status below 0.15. The accept/reject loan decision now depends on a model estimation. However, this simple approach could be controversial as managers may be interested in allocating more loans, not less. An alternative that meets credit risk and managers interests is to create several cutoff ranges instead of a single cutoff of 0.15. For example, the financial firm may accept loan applications with an estimated loan status between 0.15 and 0.25 with a higher interest rate to compensate for the additional credit risk. On the other hand, we may consider a different view by incorporating a social criterion to help those people who are naturally excluded by the financial industry given their economic condition. If this is the case, then, we may be interested to identify those applicants with a lower estimated loan status.

For now, consider that every estimated loan status below 0.15 will be considered as 0 (no-default), and every estimated loan status above 0.15 will be considered as 1 (default). Graphically it looks like this:

Code
ggplot(pred_logi[pred_logi$model == "pred_logi_full",], 
       aes(x = pred, fill = model)) + 
  geom_density(alpha = 0.4) +
  geom_vline(xintercept = 0.15, linetype = "longdash") +
  labs(y = "Density", x = "Default prediction") +
  theme(legend.position = "none", legend.title = element_blank())

Figure 1.9: Full models predictions histogram and cut-off.

Here, predicted loan_st values at the left of the dashed line represent no default predictions and values at the right of the dashed line represent default predictions. Let’s set up the rule to convert the estimated loan status into a binary (0 or 1) variable. See how this transformation takes place:

Code
# Make a binary predictions-vector using a cut-off of 15%
pred_cutoff_15 <- ifelse(pred_logi_full > 0.15, 1, 0)
head(cbind(pred_logi_full, "rounded" = round(pred_logi_full, 4), pred_cutoff_15))
   pred_logi_full rounded pred_cutoff_15
1    1.357995e-08  0.0000              0
2    2.986278e-08  0.0000              0
18   2.299645e-02  0.0230              0
26   2.403879e-01  0.2404              1
27   1.778986e-01  0.1779              1
28   2.600172e-02  0.0260              0

These are only the first 6 rows in the test set. We can see that the rule works as expected because every estimated loan status below 0.15 is now considered as 0 (no-default), and every estimated loan status above 0.15 is considered as 1 (default). The table above shows how we can create this binary variable given the logistic model prediction.

Note that the rows numbers in the table above are 1, 2, 18, 26, 27 and 28. These are not 1, 2, 3, 4, 5 and 6 because the test rows were selected randomly out of the dat. So, the row numbers in the table above correspond to the original place in dat.

Is logi_full a good model after all? We can add a new column to the previous table. This new variable represents what really happened with the loan. Then, the first column is the logistic model prediction, the second column the transformed binary variable given a cutoff of 0.15, and the third column is what actually happened (default or no-default). Let’s take a sample of 10 observations to conduct the comparison easily. Note that the model correctly predicts a no default in most cases. Rows 308 and 329 predict a default incorrectly and row 323 predict a default correctly.

Code
# Let's take from rows 101 to 110.
(cbind(pred_logi_full, pred_cutoff_15, 
           as.data.frame.numeric(test$loan_st)))[101:110,]
    pred_logi_full pred_cutoff_15 test$loan_st
308   1.537106e-01              1            0
309   4.137654e-02              0            0
310   3.113926e-02              0            0
312   5.186962e-09              0            0
318   5.337092e-02              0            0
319   8.402239e-02              0            0
323   2.795535e-01              1            1
328   5.384112e-09              0            0
329   1.893745e-01              1            0
330   9.012136e-03              0            0

There is an easy way to evaluate the rest of the cases in the test set using a simple table called confusion matrix.

Code
# Construct a confusion matrix.
table(test$loan_st, pred_cutoff_15)
   pred_cutoff_15
       0    1
  0 6554 2081
  1  308  752

The logi_full model predicts 6,554 of no-defaults correctly and 752 defaults correctly. However, the model predicts 2,081 defaults that are in fact no-defaults and 308 no-defaults that are in fact defaults. How good are these results? Which of these four values is more important? These are questions we address later. For now, we can say that different models and different cut-off rates lead to different confusion matrix results. Please note that adding all these values leads to 9,695 which are the number of observations in the test set.

You may want to see relative and not absolute values. Let’s try the CrossTable() function instead.

Code
CrossTable(test$loan_st, pred_cutoff_15, prop.r = TRUE,
           prop.c = FALSE, prop.t = FALSE, prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|-------------------------|

 
Total Observations in Table:  9695 

 
             | pred_cutoff_15 
test$loan_st |         0 |         1 | Row Total | 
-------------|-----------|-----------|-----------|
           0 |      6554 |      2081 |      8635 | 
             |     0.759 |     0.241 |     0.891 | 
-------------|-----------|-----------|-----------|
           1 |       308 |       752 |      1060 | 
             |     0.291 |     0.709 |     0.109 | 
-------------|-----------|-----------|-----------|
Column Total |      6862 |      2833 |      9695 | 
-------------|-----------|-----------|-----------|

 

This table is more informative. The logi_full model correctly predicts no-defaults in 75.9% of all the actual no-default cases. In other words, the model fails to predict no-default in 24.1% of the total no-default cases. Now the default. The model correctly predicts default in 70.9% of all the actual default cases (not bad), and fails to predict default in 29.1% of the total default cases.

Instead of arbitrarily consider a cutoff of 0.15, we can follow a different approach. Now consider that we are interested in taking the 20% highest estimates (closer to 1) of pred_logi_full as default. Equivalently, this is to take the lowest 80% pred_logi_full estimates (closer to 0) as no-default. Let’s calculate the new cut-off that meets this new criterion.

Code
# Cutoff definition.
cutoff <- quantile(pred_logi_full, 0.8)
cutoff
      80% 
0.1994621 

This new approach (taking the 20% highest estimates of pred_logi_full as default) represents a cut-off of 0.1994621. Graphically:

Code
ggplot(pred_logi[pred_logi$model == "pred_logi_full",], 
       aes(x = pred, fill = model)) + 
  geom_density(alpha = 0.4) +
  geom_vline(xintercept = cutoff, linetype = "longdash") +
  labs(y = "Density", x = "Default prediction") +
  theme(legend.position = "none", legend.title = element_blank())

Figure 1.10: Full model prediction histogram new cutoff of 0.1994621.

Now the cut-off is 0.1994621. This splits the loan status predictions into two parts: higher than the cut-off is a default, and lower than the cutoff is a no-default. Taking the lowest 80% estimates (closer to 0) as no-default is an arbitrary decision. Here are the cut-off values depending on this arbitrary decision.

Code
cutoff_all <- quantile(pred_logi_full, seq(0.1, 1, 0.1))
data.frame("cut_off" = round(cutoff_all,5))
     cut_off
10%  0.00000
20%  0.01469
30%  0.02410
40%  0.03751
50%  0.06184
60%  0.09617
70%  0.14673
80%  0.19946
90%  0.29227
100% 0.85444

We can show a similar summary table as we did with the cutoff of 0.15. Here, we show the predictive ability of the logi_full model and new cut-off of 0.1994621.

Code
# Calculate the predictions with the same model and new cutoff.
pred_full_20 <- ifelse(pred_logi_full > cutoff, 1, 0)
# Show results in a confusion matrix.
CrossTable(test$loan_st, pred_full_20, prop.r = TRUE,
           prop.c = FALSE, prop.t = FALSE, prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|-------------------------|

 
Total Observations in Table:  9695 

 
             | pred_full_20 
test$loan_st |         0 |         1 | Row Total | 
-------------|-----------|-----------|-----------|
           0 |      7309 |      1326 |      8635 | 
             |     0.846 |     0.154 |     0.891 | 
-------------|-----------|-----------|-----------|
           1 |       447 |       613 |      1060 | 
             |     0.422 |     0.578 |     0.109 | 
-------------|-----------|-----------|-----------|
Column Total |      7756 |      1939 |      9695 | 
-------------|-----------|-----------|-----------|

 

With a cutoff of 0.1994621 we accept 7,309 + 447 = 7,756 applications as those are the ones that the model predicts a no default. We can compare both confusion matrix:

Code
cat <- c("correct no-default", "false default", 
         "false no-default", "correct default")
cut_15 <- c(0.759, 0.241, 0.291, 0.709)
cut_1994621 <- c(0.846, 0.154, 0.422, 0.578)
cbind("classification" = cat, cut_15, cut_1994621)
     classification       cut_15  cut_1994621
[1,] "correct no-default" "0.759" "0.846"    
[2,] "false default"      "0.241" "0.154"    
[3,] "false no-default"   "0.291" "0.422"    
[4,] "correct default"    "0.709" "0.578"    

The new cut-off of 0.1994621 improves the identification of no-defaults but worsen the identification of default. Also, the new cut-off fails less in the default and fails more in the no-default. Apparently, there is some sort of trade-off here.

We can also look the detail of 0.1994621 cut-off. Comparing two columns, the one in the left with the actual loan status, and the right column with the estimated loan status.

Code
# Comparative table in detail.
real_pred_20 <- cbind.data.frame(test$loan_st, pred_full_20,
                     "Did the model succeed?" = test$loan_st==pred_full_20)
# Show some values.
real_pred_20[131:140,]
    test$loan_st pred_full_20 Did the model succeed?
398            0            0                   TRUE
399            1            1                   TRUE
404            0            1                  FALSE
414            0            0                   TRUE
417            1            1                   TRUE
419            0            0                   TRUE
420            0            0                   TRUE
425            1            1                   TRUE
431            0            0                   TRUE
435            0            0                   TRUE

In this sample the model fails in 1 out of 10 individuals. Not bad at all. Let’s imagine we are a bank. We have a total of 9,695 applications for a loan in our desk (or computer). Assume we use the predictions from pred_full_20 to decide whether we accept a loan application or not. Our model-based acceptance rule is the following: if pred_full_20 = 0 then the model estimates a no-default and we accept the loan application. According to the extract of table above, we fortunately reject application 399, 417 and 425 because that was indeed a default. However, we reject application 404 incorrectly because it did not default. In principle, having a model as a base for a decision rule can lead to better results that guessing or a random approval rule.

Let’s count how many applications are accepted and rejected according to our rule.

Code
# Accepted.
accept_20 <- sum(pred_full_20 == 0)
# Rejected.
reject_20 <- sum(pred_full_20 == 1)
data.frame("total" = length(pred_full_20), accept_20, reject_20)
  total accept_20 reject_20
1  9695      7756      1939

Taking the 20% highest estimates of pred_logi_full as default and the lowest 80% pred_logi_full as no-default mean that by construction, we accept 7,756 loan applications (80% of the total) and reject 1,939 (20% of the total). Then, the criterion determines the number of accepted applications, and the model determines which applications to accept/reject.

We can evaluate our loan accept/reject rule as we have the corresponding real values in loan_st. First, let’s illustrate the decision making process given the model estimates.

Code
# First 10 accept decisions.
head(data.frame(real_pred_20[,1:2], 
                decision = ifelse(real_pred_20$pred_full_20 == 0, 
                                  "accept", "reject")), 12)
   test.loan_st pred_full_20 decision
1             0            0   accept
2             0            0   accept
18            1            0   accept
26            0            1   reject
27            0            0   accept
28            0            0   accept
30            0            0   accept
32            0            0   accept
34            0            1   reject
35            0            0   accept
36            0            0   accept
37            1            0   accept

We can add an evaluation column.

Code
# First 10 accept decisions.
head(data.frame(real_pred_20[,1:2], 
                decision = ifelse(real_pred_20$pred_full_20 == 0, 
                                  "accept", "reject"),
                evaluation = ifelse(real_pred_20$pred_full_20 == 1, 
                                    "we rejected a good customer", 
 ifelse(real_pred_20$pred_full_20 == 0 & real_pred_20$`test$loan_st` == 0,
        "good decision", "bad decision"))), 12)
   test.loan_st pred_full_20 decision                  evaluation
1             0            0   accept               good decision
2             0            0   accept               good decision
18            1            0   accept                bad decision
26            0            1   reject we rejected a good customer
27            0            0   accept               good decision
28            0            0   accept               good decision
30            0            0   accept               good decision
32            0            0   accept               good decision
34            0            1   reject we rejected a good customer
35            0            0   accept               good decision
36            0            0   accept               good decision
37            1            0   accept                bad decision

In the table above, we have the first 12 loan applications. According to our rule, we accept 10 applications and we reject 2 (application #26 and #34). A good decision is because we accept the loan that did not default. A bad decision is because we accept the loan application and defaulted. We also have some cases in which we rejected a good customer and that is not good. This table above is interesting although a bit problematic as it incorporates a counterfactual approach. In particular, we are evaluating the cases in which we reject the application. A more pragmatic approach is to evaluate our rule according to the cases in which we actually accept the loan application. This is the basically a bad rate measure: how many accepted loan applications default?

Remember we accepted 10 and rejected 2 loans. We can identify who default.

Code
# We accept loans that the model predicts a no-default (0).
# In "accepted_loans" we know whether the accepted loans are in fact
# default or no-default.
accepted_loans <- real_pred_20[pred_full_20 == 0, 1]
# The code above says: if we accept the application, tell me what happened.
head(accepted_loans, 10)
 [1] 0 0 1 0 0 0 0 0 0 1
Levels: 0 1

Note that the third and the tenth application default. These are applications #18 and #37 as expected. Now let’s evaluate not 12 applications but all of them, which are 7,756. The bad rate is now expressed as a percentage of the total:

Code
# bad_rate is the proportion of accepted loans that are in fact default.
bad_rate <- sum(accepted_loans == 1)/length(accepted_loans)
bad_rate
[1] 0.0576328

This is, by following the model-based rule, we accepted 7,756 loan applications that represent 80% of the total applications. However, 5.76% of those accepted applications were in fact a default. In particular, we accepted 447 loans that are defaults so: \(447/7756=0.0576328\).

Models are not perfect but we are always interested to find out a good model that leads to a lower bad rate because (in principle) we do not want to accept many defaults. If we keep the same model, we could reduce this 5.76% by being more strict in the loan application which in simple terms mean to reduce the acceptance rate. This alternative could be controversial as a lower acceptance rate represents lower income (less customers) for the bank or financial firm. In any case, consider we reduce the acceptance rate from 80% to 65% so we can evaluate the impact over the bad rate.

Code
# New cutoff value.
cutoff <- quantile(pred_logi_full, 0.65)
# Split the pred_logi_full into a binary variable.
pred_full_35 <- ifelse(pred_logi_full > cutoff, 1, 0)
# A data frame with real and predicted loan status.
real_pred_35 <- cbind.data.frame(test$loan_st, pred_full_35)
# Loans that we accept given these new rules.
accepted_loans <- real_pred_35[pred_full_35 == 0, 1]
# Bad rate (accepted loan applications that are defaults).
bad_rate <- sum(accepted_loans == 1)/length(accepted_loans)
# Show the bad rate.
bad_rate
[1] 0.03713107

As expected, the bad rate is lower (from 5.76%% to 3.71%). This is, the lower the acceptance rate the lower the bad rate. In the extreme, if we accept 0 loan applications then our bad rate would be zero, but doing so is equivalent as going out of business. We can create a function such that given a vector of prediction of loan status we can return the bad rate for different cutoff values. This could be useful to build the bank strategy. This function will reveal the trade-off between the acceptance rate and the bad rate. In particular, the lower the acceptance rate, the lower the income (bad thing) and the lower the bad rate (good thing). So, which combination is the optimal?

1.4 The bank strategy.

A bank could be interested to understand the relationship between the acceptance rate and the bad rate given a model that predicts the loan status.

Code
# Function.
bank <- function(prob_of_def){
  cutoff <- rep(NA, 21)
  bad_rate <- rep(NA, 21)
  accept_rate <- seq(1, 0, by = -0.05)
  for (i in 1:21){
    cutoff[i] <- quantile(prob_of_def, accept_rate[i])
    pred_i <- ifelse(prob_of_def > cutoff[i], 1, 0)
    pred_as_good <- test$loan_st[pred_i == 0]
    bad_rate[i] <- sum(pred_as_good == 1)/length(pred_as_good)}
  table <- cbind(accept_rate, cutoff = round(cutoff, 4), 
                 bad_rate = round(bad_rate, 4))
  return(list(table = table, bad_rate = bad_rate, 
              accept_rate = accept_rate, cutoff = cutoff))
  }

We can evaluate this function for the logi_full, and a bad model like the logi_age. In principle, we expect the logi_full model to have a more attractive relationship between the acceptance rate and the bad rate. This is, lower bad rates for a given acceptance rate. Any financial institution could be interested in increasing the acceptance rate without increasing too much the bad rate.

Let’s apply the function to the pred_logi_full and the logi_age.

Code
# Apply the bank function.
bank_logi_full <- bank(pred_logi_full)
bank_logi_age <- bank(pred_logi_age)

data.frame(accept_rate = bank_logi_age$accept_rate,
           "Bad_model_bad_rate" = bank_logi_age$bad_rate, 
           "Good_model_bad_rate)" = bank_logi_full$bad_rate)
   accept_rate Bad_model_bad_rate Good_model_bad_rate.
1         1.00         0.10933471          0.109334709
2         0.95         0.10849715          0.090662324
3         0.90         0.10849715          0.076790831
4         0.85         0.10849715          0.066626214
5         0.80         0.10547397          0.057632800
6         0.75         0.10547397          0.050612020
7         0.70         0.10396251          0.043619216
8         0.65         0.10396251          0.037131070
9         0.60         0.10092961          0.029396596
10        0.55         0.10092961          0.023443361
11        0.50         0.09933775          0.021039604
12        0.45         0.10206865          0.018565207
13        0.40         0.10206865          0.015471893
14        0.35         0.09907039          0.011788977
15        0.30         0.09972041          0.009281540
16        0.25         0.10343554          0.005363036
17        0.20         0.09722922          0.003610108
18        0.15         0.09360190          0.001374570
19        0.10         0.10125362          0.000000000
20        0.05         0.10516605          0.000000000
21        0.00         0.00000000          0.000000000

The full model is superior because at any acceptance rate we can reach a lower bad rate. A plot can reveal the main differences of these two models: logi_full and logi_age.

Code
# Plot the strategy functions
par(mfrow = c(1, 2))
plot(bank_logi_full$accept_rate, bank_logi_full$bad_rate, 
     type = "l", xlab = "Acceptance rate", ylab = "Bad rate", 
     lwd = 2, main = "logi_full")
abline(v = bank_logi_full[["accept_rate"]][8], lty = 2)
abline(h = bank_logi_full[["bad_rate"]][8], lty = 2)
abline(v = bank_logi_full[["accept_rate"]][5], lty = 2, col = "red")
abline(h = bank_logi_full[["bad_rate"]][5], lty = 2, col = "red")
plot(bank_logi_age$accept_rate, bank_logi_age$bad_rate,
     type = "l", xlab = "Acceptance rate", 
     ylab = "Bad rate", lwd = 2, main = "logi_age")
abline(v = bank_logi_age[["accept_rate"]][8], lty = 2)
abline(h = bank_logi_age[["bad_rate"]][8], lty = 2)
abline(v = bank_logi_age[["accept_rate"]][5], lty = 2, col = "red")
abline(h = bank_logi_age[["bad_rate"]][5], lty = 2, col = "red")

Figure 1.11: A good and a bad model.

The logi_full model is better because for any acceptance rate we can reach a lower bad rate compared with the logi_age. This is because the logi_full model can identify defaults and no defaults with higher precision compared with the logi_age. The value of a good model is that it can help us to make better business decisions, in this case better credit evaluation decisions.

The model ability to predict defaults and no defaults can be measured by the AUC. The AUC can be defined as the probability that the fit model will score a randomly drawn positive sample higher than a randomly drawn negative sample. AUC stands for area under the curve in the following context:

Code
ROC_logi_full <- roc(test$loan_st, pred_logi_full)
ROC_logi_age <- roc(test$loan_st, pred_logi_age)
# Draw the ROCs on one plot
plot(ROC_logi_full, col = "red")
lines(ROC_logi_age, col = "blue")

Figure 1.12: Full model in red, age model in blue.

Sensitivity is the model ability to correctly identify defaults, these are known as true positive. Specificity is the model ability to correctly identify no-default loans, these are known as true negative.

As expected, the area under the curve (AUC) is higher for the red line which corresponds to the logi_full model. We can calculate the exact values:

Code
# Compute the AUCs:
auc(ROC_logi_full)
Area under the curve: 0.8213
Code
auc(ROC_logi_age)
Area under the curve: 0.5301

Note that the logi_age has an AUC of 0.5301. This is close to a loan approval process in which we randomly accept and reject with no further analysis. In other words, the logi_age is so bad that it is almost equivalent as using no model at all and accept and reject loan applications based on a random rule. A pure-random approval rule would look like this:

Code
set.seed(2020)
pred_rand_model <- runif(length(pred_logi_age))
ROC_rand <- roc(test$loan_st, pred_rand_model)
# Draw the ROCs on one plot
plot(ROC_rand, col = "orange")
auc(ROC_rand)
Area under the curve: 0.5105

Figure 1.13: Random rule model.

In theory, this random evaluation process would lead to an AUC of \(0.5 = (1 \times 1)/2\). In contrast, now imagine we have a perfect model:

Code
pred_perfect_model <- as.numeric(test$loan_st)
ROC_perfect <- roc(test$loan_st, pred_perfect_model)
plot(ROC_perfect)
auc(ROC_perfect)
Area under the curve: 1

Figure 1.14: Perfect model.

In a perfect model, the AUC is equal to 1 (\(1 \times 1\)). The model correctly identify defaults (100% sensitivity) and at the same time the model correctly identify no-defaults (100% specificity).

Edward Malthouse explains these concepts quite well:

Code
embed_url("https://youtu.be/HljSmQhLs8M")
Code
library(knitr)
library(plotly)
library(Sim.DiffProc)
library(scatterplot3d)
library(vembedr)

2 The Merton model.

The Merton model is useful when we are interested in evaluating the credit risk of public firms. In short, it evaluates how likely it is that the value of the future firm’s assets fall below a certain threshold represented by the firm’s debt. By doing that, the model is able to estimate a probability of default among other results. The model can be used as a tool to propose changes (for example) in the balance sheet to reduce the credit risk exposure of a firm.

The philosophy of these models goes back to Black and Scholes (1973), Merton (1973) and Merton (1974), they consider corporate liabilities as contingent claims on the assets of the firm.

2.1 Minimize a function in R.

The estimation of the credit risk Merton’s model requires a minimization of a function. In this section, we show a simple minimization example. Here are some useful online resources for further references.

The objective function to be minimized is \(f(a,b)=\sum(a + bx_i-y_i)^2\). We have six values for \(x\) and \(y\), and we are expected to minimize the function \(f\) by finding the parameter values \(a\) and \(b\). The \(x_i\) and \(y_i\) data looks like this:

Code
# Data.
dat <- data.frame(x = c(1, 2, 3, 4, 5, 6), y = c(1, 3, 5, 6, 8, 12))
kable(dat, caption = "Observations.", row.names = TRUE) # The table.
Observations.
x y
1 1 1
2 2 3
3 3 5
4 4 6
5 5 8
6 6 12

Graphically:

Code
# A scatter plot.
plot(y ~ x, data = dat, pch = 19, cex = 2)

Figure 2.1: Original data.

Let’s implement the \(f(a,b)=\sum(a + bx_i-y_i)^2\) minimization.

Code
# Function that calculates the residual sum of squares.
min.RSS <- function(data, par) {
  with(data, sum((par[1] + par[2] * x - y)^2)) }
# Function minimization.
result <- optim(par = c(0, 1), min.RSS, data = dat)
a <- result$par[1]
b <- result$par[2]
f <- sum((a + b * dat$x - dat$y)^2)
# Minimization output.
ans <- data.frame(a, b, f)
kable(ans, caption = "Minimization results f(a,b).")
Minimization results f(a,b).
a b f
-1.266846 2.02862 2.819048

The minimization result is represented by a line, and the line is characterized by an intercept equal to \(-1.266846\) and a slope equal to \(2.02862\).

We can plot it and verify that this is indeed the right answer.

Code
# View the results.
plot(y ~ x, data = dat, pch = 19, ylim = c(-2, 12), xlim = c(0, 6), cex = 2)
abline(a, b, col = "red", lwd = 3)
abline(h = 0, lty = 2)
abline(v = 0, lty = 2)

Figure 2.2: Line that minimize the residual sum of squares.

The function that we minimize here is basically the OLS criterion. This is why the regression model leads to the same results.

Code
# The previous example is equivalent as a linear regression model.
reg <- lm(y ~ x, data = dat)
# See the results.
summary(reg)

Call:
lm(formula = y ~ x, data = dat)

Residuals:
      1       2       3       4       5       6 
 0.2381  0.2095  0.1810 -0.8476 -0.8762  1.0952 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -1.2667     0.7815  -1.621 0.180388    
x             2.0286     0.2007  10.109 0.000539 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8395 on 4 degrees of freedom
Multiple R-squared:  0.9623,    Adjusted R-squared:  0.9529 
F-statistic: 102.2 on 1 and 4 DF,  p-value: 0.000539

The function that we minimize in the estimation of the credit risk Merton model is more elaborated, but we follow the same principle as above.

2.2 The estimation.

For an introductory reference to understand the basics of credit see Brealey et al. (2020).

This is a nice introduction of the Merton Model for credit risk assessment:

Code
embed_url("https://youtu.be/TjVvatkjlsM")

Let’s follow the example 24.3 in (Hull 2015). We have five parameters: \(E_0=3\), \(\sigma_E=0.8\), \(rf=0.05\), \(T=1\), \(D=10\) that lead to an estimate of the value of the assets today \(V_0\) and the volatility of the assets \(\sigma_V\). With these seven parameters we can estimate the probability of default \(pd\) among other interesting results.

These are the known parameters.

Code
# List of known parameters.
E0 <- 3 # Value of the equity as today, this is market capitalization.
se <- 0.8 # Stock return volatility.
rf <- 0.05 # Risk-free rate.
TT <- 1 # Maturity.
D <- 10 # Value of the debt. The Bloomberg function DDIS is useful.

We need equations 24.3 and 24.4 to estimate \(V_0\) and \(\sigma_V\).

Code
eq24.3 <- function(V0, sv) { 
  ((V0 * pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) -
  D * exp(-rf * TT) * pnorm(((log(V0 / D) +
  (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) - sv * sqrt(TT)) - E0)) }
eq24.4 <- function(V0, sv) { 
  ((pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) *
      sv * V0 - se * E0)) }
# Footnote 10 indicates that we should minimize F(x,y)^2+G(x,y)^2
min.footnote10 <- function(x) 
  (eq24.3(x[1], x[2]))^2 + (eq24.4(x[1], x[2]))^2
# The minimization leads to the values of V0 and sv.
V0_sv <- optim(c(1, 1), min.footnote10)
# Define the values as parameters.
V0 <- V0_sv$par[1]
sv <- V0_sv$par[2]
kable(data.frame("Market value" = V0, 
                 "Volatility" = sv), 
      caption = "Assets.")
Assets.
Market.value Volatility
12.39578 0.2123041

Calculate the probability of default \(N(-d_2)\) as a function of \(V_0\) and \(\sigma_V\).

Code
PD <- function(V0, sv) { 
  pnorm(-(((log(V0 / D) + (rf + sv^2 / 2) * TT) / 
             (sv * sqrt(TT))) - sv * sqrt(TT))) }
# Calculate the probability of default given the previous parameters.
pd <- PD(V0, sv)
pd
[1] 0.1269396

The probability of default is 12.69396%.

Let’s analyze the role of the values of \(d_1\), \(d_2\), \(N(d_1)\) and \(N(d_2)\). First, we isolate these values.

Code
# Inspect the role of d and N(d).
d1 <- (log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))
d2 <- d1 - sv * sqrt(TT)
Nd1 <- pnorm(d1)
Nd2 <- pnorm(d2)
ans <- data.frame(d1, Nd1, d2, Nd2)
ans
        d1       Nd1       d2       Nd2
1 1.353282 0.9120172 1.140978 0.8730604

Now, let’s illustrate how \(N(-d_2)\) lead to the probability of default given the properties of a standard normal distribution.

Code
xseq <- seq(-4, 4, 0.01) 
densities <- dnorm(xseq, 0, 1) # Standard normal distribution.
# Graphical evaluation of N(-d_2).
plot(xseq, densities, xlab = "d values (-d_2 is the dotted line)", 
     ylab = "Density", type = "l", lwd = 2, cex = 2)
abline(h = 0)
polygon(c(xseq[xseq >= -d2], -d2), c(densities[xseq >= -d2], 0), col = "green")
polygon(c(xseq[xseq < -d2], -d2), c(densities[xseq < -d2], 0), col = "red")
legend("topleft", legend = c("N(d_2)=0.8730604 
in green and
N(-d_2)=0.1269396 in red.

N(d_2)+N(-d_2)=1."),
bg = "white", bty = "n", cex = 0.9)
abline(v = -d2, lty = 2, lwd = 3)

Figure 2.3: The red area represents the probability of default.

Here you can find an alternative explanation about \(N(-d_2)\):

Code
embed_url("https://youtu.be/lV0ytJYbVzc")

Now, let’s analyze the minimization that leads to \(V_0\) and \(\sigma_V\). We do not have an analytic math expressions (closed form) to calculate \(V_0\) and \(\sigma_V\), instead we have approximate values or estimates given the minimization of equations 24.3 and 24.4 in (Hull 2015). In order to see how good this approximation is, we can retrieve the value of \([F(x,y)]^2+[G(x,y)]^2\) evaluated at \(x=V_0\) and \(y=\sigma_v\). This value is supposed to be positive as both \(F(x,y)\) and \(G(x,y)\) functions are squared, so we expect a positive number close to zero when evaluated at \(x=V_0\) and \(y=\sigma_v\).

Code
# Evaluate how good is the minimization.
min.footnote10(x = c(V0, sv))
[1] 1.425797e-07
Code
round(min.footnote10(x = c(V0, sv)), 6)
[1] 0

The minimization conducted in the function min.footnote10() above worked fairly well as the value is zero in practical terms.

There is another way to approach how the minimization work. We can propose a graphical analysis. In particular, we can verify whether \(V_0=12.39578\) and \(\sigma_V=0.2123041\) minimizes the objective function. To illustrate this in two axis, we need to fix either \(V_0\) or \(\sigma_V\), and evaluate \([F(x,y)]^2+[G(x,y)]^2\) with different values of \(V_0\) or \(\sigma_V\). Let’s do that.

Code
# Fix sv and evaluate different values of V0.
V0.seq <- seq(from = 4, to = 20, length.out = 100)
sv.rep <- rep(sv, 100)
# Function to be evaluated.
FG <- function(V0, sv) { (eq24.3(V0, sv))^2 + (eq24.4(V0, sv))^2 }
# Apply the function with fixed sv and different V0 values.
FG.V0 <- mapply(FG, V0.seq, sv.rep)

We are ready to plot.

Code
# Plot the results.
plot(V0.seq, FG.V0, type = "l", xlab = expression(paste(V[0])),
     ylab = expression(paste(F(x,y)^2+G(x,y)^2)), cex.lab = 0.8, lwd = 3)
abline(v = V0, lty = 2)
abline(h = FG(V0, sv), lty = 2)
points(V0, FG(V0, sv), pch = 1, cex = 3, col = "red", lwd = 2)

Figure 2.4: Here we fix the volatility of the assets and change the value of the assets at time zero. The minimization looks fine.

Clearly, the minimization worked well here. For the sake of completeness, now we fix the \(V_0\) value and evaluate the function again.

Code
# Now fix V0, and evaluate different sv values.
sv.seq <- seq(0, 0.6, length.out = 100)
V0.rep <- rep(V0, 100)
# Evaluate the function with these parameters.
FG.sv <- mapply(FG, V0.rep, sv.seq)
# Plot the results.
plot(sv.seq, FG.sv, type = "l", xlab = expression(paste(sigma[V])), 
     ylab = expression(paste(F(x, y)^2 + G(x, y)^2)), 
     cex.lab = 0.8, lwd = 3)
abline(v = sv, lty = 2)
abline(h = FG(V0, sv), lty = 2)
points(sv, FG(V0, sv), pch = 1, cex = 3, col = "red", lwd = 2)

Figure 2.5: Here we fix the value of the assets at time zero and change the volatility of the assets. The minimization looks fine.

We can show the same as before but in a 3D plot.

Code
# outer evaluates the function for each combination of the vectors.
z <- outer(V0.seq, sv.seq, FG) # z is now 100x100 matrix.
persp(V0.seq, sv.seq, z, col = "springgreen", shade = 0, xlab = "V0",
      ylab = "sv", zlab = "Objective function.")

Figure 2.6: Minimization of the function.

Contour plots or level plots are a way to show a three-dimensional surface on a two-dimensional plane.

Code
# The nlevels is high to emphasize the minimum value in blue.
contour(V0.seq, sv.seq, z, xlab = expression(paste(V[0])), 
        ylab = expression(paste(sigma[V])), 
        lwd = 2, nlevels = 300)
points(V0, sv, pch = 19, col = "blue", cex = 2)

Figure 2.7: Minimization of the function: a contour view.

The following plot cannot be seen in a PDF output, but it can be seen in a browser as an interactive plot. This green plot is nice but it is not interactive. We could plot interactive three-dimension plots with other R packages like plotly and show them in a html context or java environment.

Code
plot_ly(type = "surface" , x = sv.seq, y = V0.seq , z = z ) %>%
layout(#title = "Minimization of f at V0=12.395 an sv=0.212",
       scene = list(xaxis = list(title = "Assets volatility"), 
                    yaxis = list(title = "Assets", 
                    zaxis = list(title = "f=F^2+G^2"))))  %>%
hide_colorbar()
Figure 2.8: Minimization of the function: an interactive view.
Code
#add_markers(y = 12.3957765, x = 0.2123041, z = 0, inherit = TRUE)

The Merton model allows us to calculate more values in this credit risk assessment.

Code
# Other results in the Merton's model.
market_value_debt <- V0 - E0
pv_promised_payment_debt <- D * exp(-rf * TT)
Expected_Loss <- (pv_promised_payment_debt - market_value_debt) /
  pv_promised_payment_debt
recovery_rate <- 1 - (Expected_Loss / pd)
ans <- data.frame(market_value_debt, pv_promised_payment_debt, 
                  Expected_Loss, recovery_rate, pd)
kable(t(ans), caption = "Other results in the Merton's model.")
Other results in the Merton’s model.
market_value_debt 9.3957765
pv_promised_payment_debt 9.5122942
Expected_Loss 0.0122492
recovery_rate 0.9035039
pd 0.1269396

Could we figure out the implied bond yield spread? It would be similar to the fair interest rate that this firm has to pay given its probability of default and recovery rate. Take Hull’s equation 24.2 as a reference: \(\lambda(T)=S(T)/(1-R)\) and solve for \(S(T)\): \(S(T)=\lambda(T)\times(1-R)\), so \(S(T)=(0.12693963)\times(1-0.90350393)\). This is 122.4918 basis points more than the risk-free rate. Then, with five parameters we can calculate (among other things) the firm’s probability of default and the correspondent yield spread of this firm’s bond. It would be interesting to compare this value with the market yield spread and evaluate whether the market yield is over or underestimated according to our theoretical value.

This is the spread function.

Code
spread <- function(E0, se, rf, TT, D) { 
  eq24.3 <- function(V0, sv) { 
  ((V0 * pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) - 
      D * exp(-rf * TT) * pnorm(((log(V0 / D) +
  (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) - sv * sqrt(TT)) - E0)) }
eq24.4 <- function(V0, sv) { 
  ((pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) *
      sv * V0 - se * E0)) }
  min.footnote10 <- function(x) 
  (eq24.3(x[1], x[2]))^2 + (eq24.4(x[1], x[2]))^2
V0_sv <- optim(c(1, 1), min.footnote10)
V0 <- V0_sv$par[1]
sv <- V0_sv$par[2]
pd <- pnorm(-(((log(V0 / D) + (rf + sv^2 / 2) * TT) /
          (sv * sqrt(TT))) - sv * sqrt(TT)))

market_value_debt <- V0 - E0
pv_promised_payment_debt <- D * exp(-rf * TT)
Expected_Loss <- (pv_promised_payment_debt - market_value_debt) /
  pv_promised_payment_debt
recovery_rate <- 1 - (Expected_Loss / pd)
spread <- pd * (1-recovery_rate)
spread
}

Let’s evaluate the spread function.

Code
spread(E0 = 3, se = 0.8, rf = 0.05, TT = 1, D = 10)
[1] 0.01224917

2.3 Inside view of the Merton’s model.

The model assumes that the assets follow a geometric Brownian motion stochastic process. A regular Wiener stochastic process looks like this.

Code
embed_url("https://youtu.be/X-VRWeqG_I8")

Let’s go back to our main example and assume the daily evolution of the market value of the firm assets over a year. Here are 10 simulated paths of \(V_t\) from \(t=0,...,T\). The reference of this section is (Hull 2015) Chapter 14 and 15.

Code
# Merton assumes V follows a geometric Brownian motion process.
V0.sim <- function(i) {  # the argument i is not used here.
  GBM(N = 365, T = 1, t0 = 0, x0 = V0, theta = 0.05, sigma = sv) }
set.seed(3) # Reproducibility  
paths <- sapply(1:10, V0.sim) # Create 10 paths for V.
# Plot results.
matplot(paths, type = "l", col = "black", lwd = 1, lty = 1,
        ylab = expression(paste(V[t])), 
        xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")
points(rep(366, 8), paths[366, paths[366, ] > 10], 
       pch = 1, cex = 1.5, col = "green")
points(rep(366, 2), paths[366, paths[366, ] < 10], 
       pch = 1, cex = 1.5, col = "red")

Figure 2.9: Simulation of 10 paths of the assets.

These simulations can be interpreted as the evolution of the value of the firm’s assets in 10 parallel universes. In the past, some students have called this figure la gráfica de los pelitos, this is OK as long as you understand the figure implications. The right name is 10 simulated geometric Brownian motion processes. Let’s count the cases in which \(V_t<D\) note that here I use \(t\) and not \(T\), so we are counting the cases in which at least at some time \(t\) the value of the assets were lower than 10.

Code
# Which path went lower than D?
which(colSums(paths < 10) > 0)
[1] 2 4 9

The value of the assets was lower than $10 at some time in the second, fourth and ninth alternate universes. Let’s plot these cases.

Code
# There might be an easier way to select these cases.
matplot(paths[,which(colSums(paths < 10) > 0)], type = "l", 
        col = c("green", "blue", "red"), lwd = 2, lty = 1,
        ylab = expression(paste(V[t])), 
        xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")

Figure 2.10: Note that the blue path finally did not default. Recovery is possible as in the real life.

The blue path was lower than 10 at some time, but at \(t=T\) it is clearly higher than 10. For this reason, the blue path does not represent a firm’s default. See how the assets are not high enough to pay the debt at maturity only in two cases: green and red. Thus, according to this simulation the probability of default is 20%. Here is how we can calculate the cases in which \(V_T<D\).

Code
sum(paths[366,] < 10) / 10
[1] 0.2

The probability of default of 20% can be disappointing because it is significantly higher than 12.693963%. This is because the number of simulations is small. The following examples show how these values tend to converge as we increase the number of simulations from 10 to 100 and to 100,000.

Here are 100 simulated paths of \(V_0\) to \(V_T\).

Code
set.seed(3) # Reproducibility  
paths <- sapply(1:100, V0.sim) # Create 100 paths.
# Plot results.
matplot(paths, type = "l", col = "black", lwd = 1, lty = 1,
        ylab = expression(paste(V[t])), 
        xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")

Figure 2.11: Simulation of 100 paths of the assets.

Given the simulation above, the probability of default is 17%. This is closer to 12.693963%.

Code
sum(paths[366,] < 10) / 100
[1] 0.17

Let’s see all the 17 cases that end in default.

Code
# These are the 17 default cases.
matplot(paths[,which(paths[366,] < 10)], type = "l", col = "black", 
        lwd = 1, lty = 1, ylab = expression(paste(V[t])), 
        xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")

Figure 2.12: Default cases.

Now, let’s see the 11 cases in which \(V_t\) eventually went lower than $10 but at the end did not default.

Code
# V_t < 10
almost_default <- which(colSums(paths < 10) > 0)
# V_T < 10
default <- which((paths[366,] < 10))
# There should be an easier way to do this:
matplot(paths[,almost_default[which(almost_default %in% 
                                      default == FALSE)]], 
        type = "l", col = c(1:11), lwd = 1, lty = 1, 
        ylab = expression(paste(V[t])), 
        xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")

Figure 2.13: All these paths went lower than 10 at some point, but at the end the value of the assets are higher than 10. These are paralell universes in which the firm finally survived after some drama.

Here are 100,000 simulated \(V_T\) showed in a histogram. The reference for this section is 15.1 ((Hull 2015)). Note that instead of drawing the complete paths, we can directly get the distribution of \(V_T\). The resulting distribution is log-normal.

Code
set.seed(13)
# 100,000 values of VT at once.
VT <- exp(rnorm(100000, log(V0) + (rf - (sv^2) / 2)*TT, sv*sqrt(TT)))
# Plot results.
h <- hist(VT, 100, plot = FALSE)
ccat <- cut(h$breaks, c(-Inf, D, Inf))
# You may also need to change the xlim if a big red square appears as changes in maturity may imply larger values of VT. I recommend trying without xlim and see.
plot(h, main = NULL, col = c("red", "blue")[ccat], 
     xlab = "Value of the assets at maturity", xlim = c(4, 30))
legend("topright", 
legend = c("The red area is 12.657%, and represents
the simulated probability of default.
The probability of default of the model
following the textbook example is 
N(-d_2)=12.69396%. As we can see,
both are very close."),
col = c("red", "blue"), pch = c(19), bg = "white", bty = "n", cex = 0.8)

Figure 2.14: Histogram of 100,000 values of the assets at maturity.

The probability of default can be calculated as:

Code
sum(VT < 10) / 100000
[1] 0.12657

Note that 0.12657 is now much closer than 0.12693963. Convergence achieved.

This firm can be represented as an European call option payoff. The payoff of a typical stock option is \(c=max(S_T-K, 0)\) or equivalently in terms of Merton’s model: \(E_T=max(V_T-D, 0)\).

Code
ET <- pmax(VT - D, 0) # payoff function of a typical call option.
plot(sort(VT), sort(ET), type = "l", lwd = 4, xlim = c(5, 20), 
     ylim = c(0, 10), xlab = "Simulated assets at maturity (V_T)",
     ylab = "Simulated equity at maturity (E_T)")
abline(v = 10, lty = 2)
legend("left", legend = c("According to the model,
E_T=0 happens 
in 12.657% of the 
100,000 cases."),
bg = "white", bty = "n", cex = 0.9)

Figure 2.15: A view of the firm’s balance sheet in the future. Looks like a typical European call option payoff.

This diagram show the firm’s market values in the future at maturity. This is because the \(E_T\) is a function of \(V_T\) and \(D\). As long as \(V_T<D\), then the value of \(E_T=0\). Otherwise, the value of \(E_T= V_T-D\), just as the accounting equation suggest but in future terms. Note that the Merton’s model says nothing about the estimated value of \(E_T\), instead we have an estimate about how likely is that \(V_T<D\), or in other words how likely is that \(E_T<0\).

Then, we know that \(E_T= max(V_T-D, 0)\). The Black-Scholes-Merton formula can also estimate the theoretical value of \(E_0\). This is, if the observable market value given by the market capitalization available in any financial site is \(E_0=3\), we can retrieve the theoretical value of \(E_0\) so we can compare whether the firm observable market value is over or undervalued. In other words, the Black-Scholes-Merton model can give us an estimate of the firm value. In particular, the model gives the value of the equity today as:

\(E_0=V_0N(d_1)-De^{-rT}N(d_2)\)

This is equation 24.3 in (Hull 2015). To estimate \(E_0\), we need \(V_0\) and \(\sigma_V\) and these two values were already estimated before with the implementation of the min.footnote10 function above. It is true that this minimization requires \(E_0\), the point here is to use the observable market capitalization \(E_0=3\) to find out the unobservable \(V_0\) and \(\sigma_V\) and then estimate the theoretical value of \(E_0\).

The code is:

Code
# Black-Scholes call.
E0.theo <- function(V0, D, rf, TT, sv) {
  d1 <- (log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))
  d2 <- d1 - sv * sqrt(TT)
  c <- V0 * pnorm(d1) - D * exp(-rf * TT) * pnorm(d2)
}
V0 <- 12.39578 # Assets.
D <- 10 # Debt.
rf <- 0.05 # Risk-free rate.
TT <- 1 # Maturity.
sv <- 0.2123041 # Volatility of assets.
(E0.theo(V0, D, rf, TT, sv))
[1] 3.000357

Then, we can argue that the market value of the firm is slightly undervalued since the theoretical value of the firm is 3.000357 compared with 3. This approach opens the possibility to value firms. It also opens the possibility to compare the book value in the Balance sheet of the total assets versus the estimate \(V_0\). This will allow us to understand the difference between book value and market value of the assets, not only of the equity. We can even implement a similar analysis about the market value of the debt just as we did before.

2.4 The probability of default as a function of some parameters.

We can expand our analysis by evaluating how changes in parameters change the probability of default in the context of the Merton model. This is interesting because we can move from estimating a probability of default to propose changes in the firm’s parameters like the capital structure to reduce a firm’s probability of default. First, we need the probability of default as a function.

Code
pd <- function(E0, se, rf, TT, D) { 
  eq24.3 <- function(V0, sv) { 
  ((V0 * pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) -
  D * exp(-rf * TT) * pnorm(((log(V0 / D) +
  (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) - sv * sqrt(TT)) - E0)) }
eq24.4 <- function(V0, sv) { 
  ((pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) *
      sv * V0 - se * E0)) }
  min.footnote10 <- function(x) 
  (eq24.3(x[1], x[2]))^2 + (eq24.4(x[1], x[2]))^2
V0_sv <- optim(c(1, 1), min.footnote10)
V0 <- V0_sv$par[1]
sv <- V0_sv$par[2]
pd <- pnorm(-(((log(V0 / D) + (rf + sv^2 / 2) * TT) /
          (sv * sqrt(TT))) - sv * sqrt(TT)))
pd }

Now we have a convenient function: \(pd = f(E_0, \sigma_E, rf, T, D)\). Remember these five parameters are required to estimate \(V_0\) and \(\sigma_V\) before calculating \(pd\).

Code
# Evaluate the case of the textbook example.
pd1 <- pd(3, 0.8, 0.05, 1, 10)
pd1
[1] 0.1269396

Now, we create vectors of parameters and evaluate the pd function.

Code
l = 50 # Vectors of length 50.
# Create vectors of the parameters.
x.E <- seq(from = 1, to = 20, length.out = l)
x.rf <- seq(0, 4, length.out = l)
x.D <- seq(1, 20, length.out = l)
x.T <- seq(0.5, 20, length.out = l)
x.se <- seq(0, 3, length.out = l)
# Evaluate the pd at different values.
pd.E <- mapply(pd, x.E, se, rf, TT, D)
pd.rf <- mapply(pd, E0, se, x.rf, TT, D)
pd.D <- mapply(pd, E0, se, rf, TT, x.D)
pd.T <- mapply(pd, E0, se, rf, x.T, D)
pd.se <- mapply(pd, E0, x.se, rf, TT, D)

Let’s plot the probability of default as a function of equity. What we are doing here is to evaluate the probability of default function 50 times, as the \(E_0\) has 50 values from 1 to 20. By doing this, we end up with 50 values of the probability of default.

Code
plot(x.E, pd.E, type = "l", ylab = "Probability of default", 
     xlab = "Equity at time zero", lwd = 3, ylim = c(0, 0.15))
lines(x.E, pd.E)
abline(v = E0, lty = 2)
abline(h = pd1, lty = 2)
points(E0, pd1, pch = 1, cex = 3, col = "red", lwd = 2)

Figure 2.16: The probability of default as a function of the equity at time zero.

These relationships between the parameters and the probability of default are all non-linear. This is interesting because according to the model, the current levels of the parameters are important when deciding which parameter might lead to a high or low impact over the probability of default.

Code
plot(x.rf, pd.rf, type = "l", ylab = "Probability of default", 
     xlab = "Risk-free rate", lwd = 3)
abline(h = 0, lty = 2)
abline(v = rf, lty = 2)
abline(h = pd1, lty = 2)
points(rf, pd1, pch = 1, cex = 3, col = "red", lwd = 2)

Figure 2.17: The probability of default as a function of the risk-free rate.

Probability of default as a function of debt. Note that adding 1 or subtracting 1 unit of debt has different impact over the probability of default.

Code
plot(x.D, pd.D, type = "l", ylab = "Probability of default", 
     xlab = "Debt", lwd = 3, ylim = c(0, 0.15))
abline(h = 0, lty = 2)
abline(v = D, lty = 2)
abline(h = pd1, lty = 2)
points(D, pd1, pch = 1, cex = 3, col = "red", lwd = 2)

Figure 2.18: The probability of default as a function of the debt.

Probability of default as a function of maturity. Note that asking for a higher debt maturity (as in a negotiation), do not lead to a lower probability of default if we keep the rest of the parameters unchanged in our firm. Imagine we arrange a meeting with the bank manager and we ask for more time to pay our debt. The bank manager should not (in principle) extend the debt maturity as long as we commit to do some changes in the firm. For example, we can negotiate a longer maturity and promise to increase the firm’s equity.

Code
plot(x.T, pd.T, type = "l", ylab = "Probability of default", 
     xlab = "Maturity", lwd = 3)
abline(h = 0, lty = 2)
abline(v = TT, lty = 2)
abline(h = pd1, lty = 2)
points(TT, pd1, pch = 1, cex = 3, col = "red", lwd = 2)

Figure 2.19: The probability of default as a function of the debt’s maturity.

Probability of default as a function of the volatility of the equity \(\sigma_E\).

Code
plot(x.se, pd.se, type = "l", ylab = "Probability of default",
     xlab = "Standard deviation of equity", lwd = 3)
abline(h = 0, lty = 2)
abline(v = se, lty = 2)
abline(h = pd1, lty = 2)
points(se, pd1, pch = 1, cex = 3, col = "red", lwd = 2)

Figure 2.20: The probability of default as a function of the volatility of the equity.

In the previous plots we evaluate the probability of default function by changing one of the following parameters: \(E_0\), \(\sigma_E\), \(rf\), \(T\) or \(D\). We were able to do that because we constructed a vector of 50 different values for these five parameters. Note that \(V_0\) and \(\sigma_V\) do not remain fixed because they are at the same time a function of \(E_0\), \(\sigma_E\), \(rf\), \(T\) and \(D\). Now, let’s start by plotting the probability of default as a function of \(E_0\) and \(\sigma_E\).

One alternative is to evaluate the probability of default function 50 times by taking the 50 pairs of values of \(E_0\) and \(\sigma_E\). By doing that, we will end with three vectors of size 50 that we can plot as a three dimension scatter plot.

Code
ED <- mapply(pd, x.E, x.se, rf, TT, D)
p.ED <- scatterplot3d(x.E, x.se, ED, pch = 16, type = "h", color = 1:50, 
                      angle = 100, xlab = "Equity at time zero", 
                      ylab = "Volatility of equity", 
                      zlab = "Probability of default")
p.ED$points3d(E0, se, pd1, type = "h", col = "red", pch = 20, cex = 3)  

Figure 2.21: The probability of default as a function of the equity at time zero and the volatility of equity.

The red point represent the case of the original textbook example in which \(pd = f(E_0=3, \sigma_E=0.8,rf=0.05,T=1,D=10)=0.1269396\). Note that the red point is not part of the 50 plotted observations simply because there is no case in which the probability of default function is evaluated in \(E_0=3, \sigma_E=0.8\). Although the plot above is not incorrect, it might be incomplete as we are not showing all possible values of the probability of default. One alternative to show a more complete plot is to evaluate all possible combinations of \(E_0\) and \(\sigma_E\) to evaluate the probability of default function. If we do that, we will have 50 values for \(E_0\) and \(\sigma_E\) that represent the \(x\) and \(y\) axis, and a \(50 \times 50\) matrix containing the probability of default. This allows us to plot a surface plot and a contour plot.

Code
# Create the empty matrix.
p_E_se <- matrix(0, nrow = l, ncol = l)
# Fill the empty matrix with probability of default values.
for(i in 1 : l){ # Is there an easier way to do this?
  for(j in 1 : l){
    p_E_se[i,j] <- mapply(pd, x.E[i], x.se[j], rf, TT, D) } }

Let’s plot the results.

Code
# Plot results.
p.ED1 <- persp(x.E, x.se, p_E_se, zlab = "pd", 
               xlab = "Equity at time zero", 
               ylab = "Volatility of equity", theta = 60, phi = 10, 
               expand =  0.5, col = "orange", shade = 0.2, 
               ticktype = "detailed") 
# Add the original pd value as in the textbook example.
points(trans3d(E0, se, pd1, p.ED1), cex = 2, pch = 19, col = "blue")

Figure 2.22: The probability of default as a function of the equity at time zero and the volatility of equity: A plane or surface view.

Now the plot becomes a plane. A contour plot simplify the reading of a three dimension plot by reducing it into two dimensions: \(x\) and \(y\) coordinates. The value of the probability of default is represented by the contour lines.

Code
contour(x.E, x.se, p_E_se, xlab = "Equity at time zero", 
        ylab = "Volatility of equity", lwd = 2)
points(E0, se, pch = 19, col = "blue", cex = 2)

Figure 2.23: The probability of default as a function of the equity at time zero and the volatility of equity: A contour view.

The blue point is the original case. Note that the value is slightly higher than 0.1, which is consistent with the \(pd=0.1269396\).

Code
l = 40 # Vectors of length 50.
# Create vectors of the parameters.
x.E <- seq(from = 2, to = 4, length.out = l)
x.se <- seq(0.2, 1.5, length.out = l)
# Create the empty matrix.
p_E_se <- matrix(0, nrow = l, ncol = l)
# Fill the empty matrix with probability of default values.
for(i in 1 : l){ # Is there an easier way to do this?
  for(j in 1 : l){
    p_E_se[i,j] <- mapply(pd, x.E[i], x.se[j], rf, TT, D) } }
Code
plot_ly(type = "surface" , x = x.se, y = x.E , z = p_E_se ) %>%
layout(#title = "Minimization of f at V0=12.395 an sv=0.212",
       scene = list(xaxis = list(title = "Volatility of equity"), 
                    yaxis = list(title = "Market value of equity"), 
                    zaxis = list(title = "Probability of default")))  %>%
  hide_colorbar()
Figure 2.24: The probability of default as a function of the equity at time zero and the volatility of equity: An interactive view.
Code
#add_markers(y = 3, x = 0.8, z = 0.1269396, inherit = TRUE)

2.5 GoT: capital structure.

The problem. Daenerys Targaryen owns a big manufacturing firm that produces fire extinguishers. Surprisingly, the firm data corresponds exactly as the data in the example 24.3 ((Hull 2015)). She is planning to ask the Iron Bank of Braavos for a loan using the peaceful civilized way (without dragons). However, she would like to reduce the probability of default of the firm first, in that way she might negotiate better credit conditions. Daenerys has some understanding about very basic finance because she knows that she could either reduce the debt, or increase the capital in order to reduce the probability of default. Doing both alternatives at the same time is clearly more expensive so she is not very keen about it.

What is the best strategy to reduce the probability of default? Reduce the debt by 2 or increase the capital by 2?

Code
D.seq <- seq(from = 0, to = 13, by = 0.1)
# Evaluate pd function: E0 changes from 1 to 5; debt goes from 0 to 13.
E1 <- mapply(pd, 1, se, rf, TT, D.seq)
E2 <- mapply(pd, 2, se, rf, TT, D.seq)
E3 <- mapply(pd, 3, se, rf, TT, D.seq)
E4 <- mapply(pd, 4, se, rf, TT, D.seq)
E5 <- mapply(pd, 5, se, rf, TT, D.seq)

Plot the results.

Code
# We use these vectors for colors and legends in the following plots.
colors <- c("green", "purple", "black", "blue", "red")
leg <- c("E0=1", "E0=2", "E0=3 (initial value)", "E0=4", "E0=5")
# Plot.
plot(D.seq, E1, type = "l", col = "green", lwd = 3,
     xlab = "Debt value. Initially, D=10",
     ylab = "Probability of default. Initially PD=12.69%")
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = D, lty = 2)
abline(h = pd1, lty = 2)
legend("bottomright", legend = leg, lwd = rep(3, 5), col = colors, 
       bg = "white")
points(D, pd1, pch = 19, cex = 2)

Figure 2.25: GoT problem. Probability of default and capital structure.

Let’s illustrate the alternatives as clear as possible.

Code
plot(D.seq, E1, type = "l", col = "green", lwd = 3,
     xlab = "Debt value. Initially, D=10",
     ylab = "Probability of default. Initially PD=12.69%")
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = D, lty = 2)
abline(v = 8, lty = 2)
abline(h = pd1, lty = 2)
abline(h = pd(E0, se, rf, TT, 8), lty = 2)
abline(h = pd(5, se, rf, TT, D), lty = 2)
legend("bottomright", legend = leg, lwd = rep(3, 5), col = colors, 
       bg = "white")
points(D, pd1, pch = 19, cex = 2)
points(D, pd(5, se, rf, TT, D), pch = 19, cex = 2, col = "red")
points(8, pd(3, se, rf, TT, 8), pch = 19, cex = 2, col = "yellow")

Figure 2.26: GoT solution. Probability of default and capital structure.

Consider a different initial situation. Now, \(E_0=2\) and \(D=2\) and the rest remains the same. In this case, the probability of default is now 7.13%. What is the best strategy to reduce the probability of default? Reduce the debt by 1 or increase the capital by 1?

Code
leg2 <- c("E0=1", "E0=2 (initial value)", "E0=3", "E0=4", "E0=5")

plot(D.seq, E1, type = "l", col = "green", lwd = 3,
     xlab = "Debt value. Initially, D=2",
     ylab = "Probability of default. Initially PD=7.13%", 
     xlim = c(0, 2.5), ylim = c(0, 0.08))
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = 2, lty = 2)
abline(h = pd(2, se, rf, TT, 2), lty = 2)
legend("bottomright", legend = leg2, lwd = rep(3, 5), col = colors, 
       bg = "white", cex = 0.6)
points(2, pd(2, se, rf, TT, 2), pch = 19, cex = 2, col = "purple")

Figure 2.27: GoT problem 2. Probability of default and capital structure.

The solution can be illustrated as follows:

Code
plot(D.seq, E1, type = "l", col = "green", lwd = 3,
     xlab = "Debt value. Initially, D=2",
     ylab = "Probability of default. Initially PD=7.13%", 
     xlim = c(0, 2.5), ylim = c(0, 0.08))
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = 2, lty = 2)
abline(v = 1, lty = 2)
abline(h = pd(2, se, rf, TT, 2), lty = 2)
abline(h = pd(2, se, rf, TT, 1), lty = 2)
abline(h = pd(E0, se, rf, TT, 2), lty = 2)
legend("bottomright", legend = leg2, lwd = rep(3, 5), col = colors, 
       bg = "white", cex = 0.6)
points(2, pd(2, se, rf, TT, 2), pch = 19, cex = 2, col = "purple")
points(1, pd(2, se, rf, TT, 1), pch = 19, cex = 2, col = "yellow")
points(2, pd(E0, se, rf, TT, 2), pch = 19, cex = 2)

Figure 2.28: GoT solution 2. Probability of default and capital structure.

Interesting.

Consider a third problem. In a parallel universe, Daenerys Targaryen’s firm has some liquidity short-term troubles. She needs either more cash now or some more time to pay her current debt. She would like to know which alternative leads to the lowest increase of the probability of default. Add $2 more to her current debt, or ask for a half year more time to pay her current debt?

Code
T1 <- mapply(pd, E0, se, rf, 0.5, D.seq)
T2 <- mapply(pd, E0, se, rf, 1, D.seq)
T3 <- mapply(pd, E0, se, rf, 1.5, D.seq)
T4 <- mapply(pd, E0, se, rf, 2, D.seq)
T5 <- mapply(pd, E0, se, rf, 2.5, D.seq)
Code
leg3 <- c("T=2.5", "T=2", "T=1.5", "T=1", "T=0.5")

plot(D.seq, T5, type = "l", col = "green", lwd = 3,
     xlab = "Debt value. Initially, D=6",
     ylab = "Probability of default. Initially PD=20.33%")
lines(D.seq, T4, col = "purple", lwd = 3)
lines(D.seq, T3, col = "black", lwd = 3)
lines(D.seq, T2, col = "blue", lwd = 3)
lines(D.seq, T1, col = "red", lwd = 3)
abline(v = 6, lty = 2)
abline(h = pd(E0, se, rf, 1.5, 6), lty = 2)
legend("bottomright", legend = leg3, lwd = rep(3, 5), col = colors, 
       bg = "white")
points(6, pd(E0, se, rf, 1.5, 6), pch = 19, cex = 2)

Figure 2.29: GoT problem 3. More debt or longer maturity?

Let’s visualize the result.

Code
plot(D.seq, T5, type = "l", col = "green", lwd = 3,
     xlab = "Debt value. Initially, D=6",
     ylab = "Probability of default. Initially PD=20.33%")
lines(D.seq, T4, col = "purple", lwd = 3)
lines(D.seq, T3, col = "black", lwd = 3)
lines(D.seq, T2, col = "blue", lwd = 3)
lines(D.seq, T1, col = "red", lwd = 3)
abline(v = 6, lty = 2)
abline(v = 8, lty = 2)
abline(h = pd(E0, se, rf, 1.5, 6), lty = 2)
abline(h = pd(E0, se, rf, 2, 6), lty = 2)
abline(h = pd(E0, se, rf, 1.5, 6), lty = 2)
abline(h = pd(E0, se, rf, 1.5, 8), lty = 2)
legend("bottomright", legend = leg3, lwd = rep(3, 5), col = colors, 
       bg = "white")
points(6, pd(E0, se, rf, 1.5, 6), pch = 19, cex = 2)
points(8, pd(E0, se, rf, 1.5, 8), pch = 19, cex = 2, col = "yellow")
points(6, pd(E0, se, rf, 2, 6), pch = 19, cex = 2, col = "purple")

Figure 2.30: GoT answer 3. More debt or longer maturity?

Nice.

Code
# Copula models
library(MASS)
library(knitr)
library(viridis)
library(dplyr)
library(mnormt) #dnorm()
library(ggplot2)
library(plotly)
library(rayshader)
library(vembedr)

3 The Gaussian copula model.

Copulas allow us to decompose a joint probability distribution into their marginals (which by definition have no correlation) and a function which couples (hence the name) them together and thus allows us to specify the correlation separately. The copula is that coupling function. Here, we introduce the simplest type of copulas into a very common problem in credit risk which is the time to default.

3.1 The basics.

In a finance-context, the variable \(x\) represents a firm’s future (unknown) performance measure that goes from \(-4\) to \(4\) in the horizontal axis. Strictly speaking, more extreme values of \(x\) like \(-\infty\) and \(+\infty\) are theoretically possible but are very rare and happen quite infrequently at least in real-life situations. At the moment, we do not care too much about what kind of performance measure this is, it could be liquidity for example, solvency, or any other that it is normalized to have values between \(-4\) and \(4\). In a statistics-context, we can think that \(x\) is equivalent to the so-called \(z\)-score in the context of the standardized normal distribution function.

Code
# The standard normal distribution function is: y = f(x).
x <- seq(-4, 4, 0.001) # First define x.
y <- 1 / sqrt(2 * pi) * exp(-x ^ 2 / 2) # Define y as a function of x.
# Now plot.
plot(x, y, type = "l", lwd = 3, col = "blue" , 
     ylab = "Standard normal density function: dnorm(x)")
abline(h = 0, lty = 2)

Figure 3.1: Standard normal distribution function.

The copula model, including the Gaussian, considers that the future firm performance measure \(x\) in the horizontal axis is related with the firm’s probability of default (from 0 to 1, or 0% to 100%) represented by the value of the cumulative density function pnorm() of the normal distribution.

Assume the firm will default in six months as long as the firm performance \(x\) in six months is lower than a given threshold: \(\bar{x}=-2.5\). The threshold \(\bar{x}=-2.5\) in the Gaussian copula model is equivalent to the \(-d_2\) in the Merton model. Then, the six months firm’s probability of default in math notation is \(N(-2.5)\), or pnorm(-2.5)=0.006209665 in R code, which is very low as it is close to 0. It is the red area below, which is 0.62% of the 100% possible area under the curve.

Code
# Useful to construct next plots.
D <- dnorm(x) 
P <- pnorm(x)
Code
plot(x, D, type = "l", lwd = 3, col = "blue" , 
     ylab = "Standard normal density function: dnorm(x)")
abline(h = 0, lty = 2)
abline(v = -2.5, lty = 2)
polygon(c(x[x < -2.5], -2.5), c(D[x < -2.5], 0), col = "red")

Figure 3.2: Six months probability of default.

Now assume the firm will default in one year as long as the firm performance \(x\) in one year is lower than a higher given threshold: \(\bar{x}=-2\). Then, the one year probability of default is \(N(-2)\) or pnorm(-2)=0.02275013 which is still low but higher than the six months probability of default. Note that the new red area is 2.27%.

Code
plot(x, D, type = "l", lwd = 3, col = "blue" , 
     ylab = "Standard normal density function: dnorm(x)")
abline(h = 0, lty = 2)
abline(v = -2, lty = 2)
polygon(c(x[x < -2], -2), c(D[x < -2], 0), col = "red")

Figure 3.3: One year probability of default.

At the moment, we have not discussed how to estimate the threshold values \(\bar{x}\). However, we can say that the Gaussian copula model assumes that the probability of default increases as time passes. As if the firm will eventually default in the future. The six month probability of default 0.62% is low because it is not very likely that the future firm performance lies below -2.5 in six months. The one year probability of default 2.27% is a bit higher because it is more likely that the future firm performance lies below -2 in one year.

Then, the pnorm() R function allows us to transform threshold values \(\bar{x}\) into a probabilities of default. Transform probabilities into \(\bar{x}\) is also possible as the function qnorm() is the inverse of pnorm().

From \(\bar{x}\) to probabilities:

Code
pnorm(-2.5)
[1] 0.006209665
Code
pnorm(-2)
[1] 0.02275013

From probabilities to \(\bar{x}\):

Code
qnorm(0.006209665)
[1] -2.5
Code
qnorm(0.02275013)
[1] -2

Nice.

The function dnorm() is relevant when we implement a graphical approach because it represents how frequent (or how likely) these values are given the standard normal distribution, so in both extreme values of \(x\) the value of dnorm() is low. Here, extreme values of \(x\) can be represented by \(-4\) and \(4\). This function delivers the height of the standard normal distribution, dnorm(4)=0.0001338302, and dnorm(-4)=0.0001338302. Given that the standard normal function is symmetrical, we have that in general: dnorm(-x)=dnorm(x). The letter \(d\) in dnorm() stands for density and it is maximum at \(x=0\). When plotting the density values we get the bell-shaped normal curve.

See how these pnorm(), qnorm(), dnorm() R functions work and relate:

Code
# Here, x has 11 values only.
x.summary <- c(-Inf, -4:4, Inf) # vector of x values to evaluate functions.
ans <- data.frame(x.summary, dnorm(x.summary), 
                  pnorm(x.summary), qnorm(pnorm(x.summary)))
colnames(ans) <- c("x", "dnorm(x)=height", "pnorm(x)=pd", "qnorm(pd)=x")
kable(ans, caption = "Review of standard normal distribution functions.", 
      digits = 5)
Review of standard normal distribution functions.
x dnorm(x)=height pnorm(x)=pd qnorm(pd)=x
-Inf 0.00000 0.00000 -Inf
-4 0.00013 0.00003 -4
-3 0.00443 0.00135 -3
-2 0.05399 0.02275 -2
-1 0.24197 0.15866 -1
0 0.39894 0.50000 0
1 0.24197 0.84134 1
2 0.05399 0.97725 2
3 0.00443 0.99865 3
4 0.00013 0.99997 4
Inf 0.00000 1.00000 Inf

You can type for example ?dnorm in the RStudio console to see more details about this (and other) functions.

Note that the only distribution function that we are currently analyzing is the standard normal (Gaussian) distribution function. A standard normal distribution function is the one that has mean 0 and variance 1. There are other copula models that assume other kinds of distributions. These other kinds of copulas are useful when we are interested in modeling cases in which extreme values are more likely to happen compared with the standard normal distribution. Understanding Gaussian copulas allows you to understand other more elaborated copulas.

Instead of a density function, we can plot the cumulative probability distribution. Now, we do not need the \(N(\cdot)\) function to compute the probability as the vertical axis already represents the probability of default.

Code
plot(x, P, type = "l", lwd = 3,
     ylab = "Cumulative probability function: pnorm(x)", xlab = "x")
abline(h = 0.5, lty = 2)
abline(v = 0, lty = 2)

Figure 3.4: The higher the x, the higher the probability of default.

Both in the same plot.

Code
plot(x, P, type = "l", lwd = 3,
     ylab = "pnorm(x) and dnorm(x)", xlab = "x")
lines(x, D, type = "l", lwd = 3, col = "blue")
abline(h = 0.5, lty = 2)
abline(v = 0, lty = 2)

Figure 3.5: Cumulative density and probability functions.

3.2 Introduction to example 24.6.

Here, we start analyzing Hull (2015) example 24.6. We do not analyze the 10 firms yet as in the original example. Instead, we first make some sense about the data, the relevant analysis, the logic, and the model basics before dealing with the full features in example in Hull (2015) 24.6.

Recall that, pnorm() leads to a probability, whereas qnorm() leads to a value of \(x\). According to Hull (2015), the cumulative probabilities of default of 1%, 3%, 6%, 10% and 15% are taken as given for the maturities of 1, 2, 3, 4 and 5 years respectively. To implement the model, we take this information to derive the corresponding threshold values \(\bar{x}\):

Code
pd <- c(0.01, 0.03, 0.06, 0.1, 0.15) # Probabilities of default per year.
x.y <- qnorm(pd) # Transform probabilities of default into x values.
ans <- data.frame(1:5, pd, x.y) # Gather the results.
colnames(ans) <- c("year", "pd", "x (values given in Hull)")
kable(ans, caption = "Main parameters.")
Main parameters.
year pd x (values given in Hull)
1 0.01 -2.326348
2 0.03 -1.880794
3 0.06 -1.554774
4 0.10 -1.281552
5 0.15 -1.036433

Note that the example assumes that the probability of default increases as we consider a longer maturity (from 1 year to 5 years). Probabilities above are cumulative, so they go from year 0 to year 1, from year 0 to year 2 and so on. To calculate the probability of default during a specific year we need to calculate the differences. In particular:

Code
ans <- data.frame(diff(pd))
colnames(ans) <- c("PD")
rownames(ans) <- c("x.y1 to x.y2", "x.y2 to x.y3", 
                   "x.y3 to x.y4", "x.y4 to x.y5")
kable(ans, caption = "PD at specific years.")
PD at specific years.
PD
x.y1 to x.y2 0.02
x.y2 to x.y3 0.03
x.y3 to x.y4 0.04
x.y4 to x.y5 0.05

This is how we can illustrate the case of a probability of default of 15% in 5 years. The green area represents the 15% of the whole area below the bell-shaped curve.

Code
plot(x, D, type = "l", col = 'black', lwd = 3, ylim = c(0, 0.4), xlab = "x",
     ylab = "Standard normal density function: dnorm(x)")
abline(h = 0, lty = 2)
abline(v = x.y[5], lty = 2)
polygon(c(x[x < x.y[5]], x.y[5]), 
        c(D[x < x.y[5]], 0), col = "green")

legend("topright", legend=c(
"pd(5years)=15%

The left green area
represents 15% of the
whole bell-shape.

N^-1(0.15)=-1.036433
N(-1.036433)=0.15"),
bg = "white", pch = 19, cex = 0.8, bty = "n", col = "green")

Figure 3.6: Gaussian density distribution function.

A complementary view is the cumulative probability function. Let’s illustrate the same case: a probability of default of 15% in 5 years. In this case, we do not need to calculate the area since the y-axis already represents the probability.

Code
plot(x, P, type = "l", col = 'black', lwd = 3, ylim = c(0, 1),
     xlab = "x",
     ylab = "Cumulative probability function: pnorm(x)")
abline(h = 0, lty = 2)
abline(h = 1, lty = 2)
lines(seq(-5, x.y[5], length.out = 2), rep(pnorm(x.y[5]), 2), 
      col = "green", lwd = 3)
lines(rep(x.y[5], 2), seq(0, pnorm(x.y[5]), length.out = 2), 
      col = "green", lwd = 3)
points(x.y[5], 0.15, pch = 19, col = "green", cex = 2)
legend("right", legend=c(
"pd(5years)=15%

N^-1(0.15)=-1.036433
N(-1.036433)=0.15"),
bg = "white", pch = 19, cex = 1, bty = "n", col = "green")

Figure 3.7: Gaussian probability distribution function.

A closer view to the figure above to see the 3-year and 5-year cases:

Code
plot(x, P, type = "l", col = 'black', lwd = 5, ylim = c(0, 0.22), 
     xlim = c(-4, 0), ylab = "Cumulative probability function: pnorm(x)",
     xlab = "x")
abline(h = 0, lty = 2)
abline(h = 1, lty = 2)
lines(seq(-5, x.y[3], length.out = 2), rep(pnorm(x.y[3]), 2), 
      col = "purple", lwd = 3, lty = 2)
lines(rep(x.y[3], 2), seq(0, pnorm(x.y[3]), length.out = 2), 
      col = "purple", lwd = 3, lty = 2)
lines(seq(-5, x.y[5], length.out = 2), rep(pnorm(x.y[5]), 2), 
      col = "green", lwd = 3, lty = 2)
lines(rep(x.y[5], 2), seq(0, pnorm(x.y[5]), length.out = 2), 
      col = "green", lwd = 3, lty = 2)
points(x.y[3], 0.06, pch = 19, col = "purple", cex = 2)
points(x.y[5], 0.15, pch = 19, col = "green", cex = 2)
legend("topleft", legend=c("pd(5years)=15%: N(-1.036433)=0.15",
                           "pd(3years)=6%: N(-1.554774)=0.06"),
pch = 19, col = c("green", "purple"), bg = "white", cex = 1, bty = "n")

Figure 3.8: Gaussian probability distribution function: Zoom version.

Nice.

3.3 One firm.

Now let’s introduce a simulation approach. This is, instead of generating continuous values of \(x\) from \(-4\) to \(4\), we simulate many \(x\) values (10,000 in this case) that follow a standard normal distribution function using the rnorm() function. The simulation approach is useful especially when we are interested in replicating what happens in real-life situations because we can artificially replicate the observed behavior many times and understand what may happen in the future. In other words, x was used before to characterize a perfect normal distribution. Now, we incorporate x.sim that behaves as a normal distribution and allows some randomness just as in real life situations.

See the difference between both approaches to generate \(x\).

Code
N <- 10000 # Number of simulated values.
set.seed(130575) # Reproducibility.
x.sim <- rnorm(N, 0, 1) # Simulation.
kable(head(cbind(x.sim, x)))
x.sim x
-1.7939058 -4.000
-0.7515449 -3.999
-0.3569900 -3.998
0.9629299 -3.997
0.3680735 -3.996
1.0714250 -3.995

Note that the probabilities of default per maturity in the simulated approach are close to the values of the previous section. In particular, \(0.01\) is equivalent to \(0.0102\), and \(0.03\) is equivalent to \(0.0307\). They do not match exactly simply because we are comparing theoretical versus simulated probabilities.

Code
# Function to calculate proportions that we understand as probabilities.
prop <- function(x) {
  ans <- length(x.sim[x.sim <= x]) / N
  }
pd.sim <- mapply(prop, x.y) # Apply the function.
ans <- data.frame(x.y, pd, pd.sim)
colnames(ans) <- c("x", "pd.theo", "pd.sim")
rownames(ans) <- c("y1", "y2", "y3", "y4", "y5")
kable(ans, caption = "Theoretic versus simulated probabilities of default.")
Theoretic versus simulated probabilities of default.
x pd.theo pd.sim
y1 -2.326348 0.01 0.0102
y2 -1.880794 0.03 0.0307
y3 -1.554774 0.06 0.0629
y4 -1.281552 0.10 0.1008
y5 -1.036433 0.15 0.1480

Let’s view the results of the simulated approach. First in a histogram.

Code
# Some parameters we need to plot.
L <- c(-4, 4) # axis limits.
colors2 <- c("blue", "red", "purple", "pink", "green")
legend2 = c("pd(1year)=1.02%: x<=-2.326348", 
"pd(2years)=3.07%: x<=-1.880794", "pd(3years)=6.29%: x<=-1.554774",
"pd(4years)=10.08%: x<=-1.281552", "pd(5years)=14.8%: x<=-1.036433")
# The histogram.
hist(x.sim, 500, xlim = L, ylim = c(0, 100), main = NULL, xlab = "x.sim")
abline(h = 0, lty = 2)
abline(v = x.y[1], lwd = 3, col = "blue")
abline(v = x.y[2], lwd = 3, col = "red")
abline(v = x.y[3], lwd = 3, col = "purple")
abline(v = x.y[4], lwd = 3, col = "pink")
abline(v = x.y[5], lwd = 3, col = "green")
legend("topright", legend = legend2, bg = "transparent", 
       text.col = colors2, cex = 0.9)

Figure 3.9: Simulated Gaussian probability distribution function. Somewhat different with respect to the theoretical.

The area at the left hand side of each colored line represents the cumulative probability of default just as we explained before. In the same way, the area between two colored lines represents the probability of default in a specific period of time.

Now let’s see all the simulated data at once.

Code
plot(x.sim, ylab = "x.sim", pch = ".", ylim = c(-4, 4),
     xlab = "Number of simulaltions")
abline(h = x.y[1], lwd = 2, col = "blue")
abline(h = x.y[2], lwd = 2, col = "red")
abline(h = x.y[3], lwd = 2, col = "purple")
abline(h = x.y[4], lwd = 2, col = "pink")
abline(h = x.y[5], lwd = 2, col = "green")
abline(v = 0, lty = 2)
abline(v = 10000, lty = 2)
# legend("topright", legend = legend2, bg = "white", 
#        text.col = colors2, cex = 0.9)

Figure 3.10: An alternative view.

We normally conduct a simulation approach because we might adapt the distribution function parameters to match what we see in the real life situations. This is how we can reproduce what may happen in the future and at the same time allow for some randomness.

In a different context, here we have an old video in which we drop 2,000 blue balls. Balls are dropped randomly given a normal distribution. Note how the bell-shape distribution is formed as we increase the number of balls.

Code
embed_url("https://youtu.be/P1Ky6-a8DE0")

3.4 Two firms.

Consider the case in which we have two firms instead of one. The main difference now is that instead of simulating one firm we need two. Moreover, each firm follows a standard normal distribution function and both of them are correlated by a given correlation value so the firms are not independent. If they are not independent, then what happens to one firm has some impact on what happens to the other.

In a different context, here we have an old video in which we drop many blue balls. Balls are dropped randomly given a multi-variate normal distribution. Note how the 3-D bell-shape distribution is formed as we increase the number of balls.

Code
embed_url("https://youtu.be/5iv5Zksf_zo")

The case of two firms is not the one presented in Hull (2015) example but it can help us to visualize how the Gaussian copula approach works in a two-dimension plot where the default correlation is 0.2.

The simulation of both firms’ performance measures is x2.

Code
m <- 2 # number of firms
n <- 10000 # number of simulations
rho_pos02 <- 0.2 # correlation
corr_pos02 <- matrix(rep(rho_pos02, m * m), m, m) # correlation matrix
diag(corr_pos02) <- 1
set.seed(130575)
x2 <- mvrnorm(n, mu = rep(0, m), Sigma = corr_pos02)
x2 <- data.frame(x2)
colnames(x2) <- c("f1", "f2")

The matrix x2 length is 10,000 for each firm. In other words, we have 10,000 observations of the performance measure or \(z\)-score for two firms that are related. This matrix is big, but we can visualize the header (the first six observations).

Code
kable(head(x2), caption = "Firms' performance x2.", row.names = TRUE)
Firms’ performance x2.
f1 f2
1 0.0880039 -2.8671108
2 0.0092849 -1.1735732
3 0.0323809 -0.5854275
4 1.5547886 -0.0630241
5 0.9679767 -0.3977597
6 0.7751329 0.8847116

Remember the cumulative probabilities of default are 1%, 3%, 6%, 10% and 15% for the maturities of 1, 2, 3, 4 and 5 years respectively. How do we extract those cases in which both firms will default in 5 years? In rows 15, 53, 61 and so on both firms default at the same time. Note that in all cases the \(x\) values are indeed below -1.036433.

Code
# These names are going to be useful later.
n.year <- c("year 1", "year 2", "year 3", "year 4", "year 5")
n.pd <- c("pd.y1", "pd.y2", "pd.y3", "pd.y4", "pd.y5")
n.f <- c("f1", "f2", "f1 default?", "f2 default?")

# Function to calculate cases in which firms default and probabilities.
fun.X <- function(x) {
  both <- x2[x2$f2 < x & x2$f1 < x, ] # both default.
  any <- x2[x2$f2 < x | x2$f1 < x, ] # at least one firm default.
  onlyf1 <- x2[x2$f2 > x & x2$f1 < x, ] # only firm 1 default.
  onlyf2 <- x2[x2$f1 > x & x2$f2 < x, ] # only firm 2 default.
  one <- x2[(x2$f2 < x & x2$f1 > x | # only one firm default.
                   x2$f2 > x & x2$f1 < x),]
  none <- x2[x2$f2 > x & x2$f1 > x, ] # no firm default.
# Gather results and probabilities in a list.
ans <- list(both = both, any = any, onlyf1 = onlyf1,
  onlyf2 = onlyf2, one = one, none = none,
  both.pd = (nrow(both) / n), any.pd = (nrow(any) / n),
  onlyf1.pd = (nrow(onlyf1) / n), onlyf2.pd = (nrow(onlyf2)/ n),
  one.pd = (nrow(one) / n), none.pd = (nrow(none) / n))
}
# X has all the relevant results for x2.
X <- mapply(fun.X, x.y)

See the cases in which both firms default in 5 years.

Code
# Extract "both" cases, year 5.
both <- data.frame(X[["both", 5]], X[["both", 5]] < x.y[5])
colnames(both) <- n.f
kable(head(both), caption = "Cases in which both firms default in 5 years.", 
      row.names = TRUE)
Cases in which both firms default in 5 years.
f1 f2 f1 default? f2 default?
15 -1.123330 -1.783569 TRUE TRUE
53 -2.724912 -1.235614 TRUE TRUE
61 -1.968789 -1.516563 TRUE TRUE
108 -2.121617 -1.807062 TRUE TRUE
156 -1.559930 -1.132879 TRUE TRUE
157 -1.194204 -2.120824 TRUE TRUE

In total, we have 343 cases in which both firms default at the same time. The first case is number 15, the second 53, the third 61 and so on. It is easy to know the total cases if we count the number of rows.

How do we extract those cases in which any firm will default in 5 years? This is, only firm 1, only firm 2, or both at the same time. This is a less strict condition so we would expect to have more cases to match this new criteria compared with both. Note that in all cases at least one one firm is indeed below -1.036433. In row 1, 2 and 10 firm 2 defaults. In row 15 both firms default. In row 18 and 20 firm 1 and firm 2 default respectively.

Code
any <- data.frame(X[["any", 5]], X[["any", 5]] < x.y[5])
colnames(any) <- n.f
kable(head(any), caption = "Cases in which at least one firm default.", 
      row.names = TRUE)
Cases in which at least one firm default.
f1 f2 f1 default? f2 default?
1 0.0880039 -2.8671108 FALSE TRUE
2 0.0092849 -1.1735732 FALSE TRUE
10 1.4475216 -1.1512274 FALSE TRUE
15 -1.1233296 -1.7835692 TRUE TRUE
18 -1.5829965 -0.7174298 TRUE FALSE
20 -0.9221666 -1.9982869 FALSE TRUE

Now we have 2,660 cases. Considerably more as the | restriction is less strict than the &. Let’s see the cases in which only firm 1 defaults.

Code
onlyf1 <- data.frame(X[["onlyf1", 5]], X[["onlyf1", 5]] < x.y[5])
colnames(onlyf1) <- n.f
kable(head(onlyf1), caption = "Cases in which only firm 1 default.")
Cases in which only firm 1 default.
f1 f2 f1 default? f2 default?
18 -1.582996 -0.7174298 TRUE FALSE
34 -2.842355 1.5400020 TRUE FALSE
35 -2.090320 0.7431275 TRUE FALSE
36 -1.600142 0.3082653 TRUE FALSE
41 -2.193496 -0.8465008 TRUE FALSE
49 -1.764711 -0.3914202 TRUE FALSE

Only firm 2 defaults.

Code
onlyf2 <- data.frame(X[["onlyf2", 5]], X[["onlyf2", 5]] < x.y[5])
colnames(onlyf2) <- n.f
kable(head(onlyf2), caption = "Cases in which only firm 2 default.")
Cases in which only firm 2 default.
f1 f2 f1 default? f2 default?
1 0.0880039 -2.867111 FALSE TRUE
2 0.0092849 -1.173573 FALSE TRUE
10 1.4475216 -1.151227 FALSE TRUE
20 -0.9221666 -1.998287 FALSE TRUE
22 -0.1110555 -1.348605 FALSE TRUE
40 0.1058949 -1.752620 FALSE TRUE

Only one firm default at year 5.

Code
one <- data.frame(X[["one", 5]], X[["one", 5]] < x.y[5])
colnames(one) <- n.f
kable(head(one), caption = "Cases in which only one firm default.")
Cases in which only one firm default.
f1 f2 f1 default? f2 default?
1 0.0880039 -2.8671108 FALSE TRUE
2 0.0092849 -1.1735732 FALSE TRUE
10 1.4475216 -1.1512274 FALSE TRUE
18 -1.5829965 -0.7174298 TRUE FALSE
20 -0.9221666 -1.9982869 FALSE TRUE
22 -0.1110555 -1.3486051 FALSE TRUE

Lastly, when no firm defaults.

Code
none <- data.frame(X[["none", 5]], X[["none", 5]] < x.y[5])
colnames(none) <- n.f
kable(head(none), caption = "Cases in which no firm default.")
Cases in which no firm default.
f1 f2 f1 default? f2 default?
3 0.0323809 -0.5854275 FALSE FALSE
4 1.5547886 -0.0630241 FALSE FALSE
5 0.9679767 -0.3977597 FALSE FALSE
6 0.7751329 0.8847116 FALSE FALSE
7 1.3361498 1.0392145 FALSE FALSE
8 1.3039572 1.5191303 FALSE FALSE

Finally, probabilities.

Code
both.pd <- t(data.frame(X["both.pd",]))
any.pd <- t(data.frame(X["any.pd",]))
onlyf1.pd <- t(data.frame(X["onlyf1.pd",]))
onlyf2.pd <- t(data.frame(X["onlyf2.pd",]))
one.pd <- t(data.frame(X["one.pd",]))
none.pd <- t(data.frame(X["none.pd",]))
ans <- data.frame(both.pd, any.pd, onlyf1.pd, onlyf2.pd, 
                 one.pd, none.pd)
rownames(ans) <- n.year
kable(ans, caption = "Probabilities of default.")
Probabilities of default.
both.pd any.pd onlyf1.pd onlyf2.pd one.pd none.pd
year 1 0.0003 0.0192 0.0103 0.0086 0.0189 0.9808
year 2 0.0019 0.0560 0.0282 0.0259 0.0541 0.9440
year 3 0.0066 0.1141 0.0561 0.0514 0.1075 0.8859
year 4 0.0176 0.1854 0.0864 0.0814 0.1678 0.8146
year 5 0.0343 0.2660 0.1210 0.1107 0.2317 0.7340

So interesting.

Note that 15% is the theoretical probability that one firm will default in 5 years. Here, this 15% is 12.1% for firm 1 and 11.07% for firm 2 when the data is simulated.

We can perform a simple test to verify that everything is alright. For example, this equation must hold: onlyf1+onlyf2=one. Substituting for the year 5: \(0.121+0.1107=0.2317\). As you can see, everything is alright. This equation must hold as well: any-both=one. Substituting for the year 5: \(0.266-0.0343=0.2317\).

Let’s visualize all 10,000 cases. Each dot represents a couple of Firm 1 and Firm 2 \(x\) values and the dotted lines the threshold that represents the probability of default in 5 years.

Code
par(pty = "s") # Figures are shown in a perfect square (not a rectangle).
plot(x2, pch = ".", cex = 1) 
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("bottomright", legend = c(paste(nrow(x2))), bty = "n")

Figure 3.11: All 10,000 simulated cases.

These 10,000 observations are highly concentrated around the mean which is very close to zero. This can be also easily seen in the following density plot.

Code
ggplot(x2, aes(x = f1, y = f2) ) +
  stat_density_2d(aes(fill = ..level..), geom = "polygon", 
                  colour = "white") +
    coord_fixed()

Figure 3.12: All 10,000 simulated cases: A density view.

This is the 3-D rayshader video version of the plot above.

Code
embed_url("https://youtu.be/3p-B19R8RwA")

And a third interactive version.

Code
f <- function(x, y) dmnorm(cbind(x, y), c(0, 0), corr_pos02)
z <- outer(sort(x2[1:100,1]), sort(x2[1:100,2]), f)

#create contour plot
#contour(x2[1:100,1], x2[1:100,2], z)
plot_ly(type = "surface" , x = sort(x2[1:100,2]), 
        y = sort(x2[1:100,1]) , z = z ) %>%
layout(#title = "Surface",
       scene = list(xaxis = list(title = "f1", range = c(-2,2)),
                    yaxis = list(title = "f2", range = c(-2,2)), 
                    zaxis = list(title = "Density"))) %>%
hide_colorbar()
Figure 3.13: An interactive view.

We can visualize the default cases. First, the case when both firms default at year 5.

Code
par(pty = "s")
plot(X[["both", 5]], xlim  = L, ylim = L, pch = ".", cex = 1) 
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(both.pd[5]*n)), bty = "n")

Figure 3.14: Both firms default at year 5.

The values within the plot represent the number of cases. Here, we have 343 times (out of 10,000) in which both firms default at the same time in 5 years. Note that this is a cumulative probability of default.

Now, the case in which any firm defaults in 5 years.

Code
par(pty = "s")
plot(X[["any", 5]], xlim = L, ylim = L, pch = ".", cex = 1)
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(any.pd[5]*n)), bty = "n")

Figure 3.15: Any firm defaults in 5 years.

Only firm 1 defaults in 5 years.

Code
par(pty = "s")
plot(X[["onlyf1", 5]], xlim = L, ylim = L, pch = ".", cex = 1)
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(onlyf1.pd[5]*n)), bty = "n")

Figure 3.16: Only firm 1 defaults in 5 years.

Only firm 2 defaults in 5 years.

Code
par(pty = "s")
plot(X[["onlyf2", 5]], xlim = L, ylim = L, pch = ".", cex = 1)
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(onlyf2.pd[5]*n)), bty = "n")

Figure 3.17: Only firm 2 defaults in 5 years.

This is the case in which only one firm defaults.

Code
par(pty = "s")
plot(X[["one", 5]], xlim = L, ylim = L, pch = ".", cex = 1)
abline(h = x.y[5], lty =2)
abline(v = x.y[5], lty =2)
legend("topright", legend = c(paste(one.pd[5]*n)), bty = "n")

Figure 3.18: Only one firm defaults in 5 years.

This is the case in which no one firm defaults.

Code
par(pty = "s")
plot(X[["none", 5]], xlim = L, ylim = L, pch = ".", cex = 1)
abline(h = x.y[5], lty =2)
abline(v = x.y[5], lty =2)
legend("topright", legend = c(paste(none.pd[5]*n)), bty = "n")

Figure 3.19: No firm defaults in 5 years.

It is a good idea to summarize all previous plots in one.

Code
# none
par(mfrow = c(2, 3), oma = c(0, 0, 2, 0))
par(pty = "s")
plot(X[["none", 5]], xlim = L, ylim = L, pch = ".", cex = 1, 
     main = "None.") 
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("bottomright", legend = c(paste(none.pd[5]*n)), bty = "n")
# both
par(pty = "s")
plot(X[["both", 5]], xlim = L, ylim = L, pch = ".", cex = 1, 
     main = "Both.") 
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(both.pd[5]*n)), bty = "n")
# any
par(pty = "s")
plot(X[["any", 5]], xlim = L, ylim = L, pch = ".", cex = 1, 
     main = "Any.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(any.pd[5]*n)), bty = "n")
# onlyfirm1
par(pty = "s")
plot(X[["onlyf1", 5]], xlim = L, ylim = L, pch = ".", cex = 1, 
     main = "Only f1.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(onlyf1.pd[5]*n)), bty = "n")
# onlyfirm2
par(pty = "s")
plot(X[["onlyf2", 5]], xlim = L, ylim = L, pch = ".", cex = 1, 
     main = "Only f2.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(onlyf2.pd[5]*n)), bty = "n")
# one
par(pty = "s")
plot(X[["one", 5]], xlim = L, ylim = L, pch = ".", cex = 1, 
     main = "One.")
abline(h = x.y[5], lty = 2)
abline(v = x.y[5], lty = 2)
legend("topright", legend = c(paste(one.pd[5]*n)), bty = "n")

Figure 3.20: Which firm defaults at year 5?

It is interesting to compare two different correlation values. Here, we compare 0 versus 0.2.

Code
m <- 2 # number of firms
n <- 10000 # number of simulations
rho_pos02 <- 0 # correlation
corr_pos02 <- matrix(rep(rho_pos02, m * m), m, m) # correlation matrix
diag(corr_pos02) <- 1
set.seed(130575)
X0 <- mvrnorm(n, mu = rep(0, m), Sigma = corr_pos02)
X0 <- data.frame(X0)
colnames(X0) <- c("f1", "f2")

Just as before, we create a function, evaluate it, and store results in X0.

Code
fun.X0 <- function(x) {
  both <- X0[X0$f2 < x & X0$f1 < x, ]
  any <- X0[X0$f2 < x | X0$f1 < x, ]
  onlyf1 <- X0[X0$f2 > x & X0$f1 < x, ]
  onlyf2 <- X0[X0$f1 > x & X0$f2 < x, ]
  one <- X0[(X0$f2 < x & X0$f1 > x |
                   X0$f2 > x & X0$f1 < x),]
  none <- X0[X0$f2 > x & X0$f1 > x, ]

ans <- list(both = both, any = any, onlyf1 = onlyf1,
  onlyf2 = onlyf2, one = one, none = none,
  both.pd = (nrow(both) / n), any.pd = (nrow(any) / n),
  onlyf1.pd = (nrow(onlyf1) / n), onlyf2.pd = (nrow(onlyf2) /n),
  one.pd = (nrow(one) / n), none.pd = (nrow(none) / n)) }

X0 <- mapply(fun.X0, x.y)

none.pd0 <- data.frame(X0["none.pd",]) * n
both.pd0 <- data.frame(X0["both.pd",]) * n

onlyf1.pd0 <- data.frame(X0["onlyf1.pd",]) * n
onlyf2.pd0 <- data.frame(X0["onlyf2.pd",]) * n

A graphical analysis shows that in the case of 0.2 it is more likely that both firms default at the same time, and it is less likely that any firm default at the same time.

Code
par(mfrow=c(1, 2), oma = c(0, 0, 2, 0))
par(pty = "s")
plot(X[["none", 5]], pch = ".", xlim = L, ylim = L, 
     cex = 1, main = "Correlation = 0.2") 
points(X[["both", 5]], pch = ".", col = "red")
points(X[["onlyf1", 5]], pch = ".", col = "purple")
points(X[["onlyf2", 5]], pch = ".", col = "blue")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("bottomleft", legend = c(paste(both.pd[5]*n)), bty = "n")
legend("topright", legend = c(paste(none.pd[5]*n)), bty = "n")
legend("topleft", legend = c(paste(onlyf1.pd[5]*n)), bty = "n")
legend("bottomright", legend = c(paste(onlyf2.pd[5]*n)), bty = "n")
par(pty = "s")
plot(X0[["none", 5]], pch = ".", xlim = L, ylim = L,
     cex = 1, main = "Correlation = 0.") 
points(X0[["both", 5]], pch = ".", col = "red")
points(X0[["onlyf1", 5]], pch = ".", col = "purple")
points(X0[["onlyf2", 5]], pch = ".", col = "blue")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("bottomleft", legend = c(paste(both.pd0[5])), bty = "n")
legend("topright", legend = c(paste(none.pd0[5])), bty = "n")
legend("topleft", legend = c(paste(onlyf1.pd0[5])), bty = "n")
legend("bottomright", legend = c(paste(onlyf2.pd0[5])), bty = "n")
par(pty = "s")

Figure 3.21: Cases per quadrant. Dotted lines corresponds to year 5.

Very interesting indeed.

The Gaussian copula model is not the only copula model. See how other distributions work.

t-coupla:

Code
embed_url("https://youtu.be/XZPVnanmEtw")

Gumbel coupla:

Code
embed_url("https://youtu.be/Oq27fu86ahU")

Frank coupla:

Code
embed_url("https://youtu.be/4OOF_FTrZ3w")

Clayton coupla:

Code
embed_url("https://youtu.be/vtzkgnWjFQ4")

3.5 Ten firms.

The original Hull (2015) example proposes a 10 firm case and here we implement this example following a simulation approach. First, we need a \(10\times10\) correlation matrix to produce the new \(x\) values using a random multi-variate distribution algorithm. According to Hull (2015) example the copula default correlations between each pair of companies is 0.2. The code below has the option to vary the default correlation given a uniform random distribution.

The new \(10\times10\) correlation matrix is then:

Code
# Create the correlation matrix.
m <- 10 # number of firms.
n <- 1000000 # number of simulations per firm.
x <- matrix(rep(0.2, m * m), m, m) 
ind <- lower.tri(x) 
x[ind] <- t(x)[ind] 
diag(x) = 1
kable(x, caption = "Correlation matrix 0.2.")
Correlation matrix 0.2.
1.0 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
0.2 1.0 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
0.2 0.2 1.0 0.2 0.2 0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 1.0 0.2 0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 1.0 0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2 1.0 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2 0.2 1.0 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2 0.2 0.2 1.0 0.2 0.2
0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 1.0 0.2
0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 1.0

Now, we can simulate the multivariate normal distribution. The variable X10 length is 1,000,000 for each firm. We choose a higher number of simulations because otherwise the most restrictive constraints would not lead to any observations.

Code
# Create the simulated cases.
set.seed(130575) # Reproducibility.
X10 <- mvrnorm(n, mu = rep(0, m), Sigma = x)
X10 <- data.frame(X10) # 10,000,000 observations.
colnames(X10) <- c("f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10")
kable(head(X10), caption = "10 firms' performance, 1,000,000 simulations.",
      digits = 3)
10 firms’ performance, 1,000,000 simulations.
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
0.449 2.362 1.716 -0.350 0.807 1.578 -0.085 1.295 0.634 1.086
0.795 -2.315 0.722 -0.607 -0.514 0.540 1.620 2.240 0.310 1.184
0.513 0.318 0.256 -0.146 0.668 1.071 0.929 -1.511 1.701 -1.908
0.643 -1.120 -0.818 0.398 -0.985 -2.094 0.412 -0.001 -0.699 -0.831
-1.404 -0.385 0.916 1.768 0.785 0.425 -0.873 -1.025 -0.164 -1.990
0.038 -0.818 0.402 -1.354 -0.153 -0.830 -1.523 -0.856 -0.433 -0.142

How do we extract those cases in which all 10 firms will default in 5 years (at the same time)? Here are the first 6 of those cases. Note that in all cases the X10 values are lower than \(-1.036433\).

Code
# Given that we have 10 firms, it is easier to use filter_all function.
# Although probably this could be simplified even further.
y5.all.2 <- filter_all(X10, all_vars(. < x.y[5]))
y4.all.2 <- filter_all(X10, all_vars(. < x.y[4])) 
y3.all.2 <- filter_all(X10, all_vars(. < x.y[3])) 
y2.all.2 <- filter_all(X10, all_vars(. < x.y[2])) 
y1.all.2 <- filter_all(X10, all_vars(. < x.y[1])) 

Let’s analyze the case of 5 years.

Code
kable(head(y5.all.2), 
      caption = "Cases in which all 10 firms default in five years.", 
      digits = 3)
Cases in which all 10 firms default in five years.
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
-1.625 -1.198 -1.740 -2.203 -1.489 -3.152 -1.395 -1.409 -2.809 -2.027
-1.410 -3.567 -1.849 -2.273 -1.336 -1.638 -1.289 -2.135 -2.671 -1.675
-3.617 -1.495 -1.210 -1.367 -3.220 -1.481 -1.784 -1.891 -1.443 -1.375
-2.624 -1.162 -2.479 -2.613 -1.929 -1.734 -3.395 -2.635 -2.289 -1.593
-1.834 -1.063 -1.888 -1.390 -3.494 -1.280 -2.108 -1.664 -4.007 -2.058
-2.083 -1.800 -2.085 -2.775 -1.437 -2.610 -1.215 -1.739 -1.219 -2.841
Code
kable((head(y5.all.2) < x.y[5]), caption = "Check if all of them default.")
Check if all of them default.
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

How many cases are there?

Code
nrow(y5.all.2)
[1] 72

Which are those 72 cases?

Code
# Here, I compare only firm 1 as if firm 1 defaults, then the rest default.
kable(matrix(which(X10[,1] %in% y5.all.2[,1]), 6, 9), 
      caption = "Which of the 1,000,000 cases represent a default of all 
      firms in five years?")
Which of the 1,000,000 cases represent a default of all firms in five years?
11119 92221 252520 350035 402689 496478 574643 633402 701255
18645 131986 266761 354133 413425 511731 576215 643935 708339
45042 165394 284189 363067 423852 522630 585654 644895 727092
57098 201832 284680 374237 437629 523010 615497 647511 747221
57696 227833 302542 384869 438202 527750 630487 650216 789221
81982 248381 328106 387617 477385 560115 630626 688847 797265

In total, we only have 72 cases. This is, the 10 firms will default at the same time in 5 years in 72 out of 1,000,000 total cases. For the rest of the years, the cases are less frequent, in fact we have zero cases for year 1, 2 and 3.

How do we extract those cases in which at least one of the 10 firms will default?

Code
y5.any.2 <- filter_all(X10, any_vars(. < x.y[5]))
y4.any.2 <- filter_all(X10, any_vars(. < x.y[4]))
y3.any.2 <- filter_all(X10, any_vars(. < x.y[3]))
y2.any.2 <- filter_all(X10, any_vars(. < x.y[2]))
y1.any.2 <- filter_all(X10, any_vars(. < x.y[1]))

Here are the first 6 cases for the 5-year default.

Code
kable(head(y5.any.2), caption = "At least one firm default in five years.",
      digits = 3)
At least one firm default in five years.
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
0.795 -2.315 0.722 -0.607 -0.514 0.540 1.620 2.240 0.310 1.184
0.513 0.318 0.256 -0.146 0.668 1.071 0.929 -1.511 1.701 -1.908
0.643 -1.120 -0.818 0.398 -0.985 -2.094 0.412 -0.001 -0.699 -0.831
-1.404 -0.385 0.916 1.768 0.785 0.425 -0.873 -1.025 -0.164 -1.990
0.038 -0.818 0.402 -1.354 -0.153 -0.830 -1.523 -0.856 -0.433 -0.142
0.353 -0.045 -0.393 -1.914 -0.168 -0.439 -0.379 -0.742 -1.612 -2.775
Code
kable((head(y5.any.2) < x.y[5]), caption = "Check which one(s) default.")
Check which one(s) default.
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE
FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE
FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE

How many cases are these?

Code
nrow(y5.any.2)
[1] 682440

In total, we have 682,148 cases. This is, at least one of 10 firms will default in 5 years in 682,148 of 1,000,000 cases.

And how to convert them into probabilities?

Code
atleastone02 <- t(data.frame(nrow(y1.any.2), nrow(y2.any.2),
                nrow(y3.any.2), nrow(y4.any.2), nrow(y5.any.2)))
all02 <- t(data.frame(nrow(y1.all.2), nrow(y2.all.2),
                nrow(y3.all.2), nrow(y4.all.2), nrow(y5.all.2)))
res02 <- data.frame(all02 / n, atleastone02 / n)
rownames(res02) <- n.year
colnames(res02) <- c("All firms", "At least one")
kable(res02, caption = "Probabilities of default (10 firms, corr=0.2).")
Probabilities of default (10 firms, corr=0.2).
All firms At least one
year 1 0.0e+00 0.086835
year 2 0.0e+00 0.226086
year 3 0.0e+00 0.386264
year 4 1.0e-05 0.543684
year 5 7.2e-05 0.682440

Let’s see the difference when we assume a different correlation matrix. This case, the correlation vary randomly between 0.45 and 0.65.

Code
m <- 10 # number of firms
n <- 1000000 # number of simulations
set.seed(130575)
x <- matrix(runif(m * m, 0.45, 0.65), m, m) 
ind <- lower.tri(x) 
x[ind] <- t(x)[ind] 
diag(x) = 1
kable(x, caption = "Correlation between 0.45 and 0.65.", digits = 3)
Correlation between 0.45 and 0.65.
1.000 0.622 0.597 0.622 0.530 0.517 0.518 0.490 0.455 0.578
0.622 1.000 0.523 0.594 0.465 0.501 0.558 0.560 0.460 0.513
0.597 0.523 1.000 0.556 0.485 0.603 0.534 0.524 0.611 0.547
0.622 0.594 0.556 1.000 0.564 0.555 0.496 0.500 0.502 0.513
0.530 0.465 0.485 0.564 1.000 0.608 0.513 0.559 0.647 0.505
0.517 0.501 0.603 0.555 0.608 1.000 0.452 0.486 0.644 0.518
0.518 0.558 0.534 0.496 0.513 0.452 1.000 0.577 0.635 0.466
0.490 0.560 0.524 0.500 0.559 0.486 0.577 1.000 0.585 0.524
0.455 0.460 0.611 0.502 0.647 0.644 0.635 0.585 1.000 0.474
0.578 0.513 0.547 0.513 0.505 0.518 0.466 0.524 0.474 1.000

The resulting values of Xr are:

Code
set.seed(130575)
Xr <- mvrnorm(n, mu = rep(0, m), Sigma = x)
Xr <- data.frame(Xr)
kable(head(Xr), caption = "Firms' performance.", digits = 3)
Firms’ performance.
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
0.673 1.507 0.776 1.038 1.997 0.557 1.759 1.578 2.529 1.326
-0.181 -0.887 -0.485 1.532 0.829 2.031 0.800 0.824 0.471 0.838
0.731 -0.826 1.679 -1.181 -0.034 -0.270 0.478 0.920 0.412 0.818
-0.852 -0.288 -0.362 -0.457 -1.596 0.005 -0.153 -1.029 -1.703 -0.914
0.692 -0.689 0.906 -1.140 -0.080 0.082 -0.968 -2.028 0.089 0.256
-1.123 -0.810 -1.242 -1.438 -0.850 -0.394 -0.601 -0.458 0.080 -1.393

Extract the cases in which all ten firms default at the same time in 5 years and the cases in which any firm default at the same time in 5 years.

Code
Xr <- as_tibble(Xr)
# All firms.
Xr.y1.all <- filter_all(Xr, all_vars(. < x.y[1])) 
Xr.y2.all <- filter_all(Xr, all_vars(. < x.y[2])) 
Xr.y3.all <- filter_all(Xr, all_vars(. < x.y[3])) 
Xr.y4.all <- filter_all(Xr, all_vars(. < x.y[4]))
Xr.y5.all <- filter_all(Xr, all_vars(. < x.y[5])) 
# At least one firm.
Xr.y1.any <- filter_all(Xr, any_vars(. < x.y[1]))
Xr.y2.any <- filter_all(Xr, any_vars(. < x.y[2]))
Xr.y3.any <- filter_all(Xr, any_vars(. < x.y[3]))
Xr.y4.any <- filter_all(Xr, any_vars(. < x.y[4]))
Xr.y5.any <- filter_all(Xr, any_vars(. < x.y[5]))

All firms default in 5,935 cases out of 1,000,000.

Code
nrow(Xr.y5.all)
[1] 6144

Let’s see the first 6 cases:

Code
kable(head(Xr.y5.all), 
      caption = "Cases in which all 10 firms default in five years.", 
      digits = 3)
Cases in which all 10 firms default in five years.
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
-1.357 -1.241 -2.165 -1.486 -1.820 -1.181 -1.816 -1.146 -1.823 -2.462
-1.874 -1.751 -2.281 -1.390 -2.020 -2.189 -1.752 -2.580 -2.062 -2.731
-1.701 -2.546 -1.671 -1.895 -1.852 -1.603 -1.292 -2.384 -1.184 -1.592
-1.443 -1.452 -2.499 -1.561 -1.166 -2.392 -1.658 -1.369 -3.522 -2.372
-1.326 -2.036 -1.252 -1.729 -2.569 -1.650 -1.795 -1.235 -2.822 -1.722
-2.314 -2.703 -2.063 -3.262 -1.551 -2.172 -2.205 -3.088 -2.130 -1.908

Verify that those cases default.

Code
kable((head(Xr.y5.all) < x.y[5]), caption = "Check if all of them default.")
Check if all of them default.
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

At least one firm default in 497,987 out of 1,000,000.

Code
nrow(Xr.y5.any)
[1] 497513
Code
kable(head(Xr.y5.any), caption = "At least one firm default in five years.",
      digits = 3)
At least one firm default in five years.
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
0.731 -0.826 1.679 -1.181 -0.034 -0.270 0.478 0.920 0.412 0.818
-0.852 -0.288 -0.362 -0.457 -1.596 0.005 -0.153 -1.029 -1.703 -0.914
0.692 -0.689 0.906 -1.140 -0.080 0.082 -0.968 -2.028 0.089 0.256
-1.123 -0.810 -1.242 -1.438 -0.850 -0.394 -0.601 -0.458 0.080 -1.393
-1.089 -1.470 -1.292 -2.389 -1.046 -1.960 0.653 -1.052 -1.027 -1.017
-1.280 -0.320 -2.717 -1.976 -1.970 -1.238 -0.891 -2.161 -0.682 -0.657
Code
kable((head(Xr.y5.any) < x.y[5]), caption = "Check which one(s) default.")
Check which one(s) default.
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE
TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE

And the corresponding probabilities:

Code
allR <- t(data.frame(nrow(Xr.y1.all), nrow(Xr.y2.all),
        nrow(Xr.y3.all), nrow(Xr.y4.all), nrow(Xr.y5.all)))
atleastoneR <- t(data.frame(nrow(Xr.y1.any), nrow(Xr.y2.any),
        nrow(Xr.y3.any), nrow(Xr.y4.any), nrow(Xr.y5.any)))
resR <- data.frame(allR / n, atleastoneR / n)
ans <- data.frame(res02, resR)
rownames(ans) <- n.year
colnames(ans) <- c("All (corr=0.2)", "At least one (corr=0.2)",
                    "All (corr=rand)", "At least one (corr=rand)")
kable(ans, caption = "Probability of default.", 
      row.names = TRUE)
Probability of default.
All (corr=0.2) At least one (corr=0.2) All (corr=rand) At least one (corr=rand)
year 1 0.0e+00 0.086835 0.000021 0.062784
year 2 0.0e+00 0.226086 0.000184 0.156169
year 3 0.0e+00 0.386264 0.000783 0.266124
year 4 1.0e-05 0.543684 0.002438 0.382602
year 5 7.2e-05 0.682440 0.006144 0.497513
Code
library(ggplot2)

4 Credit VaR example 24.7.

This replicates Hull (2015) example 24.7.

This is the Vasicek model (equation 24.10) in a function form.

Code
CVaR <- function(exp, pd, r, c, l) {
v <- pnorm((qnorm(pd) + (c^0.5) * qnorm(l)) / (1 - c)^0.5)
VaR <- exp * v * (1 - r)
}

See if it works. Let’s evaluate equation 24.10 at 99% and 99.9%. The 1-year 99% and 99.9% credit VaR is:

Code
CVaR.999 <- CVaR(100, 0.02, 0.6, 0.1, 0.999)
CVaR.99 <- CVaR(100, 0.02, 0.6, 0.1, 0.99)
CVaR.999
[1] 5.129484
Code
CVaR.99
[1] 3.294271

Let’s evaluate the model at all confidence levels (from 0 to 1) simulating 10,000 values.

Code
set.seed(13)
l <- runif(10000, 0, 1)
Loss <- CVaR(100, 0.02, 0.6, 0.1, l)
sim.var999 <- sort(Loss)[10000 * 0.999]
sim.var99 <- sort(Loss)[10000 * 0.99]

Now, visually:

Code
hist(Loss, 100, xlim = c(0, 7), xlab = "Losses in millions", main = NULL)
abline(v = CVaR.999, lty = 2, col = "red", lwd = 2)
abline(v = CVaR.99, lty = 2, col = "blue", lwd = 2)
legend("topright", legend = c("1-year 99% credit VaR is 3.294271", 
         "1-year 99.9% credit VaR is 5.129484"),
col = c("blue", "red"), lwd = 2, lty = 2, bg = "white", cex = 0.8)

Figure 4.1: Distribution of losses.
Code
dat <- data.frame(Loss, l)
ggplot(dat, aes(x = Loss, fill = "Losses in millions")) + 
  geom_density(color = "darkblue", fill = "lightblue") +
  geom_vline(aes(xintercept = CVaR.999), 
             color = "red", linetype = "dashed", size = 1) +
  geom_vline(aes(xintercept = CVaR.99 ),   
             color = "blue", linetype = "dashed", size = 1) +
  scale_x_continuous(labels = scales::dollar)

Figure 4.2: Distribution of losses with ggplot.

Nice.

This document took 44.41 seconds to compile in Quarto version 1.3.450, and R version 4.3.2 (2023-10-31 ucrt).

References.

Black, Fischer, and Myron Scholes. 1973. “The Pricing of Options and Corporate Liabilities.” Journal of Political Economy 81 (3): 637–54.
Brealey, Richard A, Stewart C Myers, Franklin Allen, and Pitabas Mohanty. 2020. Principles of Corporate Finance. 13th ed. McGraw-Hill Education.
Hull, John C. 2015. Options, Futures, and Other Derivatives. 9th ed. Prentice Hall.
———. 2020. Machine Learning in Business: An Introduction to the World of Data Science. Amazon Distribution.
Knuth, Donald Ervin. 1984. “Literate Programming.” The Computer Journal 27 (2): 97–111.
Merton, Robert C. 1973. “Theory of Rational Option Pricing.” The Bell Journal of Economics and Management Science, 141–83.
———. 1974. “On the Pricing of Corporate Debt: The Risk Structure of Interest Rates.” The Journal of Finance 29 (2): 449–70.
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.