Credit risk with R

Author

Dr. Martín Lozano.

Modified

December 29, 2024, 16:49:39 pm.

Abstract
This material relies on Hull (2015) credit risk chapters and Lore Dirick credit risk DataCamp course. Some mathematical background is skipped to emphasize the data analysis, model logic, discussion, graphical approach and R coding (R Core Team 2024). As in the philosophy of Knuth (1984), the objective of this document is to explain to human beings what we want a computer to do as literate programming. This is a work in progress and it is under revision.

Back to Quantitative Finance with R Icon

martin-lozano-21818a22 mailto:mlozanoqf@gmail.com martin-lozano-21818a22 https://www.youtube.com/channel/drmartinlozano https://www.youtube.com/channel/drmartinlozano zoom


Introduction.

Credit risk models play a pivotal role in the financial landscape, providing institutions with a systematic framework to assess and manage the risks associated with lending and investment activities. One crucial component of these models is the estimation of the probability of default, which quantifies the likelihood that a borrower will fail to meet their debt obligations. The relevance of credit risk models lies in their ability to enhance decision-making processes by offering a comprehensive understanding of the potential creditworthiness of individuals, companies, or financial instruments. By utilizing statistical methods, historical data, and various financial indicators, these models enable financial institutions to make informed decisions about lending, pricing, and portfolio management. The estimation of the probability of default not only aids in risk assessment but also supports regulatory compliance, allowing institutions to maintain a healthy balance between risk and return in their credit portfolios. In an ever-evolving financial landscape, the continual refinement and application of credit risk models are essential for fostering stability and resilience within the financial system.

1 Loan analysis.

The main objective of this section is to predict whether a person or client applying for a loan will repay it or not. This issue can be considered a classification problem, where based on information about the client and the characteristics of their loan application, the client is classified into one of two categories: whether they will default or not. There are various ways to approach this classification problem; in this section, we will use a simple machine learning model called logistic regression. The model is first estimated or trained and then evaluated using new data to determine how well it performs the classification.

Let’s load the required R packages first.

Code
# Logistic models
library(gmodels) # CrossTable()
library(ggplot2)
library(tidyr) # gather()
library(dplyr)
library(pROC) # roc
library(vembedr)

This section relies on the DataCamp course Credit Risk Modeling in R by Lore Dirick. However, we incorporate a slightly different database and an extended analysis.

1.1 Explore the database.

Let’s load the data called loan_data_ARF.rds and then understand its structure before conducting any further analysis. This database is available here.

Code
dat <- readRDS("loan_data_ARF.rds")
str(dat)
'data.frame':   29092 obs. of  10 variables:
 $ loan_status   : int  0 0 0 0 0 0 1 0 1 0 ...
 $ loan_amnt     : int  5000 2400 10000 5000 3000 12000 9000 3000 10000 1000 ...
 $ int_rate      : num  10.6 11 13.5 11 11 ...
 $ grade         : Factor w/ 7 levels "A","B","C","D",..: 2 3 3 1 5 2 3 2 2 4 ...
 $ emp_length    : int  10 25 13 3 9 11 0 3 3 0 ...
 $ home_ownership: Factor w/ 4 levels "MORTGAGE","OTHER",..: 4 4 4 4 4 3 4 4 4 4 ...
 $ annual_inc    : num  24000 12252 49200 36000 48000 ...
 $ age           : int  33 31 24 39 24 28 22 22 28 22 ...
 $ sex           : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 2 2 1 2 ...
 $ region        : Factor w/ 4 levels "E","N","S","W": 1 1 3 3 2 2 2 2 4 1 ...

This dataset could represent a typical collection of data from a financial institution, such as a bank or a company that uses credit channels to sell its products or services. It contains 29,092 observations across 10 variables. Each observation corresponds to the personal and loan characteristics of an individual loan.

A key variable, our dependent variable, is loan_st, which indicates loan status. A value of 0 represents no default, while a value of 1 represents default. A default occurs when a borrower fails to make timely payments, misses payments, or ceases payments on the interest or principal owed. The definition of default can vary depending on the goals and interests of the analysis. For the purposes of this study, we classify loans simply as either default or no default.

The variable loan_st is dichotomous, or categorical. Our primary interest lies in predicting whether a new loan application will result in default or not.

Clearly, loan_data_ARF.rds represents past information, as we know with certainty whether an individual defaulted (1) or not (0). Historical data is valuable for understanding how likely an individual is to default based on their personal and loan characteristics. It is also essential for training quantitative models to predict outcomes for new applicants and for evaluating the performance of our classifications.

A variable name is considered too long when a shorter name can convey the same purpose just as effectively. Let’s rename some of them.

Code
old_names <- colnames(dat)
colnames(dat) <- c("loan_st", "l_amnt", "int", "grade", "emp_len", 
                   "home", "income", "age", "sex", "region")
data.frame(old_names, "new_names" = colnames(dat))
        old_names new_names
1     loan_status   loan_st
2       loan_amnt    l_amnt
3        int_rate       int
4           grade     grade
5      emp_length   emp_len
6  home_ownership      home
7      annual_inc    income
8             age       age
9             sex       sex
10         region    region

New variable names have now been assigned.

We can take a look at the information in different ways. For example, look at the first 10 rows out of 29,092.

Code
head(dat, 10)
   loan_st l_amnt   int grade emp_len home income age sex region
1        0   5000 10.65     B      10 RENT  24000  33   0      E
2        0   2400 10.99     C      25 RENT  12252  31   0      E
3        0  10000 13.49     C      13 RENT  49200  24   0      S
4        0   5000 10.99     A       3 RENT  36000  39   0      S
5        0   3000 10.99     E       9 RENT  48000  24   1      N
6        0  12000 12.69     B      11  OWN  75000  28   1      N
7        1   9000 13.49     C       0 RENT  30000  22   1      N
8        0   3000  9.91     B       3 RENT  15000  22   1      N
9        1  10000 10.65     B       3 RENT 100000  28   0      W
10       0   1000 16.29     D       0 RENT  28000  22   1      E

Note that sex is 1 for female and 0 for male. Now, instead of looking the details of the first 10 rows, we can summarize with respect to home with the CrossTable() function.

Code
CrossTable(dat$home)

 
   Cell Contents
|-------------------------|
|                       N |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  29092 

 
          |  MORTGAGE |     OTHER |       OWN |      RENT | 
          |-----------|-----------|-----------|-----------|
          |     12002 |        97 |      2301 |     14692 | 
          |     0.413 |     0.003 |     0.079 |     0.505 | 
          |-----------|-----------|-----------|-----------|



 

We can also use CrossTable() to summarize two variables. In particular, instead of counting for home ownership we can add a second dimension like loan_st. This allows us to create more informative tables.

Code
CrossTable(dat$home, dat$loan_st, prop.r = TRUE,
           prop.c = FALSE, prop.t = FALSE, prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|-------------------------|

 
Total Observations in Table:  29092 

 
             | dat$loan_st 
    dat$home |         0 |         1 | Row Total | 
-------------|-----------|-----------|-----------|
    MORTGAGE |     10821 |      1181 |     12002 | 
             |     0.902 |     0.098 |     0.413 | 
-------------|-----------|-----------|-----------|
       OTHER |        80 |        17 |        97 | 
             |     0.825 |     0.175 |     0.003 | 
-------------|-----------|-----------|-----------|
         OWN |      2049 |       252 |      2301 | 
             |     0.890 |     0.110 |     0.079 | 
-------------|-----------|-----------|-----------|
        RENT |     12915 |      1777 |     14692 | 
             |     0.879 |     0.121 |     0.505 | 
-------------|-----------|-----------|-----------|
Column Total |     25865 |      3227 |     29092 | 
-------------|-----------|-----------|-----------|

 

This table reveals defaults by home ownership. We can use histograms to see one variable distribution. In this case we have the interest rate distribution.

Code
ggplot(dat, aes(x = int)) + 
  geom_histogram(aes(y=..density..), binwidth = 0.5, colour = "black", 
                 fill = "white") +
  labs(y = "Density", x = "Interest rate") +
  theme(legend.position = "bottom", legend.title = element_blank())
Figure 1.1: Interest rate histogram.

The following is a dotplot figure for the annual income.

Code
ggplot(dat, aes(income)) + 
  geom_dotplot(binwidth = 100000, fill = "blue") +
  labs(y = "Density",
       x = "Annual income") +
  theme(legend.position = "bottom", legend.title = element_blank())
Figure 1.2: Annual income dotplot.

The Figure 1.2 above appears suspicious. The horizontal axis shows a very large annual income value (6,000,000). Additionally, there are only a few individuals with extremely high incomes. We should investigate further to determine whether these are valid observations or errors in the original dataset.

Code
high_income <- dat[(dat$income > 1000000), ]
high_income
      loan_st l_amnt   int grade emp_len     home  income age sex region
4861        0  12025 14.27     C      13     RENT 1782000  63   0      E
13931       0  10000  6.54     A      16      OWN 1200000  36   0      W
15386       0   1500 10.99     A       5 MORTGAGE 1900000  60   1      N
16713       0  12000  7.51     A       1 MORTGAGE 1200000  32   0      E
19486       0   5000 12.73     C      12 MORTGAGE 6000000 144   1      E
22811       0  10000 10.99     A       1 MORTGAGE 1200000  40   1      E
23361       0   6400  7.40     A       7 MORTGAGE 1440000  44   1      E
23683       0   6600  7.74     A       9 MORTGAGE 1362000  47   0      E
28468       0   8450 12.29     C       0     RENT 2039784  42   0      E

One individual (ID 19486) is not only extremely wealthy but also 144 years old. As a result, I have decided to remove these nine observations. Data cleaning is a common task when working with large datasets, and it is acceptable as long as the integrity of the data remains intact.

Code
high_income_index <- data.frame(value = as.integer(rownames(high_income)))
dat <- dat[-high_income_index$value,]

Remember, the original dataset contains 29,092 rows. After removing 9 observations, we are left with 29,083 rows. Let’s take a look at the result.

Code
ggplot(dat, aes(income)) + 
  geom_dotplot(binwidth = 10000, fill = "blue") +
  labs(y = "Density", x = "Annual income") +
  theme(legend.position = "bottom", legend.title = element_blank())
Figure 1.3: Annual income without extreme values.

Somewhat better now.

We could also explore some gender issues.

Code
summary_table <- dat |>
  group_by(sex) |>
  summarize(
    Mean = mean(income),
    Median = median(income),
    Min = min(income),
    Max = max(income),
    Count = n()
  ) |>
  mutate(
    Gender = case_when(
      sex == 0 ~ "Male",
      sex == 1 ~ "Female"
    )
 )
  
summary_table
# A tibble: 2 × 7
  sex     Mean Median   Min    Max Count Gender
  <fct>  <dbl>  <dbl> <dbl>  <dbl> <int> <chr> 
1 0     66021.  56000  4000 900000 13877 Male  
2 1     67064.  57000  4200 948000 15206 Female

The table shows that females tend to have slightly higher average and median incomes compared to males. Additionally, the income distribution for females appears more variable, with a wider range of incomes. While the sample sizes for both groups are similar, females are slightly more represented. Overall, the data suggests a modest income advantage for females and greater variability in their earnings.

The better we understand the data, the better positioned we are to analyze it, gain deeper insights into the problem, and interpret the solutions more effectively. However, there is a risk of falling into the trap of overexploration, losing sight of the main problem and failing to address the objectives. Data cleaning, for instance, can be an ongoing task. Removing outliers with excessively high incomes is just one example of how data cleaning can be approached. For now, we pause the exploration and cleaning process and move on to the next section, where we estimate models that will help us achieve the outlined objectives.

1.2 Logistic models.

Logistic models allows us to make predictions about loan defaults. Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable like loan_st. In this case, the binary dependent variable is default (1) or no default (0). A good reference for this section is Hull (2020).

First, load the data and split it into two sets: (1) training and (2) test. The training set is for building and estimate models, and the test set is used to evaluate our model predictions with new data. When estimating models, it is common practice to separate the available data into two parts, training and test data, where the training data is used to estimate parameters (in-sample) and the test data is used to evaluate its accuracy (out of sample). Because the test data is not used in determining the estimation, it should provide a reliable indication of how well the model is likely to estimate or forecast on new data. In sum, we train the model, we test the model, and once we are OK with the model performance on new data, we are ready to use it in real-life applications. If we ignore this split and use the whole database to estimate our models, we may succeed at explaining defaults in our database but we may fail to explain defaults for new loan applications.

Code
# It is convenient to set the loan status as factor.
dat$loan_st <- as.factor(dat$loan_st)
set.seed(567)
index_train <- cbind(runif(1 : nrow(dat), 0 , 1), c(1 : nrow(dat)))
index_train <- order(index_train[, 1])
index_train <- index_train[1: (2/3 * nrow(dat))]
# Create training set
train <- dat[index_train, ]
# Create test set
test <- dat[-index_train, ]

Variables as factors are useful for model estimation and data visualization. Factors are variables in R which take on a limited number of different values; such variables are often referred to as categorical variables.

We have 29,083 observations in dat. The code above randomly selects \(29083 \times (2/3)=19388\) rows to form the train. The test are the remaining \(29083-19388=9695\) rows. The random selection is highly recommended as the dat may have some structure or sorting that could bias our model estimation and negatively impact our model test. For example, imagine that for some weird reason the database is sorted in such a way that the first observations are all no default. If that is the case, then the training and the test set would not have portions of default and no default cases and we may distort the whole analysis. The random selection allows us to replicate a real situation in which our database is unsorted, with different characteristics.

Let’s see if the recent created train and test have the same basic properties than dat.

Code
# Combine data into a list for processing
data_list <- list(dat = dat$loan_st, train = train$loan_st, test = test$loan_st)

# Compute proportions and combine into a data frame
prop <- do.call(rbind, lapply(data_list, function(x) prop.table(table(x))))
colnames(prop) <- c("no defaults", "defaults")
row.names(prop) <- c("dat_prop", "train_prop", "test_prop")
prop
           no defaults  defaults
dat_prop     0.8890417 0.1109583
train_prop   0.8882298 0.1117702
test_prop    0.8906653 0.1093347

The proportions of “defaults” and “no defaults” in the train and test datasets are very similar to those in the original dataset, indicating that both samples are representative. Specifically, the proportion of “defaults” in the train and test datasets closely matches that of the original dataset, as does the proportion of “no defaults.” This suggests that the random sampling process has preserved the overall distribution of the target variable, ensuring that the train and test sets reflect the characteristics of the full dataset.

Assume we think that the loan_st depends on the age of the individual. We can estimate a simple logistic model to learn about the relationship between age and loan status.

\(\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 \times \text{age}\).

Where:

  • \(p\) is the probability that loan_st = 1 (e.g., the probability of default).

  • \(\beta_0\) is the intercept, and \(\beta_1\) is the coefficient for the predictor variable age.

The dependent variable in logistic regression is expressed as: \(\log\left(\frac{p}{1-p}\right)\). This transformation, known as the logit function, maps probabilities \(p\) which range between 0 and 1 to the entire real number line \(-\infty\) to \(+\infty\). This allows the model to establish a linear relationship between the independent variable(s) (e.g., age) and the transformed dependent variable, making estimation feasible. The fraction \(\frac{p}{1-p}\), called the odds, represents the ratio of the probability of an event happening \(p\) to the probability of it not happening \(1-p\), making it a natural choice for modeling binary outcomes.

Let’s estimate the age model.

Code
# Fitting a simple logistic model.
logi_age <- glm(loan_st ~ age, family = "binomial", data = train)
logi_age

Call:  glm(formula = loan_st ~ age, family = "binomial", data = train)

Coefficients:
(Intercept)          age  
   -1.90097     -0.00623  

Degrees of Freedom: 19387 Total (i.e. Null);  19386 Residual
Null Deviance:      13580 
Residual Deviance: 13580    AIC: 13580

The estimated model is: \(\log\left(\frac{p}{1-p}\right) = -1.90097 - 0.00623 \times \text{age}\).

The model suggests there is a negative relationship between age and loan status.

We can solve for \(p\) to find the estimate of the probability of default according to the age model: \(p = \frac{\exp(-1.90097 - 0.00623 \times \text{age})}{1 + \exp(-1.90097 - 0.00623 \times \text{age})}\). Let’s evaluate \(p\) for a couple of age values.

Substituting age = 18: \(p = \frac{\exp(-1.90097 - 0.00623 \times 18)}{1 + \exp(-1.90097 - 0.00623 \times 18)}\),

\(p = \frac{\exp(-2.01311)}{1 + \exp(-2.01311)} \approx \frac{0.1333}{1 + 0.1333} \approx 0.1176\).

Substituting age = 60: \(p = \frac{\exp(-1.90097 - 0.00623 \times 60)}{1 + \exp(-1.90097 - 0.00623 \times 60)}\),

\(p = \frac{\exp(-2.27477)}{1 + \exp(-2.27477)} \approx \frac{0.1037}{1 + 0.1037} \approx 0.0939\).

The results of the default probability estimation evaluated at ages 18 and 60 are not very promising because the difference in probabilities is very small. This means that the model does not generate significantly differentiated default probabilities, even when evaluated over a wide age range. This can be problematic because it suggests that this is a poor model.

The following is a graphical representation of the logistic regression model’s output.

Code
# Define the function for the logistic regression model
logistic_function <- function(age) {
  beta0 <- -1.90097
  beta1 <- -0.00623
  p <- exp(beta0 + beta1 * age) / (1 + exp(beta0 + beta1 * age))
  return(p)
}

# Generate a sequence of ages
ages <- seq(-1000, 500, by = 1)

# Apply the logistic function to each age
p_values <- sapply(ages, logistic_function)

# Create the plot
library(ggplot2)
ggplot(data = data.frame(age = ages, p = p_values), 
       aes(x = age, y = p)) +
  geom_line(color = "blue", size = 2) +
  labs(x = "Age", y = "Probability (p)") +
  theme_minimal()
Figure 1.4: Sigmoid curve or logistic curve: age model.

The logistic curve derived from this model is unreasonable because the probability \(p\) changes very slowly with respect to age. This implies that, to observe the full range of probabilities from 0 to 1, we would require age values that are unrealistic or implausibly extreme. Such behavior suggests that the model is poorly specified, as it fails to generate meaningful distinctions in probabilities across a realistic range of ages.

The AIC value of the age model (13,580) is useful when comparing models. The Akaike information criterion (AIC) is a mathematical method for evaluating how well a model fits the data it was generated from. In statistics, AIC is used to compare different possible models and determine which one is the best fit for the data. At the moment we cannot interpret the AIC simply because we only have one model and we cannot compare it with another AIC.

Let’s estimate another simple model where the interest rate category is used as a predictor of the loan_st. Remember we are not conducting any prediction at all at this moment, we are only estimating models using the training set.

Code
# Build a glm model with variable interest rate as a predictor.
logi_int <- glm(formula = loan_st ~ int, family = "binomial", data = train)
# Print the parameter estimates.
logi_int

Call:  glm(formula = loan_st ~ int, family = "binomial", data = train)

Coefficients:
(Intercept)          int  
     -3.710        0.142  

Degrees of Freedom: 19387 Total (i.e. Null);  19386 Residual
Null Deviance:      13580 
Residual Deviance: 13210    AIC: 13220

The AIC is a lower (13,220 versus 13,580), so we have a better model now.

Using one single predictor as age or interest rate is clearly a limited approach. Let’s add some more predictors. Also, let’s introduce the summary() function to extract more information about the model estimation results. The logi_multi below assumes that the loan status depend on the age, interest rate, grade, loan amount, and annual income.

Code
# Multiple variables in a logistic regression model.
logi_multi <- glm(loan_st ~ age + int + grade + log(l_amnt) + 
                  log(income) , family = "binomial", data = train)
# Obtain significance levels using summary().
summary(logi_multi)

Call:
glm(formula = loan_st ~ age + int + grade + log(l_amnt) + log(income), 
    family = "binomial", data = train)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.996240   0.477911   4.177 2.95e-05 ***
age         -0.002302   0.003825  -0.602   0.5473    
int          0.038767   0.017249   2.247   0.0246 *  
gradeB       0.503409   0.087435   5.758 8.54e-09 ***
gradeC       0.748229   0.117765   6.354 2.10e-10 ***
gradeD       0.964343   0.147283   6.548 5.85e-11 ***
gradeE       1.033442   0.190817   5.416 6.10e-08 ***
gradeF       1.619470   0.257900   6.279 3.40e-10 ***
gradeG       1.867494   0.440232   4.242 2.21e-05 ***
log(l_amnt)  0.015718   0.036341   0.433   0.6654    
log(income) -0.470748   0.046423 -10.140  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 13579  on 19387  degrees of freedom
Residual deviance: 13028  on 19377  degrees of freedom
AIC: 13050

Number of Fisher Scoring iterations: 5

Our multi-factor model works well. In logi_multi, the AIC value is the lowest so far (13,050 versus 13,220), so this should be considered as the best in-sample model at the moment. The summary() function shows the significance levels of the estimators, but we are currently more interested in the goodness of fit of the models because we want to conduct predictions about the loan_st. This is, we are interested to use a model to find out whether new applicants in the test set are expected to default or not, rather than in the applicants’ credit risk factors. This is why we are concentrated in AIC now.

When a customer fill out a credit application form, we collect information but we do not know for sure whether she or he will eventually default. A credit risk model can help us in this task.

1.3 Prediction and model evaluation.

Let’s evaluate the predictive performance of our three models: logi_age, logi_int, and logi_multi, introduced in the previous subsection. We begin by selecting a single observation from the test set and asking each model to predict the loan_st (loan status) for this individual. Specifically, we take the age of the first person in the test set, use it as input for the logi_age model, and then compare the model’s predicted loan_st with the actual outcome recorded in the test set. Since the test set contains the true loan status for this individual, we can directly assess how well the model performed. Each model is expected to generate a potentially different prediction for the loan_st. A good model should produce a prediction that matches the actual outcome for this observation, indicating its reliability in capturing the underlying patterns in the data.

Let’s call the first individual John Doe for practical purposes.

Code
# Define one single observation in test_set.
John_Doe <- as.data.frame(test[1, ])
John_Doe
  loan_st l_amnt   int grade emp_len home income age sex region
1       0   5000 10.65     B      10 RENT  24000  33   0      E

We know in advance that the loan_st for this observation, taken from the test set, is 0. However, the models cannot know this because the test set was not used to estimate the logistic models. Instead, the models were trained using the training set. A good credit risk model should predict no default for this new applicant called John Doe.

The values of loan_st in the test set are either 0 or 1. However, the logistic models predict the loan_st as probability estimates within the range of 0 to 1, as illustrated in Figure 1.4. This means we would expect the estimated loan_st for John Doe to be close to 0. But how close? We will address this issue later.

This code predicts the loan status for John Doe using three logistic regression models. Each prediction represents the probability that John Doe defaults on a loan, based on the respective model.

Code
# List of models
models <- list("logi_age" = logi_age, "logi_int" = logi_int, "logi_multi" = logi_multi)

# Predict the loan status for John Doe using all models
pred_John <- sapply(models, predict, newdata = John_Doe, type = "response")

# Convert to a table with appropriate column name
pred_John <- as.data.frame(pred_John)
colnames(pred_John) <- "Loan status predictions for John Doe."

# Display the predictions
pred_John
             Loan status predictions for John Doe.
logi_age.1                              0.10846236
logi_int.1                              0.09996702
logi_multi.1                            0.14461840

These values are low, as they are close to 0. We could interpret this as some ability of the models to correctly predict that John Doe will not default. However, we have already illustrated that at least the age model is highly problematic. Additionally, several questions remain unanswered and require further analysis. For example, how can we determine if the prediction is low enough to classify it as a non-default? We may need a cutoff value to make this decision. We will explore this issue later.

Another important consideration is: What about the rest of the cases in the test set? The test set contains 9,695 observations, yet the example above only evaluates the first one. Our interest lies in assessing the entire test set, not just John Doe. Fortunately, this issue is easy to resolve by modifying the newdata parameter in the predict() function. Specifically, instead of using newdata = John_Doe, which refers to a single observation, we can replace it with newdata = test, encompassing all 9,695 observations in the test set.

Code
# Predict the loan status with the three models.
pred_logi_age <- predict(logi_age, newdata = test, type = "response")
pred_logi_int <- predict(logi_int, newdata = test, type = "response")
pred_logi_multi <- predict(logi_multi, newdata = test,
                           type = "response")

pred_range <- rbind("logi_age" = range(pred_logi_age),
                     "logi_int" = range(pred_logi_int),
                     "logi_multi" = range(pred_logi_multi))
aic <- rbind(logi_age$aic, logi_int$aic, logi_multi$aic)
pred_range <- cbind(pred_range, aic)
colnames(pred_range) <- c("min(loan_st)", "max(loan_st)", "AIC")
pred_range
           min(loan_st) max(loan_st)      AIC
logi_age     0.08417892    0.1165453 13580.65
logi_int     0.05019756    0.3982977 13215.61
logi_multi   0.01739107    0.4668004 13050.30

Now, instead of predicting for a single applicant, we have made predictions for all 9,695 applicants in the test set. The first column shows the lower predicted loan_st for each model, and the logistic models produce values in the range of 0 to 1. In this case, the predicted ranges are quite narrow.

Narrow ranges (i.e., the small difference between the highest and lowest predicted loan_st values) could be problematic because the model might struggle to discriminate between defaults (predictions closer to 1) and non-defaults (predictions closer to 0) as suggested in Figure 1.4. The higher the AIC, the worse the in-sample model, and the lower the AIC, the better the model. Here, we see some consistency between in-sample and out-of-sample performance: the model with the highest prediction range also has the lowest AIC, indicating it is the best in-sample model.

Let’s explore all the predicted loan_st values for the logi_age. First let’s see all of them.

Code
# Convert the predictions to a data frame
pred_data <- data.frame(
  Observation = seq_along(pred_logi_age),  # Sequence for observation index
  Predicted_Probabilities = pred_logi_age
)

# Create the plot
ggplot(pred_data, aes(x = Observation, y = Predicted_Probabilities)) +
  geom_point(alpha = 0.6, color = "blue") +  # Points for predictions
  labs(x = "Observation (test set)", 
       y = "Default prediction") +
  theme_minimal()
Figure 1.5: Age model predictions.

Now its distribution.

Code
ggplot(data.frame(pred_logi_age), aes(x = pred_logi_age)) + 
  geom_density(fill = "red") +
  labs(y = "Density", x = "Default prediction") +
  theme(legend.position = "bottom", legend.title = element_blank())
Figure 1.6: Age model prediction histogram.

The logi_age fails to predict values ranging from 0 to 1. In fact, these values are quite concentrated in a very small range of values. As a consequence, this model fails to differentiate between default and no default predictions.

Let’s visualize the predictions of the logi_int and logi_multi. We first collect all predictions in a single data frame just for convenience.

Code
pred_logi <- data.frame(cbind(pred_logi_age, pred_logi_int,
                                  pred_logi_multi))
pred_logi <- gather(pred_logi, key = "model",  value = "pred")

Now we plot the logi_int and logi_multi.

Code
ggplot(pred_logi[pred_logi$model != "pred_logi_age",], 
       aes(x = pred, fill = model)) + 
  geom_density(alpha = 0.4) +
  labs(y = "Density", x = "Default prediction") +
  theme(legend.position = "bottom", legend.title = element_blank())
Figure 1.7: Interest rate and multi models predictions histograms.

Let’s add the age model as well.

Code
ggplot(pred_logi, aes(x = pred, y = model, fill = model)) + 
  geom_boxplot() +
  labs(y = "Model", x = "Default prediction") +
  theme(legend.position = "none", legend.title = element_blank())
Figure 1.8: Age, Interest rate and multi models predictions boxplot.

Presumably, a model which considers all available predictors could be better for predicting the loan_st.

Code
# Logistic regression model using all available predictors in the data set.
logi_full <- glm(loan_st ~ age + int + grade + log(l_amnt) + 
                   log(income) + emp_len + home + sex +
                   region, family = "binomial", data = train)
# Loan status predictions for all test set elements.
pred_logi_full <- predict(logi_full, newdata = test, type = "response")
# Look at the predictions range.
range(pred_logi_full)
[1] 1.422469e-09 8.544424e-01

Now, the pred_logi_full prediction range is wider. A wider range means that the loan_st predictions are now closer to 1. This is good because we need the model to be able to predict both no-defaults (0) and defaults (1). Let’s see a prediction comparison with respect to the rest of the models.

Code
pred_range <- rbind("logi_age" = range(pred_logi_age), 
                     "logi_int" = range(pred_logi_int),
                     "logi_multi" = range(pred_logi_multi),
                     "logi_full" = range(pred_logi_full))
aic <- rbind(logi_age$aic, logi_int$aic, logi_multi$aic, logi_full$aic)
pred_range <- cbind(pred_range, aic)
colnames(pred_range) <- c("min(loan_st)", "max(loan_st)", "AIC")
pred_range
           min(loan_st) max(loan_st)      AIC
logi_age   8.417892e-02    0.1165453 13580.65
logi_int   5.019756e-02    0.3982977 13215.61
logi_multi 1.739107e-02    0.4668004 13050.30
logi_full  1.422469e-09    0.8544424 10763.67
Code
pred_logi <- data.frame(cbind(pred_logi_age, pred_logi_int,
                                  pred_logi_multi, pred_logi_full))
pred_logi <- gather(pred_logi, key = "model",  value = "pred")

A graphical comparison of the new pred_logi_full and pred_logi_multi.

Code
ggplot(pred_logi[pred_logi$model != "pred_logi_age" &
                  pred_logi$model != "pred_logi_int" ,], 
       aes(x = pred, fill = model)) + 
  geom_density(alpha = 0.4) +
  labs(y = "Density", x = "Default prediction") +
  theme(legend.position = "bottom", legend.title = element_blank())
Figure 1.9: Multi and full models predictions histograms.

And all of them together.

Code
ggplot(pred_logi, aes(x = pred, y = model, fill = model)) + 
  geom_boxplot() +
  labs(y = "Model", x = "Default prediction") +
  theme(legend.position = "none", legend.title = element_blank())
Figure 1.10: Age, Interest rate, multi and full models predictions boxplot.

The logi_full model predictions looks better than the other models.

Another question is: How can we know these model predictions corresponds to a default or no default? The loan status predictions go from 0 to 0.854 for the case of the logi_full model. At the end, these loan status estimations need to be interpreted or classified as a default or no default because we are interested on that. Are they closer enough to 0 to consider a no default? This issue is addressed by setting up a cutoff rate so we can split all estimated loan status into 0 or 1.

Let us arbitrarily set a cutoff of 0.15 for now. This implies that any estimated loan status below 0.15 will be classified as 0 (no default), while any estimated loan status above 0.15 will be classified as 1 (default). This classification rule can guide financial firms in deciding whether to approve or reject new loan applications. Applications with an estimated loan status above 0.15 will be rejected, whereas those with an estimated loan status below 0.15 will be approved. Thus, the loan approval or rejection decision is now based on the model’s estimation.

Choosing a cutoff value as a classification and acceptance rule can be controversial, as managers may prefer to allocate more loans rather than fewer. In other words, managers may prefer more sales at the cost of a higher credit risk. Specifically, the higher the cutoff, the greater the number of loans allocated, since the allocation rule selects those applicants whose predicted loan status is below the cutoff.

An alternative that balances credit risk with managerial interests is to implement multiple cutoff ranges instead of relying on a single cutoff value, such as 0.15. For instance, a financial firm could accept loan applications with an estimated loan status between 0.15 and 0.25 but charge a higher interest rate to offset the increased credit risk. Alternatively, a different perspective might involve incorporating social criteria to support individuals typically excluded by the financial industry due to their economic circumstances. In such cases, the focus could shift toward identifying applicants with lower estimated loan statuses.

For now, our cutoff and allocation rule assume that any estimated loan status below 0.15 will be classified as 0 (no default), while any estimated loan status above 0.15 will be classified as 1 (default).

Graphically, it looks like this:

Code
# Convert the predictions to a data frame
pred_data <- data.frame(
  Observation = seq_along(pred_logi_full),  # Sequence for observation index
  Predicted_Probabilities = pred_logi_full
)

# Create the plot
ggplot(pred_data, aes(x = Observation, y = Predicted_Probabilities)) +
  geom_point(alpha = 0.6, color = "blue") +  # Points for predictions
  geom_hline(yintercept = 0.15, color = "red", size = 1) +  # Horizontal line
  labs(x = "Observation (test set)", 
       y = "Default prediction") +
  theme_minimal()
Figure 1.11: Full model predictions and cutoff of 0.15.

Alternatively, we can illustrate the distribution.

Code
ggplot(pred_logi[pred_logi$model == "pred_logi_full",], 
       aes(x = pred, fill = model)) + 
  geom_density(alpha = 0.4) +
  geom_vline(xintercept = 0.15, linetype = "longdash") +
  labs(y = "Density", x = "Default prediction") +
  theme(legend.position = "none", legend.title = element_blank())
Figure 1.12: Full model predictions histogram and cutoff of 0.15.

Here, predicted loan_st values at the left of the dashed line represent no default predictions and values at the right of the dashed line represent default predictions. Let’s set up the rule to convert the estimated loan status into a binary (0 or 1) variable.

See how this transformation takes place:

Code
# Make a binary predictions-vector using a cutoff of 15%
pred_cutoff_15 <- ifelse(pred_logi_full > 0.15, 1, 0)
head(cbind(pred_logi_full, 
           "pred_logi_full_rounded" = round(pred_logi_full, 4), 
           pred_cutoff_15))
   pred_logi_full pred_logi_full_rounded pred_cutoff_15
1    1.357995e-08                 0.0000              0
2    2.986278e-08                 0.0000              0
18   2.299645e-02                 0.0230              0
26   2.403879e-01                 0.2404              1
27   1.778986e-01                 0.1779              1
28   2.600172e-02                 0.0260              0

These are only the first 6 rows in the test set. We can see that the rule works as expected because every estimated loan status below 0.15 is now considered as 0 (no-default), and every estimated loan status above 0.15 is considered as 1 (default). The table above shows how we can create this binary variable given the logistic model prediction.

Note that the rows numbers in the table above are 1, 2, 18, 26, 27 and 28. These are not 1, 2, 3, 4, 5 and 6 because the test rows were selected randomly out of the dat. So, the row numbers in the table above correspond to the original place in dat.

Is logi_full a good model after all? We can add a new column to the previous table. This new variable represents what really happened with the loan. Then, the first column is the logistic model prediction, the second column the transformed binary variable given a cutoff of 0.15, and the third column is what actually happened (default or no-default). Let’s take a sample of 10 observations to conduct the comparison easily. Note that the model correctly predicts a no default in most cases. Rows 308 and 329 predict a default incorrectly and row 323 predict a default correctly.

Code
# Let's take from rows 101 to 110.
(cbind(pred_logi_full, pred_cutoff_15, 
           as.data.frame.numeric(test$loan_st)))[101:110,]
    pred_logi_full pred_cutoff_15 test$loan_st
308   1.537106e-01              1            0
309   4.137654e-02              0            0
310   3.113926e-02              0            0
312   5.186962e-09              0            0
318   5.337092e-02              0            0
319   8.402239e-02              0            0
323   2.795535e-01              1            1
328   5.384112e-09              0            0
329   1.893745e-01              1            0
330   9.012136e-03              0            0

There is an easy way to evaluate the rest of the cases in the test set using a simple table called confusion matrix.

Code
table(Actual = test$loan_st, Predicted = pred_cutoff_15)
      Predicted
Actual    0    1
     0 6554 2081
     1  308  752

The logi_full model correctly predicts 6,554 no-defaults and 752 defaults. However, it also misclassifies 2,081 defaults as no-defaults and 308 no-defaults as defaults. How good are these results? Which of these four values is most important? These are questions we will address later. For now, it is important to note that different models and cutoff rates yield different confusion matrix results. Note that the sum of these four values equals 9,695, which corresponds to the total number of observations in the test set.

You may want to see relative and not absolute values. Let’s try the CrossTable() function instead.

Code
CrossTable(test$loan_st, pred_cutoff_15, 
           prop.r = FALSE, prop.c = FALSE, prop.t = TRUE, 
           prop.chisq = FALSE,
          dnn = c("Actual", "Predicted"))

 
   Cell Contents
|-------------------------|
|                       N |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  9695 

 
             | Predicted 
      Actual |         0 |         1 | Row Total | 
-------------|-----------|-----------|-----------|
           0 |      6554 |      2081 |      8635 | 
             |     0.676 |     0.215 |           | 
-------------|-----------|-----------|-----------|
           1 |       308 |       752 |      1060 | 
             |     0.032 |     0.078 |           | 
-------------|-----------|-----------|-----------|
Column Total |      6862 |      2833 |      9695 | 
-------------|-----------|-----------|-----------|

 

Here, 67.6% of all cases were correctly predicted as non-default (true negatives) while 21.5% were incorrectly predicted as default (false positives), together making up 89.1% of cases and showing non-default is the majority class; meanwhile, 7.8% of all cases were correctly predicted as default (true positives) and 3.2% were incorrectly predicted as non-default (false negatives), comprising 10.9% of cases and indicating default is the minority class, which suggests the model has better prediction accuracy for non-default (67.6% correct) compared to default (7.8% correct), though this should be considered in the context of the imbalanced class distribution.

Instead of arbitrarily consider a cutoff of 0.15, we can follow a different approach. Now consider that we are interested in taking the 20% highest estimates (closer to 1) of pred_logi_full as default. Equivalently, this is to take the lowest 80% pred_logi_full estimates (closer to 0) as no-default. Let’s calculate the new cutoff that meets this new criterion.

Code
# Cutoff definition.
cutoff <- quantile(pred_logi_full, 0.8)
cutoff
      80% 
0.1994621 

This new approach (taking the 20% highest estimates of pred_logi_full as default) represents a cutoff of 0.1994621. Graphically:

Code
ggplot(pred_logi[pred_logi$model == "pred_logi_full",], 
       aes(x = pred, fill = model)) + 
  geom_density(alpha = 0.4) +
  geom_vline(xintercept = cutoff, linetype = "longdash") +
  labs(y = "Density", x = "Default prediction") +
  theme(legend.position = "none", legend.title = element_blank())
Figure 1.13: Full model prediction histogram new cutoff of 0.1994621.

Now the cutoff is 0.1994621. This splits the loan status predictions into two parts: higher than the cutoff is a default, and lower than the cutoff is a no-default. Here are the cutoff values as a function of quantiles.

Code
cutoff_all <- quantile(pred_logi_full, seq(0.1, 1, 0.1))
data.frame("cut_off" = round(cutoff_all,5))
     cut_off
10%  0.00000
20%  0.01469
30%  0.02410
40%  0.03751
50%  0.06184
60%  0.09617
70%  0.14673
80%  0.19946
90%  0.29227
100% 0.85444

We can show a similar summary table as we did with the cutoff of 0.15. Here, we show the predictive ability of the logi_full model with new cutoff of 0.1994621.

Code
# Calculate the predictions with the same model and new cutoff.
pred_full_20 <- ifelse(pred_logi_full > cutoff, 1, 0)
# Show results in a confusion matrix.
CrossTable(test$loan_st, pred_full_20, prop.r = FALSE,
           prop.c = FALSE, prop.t = TRUE, prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  9695 

 
             | pred_full_20 
test$loan_st |         0 |         1 | Row Total | 
-------------|-----------|-----------|-----------|
           0 |      7309 |      1326 |      8635 | 
             |     0.754 |     0.137 |           | 
-------------|-----------|-----------|-----------|
           1 |       447 |       613 |      1060 | 
             |     0.046 |     0.063 |           | 
-------------|-----------|-----------|-----------|
Column Total |      7756 |      1939 |      9695 | 
-------------|-----------|-----------|-----------|

 

With a cutoff of 0.1994621 we accept 7,309 + 447 = 7,756 applications as those are the ones that the model predicts a no default. Previously, in the case of a cutoff of 0.15 we accept 6,554 + 308 = 6,862.

Let’s compare both confusion matrices:

Code
cat <- c("Correct no-default (true negatives).", 
         "False default (false positives).", 
         "False no-default (false negatives).", 
         "Correct default (true positives).")
cut_15 <- c("67.6%", "21.5", "3.2", "7.8%")
cut_1994621 <- c("75.4%", "13.7%", "4.6%", "6.3%")
cbind("Classification." = cat, cut_15, cut_1994621)
     Classification.                        cut_15  cut_1994621
[1,] "Correct no-default (true negatives)." "67.6%" "75.4%"    
[2,] "False default (false positives)."     "21.5"  "13.7%"    
[3,] "False no-default (false negatives)."  "3.2"   "4.6%"     
[4,] "Correct default (true positives)."    "7.8%"  "6.3%"     

Type I and Type II errors are key concepts in hypothesis testing and statistical decision-making, particularly relevant in loan default prediction. A Type I error (false positive) occurs when good customers, who would not default, are incorrectly rejected. On the other hand, a Type II error (false negative) happens when bad customers, who will default, are mistakenly accepted.

The new cutoff of 0.1994621 improves the identification of no-defaults but worsens the identification of defaults. Additionally, the new cutoff results in fewer false positives but more false negatives. This suggests that there is a trade-off between the two outcomes.

We can also look the detail of 0.1994621 cutoff. Comparing two columns, the one in the left with the actual loan status, and the right column with the estimated loan status.

Code
# Comparative table in detail.
real_pred_20 <- cbind.data.frame(test$loan_st, pred_full_20,
                     "Did the model succeed?" = test$loan_st == pred_full_20)
# Show some values.
real_pred_20[131:140,]
    test$loan_st pred_full_20 Did the model succeed?
398            0            0                   TRUE
399            1            1                   TRUE
404            0            1                  FALSE
414            0            0                   TRUE
417            1            1                   TRUE
419            0            0                   TRUE
420            0            0                   TRUE
425            1            1                   TRUE
431            0            0                   TRUE
435            0            0                   TRUE

In this sample, the model fails for 1 out of 10 individuals, a fairly good result. Now, let’s imagine we are a bank with 9,695 loan applications on our desk (or computer). Suppose we use the predictions from pred_full_20 to decide whether to accept a loan application. Our model-based acceptance rule is as follows: if pred_full_20 = 0, the model predicts no default, and we accept the loan application. According to the excerpt from the table above, we correctly reject applications 399, 417, and 425 because those were indeed defaults. However, we incorrectly reject application 404, which did not default. Overall, using a model as the basis for a decision rule can lead to better outcomes than guessing or applying a random approval policy.

Let’s count how many applications are accepted and rejected according to our rule.

Code
# Accepted.
accept_20 <- sum(pred_full_20 == 0)
# Rejected.
reject_20 <- sum(pred_full_20 == 1)
data.frame("total" = length(pred_full_20), accept_20, reject_20)
  total accept_20 reject_20
1  9695      7756      1939

By taking the top 20% of the pred_logi_full estimates as defaults and the bottom 80% as non-defaults, we construct a rule that accepts 7,756 loan applications (80% of the total) and rejects 1,939 applications (20% of the total). The cutoff criterion determines the number of accepted applications, while the logi_full model specifies which applications to accept or reject.

We can evaluate this loan acceptance/rejection rule by comparing it against the actual outcomes recorded in loan_st. To begin, let’s illustrate the decision-making process based on the model’s estimates.

Code
# First 10 accept decisions.
head(data.frame(real_pred_20[,1:2], 
                decision = ifelse(real_pred_20$pred_full_20 == 0, 
                                  "accept", "reject")), 12)
   test.loan_st pred_full_20 decision
1             0            0   accept
2             0            0   accept
18            1            0   accept
26            0            1   reject
27            0            0   accept
28            0            0   accept
30            0            0   accept
32            0            0   accept
34            0            1   reject
35            0            0   accept
36            0            0   accept
37            1            0   accept

We can add an evaluation column.

Code
# First 10 accept decisions.
head(data.frame(real_pred_20[,1:2], 
                decision = ifelse(real_pred_20$pred_full_20 == 0, 
                                  "accept", "reject"),
                evaluation = ifelse(real_pred_20$pred_full_20 == 1, 
                                    "we rejected a good customer", 
 ifelse(real_pred_20$pred_full_20 == 0 & real_pred_20$`test$loan_st` == 0,
        "good decision", "bad decision"))), 12)
   test.loan_st pred_full_20 decision                  evaluation
1             0            0   accept               good decision
2             0            0   accept               good decision
18            1            0   accept                bad decision
26            0            1   reject we rejected a good customer
27            0            0   accept               good decision
28            0            0   accept               good decision
30            0            0   accept               good decision
32            0            0   accept               good decision
34            0            1   reject we rejected a good customer
35            0            0   accept               good decision
36            0            0   accept               good decision
37            1            0   accept                bad decision

In the table above, we see the outcomes of the first 12 loan applications. According to our rule, we accept 10 applications and reject 2 (applications #26 and #34). A “good decision” refers to cases where we accept a loan application that does not default, meaning we correctly identify the applicant as creditworthy. Conversely, a “bad decision” occurs when we accept a loan application that defaults, leading to a financial loss. Additionally, there are instances where we reject applications from good customers who would not have defaulted, which is also undesirable because it represents lost opportunities for profitable lending.

While this table provides valuable insights, it is somewhat problematic because it incorporates a counterfactual approach. Specifically, we are evaluating the rejected applications based on their potential outcomes if they had been accepted. This requires us to hypothesize about what would have happened, which introduces assumptions and uncertainty, as these scenarios did not actually occur. A more practical and grounded approach is to focus on evaluating the rule based solely on the cases where we do accept loan applications. This is where the concept of the “bad rate” becomes useful. The bad rate measures the proportion of accepted loans that default, providing a straightforward and actionable metric. By concentrating on the actual outcomes of accepted applications, the bad rate allows us to assess the effectiveness of our rule without relying on speculative counterfactuals, making it a more reliable and convenient evaluation method.

Remember that the real_pred_20 object is significant because it contains both the binary model predictions and the actual loan status, essentially, the model’s prediction and the corresponding real outcomes. In the code below, we first filter for accepted loans, which are cases where the prediction equals 0. If this condition is met, we assign the value from the first column, which represents the loan status.

As a result, accepted_loans becomes a vector of zeros and ones: a value of 0 indicates a loan that did not default, while a value of 1 indicates a loan that defaulted. The length of the accepted_loans vector 7,756 corresponds to the number of loans allocated based on the cutoff value of 0.1994621 and the predictions from the logi_full model.

Code
# We accept loans that the model predicts a no-default (0).
# In "accepted_loans" we know whether the accepted loans are in fact
# default or no-default.
accepted_loans <- real_pred_20[pred_full_20 == 0, 1]
length(accepted_loans)
[1] 7756
Code
# The code above says: if we accept the application, tell me what happened.
Code
head(accepted_loans, 10)
 [1] 0 0 1 0 0 0 0 0 0 1
Levels: 0 1

Note that the third and the tenth application default. These are applications #18 and #37 as expected. Now let’s evaluate not 12 applications but all of them, which are 7,756. The bad rate is now expressed as a percentage of the total:

Code
# bad_rate is the proportion of accepted loans that are in fact default.
bad_rate <- sum(accepted_loans == 1)/length(accepted_loans)
bad_rate
[1] 0.0576328

This is, by following the model-based rule, we accepted 7,756 loan applications that represent 80% of the total applications. However, 5.76% of those accepted applications were in fact a default. In particular, we accepted 447 loans that are defaults so: \(447/7756=0.0576328\).

Models are not perfect but we are always interested to find out a good model that leads to a lower bad rate because (in principle) we do not want to accept many defaults. If we keep the same model, we could reduce this 5.76% by being more strict in the loan application which in simple terms mean to reduce the acceptance rate. This alternative could be controversial as a lower acceptance rate represents lower income (less customers, fewer sales) for the bank or financial firm.

Let’s consider we reduce the acceptance rate from 80% to 65% so we can evaluate the impact over the bad rate.

Code
# New cutoff value.
cutoff <- quantile(pred_logi_full, 0.65)
# Split the pred_logi_full into a binary variable.
pred_full_35 <- ifelse(pred_logi_full > cutoff, 1, 0)
# A data frame with real and predicted loan status.
real_pred_35 <- cbind.data.frame(test$loan_st, pred_full_35)
# Loans that we accept given these new rules.
accepted_loans <- real_pred_35[pred_full_35 == 0, 1]
# Bad rate (accepted loan applications that are defaults).
bad_rate <- sum(accepted_loans == 1)/length(accepted_loans)
# Show the bad rate.
bad_rate
[1] 0.03713107

As expected, the bad rate is lower (from 5.76%% to 3.71%). This is, the lower the acceptance rate the lower the bad rate. In the extreme, if we accept 0 loan applications then our bad rate would be zero, but doing so is equivalent as going out of business. We can create a function such that given a vector of prediction of loan status we can return the bad rate for different cutoff values. This could be useful to build the bank strategy. This function will reveal the trade-off between the acceptance rate and the bad rate. In particular, the lower the acceptance rate, the lower the income (bad thing) and the lower the bad rate (good thing). So, which combination is the optimal?

1.4 The bank strategy.

A bank could be interested to understand the relationship between the acceptance rate and the bad rate given a model that predicts the loan status.

Code
bank <- function(prob_of_def) {
  # Pre-define the acceptance rates
  accept_rate <- seq(1, 0, by = -0.05)
  
  # Calculate cutoffs for each acceptance rate
  cutoff <- quantile(prob_of_def, accept_rate)
  
  # Calculate bad rates for each cutoff
  bad_rate <- sapply(cutoff, function(thresh) {
    pred_i <- ifelse(prob_of_def > thresh, 1, 0)
    pred_as_good <- test$loan_st[pred_i == 0]
    mean(pred_as_good == 1)
  })
  
  # Create the result table
  table <- cbind(
    accept_rate,
    cutoff = round(cutoff, 4),
    bad_rate = round(bad_rate, 4)
  )
  
  return(list(table = table, bad_rate = bad_rate, accept_rate = accept_rate, cutoff = cutoff))
}

We can evaluate this function for the logi_full model and compare it to a less effective model, such as logi_age. In principle, we expect the logi_full model to demonstrate a more favorable relationship between the acceptance rate and the bad rate, that is, lower bad rates for a given acceptance rate. Financial institutions are generally interested in increasing the acceptance rate without significantly increasing the bad rate.

Let’s apply the function to the pred_logi_full and the logi_age.

Code
# Apply the bank function.
bank_logi_full <- bank(pred_logi_full)
bank_logi_age <- bank(pred_logi_age)

data.frame(accept_rate = bank_logi_age$accept_rate,
           "Bad_model_bad_rate" = bank_logi_age$bad_rate, 
           "Good_model_bad_rate)" = bank_logi_full$bad_rate)
     accept_rate Bad_model_bad_rate Good_model_bad_rate.
100%        1.00         0.10933471          0.109334709
95%         0.95         0.10849715          0.090662324
90%         0.90         0.10849715          0.076790831
85%         0.85         0.10849715          0.066626214
80%         0.80         0.10547397          0.057632800
75%         0.75         0.10547397          0.050612020
70%         0.70         0.10396251          0.043619216
65%         0.65         0.10396251          0.037131070
60%         0.60         0.10092961          0.029396596
55%         0.55         0.10092961          0.023443361
50%         0.50         0.09933775          0.021039604
45%         0.45         0.10206865          0.018565207
40%         0.40         0.10206865          0.015471893
35%         0.35         0.09907039          0.011788977
30%         0.30         0.09972041          0.009281540
25%         0.25         0.10343554          0.005363036
20%         0.20         0.09722922          0.003610108
15%         0.15         0.09360190          0.001374570
10%         0.10         0.10125362          0.000000000
5%          0.05         0.10516605          0.000000000
0%          0.00         0.00000000          0.000000000

The full model is superior because it achieves a lower bad rate at any given acceptance rate. A plot can effectively highlight the key differences between the two models: logi_age and logi_full.

Code
# Combine the two datasets for easier comparison
bank_logi_full$model <- "logi_full"
bank_logi_age$model <- "logi_age"
combined_data <- bind_rows(bank_logi_full, bank_logi_age)

# Add vertical and horizontal lines for visualization
highlight_lines <- tibble(
  model = c("logi_full", "logi_full", "logi_age", "logi_age"),
  accept_rate = c(bank_logi_full[["accept_rate"]][8],
                  bank_logi_full[["accept_rate"]][5],
                  bank_logi_age[["accept_rate"]][8],
                  bank_logi_age[["accept_rate"]][5]),
  bad_rate = c(bank_logi_full[["bad_rate"]][8],
               bank_logi_full[["bad_rate"]][5],
               bank_logi_age[["bad_rate"]][8],
               bank_logi_age[["bad_rate"]][5]),
  color = c("black", "red", "black", "red")
)

# Add a color column to distinguish facets
combined_data <- combined_data %>%
  mutate(facet_color = ifelse(model == "logi_full", "blue", "orange"))

ggplot(combined_data, aes(x = accept_rate, y = bad_rate, color = facet_color)) +
  geom_line(size = 1.2) +  # Use facet-specific colors for the lines
  geom_vline(data = highlight_lines, aes(xintercept = accept_rate, color = color),
             linetype = "dashed", size = 1.1) +
  geom_hline(data = highlight_lines, aes(yintercept = bad_rate, color = color),
             linetype = "dashed", size = 1.1) +
  facet_wrap(~model, nrow = 1) +
  labs(x = "Acceptance rate", y = "Bad rate") +
  theme_minimal(base_size = 14) +
  theme(legend.position = "none") +
  scale_color_identity() +  # Use the colors directly without adding them to the legend
  scale_x_continuous(breaks = c(0, 0.65, 0.8, 1)) +
  scale_y_continuous(breaks = c(0, 0.0371, 0.0576, 0.1093))
Figure 1.14: A bad and a good model.

The logi_full model is superior because, at any acceptance rate, it achieves a lower bad rate compared to the logi_age model. This advantage arises from the logi_full model’s ability to identify defaults and non-defaults with greater precision. The value of a strong model lies in its capacity to support better business decisions, in this case, more accurate credit evaluations.

The ROC (Receiver Operating Characteristic) curve is a graphical representation used to evaluate the performance of a binary classification model, such as a credit risk model designed to predict defaults and non-defaults. In this context, the ROC curve plots the true positive rate, or sensitivity, against the false positive rate or specificity across various cutoff or acceptance rates.

The area under the ROC curve (AUC) quantifies the model’s ability to discriminate between the two classes: a higher AUC indicates better performance. For example, a model with an AUC close to 1 suggests it can effectively distinguish between defaulting and non-defaulting borrowers, while an AUC of 0.5 indicates no better performance than random guessing. By analyzing the trade-off between sensitivity (correctly identifying defaults) and specificity (correctly identifying non-defaults), the ROC curve helps determine an optimal threshold for classification, balancing the cost of misclassifying borrowers.

Code
# Calculate ROC curves
ROC_logi_full <- roc(test$loan_st, pred_logi_full)
ROC_logi_age <- roc(test$loan_st, pred_logi_age)

# Draw the ROC curves on one plot
plot(ROC_logi_full, col = "blue", lwd = 3)
lines(ROC_logi_age, col = "orange", lwd = 3)
legend("bottomright", 
       legend = c("Full model AUC: 0.8213", 
                  "Age model AUC: 0.5301"), 
       col = c("blue", "orange"), 
       lwd = 3)
Figure 1.15: ROC curves: Full model in blue, age model in orange.

As expected, in Figure 1.15 the area under the curve (AUC) is higher for the blue line which corresponds to the logi_full model. We can calculate the exact values for both models:

Code
# Compute the AUCs:
auc(ROC_logi_full)
Area under the curve: 0.8213
Code
auc(ROC_logi_age)
Area under the curve: 0.5301

Note that the logi_age has an AUC of 0.5301. This is close to a loan approval process in which we randomly accept and reject with no further analysis. In other words, the logi_age is so bad that it is almost equivalent as using no model at all and accept and reject loan applications based on a random rule.

Sensitivity refers to the model’s ability to correctly identify defaults, also known as true positives. Specificity, on the other hand, measures the model’s ability to correctly identify non-default loans, also referred to as true negatives. In practical terms, a highly sensitive model minimizes the risk of missing actual defaults, ensuring that loans likely to fail are identified and rejected. However, an extreme focus on sensitivity can lead to over-rejection, where many creditworthy applicants are misclassified as high risk. This conservative approach reduces potential defaults but can severely limit revenue, harming the firm’s profitability and growth.

Conversely, a model with high specificity ensures that most creditworthy borrowers are approved, maximizing loan issuance and short-term revenue. Yet, an exclusive focus on specificity may allow a significant number of risky loans to go undetected, resulting in a higher rate of defaults. Over time, the accumulation of bad debts could undermine the firm’s financial stability, potentially leading to insolvency. Thus, balancing sensitivity and specificity is essential: the former safeguards against default-related losses, while the latter promotes profitable lending. The optimal trade-off will depend on the firm’s risk appetite, financial goals, and regulatory constraints.

If a firm were to accept all loans, it would achieve 100% specificity by correctly identifying all non-default loans, but it would fail to identify defaults, leading to the maximum bad rate possible and significant financial losses. Conversely, if the firm were to reject all loans that might potentially default, it would achieve 100% sensitivity by correctly identifying all defaults. However, this would result in rejecting all loan applications, including creditworthy borrowers, and consequently, the firm would not issue any loans. While this would eliminate credit risk, it would also severely restrict the firm’s revenue and growth potential. In both cases, while some risks may be mitigated, the firm would face substantial operational challenges: one approach would expose the firm to financial instability due to the maximum bad rate, while the other would stifle business growth by not issuing any loans. Both scenarios emphasize the importance of balancing risk mitigation with sustainable business growth.

The Figure 1.15 has a full range of acceptance rates. Let’s evaluate the values of sensitivity and specificity at two acceptance rates that we have reviewed before: 65% and 80%.

Code
# Obtener coordenadas
coords_full80 <- coords(ROC_logi_full, x = 0.20)
coords_age80 <- coords(ROC_logi_age, x = 0.20)
coords_full65 <- coords(ROC_logi_full, x = 0.35)
coords_age65 <- coords(ROC_logi_age, x = 0.35)

# Crear un data.frame combinando los resultados
results_df <- data.frame(
  Model = c("logi_full", "logi_age", "logi_full", "logi_age"),
    Acceptance_rate = 1-c(coords_full80$threshold, coords_age80$threshold, 
                coords_full65$threshold, coords_age65$threshold),
  Sensitivity = c(coords_full80$sensitivity, coords_age80$sensitivity, 
                  coords_full65$sensitivity, coords_age65$sensitivity),
  Specificity = c(coords_full80$specificity, coords_age80$specificity, 
                  coords_full65$specificity, coords_age65$specificity)
)

# Mostrar el data.frame
print(results_df)
      Model Acceptance_rate Sensitivity Specificity
1 logi_full            0.80   0.5773585   0.8473654
2  logi_age            0.80   0.0000000   1.0000000
3 logi_full            0.65   0.2754717   0.9579618
4  logi_age            0.65   0.0000000   1.0000000
Code
# Draw the ROCs on one plot with a title
plot(ROC_logi_full, col = "blue", lwd = 3)
abline(v = coords_full80$specificity, col = "red", lty = 2, lwd = 2)
abline(h = coords_full80$sensitivity, col = "red", lty = 2, lwd = 2)
abline(v = coords_full65$specificity, col = "black", lty = 2, lwd = 2)
abline(h = coords_full65$sensitivity, col = "black", lty = 2, lwd = 2)
legend("bottomright", 
       legend = c("Acceptance rate of 80%", 
                  "Acceptance rate of 65%"), 
       col = c("red", "black"), 
       lwd = 3, lty = 2)
Figure 1.16: Full model in blue, evaluated in two acceptance rates: 65% in black, 80% in red.

Let’s check if these sensitivity and specificity results are correct.

The sensitivity value is: \(\text{Sensitivity} = \frac{\text{True positives}}{\text{True positives} + \text{False negatives}}\).

The specificity value is: \(\text{Specificity} = \frac{\text{True negatives}}{\text{True negatives} + \text{False positives}}\).

For the case of an acceptance rate of 80%:

Code
# Calculate the predictions with the same model and new cutoff.
pred_full_20 <- ifelse(pred_logi_full > 0.2, 1, 0)
# Show results in a confusion matrix.
CrossTable(test$loan_st, pred_full_20, prop.r = FALSE,
           prop.c = FALSE, prop.t = FALSE, prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|-------------------------|

 
Total Observations in Table:  9695 

 
             | pred_full_20 
test$loan_st |         0 |         1 | Row Total | 
-------------|-----------|-----------|-----------|
           0 |      7317 |      1318 |      8635 | 
-------------|-----------|-----------|-----------|
           1 |       448 |       612 |      1060 | 
-------------|-----------|-----------|-----------|
Column Total |      7765 |      1930 |      9695 | 
-------------|-----------|-----------|-----------|

 

\(\text{Sensitivity} = \frac{612}{612 + 448} = 0.5773585\).

\(\text{Specificity} = \frac{7317}{7317 + 1318} = 0.8473654\).

For the case of an acceptance rate of 65%:

Code
# Calculate the predictions with the same model and new cutoff.
pred_full_35 <- ifelse(pred_logi_full > 0.35, 1, 0)
# Show results in a confusion matrix.
CrossTable(test$loan_st, pred_full_35, prop.r = FALSE,
           prop.c = FALSE, prop.t = FALSE, prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|-------------------------|

 
Total Observations in Table:  9695 

 
             | pred_full_35 
test$loan_st |         0 |         1 | Row Total | 
-------------|-----------|-----------|-----------|
           0 |      8272 |       363 |      8635 | 
-------------|-----------|-----------|-----------|
           1 |       768 |       292 |      1060 | 
-------------|-----------|-----------|-----------|
Column Total |      9040 |       655 |      9695 | 
-------------|-----------|-----------|-----------|

 

\(\text{Sensitivity} = \frac{292}{292 + 768} = 0.2754717\).

\(\text{Specificity} = \frac{8272}{8272 + 363} = 0.9579618\).

In a credit risk model that predicts default and non-default cases, there is an inherent trade-off between sensitivity and specificity. Increasing sensitivity often comes at the cost of reducing specificity, as lowering the cutoff for classifying defaults (or increasing the acceptance rate) may capture more actual defaulters but also increase the number of non-defaulting borrowers misclassified as defaulters (false positives). Conversely, raising the cutoff (or reducing the acceptance rate) improves specificity by reducing false positives but risks missing actual defaulters (false negatives), thereby lowering sensitivity.

A pure-random approval rule would look like this:

Code
set.seed(2020)
pred_rand_model <- runif(length(pred_logi_age))
ROC_rand <- roc(test$loan_st, pred_rand_model)
# Draw the ROCs on one plot
plot(ROC_rand, col = "red")
auc(ROC_rand)
Area under the curve: 0.5105
Figure 1.17: Random rule model.

In theory, this random evaluation process would lead to an AUC of \(0.5 = (1 \times 1)/2\). In contrast, now imagine we have a perfect model:

Code
pred_perfect_model <- as.numeric(test$loan_st)
ROC_perfect <- roc(test$loan_st, pred_perfect_model)
plot(ROC_perfect)
auc(ROC_perfect)
Area under the curve: 1
Figure 1.18: Perfect model.

In a perfect model, the AUC is equal to 1 (\(1 \times 1\)). The model correctly identify defaults (100% sensitivity) and at the same time the model correctly identify no-defaults (100% specificity).

Edward Malthouse explains these concepts quite well:

Code
embed_url("https://youtu.be/HljSmQhLs8M")
Code
library(knitr)
library(plotly)
library(Sim.DiffProc)
library(scatterplot3d)
library(vembedr)

2 The Merton model.

The Merton model is useful when we are interested in evaluating the credit risk of public firms. In short, it evaluates how likely it is that the value of the future firm’s assets fall below a certain threshold represented by the firm’s debt. By doing that, the model is able to estimate a probability of default among other results. The model can be used as a tool to propose changes (for example) in the balance sheet to reduce the credit risk exposure of a firm.

The philosophy of these models goes back to Black and Scholes (1973), Merton (1973) and Merton (1974), they consider corporate liabilities as contingent claims on the assets of the firm.

2.1 Minimize a function in R.

The estimation of the credit risk Merton’s model requires a minimization of a function. In this section, we show a simple minimization example. Here are some useful online resources for further references.

The objective function to be minimized is \(f(a,b)=\sum(a + bx_i-y_i)^2\). We have six values for \(x\) and \(y\), and we are expected to minimize the function \(f\) by finding the parameter values \(a\) and \(b\). The \(x_i\) and \(y_i\) data looks like this:

Code
# Data.
dat <- data.frame(x = c(1, 2, 3, 4, 5, 6), y = c(1, 3, 5, 6, 8, 12))
kable(dat, caption = "Observations.", row.names = TRUE) # The table.
Observations.
x y
1 1 1
2 2 3
3 3 5
4 4 6
5 5 8
6 6 12

Graphically:

Code
# A scatter plot.
plot(y ~ x, data = dat, pch = 19, cex = 2)
Figure 2.1: Original data.

Let’s implement the \(f(a,b)=\sum(a + bx_i-y_i)^2\) minimization.

Code
# Function that calculates the residual sum of squares.
min.RSS <- function(data, par) {
  with(data, sum((par[1] + par[2] * x - y)^2)) }
# Function minimization.
result <- optim(par = c(0, 1), min.RSS, data = dat)
a <- result$par[1]
b <- result$par[2]
f <- sum((a + b * dat$x - dat$y)^2)
# Minimization output.
ans <- data.frame(a, b, f)
kable(ans, caption = "Minimization results f(a,b).")
Minimization results f(a,b).
a b f
-1.266846 2.02862 2.819048

The minimization result is represented by a line, and the line is characterized by an intercept equal to \(-1.266846\) and a slope equal to \(2.02862\).

We can plot it and verify that this is indeed the right answer.

Code
# View the results.
plot(y ~ x, data = dat, pch = 19, ylim = c(-2, 12), xlim = c(0, 6), cex = 2)
abline(a, b, col = "red", lwd = 3)
abline(h = 0, lty = 2)
abline(v = 0, lty = 2)
Figure 2.2: Line that minimize the residual sum of squares.

The function that we minimize here is basically the OLS criterion. This is why the regression model leads to the same results.

Code
# The previous example is equivalent as a linear regression model.
reg <- lm(y ~ x, data = dat)
# See the results.
summary(reg)

Call:
lm(formula = y ~ x, data = dat)

Residuals:
      1       2       3       4       5       6 
 0.2381  0.2095  0.1810 -0.8476 -0.8762  1.0952 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -1.2667     0.7815  -1.621 0.180388    
x             2.0286     0.2007  10.109 0.000539 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8395 on 4 degrees of freedom
Multiple R-squared:  0.9623,    Adjusted R-squared:  0.9529 
F-statistic: 102.2 on 1 and 4 DF,  p-value: 0.000539

The function that we minimize in the estimation of the credit risk Merton model is more elaborated, but we follow the same principle as above.

2.2 The estimation.

For an introductory reference to understand the basics of credit see Brealey et al. (2020).

This is a nice introduction of the Merton Model for credit risk assessment:

Code
embed_url("https://youtu.be/TjVvatkjlsM")

Let’s follow the example 24.3 in (Hull 2015). We have five parameters: \(E_0=3\), \(\sigma_E=0.8\), \(rf=0.05\), \(T=1\), \(D=10\) that lead to an estimate of the value of the assets today \(V_0\) and the volatility of the assets \(\sigma_V\). With these seven parameters we can estimate the probability of default \(pd\) among other interesting results.

These are the known parameters.

Code
# List of known parameters.
E0 <- 3 # Value of the equity as today, this is market capitalization.
se <- 0.8 # Stock return volatility.
rf <- 0.05 # Risk-free rate.
TT <- 1 # Maturity.
D <- 10 # Value of the debt. The Bloomberg function DDIS is useful.

We need equations 24.3 and 24.4 to estimate \(V_0\) and \(\sigma_V\).

Code
eq24.3 <- function(V0, sv) { 
  ((V0 * pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) -
  D * exp(-rf * TT) * pnorm(((log(V0 / D) +
  (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) - sv * sqrt(TT)) - E0)) }
eq24.4 <- function(V0, sv) { 
  ((pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) *
      sv * V0 - se * E0)) }
# Footnote 10 indicates that we should minimize F(x,y)^2+G(x,y)^2
min.footnote10 <- function(x) 
  (eq24.3(x[1], x[2]))^2 + (eq24.4(x[1], x[2]))^2
# The minimization leads to the values of V0 and sv.
V0_sv <- optim(c(1, 1), min.footnote10)
# Define the values as parameters.
V0 <- V0_sv$par[1]
sv <- V0_sv$par[2]
kable(data.frame("Market value" = V0, 
                 "Volatility" = sv), 
      caption = "Assets.")
Assets.
Market.value Volatility
12.39578 0.2123041

Calculate the probability of default \(N(-d_2)\) as a function of \(V_0\) and \(\sigma_V\).

Code
PD <- function(V0, sv) { 
  pnorm(-(((log(V0 / D) + (rf + sv^2 / 2) * TT) / 
             (sv * sqrt(TT))) - sv * sqrt(TT))) }
# Calculate the probability of default given the previous parameters.
pd <- PD(V0, sv)
pd
[1] 0.1269396

The probability of default is 12.69396%.

Let’s analyze the role of the values of \(d_1\), \(d_2\), \(N(d_1)\) and \(N(d_2)\). First, we isolate these values.

Code
# Inspect the role of d and N(d).
d1 <- (log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))
d2 <- d1 - sv * sqrt(TT)
Nd1 <- pnorm(d1)
Nd2 <- pnorm(d2)
ans <- data.frame(d1, Nd1, d2, Nd2)
ans
        d1       Nd1       d2       Nd2
1 1.353282 0.9120172 1.140978 0.8730604

Now, let’s illustrate how \(N(-d_2)\) lead to the probability of default given the properties of a standard normal distribution.

Code
xseq <- seq(-4, 4, 0.01) 
densities <- dnorm(xseq, 0, 1) # Standard normal distribution.
# Graphical evaluation of N(-d_2).
plot(xseq, densities, xlab = "d values (-d_2 is the dotted line)", 
     ylab = "Density", type = "l", lwd = 2, cex = 2)
abline(h = 0)
polygon(c(xseq[xseq >= -d2], -d2), c(densities[xseq >= -d2], 0), col = "green")
polygon(c(xseq[xseq < -d2], -d2), c(densities[xseq < -d2], 0), col = "red")
legend("topleft", legend = c("N(d_2)=0.8730604 
in green and
N(-d_2)=0.1269396 in red.

N(d_2)+N(-d_2)=1."),
bg = "white", bty = "n", cex = 0.9)
abline(v = -d2, lty = 2, lwd = 3)
Figure 2.3: The red area represents the probability of default.

Here you can find an alternative explanation about \(N(-d_2)\):

Code
embed_url("https://youtu.be/lV0ytJYbVzc")

Now, let’s analyze the minimization that leads to \(V_0\) and \(\sigma_V\). We do not have an analytic math expressions (closed form) to calculate \(V_0\) and \(\sigma_V\), instead we have approximate values or estimates given the minimization of equations 24.3 and 24.4 in (Hull 2015). In order to see how good this approximation is, we can retrieve the value of \([F(x,y)]^2+[G(x,y)]^2\) evaluated at \(x=V_0\) and \(y=\sigma_v\). This value is supposed to be positive as both \(F(x,y)\) and \(G(x,y)\) functions are squared, so we expect a positive number close to zero when evaluated at \(x=V_0\) and \(y=\sigma_v\).

Code
# Evaluate how good is the minimization.
min.footnote10(x = c(V0, sv))
[1] 1.425797e-07
Code
round(min.footnote10(x = c(V0, sv)), 6)
[1] 0

The minimization conducted in the function min.footnote10() above worked fairly well as the value is zero in practical terms.

There is another way to approach how the minimization work. We can propose a graphical analysis. In particular, we can verify whether \(V_0=12.39578\) and \(\sigma_V=0.2123041\) minimizes the objective function. To illustrate this in two axis, we need to fix either \(V_0\) or \(\sigma_V\), and evaluate \([F(x,y)]^2+[G(x,y)]^2\) with different values of \(V_0\) or \(\sigma_V\). Let’s do that.

Code
# Fix sv and evaluate different values of V0.
V0.seq <- seq(from = 4, to = 20, length.out = 100)
sv.rep <- rep(sv, 100)
# Function to be evaluated.
FG <- function(V0, sv) { (eq24.3(V0, sv))^2 + (eq24.4(V0, sv))^2 }
# Apply the function with fixed sv and different V0 values.
FG.V0 <- mapply(FG, V0.seq, sv.rep)

We are ready to plot.

Code
# Plot the results.
plot(V0.seq, FG.V0, type = "l", xlab = expression(paste(V[0])),
     ylab = expression(paste(F(x,y)^2+G(x,y)^2)), cex.lab = 0.8, lwd = 3)
abline(v = V0, lty = 2)
abline(h = FG(V0, sv), lty = 2)
points(V0, FG(V0, sv), pch = 1, cex = 3, col = "red", lwd = 2)
Figure 2.4: Here we fix the volatility of the assets and change the value of the assets at time zero. The minimization looks fine.

Clearly, the minimization worked well here. For the sake of completeness, now we fix the \(V_0\) value and evaluate the function again.

Code
# Now fix V0, and evaluate different sv values.
sv.seq <- seq(0, 0.6, length.out = 100)
V0.rep <- rep(V0, 100)
# Evaluate the function with these parameters.
FG.sv <- mapply(FG, V0.rep, sv.seq)
# Plot the results.
plot(sv.seq, FG.sv, type = "l", xlab = expression(paste(sigma[V])), 
     ylab = expression(paste(F(x, y)^2 + G(x, y)^2)), 
     cex.lab = 0.8, lwd = 3)
abline(v = sv, lty = 2)
abline(h = FG(V0, sv), lty = 2)
points(sv, FG(V0, sv), pch = 1, cex = 3, col = "red", lwd = 2)
Figure 2.5: Here we fix the value of the assets at time zero and change the volatility of the assets. The minimization looks fine.

We can show the same as before but in a 3D plot.

Code
# outer evaluates the function for each combination of the vectors.
z <- outer(V0.seq, sv.seq, FG) # z is now 100x100 matrix.
persp(V0.seq, sv.seq, z, col = "springgreen", shade = 0, xlab = "V0",
      ylab = "sv", zlab = "Objective function.")
Figure 2.6: Minimization of the function.

Contour plots or level plots are a way to show a three-dimensional surface on a two-dimensional plane.

Code
# The nlevels is high to emphasize the minimum value in blue.
contour(V0.seq, sv.seq, z, xlab = expression(paste(V[0])), 
        ylab = expression(paste(sigma[V])), 
        lwd = 2, nlevels = 300)
points(V0, sv, pch = 19, col = "blue", cex = 2)
Figure 2.7: Minimization of the function: a contour view.

The following plot cannot be seen in a PDF output, but it can be seen in a browser as an interactive plot. This green plot is nice but it is not interactive. We could plot interactive three-dimension plots with other R packages like plotly and show them in a html context or java environment.

Code
plot_ly(type = "surface" , x = sv.seq, y = V0.seq , z = z ) %>%
layout(#title = "Minimization of f at V0=12.395 an sv=0.212",
       scene = list(xaxis = list(title = "Assets volatility"), 
                    yaxis = list(title = "Assets", 
                    zaxis = list(title = "f=F^2+G^2"))))  %>%
hide_colorbar()
Figure 2.8: Minimization of the function: an interactive view.
Code
#add_markers(y = 12.3957765, x = 0.2123041, z = 0, inherit = TRUE)

The Merton model allows us to calculate more values in this credit risk assessment.

Code
# Other results in the Merton's model.
market_value_debt <- V0 - E0
pv_promised_payment_debt <- D * exp(-rf * TT)
Expected_Loss <- (pv_promised_payment_debt - market_value_debt) /
  pv_promised_payment_debt
recovery_rate <- 1 - (Expected_Loss / pd)
ans <- data.frame(market_value_debt, pv_promised_payment_debt, 
                  Expected_Loss, recovery_rate, pd)
kable(t(ans), caption = "Other results in the Merton's model.")
Other results in the Merton’s model.
market_value_debt 9.3957765
pv_promised_payment_debt 9.5122942
Expected_Loss 0.0122492
recovery_rate 0.9035039
pd 0.1269396

Could we figure out the implied bond yield spread? It would be similar to the fair interest rate that this firm has to pay given its probability of default and recovery rate. Take Hull’s equation 24.2 as a reference: \(\lambda(T)=S(T)/(1-R)\) and solve for \(S(T)\): \(S(T)=\lambda(T)\times(1-R)\), so \(S(T)=(0.12693963)\times(1-0.90350393)\). This is 122.4918 basis points more than the risk-free rate. Then, with five parameters we can calculate (among other things) the firm’s probability of default and the correspondent yield spread of this firm’s bond. It would be interesting to compare this value with the market yield spread and evaluate whether the market yield is over or underestimated according to our theoretical value.

This is the spread function.

Code
spread <- function(E0, se, rf, TT, D) { 
  eq24.3 <- function(V0, sv) { 
  ((V0 * pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) - 
      D * exp(-rf * TT) * pnorm(((log(V0 / D) +
  (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) - sv * sqrt(TT)) - E0)) }
eq24.4 <- function(V0, sv) { 
  ((pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) *
      sv * V0 - se * E0)) }
  min.footnote10 <- function(x) 
  (eq24.3(x[1], x[2]))^2 + (eq24.4(x[1], x[2]))^2
V0_sv <- optim(c(1, 1), min.footnote10)
V0 <- V0_sv$par[1]
sv <- V0_sv$par[2]
pd <- pnorm(-(((log(V0 / D) + (rf + sv^2 / 2) * TT) /
          (sv * sqrt(TT))) - sv * sqrt(TT)))

market_value_debt <- V0 - E0
pv_promised_payment_debt <- D * exp(-rf * TT)
Expected_Loss <- (pv_promised_payment_debt - market_value_debt) /
  pv_promised_payment_debt
recovery_rate <- 1 - (Expected_Loss / pd)
spread <- pd * (1-recovery_rate)
spread
}

Let’s evaluate the spread function.

Code
spread(E0 = 3, se = 0.8, rf = 0.05, TT = 1, D = 10)
[1] 0.01224917

2.3 Inside view of the Merton’s model.

The model assumes that the assets follow a geometric Brownian motion stochastic process. A regular Wiener stochastic process looks like this.

Code
embed_url("https://youtu.be/X-VRWeqG_I8")

Let’s go back to our main example and assume the daily evolution of the market value of the firm assets over a year. Here are 10 simulated paths of \(V_t\) from \(t=0,...,T\). The reference of this section is (Hull 2015) Chapter 14 and 15.

Code
# Merton assumes V follows a geometric Brownian motion process.
V0.sim <- function(i) {  # the argument i is not used here.
  GBM(N = 365, T = 1, t0 = 0, x0 = V0, theta = 0.05, sigma = sv) }
set.seed(3) # Reproducibility  
paths <- sapply(1:10, V0.sim) # Create 10 paths for V.
# Plot results.
matplot(paths, type = "l", col = "black", lwd = 1, lty = 1,
        ylab = expression(paste(V[t])), 
        xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")
points(rep(366, 8), paths[366, paths[366, ] > 10], 
       pch = 1, cex = 1.5, col = "green")
points(rep(366, 2), paths[366, paths[366, ] < 10], 
       pch = 1, cex = 1.5, col = "red")
Figure 2.9: Simulation of 10 paths of the assets.

These simulations can be interpreted as the evolution of the value of the firm’s assets in 10 parallel universes. In the past, some students have called this figure la gráfica de los pelitos, this is OK as long as you understand the figure implications. The right name is 10 simulated geometric Brownian motion processes. Let’s count the cases in which \(V_t<D\) note that here I use \(t\) and not \(T\), so we are counting the cases in which at least at some time \(t\) the value of the assets were lower than 10.

Code
# Which path went lower than D?
which(colSums(paths < 10) > 0)
[1] 2 4 9

The value of the assets was lower than $10 at some time in the second, fourth and ninth alternate universes. Let’s plot these cases.

Code
# There might be an easier way to select these cases.
matplot(paths[,which(colSums(paths < 10) > 0)], type = "l", 
        col = c("green", "blue", "red"), lwd = 2, lty = 1,
        ylab = expression(paste(V[t])), 
        xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")
Figure 2.10: Note that the blue path finally did not default. Recovery is possible as in the real life.

The blue path was lower than 10 at some time, but at \(t=T\) it is clearly higher than 10. For this reason, the blue path does not represent a firm’s default. See how the assets are not high enough to pay the debt at maturity only in two cases: green and red. Thus, according to this simulation the probability of default is 20%. Here is how we can calculate the cases in which \(V_T<D\).

Code
sum(paths[366,] < 10) / 10
[1] 0.2

The probability of default of 20% can be disappointing because it is significantly higher than 12.693963%. This is because the number of simulations is small. The following examples show how these values tend to converge as we increase the number of simulations from 10 to 100 and to 100,000.

Here are 100 simulated paths of \(V_0\) to \(V_T\).

Code
set.seed(3) # Reproducibility  
paths <- sapply(1:100, V0.sim) # Create 100 paths.
# Plot results.
matplot(paths, type = "l", col = "black", lwd = 1, lty = 1,
        ylab = expression(paste(V[t])), 
        xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")
Figure 2.11: Simulation of 100 paths of the assets.

Given the simulation above, the probability of default is 17%. This is closer to 12.693963%.

Code
sum(paths[366,] < 10) / 100
[1] 0.17

Let’s see all the 17 cases that end in default.

Code
# These are the 17 default cases.
matplot(paths[,which(paths[366,] < 10)], type = "l", col = "black", 
        lwd = 1, lty = 1, ylab = expression(paste(V[t])), 
        xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")
Figure 2.12: Default cases.

Now, let’s see the 11 cases in which \(V_t\) eventually went lower than $10 but at the end did not default.

Code
# V_t < 10
almost_default <- which(colSums(paths < 10) > 0)
# V_T < 10
default <- which((paths[366,] < 10))
# There should be an easier way to do this:
matplot(paths[,almost_default[which(almost_default %in% 
                                      default == FALSE)]], 
        type = "l", col = c(1:11), lwd = 1, lty = 1, 
        ylab = expression(paste(V[t])), 
        xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")
Figure 2.13: All these paths went lower than 10 at some point, but at the end the value of the assets are higher than 10. These are paralell universes in which the firm finally survived after some drama.

Here are 100,000 simulated \(V_T\) showed in a histogram. The reference for this section is 15.1 ((Hull 2015)). Note that instead of drawing the complete paths, we can directly get the distribution of \(V_T\). The resulting distribution is log-normal.

Code
set.seed(13)
# 100,000 values of VT at once.
VT <- exp(rnorm(100000, log(V0) + (rf - (sv^2) / 2)*TT, sv*sqrt(TT)))
# Plot results.
h <- hist(VT, 100, plot = FALSE)
ccat <- cut(h$breaks, c(-Inf, D, Inf))
# You may also need to change the xlim if a big red square appears as changes in maturity may imply larger values of VT. I recommend trying without xlim and see.
plot(h, main = NULL, col = c("red", "blue")[ccat], 
     xlab = "Value of the assets at maturity", xlim = c(4, 30))
legend("topright", 
legend = c("The red area is 12.657%, and represents
the simulated probability of default.
The probability of default of the model
following the textbook example is 
N(-d_2)=12.69396%. As we can see,
both are very close."),
col = c("red", "blue"), pch = c(19), bg = "white", bty = "n", cex = 0.8)
Figure 2.14: Histogram of 100,000 values of the assets at maturity.

The probability of default can be calculated as:

Code
sum(VT < 10) / 100000
[1] 0.12657

Note that 0.12657 is now much closer than 0.12693963. Convergence achieved.

This firm can be represented as an European call option payoff. The payoff of a typical stock option is \(c=max(S_T-K, 0)\) or equivalently in terms of Merton’s model: \(E_T=max(V_T-D, 0)\).

Code
ET <- pmax(VT - D, 0) # payoff function of a typical call option.
plot(sort(VT), sort(ET), type = "l", lwd = 4, xlim = c(5, 20), 
     ylim = c(0, 10), xlab = "Simulated assets at maturity (V_T)",
     ylab = "Simulated equity at maturity (E_T)")
abline(v = 10, lty = 2)
legend("left", legend = c("According to the model,
E_T=0 happens 
in 12.657% of the 
100,000 cases."),
bg = "white", bty = "n", cex = 0.9)
Figure 2.15: A view of the firm’s balance sheet in the future. Looks like a typical European call option payoff.

This diagram show the firm’s market values in the future at maturity. This is because the \(E_T\) is a function of \(V_T\) and \(D\). As long as \(V_T<D\), then the value of \(E_T=0\). Otherwise, the value of \(E_T= V_T-D\), just as the accounting equation suggest but in future terms. Note that the Merton’s model says nothing about the estimated value of \(E_T\), instead we have an estimate about how likely is that \(V_T<D\), or in other words how likely is that \(E_T<0\).

Then, we know that \(E_T= max(V_T-D, 0)\). The Black-Scholes-Merton formula can also estimate the theoretical value of \(E_0\). This is, if the observable market value given by the market capitalization available in any financial site is \(E_0=3\), we can retrieve the theoretical value of \(E_0\) so we can compare whether the firm observable market value is over or undervalued. In other words, the Black-Scholes-Merton model can give us an estimate of the firm value. In particular, the model gives the value of the equity today as:

\(E_0=V_0N(d_1)-De^{-rT}N(d_2)\)

This is equation 24.3 in (Hull 2015). To estimate \(E_0\), we need \(V_0\) and \(\sigma_V\) and these two values were already estimated before with the implementation of the min.footnote10 function above. It is true that this minimization requires \(E_0\), the point here is to use the observable market capitalization \(E_0=3\) to find out the unobservable \(V_0\) and \(\sigma_V\) and then estimate the theoretical value of \(E_0\).

The code is:

Code
# Black-Scholes call.
E0.theo <- function(V0, D, rf, TT, sv) {
  d1 <- (log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))
  d2 <- d1 - sv * sqrt(TT)
  c <- V0 * pnorm(d1) - D * exp(-rf * TT) * pnorm(d2)
}
V0 <- 12.39578 # Assets.
D <- 10 # Debt.
rf <- 0.05 # Risk-free rate.
TT <- 1 # Maturity.
sv <- 0.2123041 # Volatility of assets.
(E0.theo(V0, D, rf, TT, sv))
[1] 3.000357

Then, we can argue that the market value of the firm is slightly undervalued since the theoretical value of the firm is 3.000357 compared with 3. This approach opens the possibility to value firms. It also opens the possibility to compare the book value in the Balance sheet of the total assets versus the estimate \(V_0\). This will allow us to understand the difference between book value and market value of the assets, not only of the equity. We can even implement a similar analysis about the market value of the debt just as we did before.

2.4 The probability of default as a function of some parameters.

We can expand our analysis by evaluating how changes in parameters change the probability of default in the context of the Merton model. This is interesting because we can move from estimating a probability of default to propose changes in the firm’s parameters like the capital structure to reduce a firm’s probability of default. First, we need the probability of default as a function.

Code
pd <- function(E0, se, rf, TT, D) { 
  eq24.3 <- function(V0, sv) { 
  ((V0 * pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) -
  D * exp(-rf * TT) * pnorm(((log(V0 / D) +
  (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) - sv * sqrt(TT)) - E0)) }
eq24.4 <- function(V0, sv) { 
  ((pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) *
      sv * V0 - se * E0)) }
  min.footnote10 <- function(x) 
  (eq24.3(x[1], x[2]))^2 + (eq24.4(x[1], x[2]))^2
V0_sv <- optim(c(1, 1), min.footnote10)
V0 <- V0_sv$par[1]
sv <- V0_sv$par[2]
pd <- pnorm(-(((log(V0 / D) + (rf + sv^2 / 2) * TT) /
          (sv * sqrt(TT))) - sv * sqrt(TT)))
pd }

Now we have a convenient function: \(pd = f(E_0, \sigma_E, rf, T, D)\). Remember these five parameters are required to estimate \(V_0\) and \(\sigma_V\) before calculating \(pd\).

Code
# Evaluate the case of the textbook example.
pd1 <- pd(3, 0.8, 0.05, 1, 10)
pd1
[1] 0.1269396

Now, we create vectors of parameters and evaluate the pd function.

Code
l = 50 # Vectors of length 50.
# Create vectors of the parameters.
x.E <- seq(from = 1, to = 20, length.out = l)
x.rf <- seq(0, 4, length.out = l)
x.D <- seq(1, 20, length.out = l)
x.T <- seq(0.5, 20, length.out = l)
x.se <- seq(0, 3, length.out = l)
# Evaluate the pd at different values.
pd.E <- mapply(pd, x.E, se, rf, TT, D)
pd.rf <- mapply(pd, E0, se, x.rf, TT, D)
pd.D <- mapply(pd, E0, se, rf, TT, x.D)
pd.T <- mapply(pd, E0, se, rf, x.T, D)
pd.se <- mapply(pd, E0, x.se, rf, TT, D)

Let’s plot the probability of default as a function of equity. What we are doing here is to evaluate the probability of default function 50 times, as the \(E_0\) has 50 values from 1 to 20. By doing this, we end up with 50 values of the probability of default.

Code
plot(x.E, pd.E, type = "l", ylab = "Probability of default", 
     xlab = "Equity at time zero", lwd = 3, ylim = c(0, 0.15))
lines(x.E, pd.E)
abline(v = E0, lty = 2)
abline(h = pd1, lty = 2)
points(E0, pd1, pch = 1, cex = 3, col = "red", lwd = 2)
Figure 2.16: The probability of default as a function of the equity at time zero.

These relationships between the parameters and the probability of default are all non-linear. This is interesting because according to the model, the current levels of the parameters are important when deciding which parameter might lead to a high or low impact over the probability of default.

Code
plot(x.rf, pd.rf, type = "l", ylab = "Probability of default", 
     xlab = "Risk-free rate", lwd = 3)
abline(h = 0, lty = 2)
abline(v = rf, lty = 2)
abline(h = pd1, lty = 2)
points(rf, pd1, pch = 1, cex = 3, col = "red", lwd = 2)
Figure 2.17: The probability of default as a function of the risk-free rate.

Probability of default as a function of debt. Note that adding 1 or subtracting 1 unit of debt has different impact over the probability of default.

Code
plot(x.D, pd.D, type = "l", ylab = "Probability of default", 
     xlab = "Debt", lwd = 3, ylim = c(0, 0.15))
abline(h = 0, lty = 2)
abline(v = D, lty = 2)
abline(h = pd1, lty = 2)
points(D, pd1, pch = 1, cex = 3, col = "red", lwd = 2)
Figure 2.18: The probability of default as a function of the debt.

Probability of default as a function of maturity. Note that asking for a higher debt maturity (as in a negotiation), do not lead to a lower probability of default if we keep the rest of the parameters unchanged in our firm. Imagine we arrange a meeting with the bank manager and we ask for more time to pay our debt. The bank manager should not (in principle) extend the debt maturity as long as we commit to do some changes in the firm. For example, we can negotiate a longer maturity and promise to increase the firm’s equity.

Code
plot(x.T, pd.T, type = "l", ylab = "Probability of default", 
     xlab = "Maturity", lwd = 3)
abline(h = 0, lty = 2)
abline(v = TT, lty = 2)
abline(h = pd1, lty = 2)
points(TT, pd1, pch = 1, cex = 3, col = "red", lwd = 2)
Figure 2.19: The probability of default as a function of the debt’s maturity.

Probability of default as a function of the volatility of the equity \(\sigma_E\).

Code
plot(x.se, pd.se, type = "l", ylab = "Probability of default",
     xlab = "Standard deviation of equity", lwd = 3)
abline(h = 0, lty = 2)
abline(v = se, lty = 2)
abline(h = pd1, lty = 2)
points(se, pd1, pch = 1, cex = 3, col = "red", lwd = 2)
Figure 2.20: The probability of default as a function of the volatility of the equity.

In the previous plots we evaluate the probability of default function by changing one of the following parameters: \(E_0\), \(\sigma_E\), \(rf\), \(T\) or \(D\). We were able to do that because we constructed a vector of 50 different values for these five parameters. Note that \(V_0\) and \(\sigma_V\) do not remain fixed because they are at the same time a function of \(E_0\), \(\sigma_E\), \(rf\), \(T\) and \(D\). Now, let’s start by plotting the probability of default as a function of \(E_0\) and \(\sigma_E\).

One alternative is to evaluate the probability of default function 50 times by taking the 50 pairs of values of \(E_0\) and \(\sigma_E\). By doing that, we will end with three vectors of size 50 that we can plot as a three dimension scatter plot.

Code
ED <- mapply(pd, x.E, x.se, rf, TT, D)
p.ED <- scatterplot3d(x.E, x.se, ED, pch = 16, type = "h", color = 1:50, 
                      angle = 100, xlab = "Equity at time zero", 
                      ylab = "Volatility of equity", 
                      zlab = "Probability of default")
p.ED$points3d(E0, se, pd1, type = "h", col = "red", pch = 20, cex = 3)  
Figure 2.21: The probability of default as a function of the equity at time zero and the volatility of equity.

The red point represent the case of the original textbook example in which \(pd = f(E_0=3, \sigma_E=0.8,rf=0.05,T=1,D=10)=0.1269396\). Note that the red point is not part of the 50 plotted observations simply because there is no case in which the probability of default function is evaluated in \(E_0=3, \sigma_E=0.8\). Although the plot above is not incorrect, it might be incomplete as we are not showing all possible values of the probability of default. One alternative to show a more complete plot is to evaluate all possible combinations of \(E_0\) and \(\sigma_E\) to evaluate the probability of default function. If we do that, we will have 50 values for \(E_0\) and \(\sigma_E\) that represent the \(x\) and \(y\) axis, and a \(50 \times 50\) matrix containing the probability of default. This allows us to plot a surface plot and a contour plot.

Code
# Create the empty matrix.
p_E_se <- matrix(0, nrow = l, ncol = l)
# Fill the empty matrix with probability of default values.
for(i in 1 : l){ # Is there an easier way to do this?
  for(j in 1 : l){
    p_E_se[i,j] <- mapply(pd, x.E[i], x.se[j], rf, TT, D) } }

Let’s plot the results.

Code
# Plot results.
p.ED1 <- persp(x.E, x.se, p_E_se, zlab = "pd", 
               xlab = "Equity at time zero", 
               ylab = "Volatility of equity", theta = 60, phi = 10, 
               expand =  0.5, col = "orange", shade = 0.2, 
               ticktype = "detailed") 
# Add the original pd value as in the textbook example.
points(trans3d(E0, se, pd1, p.ED1), cex = 2, pch = 19, col = "blue")
Figure 2.22: The probability of default as a function of the equity at time zero and the volatility of equity: A plane or surface view.

Now the plot becomes a plane. A contour plot simplify the reading of a three dimension plot by reducing it into two dimensions: \(x\) and \(y\) coordinates. The value of the probability of default is represented by the contour lines.

Code
contour(x.E, x.se, p_E_se, xlab = "Equity at time zero", 
        ylab = "Volatility of equity", lwd = 2)
points(E0, se, pch = 19, col = "blue", cex = 2)
Figure 2.23: The probability of default as a function of the equity at time zero and the volatility of equity: A contour view.

The blue point is the original case. Note that the value is slightly higher than 0.1, which is consistent with the \(pd=0.1269396\).

Code
l = 40 # Vectors of length 50.
# Create vectors of the parameters.
x.E <- seq(from = 2, to = 4, length.out = l)
x.se <- seq(0.2, 1.5, length.out = l)
# Create the empty matrix.
p_E_se <- matrix(0, nrow = l, ncol = l)
# Fill the empty matrix with probability of default values.
for(i in 1 : l){ # Is there an easier way to do this?
  for(j in 1 : l){
    p_E_se[i,j] <- mapply(pd, x.E[i], x.se[j], rf, TT, D) } }
Code
plot_ly(type = "surface" , x = x.se, y = x.E , z = p_E_se ) %>%
layout(#title = "Minimization of f at V0=12.395 an sv=0.212",
       scene = list(xaxis = list(title = "Volatility of equity"), 
                    yaxis = list(title = "Market value of equity"), 
                    zaxis = list(title = "Probability of default")))  %>%
  hide_colorbar()
Figure 2.24: The probability of default as a function of the equity at time zero and the volatility of equity: An interactive view.
Code
#add_markers(y = 3, x = 0.8, z = 0.1269396, inherit = TRUE)

2.5 GoT: capital structure.

The problem. Daenerys Targaryen owns a big manufacturing firm that produces fire extinguishers. Surprisingly, the firm data corresponds exactly as the data in the example 24.3 ((Hull 2015)). She is planning to ask the Iron Bank of Braavos for a loan using the peaceful civilized way (without dragons). However, she would like to reduce the probability of default of the firm first, in that way she might negotiate better credit conditions. Daenerys has some understanding about very basic finance because she knows that she could either reduce the debt, or increase the capital in order to reduce the probability of default. Doing both alternatives at the same time is clearly more expensive so she is not very keen about it.

What is the best strategy to reduce the probability of default? Reduce the debt by 2 or increase the capital by 2?

Code
D.seq <- seq(from = 0, to = 13, by = 0.1)
# Evaluate pd function: E0 changes from 1 to 5; debt goes from 0 to 13.
E1 <- mapply(pd, 1, se, rf, TT, D.seq)
E2 <- mapply(pd, 2, se, rf, TT, D.seq)
E3 <- mapply(pd, 3, se, rf, TT, D.seq)
E4 <- mapply(pd, 4, se, rf, TT, D.seq)
E5 <- mapply(pd, 5, se, rf, TT, D.seq)

Plot the results.

Code
# We use these vectors for colors and legends in the following plots.
colors <- c("green", "purple", "black", "blue", "red")
leg <- c("E0=1", "E0=2", "E0=3 (initial value)", "E0=4", "E0=5")
# Plot.
plot(D.seq, E1, type = "l", col = "green", lwd = 3,
     xlab = "Debt value. Initially, D=10",
     ylab = "Probability of default. Initially PD=12.69%")
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = D, lty = 2)
abline(h = pd1, lty = 2)
legend("bottomright", legend = leg, lwd = rep(3, 5), col = colors, 
       bg = "white")
points(D, pd1, pch = 19, cex = 2)
Figure 2.25: GoT problem. Probability of default and capital structure.

Let’s illustrate the alternatives as clear as possible.

Code
plot(D.seq, E1, type = "l", col = "green", lwd = 3,
     xlab = "Debt value. Initially, D=10",
     ylab = "Probability of default. Initially PD=12.69%")
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = D, lty = 2)
abline(v = 8, lty = 2)
abline(h = pd1, lty = 2)
abline(h = pd(E0, se, rf, TT, 8), lty = 2)
abline(h = pd(5, se, rf, TT, D), lty = 2)
legend("bottomright", legend = leg, lwd = rep(3, 5), col = colors, 
       bg = "white")
points(D, pd1, pch = 19, cex = 2)
points(D, pd(5, se, rf, TT, D), pch = 19, cex = 2, col = "red")
points(8, pd(3, se, rf, TT, 8), pch = 19, cex = 2, col = "yellow")
Figure 2.26: GoT solution. Probability of default and capital structure.

Consider a different initial situation. Now, \(E_0=2\) and \(D=2\) and the rest remains the same. In this case, the probability of default is now 7.13%. What is the best strategy to reduce the probability of default? Reduce the debt by 1 or increase the capital by 1?

Code
leg2 <- c("E0=1", "E0=2 (initial value)", "E0=3", "E0=4", "E0=5")

plot(D.seq, E1, type = "l", col = "green", lwd = 3,
     xlab = "Debt value. Initially, D=2",
     ylab = "Probability of default. Initially PD=7.13%", 
     xlim = c(0, 2.5), ylim = c(0, 0.08))
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = 2, lty = 2)
abline(h = pd(2, se, rf, TT, 2), lty = 2)
legend("bottomright", legend = leg2, lwd = rep(3, 5), col = colors, 
       bg = "white", cex = 0.6)
points(2, pd(2, se, rf, TT, 2), pch = 19, cex = 2, col = "purple")
Figure 2.27: GoT problem 2. Probability of default and capital structure.

The solution can be illustrated as follows:

Code
plot(D.seq, E1, type = "l", col = "green", lwd = 3,
     xlab = "Debt value. Initially, D=2",
     ylab = "Probability of default. Initially PD=7.13%", 
     xlim = c(0, 2.5), ylim = c(0, 0.08))
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = 2, lty = 2)
abline(v = 1, lty = 2)
abline(h = pd(2, se, rf, TT, 2), lty = 2)
abline(h = pd(2, se, rf, TT, 1), lty = 2)
abline(h = pd(E0, se, rf, TT, 2), lty = 2)
legend("bottomright", legend = leg2, lwd = rep(3, 5), col = colors, 
       bg = "white", cex = 0.6)
points(2, pd(2, se, rf, TT, 2), pch = 19, cex = 2, col = "purple")
points(1, pd(2, se, rf, TT, 1), pch = 19, cex = 2, col = "yellow")
points(2, pd(E0, se, rf, TT, 2), pch = 19, cex = 2)
Figure 2.28: GoT solution 2. Probability of default and capital structure.

Interesting.

Consider a third problem. In a parallel universe, Daenerys Targaryen’s firm has some liquidity short-term troubles. She needs either more cash now or some more time to pay her current debt. She would like to know which alternative leads to the lowest increase of the probability of default. Add $2 more to her current debt, or ask for a half year more time to pay her current debt?

Code
T1 <- mapply(pd, E0, se, rf, 0.5, D.seq)
T2 <- mapply(pd, E0, se, rf, 1, D.seq)
T3 <- mapply(pd, E0, se, rf, 1.5, D.seq)
T4 <- mapply(pd, E0, se, rf, 2, D.seq)
T5 <- mapply(pd, E0, se, rf, 2.5, D.seq)
Code
leg3 <- c("T=2.5", "T=2", "T=1.5", "T=1", "T=0.5")

plot(D.seq, T5, type = "l", col = "green", lwd = 3,
     xlab = "Debt value. Initially, D=6",
     ylab = "Probability of default. Initially PD=20.33%")
lines(D.seq, T4, col = "purple", lwd = 3)
lines(D.seq, T3, col = "black", lwd = 3)
lines(D.seq, T2, col = "blue", lwd = 3)
lines(D.seq, T1, col = "red", lwd = 3)
abline(v = 6, lty = 2)
abline(h = pd(E0, se, rf, 1.5, 6), lty = 2)
legend("bottomright", legend = leg3, lwd = rep(3, 5), col = colors, 
       bg = "white")
points(6, pd(E0, se, rf, 1.5, 6), pch = 19, cex = 2)
Figure 2.29: GoT problem 3. More debt or longer maturity?

Let’s visualize the result.

Code
plot(D.seq, T5, type = "l", col = "green", lwd = 3,
     xlab = "Debt value. Initially, D=6",
     ylab = "Probability of default. Initially PD=20.33%")
lines(D.seq, T4, col = "purple", lwd = 3)
lines(D.seq, T3, col = "black", lwd = 3)
lines(D.seq, T2, col = "blue", lwd = 3)
lines(D.seq, T1, col = "red", lwd = 3)
abline(v = 6, lty = 2)
abline(v = 8, lty = 2)
abline(h = pd(E0, se, rf, 1.5, 6), lty = 2)
abline(h = pd(E0, se, rf, 2, 6), lty = 2)
abline(h = pd(E0, se, rf, 1.5, 6), lty = 2)
abline(h = pd(E0, se, rf, 1.5, 8), lty = 2)
legend("bottomright", legend = leg3, lwd = rep(3, 5), col = colors, 
       bg = "white")
points(6, pd(E0, se, rf, 1.5, 6), pch = 19, cex = 2)
points(8, pd(E0, se, rf, 1.5, 8), pch = 19, cex = 2, col = "yellow")
points(6, pd(E0, se, rf, 2, 6), pch = 19, cex = 2, col = "purple")
Figure 2.30: GoT answer 3. More debt or longer maturity?

Nice.

Code
# Copula models
library(MASS)
library(knitr)
library(viridis)
library(dplyr)
library(mnormt) #dnorm()
library(ggplot2)
library(plotly)
library(rayshader)
library(vembedr)

3 The Gaussian copula model.

Copulas allow us to decompose a joint probability distribution into their marginals (which by definition have no correlation) and a function which couples (hence the name) them together and thus allows us to specify the correlation separately. The copula is that coupling function. Here, we introduce the simplest type of copulas into a very common problem in credit risk which is the time to default.

3.1 The basics.

In a finance-context, the variable \(x\) represents a firm’s future (unknown) performance measure that goes from \(-4\) to \(4\) in the horizontal axis. Strictly speaking, more extreme values of \(x\) like \(-\infty\) and \(+\infty\) are theoretically possible but are very rare and happen quite infrequently at least in real-life situations. At the moment, we do not care too much about what kind of performance measure this is, it could be liquidity for example, solvency, or any other that it is normalized to have values between \(-4\) and \(4\). In a statistics-context, we can think that \(x\) is equivalent to the so-called \(z\)-score in the context of the standardized normal distribution function.

Code
# The standard normal distribution function is: y = f(x).
x <- seq(-4, 4, 0.001) # First define x.
y <- 1 / sqrt(2 * pi) * exp(-x ^ 2 / 2) # Define y as a function of x.
# Now plot.
plot(x, y, type = "l", lwd = 3, col = "blue" , 
     ylab = "Standard normal density function: dnorm(x)")
abline(h = 0, lty = 2)
Figure 3.1: Standard normal distribution function.

The copula model, including the Gaussian, considers that the future firm performance measure \(x\) in the horizontal axis is related with the firm’s probability of default (from 0 to 1, or 0% to 100%) represented by the value of the cumulative density function pnorm() of the normal distribution.

Assume the firm will default in six months as long as the firm performance \(x\) in six months is lower than a given threshold: \(\bar{x}=-2.5\). The threshold \(\bar{x}=-2.5\) in the Gaussian copula model is equivalent to the \(-d_2\) in the Merton model. Then, the six months firm’s probability of default in math notation is \(N(-2.5)\), or pnorm(-2.5)=0.006209665 in R code, which is very low as it is close to 0. It is the red area below, which is 0.62% of the 100% possible area under the curve.

Code
# Useful to construct next plots.
D <- dnorm(x) 
P <- pnorm(x)
Code
plot(x, D, type = "l", lwd = 3, col = "blue" , 
     ylab = "Standard normal density function: dnorm(x)")
abline(h = 0, lty = 2)
abline(v = -2.5, lty = 2)
polygon(c(x[x < -2.5], -2.5), c(D[x < -2.5], 0), col = "red")
Figure 3.2: Six months probability of default.

Now assume the firm will default in one year as long as the firm performance \(x\) in one year is lower than a higher given threshold: \(\bar{x}=-2\). Then, the one year probability of default is \(N(-2)\) or pnorm(-2)=0.02275013 which is still low but higher than the six months probability of default. Note that the new red area is 2.27%.

Code
plot(x, D, type = "l", lwd = 3, col = "blue" , 
     ylab = "Standard normal density function: dnorm(x)")
abline(h = 0, lty = 2)
abline(v = -2, lty = 2)
polygon(c(x[x < -2], -2), c(D[x < -2], 0), col = "red")
Figure 3.3: One year probability of default.

At the moment, we have not discussed how to estimate the threshold values \(\bar{x}\). However, we can say that the Gaussian copula model assumes that the probability of default increases as time passes. As if the firm will eventually default in the future. The six month probability of default 0.62% is low because it is not very likely that the future firm performance lies below -2.5 in six months. The one year probability of default 2.27% is a bit higher because it is more likely that the future firm performance lies below -2 in one year.

Then, the pnorm() R function allows us to transform threshold values \(\bar{x}\) into a probabilities of default. Transform probabilities into \(\bar{x}\) is also possible as the function qnorm() is the inverse of pnorm().

From \(\bar{x}\) to probabilities:

Code
pnorm(-2.5)
[1] 0.006209665
Code
pnorm(-2)
[1] 0.02275013

From probabilities to \(\bar{x}\):

Code
qnorm(0.006209665)
[1] -2.5
Code
qnorm(0.02275013)
[1] -2

Nice.

The function dnorm() is relevant when we implement a graphical approach because it represents how frequent (or how likely) these values are given the standard normal distribution, so in both extreme values of \(x\) the value of dnorm() is low. Here, extreme values of \(x\) can be represented by \(-4\) and \(4\). This function delivers the height of the standard normal distribution, dnorm(4)=0.0001338302, and dnorm(-4)=0.0001338302. Given that the standard normal function is symmetrical, we have that in general: dnorm(-x)=dnorm(x). The letter \(d\) in dnorm() stands for density and it is maximum at \(x=0\). When plotting the density values we get the bell-shaped normal curve.

See how these pnorm(), qnorm(), dnorm() R functions work and relate:

Code
# Here, x has 11 values only.
x.summary <- c(-Inf, -4:4, Inf) # vector of x values to evaluate functions.
ans <- data.frame(x.summary, dnorm(x.summary), 
                  pnorm(x.summary), qnorm(pnorm(x.summary)))
colnames(ans) <- c("x", "dnorm(x)=height", "pnorm(x)=pd", "qnorm(pd)=x")
kable(ans, caption = "Review of standard normal distribution functions.", 
      digits = 5)
Review of standard normal distribution functions.
x dnorm(x)=height pnorm(x)=pd qnorm(pd)=x
-Inf 0.00000 0.00000 -Inf
-4 0.00013 0.00003 -4
-3 0.00443 0.00135 -3
-2 0.05399 0.02275 -2
-1 0.24197 0.15866 -1
0 0.39894 0.50000 0
1 0.24197 0.84134 1
2 0.05399 0.97725 2
3 0.00443 0.99865 3
4 0.00013 0.99997 4
Inf 0.00000 1.00000 Inf

You can type for example ?dnorm in the RStudio console to see more details about this (and other) functions.

Note that the only distribution function that we are currently analyzing is the standard normal (Gaussian) distribution function. A standard normal distribution function is the one that has mean 0 and variance 1. There are other copula models that assume other kinds of distributions. These other kinds of copulas are useful when we are interested in modeling cases in which extreme values are more likely to happen compared with the standard normal distribution. Understanding Gaussian copulas allows you to understand other more elaborated copulas.

Instead of a density function, we can plot the cumulative probability distribution. Now, we do not need the \(N(\cdot)\) function to compute the probability as the vertical axis already represents the probability of default.

Code
plot(x, P, type = "l", lwd = 3,
     ylab = "Cumulative probability function: pnorm(x)", xlab = "x")
abline(h = 0.5, lty = 2)
abline(v = 0, lty = 2)
Figure 3.4: The higher the x, the higher the probability of default.

Both in the same plot.

Code
plot(x, P, type = "l", lwd = 3,
     ylab = "pnorm(x) and dnorm(x)", xlab = "x")
lines(x, D, type = "l", lwd = 3, col = "blue")
abline(h = 0.5, lty = 2)
abline(v = 0, lty = 2)
Figure 3.5: Cumulative density and probability functions.

3.2 Introduction to example 24.6.

Here, we start analyzing Hull (2015) example 24.6. We do not analyze the 10 firms yet as in the original example. Instead, we first make some sense about the data, the relevant analysis, the logic, and the model basics before dealing with the full features in example in Hull (2015) 24.6.

Recall that, pnorm() leads to a probability, whereas qnorm() leads to a value of \(x\). According to Hull (2015), the cumulative probabilities of default of 1%, 3%, 6%, 10% and 15% are taken as given for the maturities of 1, 2, 3, 4 and 5 years respectively. To implement the model, we take this information to derive the corresponding threshold values \(\bar{x}\):

Code
pd <- c(0.01, 0.03, 0.06, 0.1, 0.15) # Probabilities of default per year.
x.y <- qnorm(pd) # Transform probabilities of default into x values.
ans <- data.frame(1:5, pd, x.y) # Gather the results.
colnames(ans) <- c("year", "pd", "x (values given in Hull)")
kable(ans, caption = "Main parameters.")
Main parameters.
year pd x (values given in Hull)
1 0.01 -2.326348
2 0.03 -1.880794
3 0.06 -1.554774
4 0.10 -1.281552
5 0.15 -1.036433

Note that the example assumes that the probability of default increases as we consider a longer maturity (from 1 year to 5 years). Probabilities above are cumulative, so they go from year 0 to year 1, from year 0 to year 2 and so on. To calculate the probability of default during a specific year we need to calculate the differences. In particular:

Code
ans <- data.frame(diff(pd))
colnames(ans) <- c("PD")
rownames(ans) <- c("x.y1 to x.y2", "x.y2 to x.y3", 
                   "x.y3 to x.y4", "x.y4 to x.y5")
kable(ans, caption = "PD at specific years.")
PD at specific years.
PD
x.y1 to x.y2 0.02
x.y2 to x.y3 0.03
x.y3 to x.y4 0.04
x.y4 to x.y5 0.05

This is how we can illustrate the case of a probability of default of 15% in 5 years. The green area represents the 15% of the whole area below the bell-shaped curve.

Code
plot(x, D, type = "l", col = 'black', lwd = 3, ylim = c(0, 0.4), xlab = "x",
     ylab = "Standard normal density function: dnorm(x)")
abline(h = 0, lty = 2)
abline(v = x.y[5], lty = 2)
polygon(c(x[x < x.y[5]], x.y[5]), 
        c(D[x < x.y[5]], 0), col = "green")

legend("topright", legend=c(
"pd(5years)=15%

The left green area
represents 15% of the
whole bell-shape.

N^-1(0.15)=-1.036433
N(-1.036433)=0.15"),
bg = "white", pch = 19, cex = 0.8, bty = "n", col = "green")
Figure 3.6: Gaussian density distribution function.

A complementary view is the cumulative probability function. Let’s illustrate the same case: a probability of default of 15% in 5 years. In this case, we do not need to calculate the area since the y-axis already represents the probability.

Code
plot(x, P, type = "l", col = 'black', lwd = 3, ylim = c(0, 1),
     xlab = "x",
     ylab = "Cumulative probability function: pnorm(x)")
abline(h = 0, lty = 2)
abline(h = 1, lty = 2)
lines(seq(-5, x.y[5], length.out = 2), rep(pnorm(x.y[5]), 2), 
      col = "green", lwd = 3)
lines(rep(x.y[5], 2), seq(0, pnorm(x.y[5]), length.out = 2), 
      col = "green", lwd = 3)
points(x.y[5], 0.15, pch = 19, col = "green", cex = 2)
legend("right", legend=c(
"pd(5years)=15%

N^-1(0.15)=-1.036433
N(-1.036433)=0.15"),
bg = "white", pch = 19, cex = 1, bty = "n", col = "green")
Figure 3.7: Gaussian probability distribution function.

A closer view to the figure above to see the 3-year and 5-year cases:

Code
plot(x, P, type = "l", col = 'black', lwd = 5, ylim = c(0, 0.22), 
     xlim = c(-4, 0), ylab = "Cumulative probability function: pnorm(x)",
     xlab = "x")
abline(h = 0, lty = 2)
abline(h = 1, lty = 2)
lines(seq(-5, x.y[3], length.out = 2), rep(pnorm(x.y[3]), 2), 
      col = "purple", lwd = 3, lty = 2)
lines(rep(x.y[3], 2), seq(0, pnorm(x.y[3]), length.out = 2), 
      col = "purple", lwd = 3, lty = 2)
lines(seq(-5, x.y[5], length.out = 2), rep(pnorm(x.y[5]), 2), 
      col = "green", lwd = 3, lty = 2)
lines(rep(x.y[5], 2), seq(0, pnorm(x.y[5]), length.out = 2), 
      col = "green", lwd = 3, lty = 2)
points(x.y[3], 0.06, pch = 19, col = "purple", cex = 2)
points(x.y[5], 0.15, pch = 19, col = "green", cex = 2)
legend("topleft", legend=c("pd(5years)=15%: N(-1.036433)=0.15",
                           "pd(3years)=6%: N(-1.554774)=0.06"),
pch = 19, col = c("green", "purple"), bg = "white", cex = 1, bty = "n")
Figure 3.8: Gaussian probability distribution function: Zoom version.

Nice.

3.3 One firm.

Now let’s introduce a simulation approach. This is, instead of generating continuous values of \(x\) from \(-4\) to \(4\), we simulate many \(x\) values (10,000 in this case) that follow a standard normal distribution function using the rnorm() function. The simulation approach is useful especially when we are interested in replicating what happens in real-life situations because we can artificially replicate the observed behavior many times and understand what may happen in the future. In other words, x was used before to characterize a perfect normal distribution. Now, we incorporate x.sim that behaves as a normal distribution and allows some randomness just as in real life situations.

See the difference between both approaches to generate \(x\).

Code
N <- 10000 # Number of simulated values.
set.seed(130575) # Reproducibility.
x.sim <- rnorm(N, 0, 1) # Simulation.
kable(head(cbind(x.sim, x)))
x.sim x
-1.7939058 -4.000
-0.7515449 -3.999
-0.3569900 -3.998
0.9629299 -3.997
0.3680735 -3.996
1.0714250 -3.995

Note that the probabilities of default per maturity in the simulated approach are close to the values of the previous section. In particular, \(0.01\) is equivalent to \(0.0102\), and \(0.03\) is equivalent to \(0.0307\). They do not match exactly simply because we are comparing theoretical versus simulated probabilities.

Code
# Function to calculate proportions that we understand as probabilities.
prop <- function(x) {
  ans <- length(x.sim[x.sim <= x]) / N
  }
pd.sim <- mapply(prop, x.y) # Apply the function.
ans <- data.frame(x.y, pd, pd.sim)
colnames(ans) <- c("x", "pd.theo", "pd.sim")
rownames(ans) <- c("y1", "y2", "y3", "y4", "y5")
kable(ans, caption = "Theoretic versus simulated probabilities of default.")
Theoretic versus simulated probabilities of default.
x pd.theo pd.sim
y1 -2.326348 0.01 0.0102
y2 -1.880794 0.03 0.0307
y3 -1.554774 0.06 0.0629
y4 -1.281552 0.10 0.1008
y5 -1.036433 0.15 0.1480

Let’s view the results of the simulated approach. First in a histogram.

Code
# Some parameters we need to plot.
L <- c(-4, 4) # axis limits.
colors2 <- c("blue", "red", "purple", "pink", "green")
legend2 = c("pd(1year)=1.02%: x<=-2.326348", 
"pd(2years)=3.07%: x<=-1.880794", "pd(3years)=6.29%: x<=-1.554774",
"pd(4years)=10.08%: x<=-1.281552", "pd(5years)=14.8%: x<=-1.036433")
# The histogram.
hist(x.sim, 500, xlim = L, ylim = c(0, 100), main = NULL, xlab = "x.sim")
abline(h = 0, lty = 2)
abline(v = x.y[1], lwd = 3, col = "blue")
abline(v = x.y[2], lwd = 3, col = "red")
abline(v = x.y[3], lwd = 3, col = "purple")
abline(v = x.y[4], lwd = 3, col = "pink")
abline(v = x.y[5], lwd = 3, col = "green")
legend("topright", legend = legend2, bg = "transparent", 
       text.col = colors2, cex = 0.9)
Figure 3.9: Simulated Gaussian probability distribution function. Somewhat different with respect to the theoretical.

The area at the left hand side of each colored line represents the cumulative probability of default just as we explained before. In the same way, the area between two colored lines represents the probability of default in a specific period of time.

Now let’s see all the simulated data at once.

Code
plot(x.sim, ylab = "x.sim", pch = ".", ylim = c(-4, 4),
     xlab = "Number of simulaltions")
abline(h = x.y[1], lwd = 2, col = "blue")
abline(h = x.y[2], lwd = 2, col = "red")
abline(h = x.y[3], lwd = 2, col = "purple")
abline(h = x.y[4], lwd = 2, col = "pink")
abline(h = x.y[5], lwd = 2, col = "green")
abline(v = 0, lty = 2)
abline(v = 10000, lty = 2)
# legend("topright", legend = legend2, bg = "white", 
#        text.col = colors2, cex = 0.9)
Figure 3.10: An alternative view.

We normally conduct a simulation approach because we might adapt the distribution function parameters to match what we see in the real life situations. This is how we can reproduce what may happen in the future and at the same time allow for some randomness.

In a different context, here we have an old video in which we drop 2,000 blue balls. Balls are dropped randomly given a normal distribution. Note how the bell-shape distribution is formed as we increase the number of balls.

Code
embed_url("https://youtu.be/P1Ky6-a8DE0")

3.4 Two firms.

Consider the case in which we have two firms instead of one. The main difference now is that instead of simulating one firm we need two. Moreover, each firm follows a standard normal distribution function and both of them are correlated by a given correlation value so the firms are not independent. If they are not independent, then what happens to one firm has some impact on what happens to the other.

In a different context, here we have an old video in which we drop many blue balls. Balls are dropped randomly given a multi-variate normal distribution. Note how the 3-D bell-shape distribution is formed as we increase the number of balls.

Code
embed_url("https://youtu.be/5iv5Zksf_zo")

The case of two firms is not the one presented in Hull (2015) example but it can help us to visualize how the Gaussian copula approach works in a two-dimension plot where the default correlation is 0.2.

The simulation of both firms’ performance measures is x2.

Code
m <- 2 # number of firms
n <- 10000 # number of simulations
rho_pos02 <- 0.2 # correlation
corr_pos02 <- matrix(rep(rho_pos02, m * m), m, m) # correlation matrix
diag(corr_pos02) <- 1
set.seed(130575)
x2 <- mvrnorm(n, mu = rep(0, m), Sigma = corr_pos02)
x2 <- data.frame(x2)
colnames(x2) <- c("f1", "f2")

The matrix x2 length is 10,000 for each firm. In other words, we have 10,000 observations of the performance measure or \(z\)-score for two firms that are related. This matrix is big, but we can visualize the header (the first six observations).

Code
kable(head(x2), caption = "Firms' performance x2.", row.names = TRUE)
Firms’ performance x2.
f1 f2
1 0.0880039 -2.8671108
2 0.0092849 -1.1735732
3 0.0323809 -0.5854275
4 1.5547886 -0.0630241
5 0.9679767 -0.3977597
6 0.7751329 0.8847116

Remember the cumulative probabilities of default are 1%, 3%, 6%, 10% and 15% for the maturities of 1, 2, 3, 4 and 5 years respectively. How do we extract those cases in which both firms will default in 5 years? In rows 15, 53, 61 and so on both firms default at the same time. Note that in all cases the \(x\) values are indeed below -1.036433.

Code
# These names are going to be useful later.
n.year <- c("year 1", "year 2", "year 3", "year 4", "year 5")
n.pd <- c("pd.y1", "pd.y2", "pd.y3", "pd.y4", "pd.y5")
n.f <- c("f1", "f2", "f1 default?", "f2 default?")

# Function to calculate cases in which firms default and probabilities.
fun.X <- function(x) {
  both <- x2[x2$f2 < x & x2$f1 < x, ] # both default.
  any <- x2[x2$f2 < x | x2$f1 < x, ] # at least one firm default.
  onlyf1 <- x2[x2$f2 > x & x2$f1 < x, ] # only firm 1 default.
  onlyf2 <- x2[x2$f1 > x & x2$f2 < x, ] # only firm 2 default.
  one <- x2[(x2$f2 < x & x2$f1 > x | # only one firm default.
                   x2$f2 > x & x2$f1 < x),]
  none <- x2[x2$f2 > x & x2$f1 > x, ] # no firm default.
# Gather results and probabilities in a list.
ans <- list(both = both, any = any, onlyf1 = onlyf1,
  onlyf2 = onlyf2, one = one, none = none,
  both.pd = (nrow(both) / n), any.pd = (nrow(any) / n),
  onlyf1.pd = (nrow(onlyf1) / n), onlyf2.pd = (nrow(onlyf2)/ n),
  one.pd = (nrow(one) / n), none.pd = (nrow(none) / n))
}
# X has all the relevant results for x2.
X <- mapply(fun.X, x.y)

See the cases in which both firms default in 5 years.

Code
# Extract "both" cases, year 5.
both <- data.frame(X[["both", 5]], X[["both", 5]] < x.y[5])
colnames(both) <- n.f
kable(head(both), caption = "Cases in which both firms default in 5 years.", 
      row.names = TRUE)
Cases in which both firms default in 5 years.
f1 f2 f1 default? f2 default?
15 -1.123330 -1.783569 TRUE TRUE
53 -2.724912 -1.235614 TRUE TRUE
61 -1.968789 -1.516563 TRUE TRUE
108 -2.121617 -1.807062 TRUE TRUE
156 -1.559930 -1.132879 TRUE TRUE
157 -1.194204 -2.120824 TRUE TRUE

In total, we have 343 cases in which both firms default at the same time. The first case is number 15, the second 53, the third 61 and so on. It is easy to know the total cases if we count the number of rows.

How do we extract those cases in which any firm will default in 5 years? This is, only firm 1, only firm 2, or both at the same time. This is a less strict condition so we would expect to have more cases to match this new criteria compared with both. Note that in all cases at least one one firm is indeed below -1.036433. In row 1, 2 and 10 firm 2 defaults. In row 15 both firms default. In row 18 and 20 firm 1 and firm 2 default respectively.

Code
any <- data.frame(X[["any", 5]], X[["any", 5]] < x.y[5])
colnames(any) <- n.f
kable(head(any), caption = "Cases in which at least one firm default.", 
      row.names = TRUE)
Cases in which at least one firm default.
f1 f2 f1 default? f2 default?
1 0.0880039 -2.8671108 FALSE TRUE
2 0.0092849 -1.1735732 FALSE TRUE
10 1.4475216 -1.1512274 FALSE TRUE
15 -1.1233296 -1.7835692 TRUE TRUE
18 -1.5829965 -0.7174298 TRUE FALSE
20 -0.9221666 -1.9982869 FALSE TRUE

Now we have 2,660 cases. Considerably more as the | restriction is less strict than the &. Let’s see the cases in which only firm 1 defaults.

Code
onlyf1 <- data.frame(X[["onlyf1", 5]], X[["onlyf1", 5]] < x.y[5])
colnames(onlyf1) <- n.f
kable(head(onlyf1), caption = "Cases in which only firm 1 default.")
Cases in which only firm 1 default.
f1 f2 f1 default? f2 default?
18 -1.582996 -0.7174298 TRUE FALSE
34 -2.842355 1.5400020 TRUE FALSE
35 -2.090320 0.7431275 TRUE FALSE
36 -1.600142 0.3082653 TRUE FALSE
41 -2.193496 -0.8465008 TRUE FALSE
49 -1.764711 -0.3914202 TRUE FALSE

Only firm 2 defaults.

Code
onlyf2 <- data.frame(X[["onlyf2", 5]], X[["onlyf2", 5]] < x.y[5])
colnames(onlyf2) <- n.f
kable(head(onlyf2), caption = "Cases in which only firm 2 default.")
Cases in which only firm 2 default.
f1 f2 f1 default? f2 default?
1 0.0880039 -2.867111 FALSE TRUE
2 0.0092849 -1.173573 FALSE TRUE
10 1.4475216 -1.151227 FALSE TRUE
20 -0.9221666 -1.998287 FALSE TRUE
22 -0.1110555 -1.348605 FALSE TRUE
40 0.1058949 -1.752620 FALSE TRUE

Only one firm default at year 5.

Code
one <- data.frame(X[["one", 5]], X[["one", 5]] < x.y[5])
colnames(one) <- n.f
kable(head(one), caption = "Cases in which only one firm default.")
Cases in which only one firm default.
f1 f2 f1 default? f2 default?
1 0.0880039 -2.8671108 FALSE TRUE
2 0.0092849 -1.1735732 FALSE TRUE
10 1.4475216 -1.1512274 FALSE TRUE
18 -1.5829965 -0.7174298 TRUE FALSE
20 -0.9221666 -1.9982869 FALSE TRUE
22 -0.1110555 -1.3486051 FALSE TRUE

Lastly, when no firm defaults.

Code
none <- data.frame(X[["none", 5]], X[["none", 5]] < x.y[5])
colnames(none) <- n.f
kable(head(none), caption = "Cases in which no firm default.")
Cases in which no firm default.
f1 f2 f1 default? f2 default?
3 0.0323809 -0.5854275 FALSE FALSE
4 1.5547886 -0.0630241 FALSE FALSE
5 0.9679767 -0.3977597 FALSE FALSE
6 0.7751329 0.8847116 FALSE FALSE
7 1.3361498 1.0392145 FALSE FALSE
8 1.3039572 1.5191303 FALSE FALSE

Finally, probabilities.

Code
both.pd <- t(data.frame(X["both.pd",]))
any.pd <- t(data.frame(X["any.pd",]))
onlyf1.pd <- t(data.frame(X["onlyf1.pd",]))
onlyf2.pd <- t(data.frame(X["onlyf2.pd",]))
one.pd <- t(data.frame(X["one.pd",]))
none.pd <- t(data.frame(X["none.pd",]))
ans <- data.frame(both.pd, any.pd, onlyf1.pd, onlyf2.pd, 
                 one.pd, none.pd)
rownames(ans) <- n.year
kable(ans, caption = "Probabilities of default.")
Probabilities of default.
both.pd any.pd onlyf1.pd onlyf2.pd one.pd none.pd
year 1 0.0003 0.0192 0.0103 0.0086 0.0189 0.9808
year 2 0.0019 0.0560 0.0282 0.0259 0.0541 0.9440
year 3 0.0066 0.1141 0.0561 0.0514 0.1075 0.8859
year 4 0.0176 0.1854 0.0864 0.0814 0.1678 0.8146
year 5 0.0343 0.2660 0.1210 0.1107 0.2317 0.7340

So interesting.

Note that 15% is the theoretical probability that one firm will default in 5 years. Here, this 15% is 12.1% for firm 1 and 11.07% for firm 2 when the data is simulated.

We can perform a simple test to verify that everything is alright. For example, this equation must hold: onlyf1+onlyf2=one. Substituting for the year 5: \(0.121+0.1107=0.2317\). As you can see, everything is alright. This equation must hold as well: any-both=one. Substituting for the year 5: \(0.266-0.0343=0.2317\).

Let’s visualize all 10,000 cases. Each dot represents a couple of Firm 1 and Firm 2 \(x\) values and the dotted lines the threshold that represents the probability of default in 5 years.

Code
par(pty = "s") # Figures are shown in a perfect square (not a rectangle).
plot(x2, pch = ".", cex = 1) 
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("bottomright", legend = c(paste(nrow(x2))), bty = "n")
Figure 3.11: All 10,000 simulated cases.

These 10,000 observations are highly concentrated around the mean which is very close to zero. This can be also easily seen in the following density plot.

Code
ggplot(x2, aes(x = f1, y = f2) ) +
  stat_density_2d(aes(fill = ..level..), geom = "polygon", 
                  colour = "white") +
    coord_fixed()
Figure 3.12: All 10,000 simulated cases: A density view.

This is the 3-D rayshader video version of the plot above.

Code
embed_url("https://youtu.be/3p-B19R8RwA")

And a third interactive version.

Code
f <- function(x, y) dmnorm(cbind(x, y), c(0, 0), corr_pos02)
z <- outer(sort(x2[1:100,1]), sort(x2[1:100,2]), f)

#create contour plot
#contour(x2[1:100,1], x2[1:100,2], z)
plot_ly(type = "surface" , x = sort(x2[1:100,2]), 
        y = sort(x2[1:100,1]) , z = z ) %>%
layout(#title = "Surface",
       scene = list(xaxis = list(title = "f1", range = c(-2,2)),
                    yaxis = list(title = "f2", range = c(-2,2)), 
                    zaxis = list(title = "Density"))) %>%
hide_colorbar()
Figure 3.13: An interactive view.

We can visualize the default cases. First, the case when both firms default at year 5.

Code
par(pty = "s")
plot(X[["both", 5]], xlim  = L, ylim = L, pch = ".", cex = 1) 
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(both.pd[5]*n)), bty = "n")
Figure 3.14: Both firms default at year 5.

The values within the plot represent the number of cases. Here, we have 343 times (out of 10,000) in which both firms default at the same time in 5 years. Note that this is a cumulative probability of default.

Now, the case in which any firm defaults in 5 years.

Code
par(pty = "s")
plot(X[["any", 5]], xlim = L, ylim = L, pch = ".", cex = 1)
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(any.pd[5]*n)), bty = "n")
Figure 3.15: Any firm defaults in 5 years.

Only firm 1 defaults in 5 years.

Code
par(pty = "s")
plot(X[["onlyf1", 5]], xlim = L, ylim = L, pch = ".", cex = 1)
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(onlyf1.pd[5]*n)), bty = "n")
Figure 3.16: Only firm 1 defaults in 5 years.

Only firm 2 defaults in 5 years.

Code
par(pty = "s")
plot(X[["onlyf2", 5]], xlim = L, ylim = L, pch = ".", cex = 1)
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(onlyf2.pd[5]*n)), bty = "n")
Figure 3.17: Only firm 2 defaults in 5 years.

This is the case in which only one firm defaults.

Code
par(pty = "s")
plot(X[["one", 5]], xlim = L, ylim = L, pch = ".", cex = 1)
abline(h = x.y[5], lty =2)
abline(v = x.y[5], lty =2)
legend("topright", legend = c(paste(one.pd[5]*n)), bty = "n")
Figure 3.18: Only one firm defaults in 5 years.

This is the case in which no one firm defaults.

Code
par(pty = "s")
plot(X[["none", 5]], xlim = L, ylim = L, pch = ".", cex = 1)
abline(h = x.y[5], lty =2)
abline(v = x.y[5], lty =2)
legend("topright", legend = c(paste(none.pd[5]*n)), bty = "n")
Figure 3.19: No firm defaults in 5 years.

It is a good idea to summarize all previous plots in one.

Code
# none
par(mfrow = c(2, 3), oma = c(0, 0, 2, 0))
par(pty = "s")
plot(X[["none", 5]], xlim = L, ylim = L, pch = ".", cex = 1, 
     main = "None.") 
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("bottomright", legend = c(paste(none.pd[5]*n)), bty = "n")
# both
par(pty = "s")
plot(X[["both", 5]], xlim = L, ylim = L, pch = ".", cex = 1, 
     main = "Both.") 
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(both.pd[5]*n)), bty = "n")
# any
par(pty = "s")
plot(X[["any", 5]], xlim = L, ylim = L, pch = ".", cex = 1, 
     main = "Any.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(any.pd[5]*n)), bty = "n")
# onlyfirm1
par(pty = "s")
plot(X[["onlyf1", 5]], xlim = L, ylim = L, pch = ".", cex = 1, 
     main = "Only f1.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(onlyf1.pd[5]*n)), bty = "n")
# onlyfirm2
par(pty = "s")
plot(X[["onlyf2", 5]], xlim = L, ylim = L, pch = ".", cex = 1, 
     main = "Only f2.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(onlyf2.pd[5]*n)), bty = "n")
# one
par(pty = "s")
plot(X[["one", 5]], xlim = L, ylim = L, pch = ".", cex = 1, 
     main = "One.")
abline(h = x.y[5], lty = 2)
abline(v = x.y[5], lty = 2)
legend("topright", legend = c(paste(one.pd[5]*n)), bty = "n")
Figure 3.20: Which firm defaults at year 5?

It is interesting to compare two different correlation values. Here, we compare 0 versus 0.2.

Code
m <- 2 # number of firms
n <- 10000 # number of simulations
rho_pos02 <- 0 # correlation
corr_pos02 <- matrix(rep(rho_pos02, m * m), m, m) # correlation matrix
diag(corr_pos02) <- 1
set.seed(130575)
X0 <- mvrnorm(n, mu = rep(0, m), Sigma = corr_pos02)
X0 <- data.frame(X0)
colnames(X0) <- c("f1", "f2")

Just as before, we create a function, evaluate it, and store results in X0.

Code
fun.X0 <- function(x) {
  both <- X0[X0$f2 < x & X0$f1 < x, ]
  any <- X0[X0$f2 < x | X0$f1 < x, ]
  onlyf1 <- X0[X0$f2 > x & X0$f1 < x, ]
  onlyf2 <- X0[X0$f1 > x & X0$f2 < x, ]
  one <- X0[(X0$f2 < x & X0$f1 > x |
                   X0$f2 > x & X0$f1 < x),]
  none <- X0[X0$f2 > x & X0$f1 > x, ]

ans <- list(both = both, any = any, onlyf1 = onlyf1,
  onlyf2 = onlyf2, one = one, none = none,
  both.pd = (nrow(both) / n), any.pd = (nrow(any) / n),
  onlyf1.pd = (nrow(onlyf1) / n), onlyf2.pd = (nrow(onlyf2) /n),
  one.pd = (nrow(one) / n), none.pd = (nrow(none) / n)) }

X0 <- mapply(fun.X0, x.y)

none.pd0 <- data.frame(X0["none.pd",]) * n
both.pd0 <- data.frame(X0["both.pd",]) * n

onlyf1.pd0 <- data.frame(X0["onlyf1.pd",]) * n
onlyf2.pd0 <- data.frame(X0["onlyf2.pd",]) * n

A graphical analysis shows that in the case of 0.2 it is more likely that both firms default at the same time, and it is less likely that any firm default at the same time.

Code
par(mfrow=c(1, 2), oma = c(0, 0, 2, 0))
par(pty = "s")
plot(X[["none", 5]], pch = ".", xlim = L, ylim = L, 
     cex = 1, main = "Correlation = 0.2") 
points(X[["both", 5]], pch = ".", col = "red")
points(X[["onlyf1", 5]], pch = ".", col = "purple")
points(X[["onlyf2", 5]], pch = ".", col = "blue")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("bottomleft", legend = c(paste(both.pd[5]*n)), bty = "n")
legend("topright", legend = c(paste(none.pd[5]*n)), bty = "n")
legend("topleft", legend = c(paste(onlyf1.pd[5]*n)), bty = "n")
legend("bottomright", legend = c(paste(onlyf2.pd[5]*n)), bty = "n")
par(pty = "s")
plot(X0[["none", 5]], pch = ".", xlim = L, ylim = L,
     cex = 1, main = "Correlation = 0.") 
points(X0[["both", 5]], pch = ".", col = "red")
points(X0[["onlyf1", 5]], pch = ".", col = "purple")
points(X0[["onlyf2", 5]], pch = ".", col = "blue")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("bottomleft", legend = c(paste(both.pd0[5])), bty = "n")
legend("topright", legend = c(paste(none.pd0[5])), bty = "n")
legend("topleft", legend = c(paste(onlyf1.pd0[5])), bty = "n")
legend("bottomright", legend = c(paste(onlyf2.pd0[5])), bty = "n")
par(pty = "s")
Figure 3.21: Cases per quadrant. Dotted lines corresponds to year 5.

Very interesting indeed.

The Gaussian copula model is not the only copula model. See how other distributions work.

t-coupla:

Code
embed_url("https://youtu.be/XZPVnanmEtw")

Gumbel coupla:

Code
embed_url("https://youtu.be/Oq27fu86ahU")

Frank coupla:

Code
embed_url("https://youtu.be/4OOF_FTrZ3w")

Clayton coupla:

Code
embed_url("https://youtu.be/vtzkgnWjFQ4")

3.5 Ten firms.

The original Hull (2015) example proposes a 10 firm case and here we implement this example following a simulation approach. First, we need a \(10\times10\) correlation matrix to produce the new \(x\) values using a random multi-variate distribution algorithm. According to Hull (2015) example the copula default correlations between each pair of companies is 0.2. The code below has the option to vary the default correlation given a uniform random distribution.

The new \(10\times10\) correlation matrix is then:

Code
# Create the correlation matrix.
m <- 10 # number of firms.
n <- 1000000 # number of simulations per firm.
x <- matrix(rep(0.2, m * m), m, m) 
ind <- lower.tri(x) 
x[ind] <- t(x)[ind] 
diag(x) = 1
kable(x, caption = "Correlation matrix 0.2.")
Correlation matrix 0.2.
1.0 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
0.2 1.0 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
0.2 0.2 1.0 0.2 0.2 0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 1.0 0.2 0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 1.0 0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2 1.0 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2 0.2 1.0 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2 0.2 0.2 1.0 0.2 0.2
0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 1.0 0.2
0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 1.0

Now, we can simulate the multivariate normal distribution. The variable X10 length is 1,000,000 for each firm. We choose a higher number of simulations because otherwise the most restrictive constraints would not lead to any observations.

Code
# Create the simulated cases.
set.seed(130575) # Reproducibility.
X10 <- mvrnorm(n, mu = rep(0, m), Sigma = x)
X10 <- data.frame(X10) # 10,000,000 observations.
colnames(X10) <- c("f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10")
kable(head(X10), caption = "10 firms' performance, 1,000,000 simulations.",
      digits = 3)
10 firms’ performance, 1,000,000 simulations.
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
0.449 2.362 1.716 -0.350 0.807 1.578 -0.085 1.295 0.634 1.086
0.795 -2.315 0.722 -0.607 -0.514 0.540 1.620 2.240 0.310 1.184
0.513 0.318 0.256 -0.146 0.668 1.071 0.929 -1.511 1.701 -1.908
0.643 -1.120 -0.818 0.398 -0.985 -2.094 0.412 -0.001 -0.699 -0.831
-1.404 -0.385 0.916 1.768 0.785 0.425 -0.873 -1.025 -0.164 -1.990
0.038 -0.818 0.402 -1.354 -0.153 -0.830 -1.523 -0.856 -0.433 -0.142

How do we extract those cases in which all 10 firms will default in 5 years (at the same time)? Here are the first 6 of those cases. Note that in all cases the X10 values are lower than \(-1.036433\).

Code
# Given that we have 10 firms, it is easier to use filter_all function.
# Although probably this could be simplified even further.
y5.all.2 <- filter_all(X10, all_vars(. < x.y[5]))
y4.all.2 <- filter_all(X10, all_vars(. < x.y[4])) 
y3.all.2 <- filter_all(X10, all_vars(. < x.y[3])) 
y2.all.2 <- filter_all(X10, all_vars(. < x.y[2])) 
y1.all.2 <- filter_all(X10, all_vars(. < x.y[1])) 

Let’s analyze the case of 5 years.

Code
kable(head(y5.all.2), 
      caption = "Cases in which all 10 firms default in five years.", 
      digits = 3)
Cases in which all 10 firms default in five years.
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
-1.625 -1.198 -1.740 -2.203 -1.489 -3.152 -1.395 -1.409 -2.809 -2.027
-1.410 -3.567 -1.849 -2.273 -1.336 -1.638 -1.289 -2.135 -2.671 -1.675
-3.617 -1.495 -1.210 -1.367 -3.220 -1.481 -1.784 -1.891 -1.443 -1.375
-2.624 -1.162 -2.479 -2.613 -1.929 -1.734 -3.395 -2.635 -2.289 -1.593
-1.834 -1.063 -1.888 -1.390 -3.494 -1.280 -2.108 -1.664 -4.007 -2.058
-2.083 -1.800 -2.085 -2.775 -1.437 -2.610 -1.215 -1.739 -1.219 -2.841
Code
kable((head(y5.all.2) < x.y[5]), caption = "Check if all of them default.")
Check if all of them default.
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

How many cases are there?

Code
nrow(y5.all.2)
[1] 72

Which are those 72 cases?

Code
# Here, I compare only firm 1 as if firm 1 defaults, then the rest default.
kable(matrix(which(X10[,1] %in% y5.all.2[,1]), 6, 9), 
      caption = "Which of the 1,000,000 cases represent a default of all 
      firms in five years?")
Which of the 1,000,000 cases represent a default of all firms in five years?
11119 92221 252520 350035 402689 496478 574643 633402 701255
18645 131986 266761 354133 413425 511731 576215 643935 708339
45042 165394 284189 363067 423852 522630 585654 644895 727092
57098 201832 284680 374237 437629 523010 615497 647511 747221
57696 227833 302542 384869 438202 527750 630487 650216 789221
81982 248381 328106 387617 477385 560115 630626 688847 797265

In total, we only have 72 cases. This is, the 10 firms will default at the same time in 5 years in 72 out of 1,000,000 total cases. For the rest of the years, the cases are less frequent, in fact we have zero cases for year 1, 2 and 3.

How do we extract those cases in which at least one of the 10 firms will default?

Code
y5.any.2 <- filter_all(X10, any_vars(. < x.y[5]))
y4.any.2 <- filter_all(X10, any_vars(. < x.y[4]))
y3.any.2 <- filter_all(X10, any_vars(. < x.y[3]))
y2.any.2 <- filter_all(X10, any_vars(. < x.y[2]))
y1.any.2 <- filter_all(X10, any_vars(. < x.y[1]))

Here are the first 6 cases for the 5-year default.

Code
kable(head(y5.any.2), caption = "At least one firm default in five years.",
      digits = 3)
At least one firm default in five years.
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
0.795 -2.315 0.722 -0.607 -0.514 0.540 1.620 2.240 0.310 1.184
0.513 0.318 0.256 -0.146 0.668 1.071 0.929 -1.511 1.701 -1.908
0.643 -1.120 -0.818 0.398 -0.985 -2.094 0.412 -0.001 -0.699 -0.831
-1.404 -0.385 0.916 1.768 0.785 0.425 -0.873 -1.025 -0.164 -1.990
0.038 -0.818 0.402 -1.354 -0.153 -0.830 -1.523 -0.856 -0.433 -0.142
0.353 -0.045 -0.393 -1.914 -0.168 -0.439 -0.379 -0.742 -1.612 -2.775
Code
kable((head(y5.any.2) < x.y[5]), caption = "Check which one(s) default.")
Check which one(s) default.
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE
FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE
FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE

How many cases are these?

Code
nrow(y5.any.2)
[1] 682440

In total, we have 682,148 cases. This is, at least one of 10 firms will default in 5 years in 682,148 of 1,000,000 cases.

And how to convert them into probabilities?

Code
atleastone02 <- t(data.frame(nrow(y1.any.2), nrow(y2.any.2),
                nrow(y3.any.2), nrow(y4.any.2), nrow(y5.any.2)))
all02 <- t(data.frame(nrow(y1.all.2), nrow(y2.all.2),
                nrow(y3.all.2), nrow(y4.all.2), nrow(y5.all.2)))
res02 <- data.frame(all02 / n, atleastone02 / n)
rownames(res02) <- n.year
colnames(res02) <- c("All firms", "At least one")
kable(res02, caption = "Probabilities of default (10 firms, corr=0.2).")
Probabilities of default (10 firms, corr=0.2).
All firms At least one
year 1 0.0e+00 0.086835
year 2 0.0e+00 0.226086
year 3 0.0e+00 0.386264
year 4 1.0e-05 0.543684
year 5 7.2e-05 0.682440

Let’s see the difference when we assume a different correlation matrix. This case, the correlation vary randomly between 0.45 and 0.65.

Code
m <- 10 # number of firms
n <- 1000000 # number of simulations
set.seed(130575)
x <- matrix(runif(m * m, 0.45, 0.65), m, m) 
ind <- lower.tri(x) 
x[ind] <- t(x)[ind] 
diag(x) = 1
kable(x, caption = "Correlation between 0.45 and 0.65.", digits = 3)
Correlation between 0.45 and 0.65.
1.000 0.622 0.597 0.622 0.530 0.517 0.518 0.490 0.455 0.578
0.622 1.000 0.523 0.594 0.465 0.501 0.558 0.560 0.460 0.513
0.597 0.523 1.000 0.556 0.485 0.603 0.534 0.524 0.611 0.547
0.622 0.594 0.556 1.000 0.564 0.555 0.496 0.500 0.502 0.513
0.530 0.465 0.485 0.564 1.000 0.608 0.513 0.559 0.647 0.505
0.517 0.501 0.603 0.555 0.608 1.000 0.452 0.486 0.644 0.518
0.518 0.558 0.534 0.496 0.513 0.452 1.000 0.577 0.635 0.466
0.490 0.560 0.524 0.500 0.559 0.486 0.577 1.000 0.585 0.524
0.455 0.460 0.611 0.502 0.647 0.644 0.635 0.585 1.000 0.474
0.578 0.513 0.547 0.513 0.505 0.518 0.466 0.524 0.474 1.000

The resulting values of Xr are:

Code
set.seed(130575)
Xr <- mvrnorm(n, mu = rep(0, m), Sigma = x)
Xr <- data.frame(Xr)
kable(head(Xr), caption = "Firms' performance.", digits = 3)
Firms’ performance.
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
0.673 1.507 0.776 1.038 1.997 0.557 1.759 1.578 2.529 1.326
-0.181 -0.887 -0.485 1.532 0.829 2.031 0.800 0.824 0.471 0.838
0.731 -0.826 1.679 -1.181 -0.034 -0.270 0.478 0.920 0.412 0.818
-0.852 -0.288 -0.362 -0.457 -1.596 0.005 -0.153 -1.029 -1.703 -0.914
0.692 -0.689 0.906 -1.140 -0.080 0.082 -0.968 -2.028 0.089 0.256
-1.123 -0.810 -1.242 -1.438 -0.850 -0.394 -0.601 -0.458 0.080 -1.393

Extract the cases in which all ten firms default at the same time in 5 years and the cases in which any firm default at the same time in 5 years.

Code
Xr <- as_tibble(Xr)
# All firms.
Xr.y1.all <- filter_all(Xr, all_vars(. < x.y[1])) 
Xr.y2.all <- filter_all(Xr, all_vars(. < x.y[2])) 
Xr.y3.all <- filter_all(Xr, all_vars(. < x.y[3])) 
Xr.y4.all <- filter_all(Xr, all_vars(. < x.y[4]))
Xr.y5.all <- filter_all(Xr, all_vars(. < x.y[5])) 
# At least one firm.
Xr.y1.any <- filter_all(Xr, any_vars(. < x.y[1]))
Xr.y2.any <- filter_all(Xr, any_vars(. < x.y[2]))
Xr.y3.any <- filter_all(Xr, any_vars(. < x.y[3]))
Xr.y4.any <- filter_all(Xr, any_vars(. < x.y[4]))
Xr.y5.any <- filter_all(Xr, any_vars(. < x.y[5]))

All firms default in 5,935 cases out of 1,000,000.

Code
nrow(Xr.y5.all)
[1] 6144

Let’s see the first 6 cases:

Code
kable(head(Xr.y5.all), 
      caption = "Cases in which all 10 firms default in five years.", 
      digits = 3)
Cases in which all 10 firms default in five years.
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
-1.357 -1.241 -2.165 -1.486 -1.820 -1.181 -1.816 -1.146 -1.823 -2.462
-1.874 -1.751 -2.281 -1.390 -2.020 -2.189 -1.752 -2.580 -2.062 -2.731
-1.701 -2.546 -1.671 -1.895 -1.852 -1.603 -1.292 -2.384 -1.184 -1.592
-1.443 -1.452 -2.499 -1.561 -1.166 -2.392 -1.658 -1.369 -3.522 -2.372
-1.326 -2.036 -1.252 -1.729 -2.569 -1.650 -1.795 -1.235 -2.822 -1.722
-2.314 -2.703 -2.063 -3.262 -1.551 -2.172 -2.205 -3.088 -2.130 -1.908

Verify that those cases default.

Code
kable((head(Xr.y5.all) < x.y[5]), caption = "Check if all of them default.")
Check if all of them default.
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

At least one firm default in 497,987 out of 1,000,000.

Code
nrow(Xr.y5.any)
[1] 497513
Code
kable(head(Xr.y5.any), caption = "At least one firm default in five years.",
      digits = 3)
At least one firm default in five years.
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
0.731 -0.826 1.679 -1.181 -0.034 -0.270 0.478 0.920 0.412 0.818
-0.852 -0.288 -0.362 -0.457 -1.596 0.005 -0.153 -1.029 -1.703 -0.914
0.692 -0.689 0.906 -1.140 -0.080 0.082 -0.968 -2.028 0.089 0.256
-1.123 -0.810 -1.242 -1.438 -0.850 -0.394 -0.601 -0.458 0.080 -1.393
-1.089 -1.470 -1.292 -2.389 -1.046 -1.960 0.653 -1.052 -1.027 -1.017
-1.280 -0.320 -2.717 -1.976 -1.970 -1.238 -0.891 -2.161 -0.682 -0.657
Code
kable((head(Xr.y5.any) < x.y[5]), caption = "Check which one(s) default.")
Check which one(s) default.
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE
TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE

And the corresponding probabilities:

Code
allR <- t(data.frame(nrow(Xr.y1.all), nrow(Xr.y2.all),
        nrow(Xr.y3.all), nrow(Xr.y4.all), nrow(Xr.y5.all)))
atleastoneR <- t(data.frame(nrow(Xr.y1.any), nrow(Xr.y2.any),
        nrow(Xr.y3.any), nrow(Xr.y4.any), nrow(Xr.y5.any)))
resR <- data.frame(allR / n, atleastoneR / n)
ans <- data.frame(res02, resR)
rownames(ans) <- n.year
colnames(ans) <- c("All (corr=0.2)", "At least one (corr=0.2)",
                    "All (corr=rand)", "At least one (corr=rand)")
kable(ans, caption = "Probability of default.", 
      row.names = TRUE)
Probability of default.
All (corr=0.2) At least one (corr=0.2) All (corr=rand) At least one (corr=rand)
year 1 0.0e+00 0.086835 0.000021 0.062784
year 2 0.0e+00 0.226086 0.000184 0.156169
year 3 0.0e+00 0.386264 0.000783 0.266124
year 4 1.0e-05 0.543684 0.002438 0.382602
year 5 7.2e-05 0.682440 0.006144 0.497513
Code
library(ggplot2)

4 Credit VaR example 24.7.

This replicates Hull (2015) example 24.7.

This is the Vasicek model (equation 24.10) in a function form.

Code
CVaR <- function(exp, pd, r, c, l) {
v <- pnorm((qnorm(pd) + (c^0.5) * qnorm(l)) / (1 - c)^0.5)
VaR <- exp * v * (1 - r)
}

See if it works. Let’s evaluate equation 24.10 at 99% and 99.9%. The 1-year 99% and 99.9% credit VaR is:

Code
CVaR.999 <- CVaR(100, 0.02, 0.6, 0.1, 0.999)
CVaR.99 <- CVaR(100, 0.02, 0.6, 0.1, 0.99)
CVaR.999
[1] 5.129484
Code
CVaR.99
[1] 3.294271

Let’s evaluate the model at all confidence levels (from 0 to 1) simulating 10,000 values.

Code
set.seed(13)
l <- runif(10000, 0, 1)
Loss <- CVaR(100, 0.02, 0.6, 0.1, l)
sim.var999 <- sort(Loss)[10000 * 0.999]
sim.var99 <- sort(Loss)[10000 * 0.99]

Now, visually:

Code
hist(Loss, 100, xlim = c(0, 7), xlab = "Losses in millions", main = NULL)
abline(v = CVaR.999, lty = 2, col = "red", lwd = 2)
abline(v = CVaR.99, lty = 2, col = "blue", lwd = 2)
legend("topright", legend = c("1-year 99% credit VaR is 3.294271", 
         "1-year 99.9% credit VaR is 5.129484"),
col = c("blue", "red"), lwd = 2, lty = 2, bg = "white", cex = 0.8)
Figure 4.1: Distribution of losses.
Code
dat <- data.frame(Loss, l)
ggplot(dat, aes(x = Loss, fill = "Losses in millions")) + 
  geom_density(color = "darkblue", fill = "lightblue") +
  geom_vline(aes(xintercept = CVaR.999), 
             color = "red", linetype = "dashed", size = 1) +
  geom_vline(aes(xintercept = CVaR.99 ),   
             color = "blue", linetype = "dashed", size = 1) +
  scale_x_continuous(labels = scales::dollar)
Figure 4.2: Distribution of losses with ggplot.

Nice.

This document took 40.58 seconds to compile in Quarto version 1.5.54, and R version 4.4.1 (2024-06-14 ucrt).

References.

Black, Fischer, and Myron Scholes. 1973. “The Pricing of Options and Corporate Liabilities.” Journal of Political Economy 81 (3): 637–54.
Brealey, Richard A, Stewart C Myers, Franklin Allen, and Pitabas Mohanty. 2020. Principles of Corporate Finance. 13th ed. McGraw-Hill Education.
Hull, John C. 2015. Options, Futures, and Other Derivatives. 9th ed. Prentice Hall.
———. 2020. Machine Learning in Business: An Introduction to the World of Data Science. Amazon Distribution.
Knuth, Donald Ervin. 1984. “Literate Programming.” The Computer Journal 27 (2): 97–111.
Merton, Robert C. 1973. “Theory of Rational Option Pricing.” The Bell Journal of Economics and Management Science, 141–83.
———. 1974. “On the Pricing of Corporate Debt: The Risk Structure of Interest Rates.” The Journal of Finance 29 (2): 449–70.
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.