Credit risk with R
Back to Quantitative Finance with
Introduction.
Credit risk models play a pivotal role in the financial landscape, providing institutions with a systematic framework to assess and manage the risks associated with lending and investment activities. One crucial component of these models is the estimation of the probability of default, which quantifies the likelihood that a borrower will fail to meet their debt obligations. The relevance of credit risk models lies in their ability to enhance decision-making processes by offering a comprehensive understanding of the potential creditworthiness of individuals, companies, or financial instruments. By utilizing statistical methods, historical data, and various financial indicators, these models enable financial institutions to make informed decisions about lending, pricing, and portfolio management. The estimation of the probability of default not only aids in risk assessment but also supports regulatory compliance, allowing institutions to maintain a healthy balance between risk and return in their credit portfolios. In an ever-evolving financial landscape, the continual refinement and application of credit risk models are essential for fostering stability and resilience within the financial system.
1 Loan analysis.
This section relies on the DataCamp course Credit Risk Modeling in R by Lore Dirick. However, we incorporate a slightly different database and an extended analysis.
1.1 Explore the database.
Let’s load the data called loan_data_ARF.rds
and then understand its structure before conducting any further analysis. This database is available here.
'data.frame': 29092 obs. of 10 variables:
$ loan_status : int 0 0 0 0 0 0 1 0 1 0 ...
$ loan_amnt : int 5000 2400 10000 5000 3000 12000 9000 3000 10000 1000 ...
$ int_rate : num 10.6 11 13.5 11 11 ...
$ grade : Factor w/ 7 levels "A","B","C","D",..: 2 3 3 1 5 2 3 2 2 4 ...
$ emp_length : int 10 25 13 3 9 11 0 3 3 0 ...
$ home_ownership: Factor w/ 4 levels "MORTGAGE","OTHER",..: 4 4 4 4 4 3 4 4 4 4 ...
$ annual_inc : num 24000 12252 49200 36000 48000 ...
$ age : int 33 31 24 39 24 28 22 22 28 22 ...
$ sex : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 2 2 1 2 ...
$ region : Factor w/ 4 levels "E","N","S","W": 1 1 3 3 2 2 2 2 4 1 ...
This could be a typical database taken from any given financial institution like a bank or a firm that uses credit channels to sell their products or services. Here, we have 29,092 observations of 10 variables. Each observation corresponds to the personal and loan characteristics of one loan. An important variable, our dependent variable, is loan_st
the value of 0 is no default and the value of 1 is default. A default occurs when a borrower is unable to make timely payments, misses payments, or avoids or stops making payments on interest or principal owed. Then, the definition of default depends on the interests and objectives of the analysis, here we simply classify between default or no default. The variable loan_st
is dichotomous or categorical. Here, we are interested to predict whether a new application will default or not.
Clearly, loan_data_ARF.rds
is past information as we know with certainty whether the individual defaulted (1) or not (0). Past information is helpful to better understand how likely is that one individual may default according to their personal and loan characteristics. Past information is useful to train our quantitative models to make predictions of new applicants, and even evaluate the performance of our classifications.
A variable name is too long when there exists a shorter name that equally conveys the purpose of the variable. Let’s rename some of them.
Code
old_names new_names
1 loan_status loan_st
2 loan_amnt l_amnt
3 int_rate int
4 grade grade
5 emp_length emp_len
6 home_ownership home
7 annual_inc income
8 age age
9 sex sex
10 region region
We can take a look at the information in different ways. For example, look at the first 10 rows out of 29,092.
loan_st l_amnt int grade emp_len home income age sex region
1 0 5000 10.65 B 10 RENT 24000 33 0 E
2 0 2400 10.99 C 25 RENT 12252 31 0 E
3 0 10000 13.49 C 13 RENT 49200 24 0 S
4 0 5000 10.99 A 3 RENT 36000 39 0 S
5 0 3000 10.99 E 9 RENT 48000 24 1 N
6 0 12000 12.69 B 11 OWN 75000 28 1 N
7 1 9000 13.49 C 0 RENT 30000 22 1 N
8 0 3000 9.91 B 3 RENT 15000 22 1 N
9 1 10000 10.65 B 3 RENT 100000 28 0 W
10 0 1000 16.29 D 0 RENT 28000 22 1 E
Note that sex
is 1 for female and 0 for male. Now, instead of looking the details of the first 10 rows, we can summarize with respect to home
with the CrossTable()
function.
Cell Contents
|-------------------------|
| N |
| N / Table Total |
|-------------------------|
Total Observations in Table: 29092
| MORTGAGE | OTHER | OWN | RENT |
|-----------|-----------|-----------|-----------|
| 12002 | 97 | 2301 | 14692 |
| 0.413 | 0.003 | 0.079 | 0.505 |
|-----------|-----------|-----------|-----------|
We can also use CrossTable()
to summarize two variables. In particular, instead of counting for home ownership we can add a second dimension like loan_st
. This allows us to create more informative tables.
Code
Cell Contents
|-------------------------|
| N |
| N / Row Total |
|-------------------------|
Total Observations in Table: 29092
| dat$loan_st
dat$home | 0 | 1 | Row Total |
-------------|-----------|-----------|-----------|
MORTGAGE | 10821 | 1181 | 12002 |
| 0.902 | 0.098 | 0.413 |
-------------|-----------|-----------|-----------|
OTHER | 80 | 17 | 97 |
| 0.825 | 0.175 | 0.003 |
-------------|-----------|-----------|-----------|
OWN | 2049 | 252 | 2301 |
| 0.890 | 0.110 | 0.079 |
-------------|-----------|-----------|-----------|
RENT | 12915 | 1777 | 14692 |
| 0.879 | 0.121 | 0.505 |
-------------|-----------|-----------|-----------|
Column Total | 25865 | 3227 | 29092 |
-------------|-----------|-----------|-----------|
This table reveals defaults by home ownership. We can use histograms to see one variable distribution. In this case we have the interest rate distribution.
Code
The following is a boxplot figure for the annual income.
Code
The plot above looks suspicious. We have a very large value of the annual income in the horizontal axis (6,000,000). Also, there are few individuals with a very high income. We should explore further and investigate if these are valid observations or simply a mistake in the original database.
loan_st l_amnt int grade emp_len home income age sex region
4861 0 12025 14.27 C 13 RENT 1782000 63 0 E
13931 0 10000 6.54 A 16 OWN 1200000 36 0 W
15386 0 1500 10.99 A 5 MORTGAGE 1900000 60 1 N
16713 0 12000 7.51 A 1 MORTGAGE 1200000 32 0 E
19486 0 5000 12.73 C 12 MORTGAGE 6000000 144 1 E
22811 0 10000 10.99 A 1 MORTGAGE 1200000 40 1 E
23361 0 6400 7.40 A 7 MORTGAGE 1440000 44 1 E
23683 0 6600 7.74 A 9 MORTGAGE 1362000 47 0 E
28468 0 8450 12.29 C 0 RENT 2039784 42 0 E
One guy (19486) is not only rich, he is 144 years old. So, my decision is to drop these 9 observations. Cleaning data is a common task when dealing with large databases. This is fine as long as we do not alter the nature of the data.
Remember the database has originally 29,092 rows and now we are dropping 9 so we end up with 29,083. See the result.
Code
Somewhat better now.
1.2 Logistic models.
Logistic models allows us to make predictions about loan defaults. Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable like loan_st
. In this case, the binary dependent variable is default (1) or no default (0). A good reference for this section is Hull (2020).
First, load the data and split it into two sets: (1) training and (2) test. The training set is for building and estimate models, and the test set is used to evaluate our model predictions with new data. When estimating models, it is common practice to separate the available data into two parts, training and test data, where the training data is used to estimate parameters (in-sample) and the test data is used to evaluate its accuracy (out of sample). Because the test data is not used in determining the estimation, it should provide a reliable indication of how well the model is likely to estimate or forecast on new data. In sum, we train the model, we test the model, and once we are OK with the model performance on new data, we are ready to use it in real-life applications. If we ignore this split and use the whole database to estimate our models, we may succeed at explaining defaults in our database but we may fail to explain defaults for new loan applications.
Code
# It is convenient to set the loan status as factor.
dat$loan_st <- as.factor(dat$loan_st)
set.seed(567)
index_train <- cbind(runif(1 : nrow(dat), 0 , 1), c(1 : nrow(dat)))
index_train <- order(index_train[, 1])
index_train <- index_train[1: (2/3 * nrow(dat))]
# Create training set
train <- dat[index_train, ]
# Create test set
test <- dat[-index_train, ]
We have 29,083 observations in dat
. The code above randomly selects \(29083 \times (2/3)=19388\) rows to form the train
. The test
are the remaining \(29083-19388=9695\) rows. The random selection is highly recommended as the dat
may have some structure or sorting that could bias our model estimation and negatively impact our model test. For example, imagine that for some weird reason the database is sorted in such a way that the first observations are all no default. If that is the case, then the training and the test set would not have portions of default and no default cases and we may distort the whole analysis. The random selection allows us to replicate a real situation in which our database is unsorted, with different characteristics.
Let’s see if the recent created train
and test
have the same basic properties than dat
.
Code
no defaults defaults
dat_prop 0.8890417 0.1109583
train_prop 0.8882298 0.1117702
test_prop 0.8906653 0.1093347
Take a look of the training set.
loan_st l_amnt int grade emp_len home income age sex region
21547 0 4000 10.25 B 6 RENT 26000 27 1 N
22625 0 5000 14.26 C 3 RENT 280000 36 1 N
20340 0 18000 18.30 F 10 RENT 121000 24 1 N
1911 0 5600 7.90 A 3 RENT 32000 22 0 W
4021 0 2700 14.27 C 5 MORTGAGE 88500 32 1 N
16887 0 15000 10.38 B 10 MORTGAGE 75000 23 0 W
Variables as factors are useful for model estimation and data visualization. Factors are variables in R which take on a limited number of different values; such variables are often referred to as categorical variables.
Assume we think that the loan_st
depends on the age of the individual. We can estimate a simple logistic model to learn about the relationship between age and loan status.
Code
Call: glm(formula = loan_st ~ age, family = "binomial", data = train)
Coefficients:
(Intercept) age
-1.90097 -0.00623
Degrees of Freedom: 19387 Total (i.e. Null); 19386 Residual
Null Deviance: 13580
Residual Deviance: 13580 AIC: 13580
Apparently, there is a negative relationship between age and loan status. The AIC value (13,580) is useful when comparing models. The Akaike information criterion (AIC) is a mathematical method for evaluating how well a model fits the data it was generated from. In statistics, AIC is used to compare different possible models and determine which one is the best fit for the data. At the moment we cannot interpret the AIC simply because we only have one model and we cannot compare it with another AIC.
Let’s estimate another simple model where the interest rate category is used as a predictor of the loan_st
. Remember we are not conducting any prediction at all at this moment, we are only estimating models using the training set.
Code
Call: glm(formula = loan_st ~ int, family = "binomial", data = train)
Coefficients:
(Intercept) int
-3.710 0.142
Degrees of Freedom: 19387 Total (i.e. Null); 19386 Residual
Null Deviance: 13580
Residual Deviance: 13210 AIC: 13220
The AIC is a lower (13,220 versus 13,580), so we have a better model now.
Using one single predictor as age or interest rate is clearly a limited approach. Let’s add some more predictors. Also, let’s introduce the summary()
function to extract more information about the model estimation results. The logi_multi
below assumes that the loan status depend on the age, interest rate, grade, loan amount, and annual income.
Code
Call:
glm(formula = loan_st ~ age + int + grade + log(l_amnt) + log(income),
family = "binomial", data = train)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.996240 0.477911 4.177 2.95e-05 ***
age -0.002302 0.003825 -0.602 0.5473
int 0.038767 0.017249 2.247 0.0246 *
gradeB 0.503409 0.087435 5.758 8.54e-09 ***
gradeC 0.748229 0.117765 6.354 2.10e-10 ***
gradeD 0.964343 0.147283 6.548 5.85e-11 ***
gradeE 1.033442 0.190817 5.416 6.10e-08 ***
gradeF 1.619470 0.257900 6.279 3.40e-10 ***
gradeG 1.867494 0.440232 4.242 2.21e-05 ***
log(l_amnt) 0.015718 0.036341 0.433 0.6654
log(income) -0.470748 0.046423 -10.140 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13579 on 19387 degrees of freedom
Residual deviance: 13028 on 19377 degrees of freedom
AIC: 13050
Number of Fisher Scoring iterations: 5
Our multi-factor model works well. In logi_multi
, the AIC value is the lowest so far (13,050 versus 13,220), so this should be considered as the best in-sample model at the moment. The summary()
function shows the significance levels of the estimators, but we are currently more interested in the goodness of fit of the models because we want to conduct predictions about the loan_st
. This is, we are interested to use a model to find out whether new applicants in the test set are expected to default or not, rather than in the applicants’ credit risk factors. This is why we are concentrated in AIC now.
When a customer fill out a credit application form, we collect information but we do not know for sure whether she or he will eventually default. A credit risk model can help us in this task.
1.3 Prediction and model evaluation.
Let’s take our three models: logi_age
, logi_int
and logi_multi
from the previous subsection to carry out a simple prediction exercise. We start by identifying one observation in the test set and ask the models to predict the loan_st
. This is, we take the first guy age, then we apply the logi_age
model, and compare the predicted loan_st
with respect to what really happened. Remember we know what really happened with this guy because we have the information in the test set. Every model is expected to produce different loan_st
predictions. If the model is good, then the predicted loan_st
will match what really happened.
loan_st l_amnt int grade emp_len home income age sex region
1 0 5000 10.65 B 10 RENT 24000 33 0 E
We know in advance that the loan_st
of this observation taken from the test set is 0. However, the models cannot know this simply because we did not use the test set to estimate the logistic models. Our models were estimated using the training set. A good credit risk model should predict a no default given this new applicant.
The values of loan_st
in the test set is either 0 or 1. However, the logistic models estimate the loan_st
as values in the range of 0 to 1. This mean that we would expect the estimated loan_st
to be close to 0. But, how close? We will deal with this issue later.
Code
# Predict the loan status.
logi_age_pred <- predict(logi_age, newdata = John_Doe, type = "response")
logi_int_pred <- predict(logi_int, newdata = John_Doe, type = "response")
logi_multi_pred <- predict(logi_multi, newdata = John_Doe, type = "response")
# Collect all.
pred_John <- rbind("logi_age" = logi_age_pred,
"logi_int" = logi_int_pred,
"logi_multi" = logi_multi_pred)
# Prepare a table.
colnames(pred_John) <- "Loan status predictions for John Doe."
pred_John
Loan status predictions for John Doe.
logi_age 0.10846236
logi_int 0.09996702
logi_multi 0.14461840
These values are low as they are close to 0. We could interpret this as a certain ability of the models to predict this single case from the test set. However, several questions remains unanswered and requires further analysis. For example: How can we determine if the prediction is low enough to consider it as a non-default? We may need a cut-off value to decide. We will explore this issue later.
Another aspect is: What about the rest of the cases in the test set? We have 9,695 observations in the test set and in the example above we only test for the first one. We are interested in the entire test set, not only for John Doe. Fortunately, this issue is easy to address as we only need to change the newdata
parameter in the predict()
function. In particular, instead of newdata = John_Doe
, which is one observation, and can change it to newdata = test
, which is the entire 9,695 test set.
Code
# Predict the loan status with the three models.
pred_logi_age <- predict(logi_age, newdata = test, type = "response")
pred_logi_int <- predict(logi_int, newdata = test, type = "response")
pred_logi_multi <- predict(logi_multi, newdata = test,
type = "response")
pred_range <- rbind("logi_age" = range(pred_logi_age),
"logi_int" = range(pred_logi_int),
"logi_multi" = range(pred_logi_multi))
aic <- rbind(logi_age$aic, logi_int$aic, logi_multi$aic)
pred_range <- cbind(pred_range, aic)
colnames(pred_range) <- c("min(loan_st)", "max(loan_st)", "AIC")
pred_range
min(loan_st) max(loan_st) AIC
logi_age 0.08417892 0.1165453 13580.65
logi_int 0.05019756 0.3982977 13215.61
logi_multi 0.01739107 0.4668004 13050.30
Now, instead of the prediction of one single applicant we have conducted a prediction of all 9,695 applicants in the test set. The lower value column corresponds to the lower predicted loan_st
for each model. The logistic models produce values in the range of zero to one, and in this case the ranges are rather narrow.
Narrow ranges (the difference between higher and lower loan_st
predicted values) could be problematic because the model could not be able to discriminate between defaults (predictions closer to 1) and no-defaults (predictions closer to 0). The higher AIC corresponds to the worst in-sample model and the lower AIC to the best in-sample model. Here, we can see some in and out of sample consistency because the best model according to the AIC, corresponds to the model with the higher prediction range.
Let’s explore all the predicted loan_st
values for the logi_age
:
Code
The logi_age
fails to predict values ranging from 0 to 1. In fact, these values are quite concentrated in a very small range of values. As a consequence, this model fails to differentiate between default and no default predictions.
Let’s visualize the predictions of the logi_int
and logi_multi
. We first collect all predictions in a single data frame just for convenience.
Now we plot the logi_int
and logi_multi
.
Code
Let’s add the age
model as well.
Code
Presumably, a model which considers all available predictors could be better for predicting the loan_st
.
Code
# Logistic regression model using all available predictors in the data set.
logi_full <- glm(loan_st ~ age + int + grade + log(l_amnt) +
log(income) + emp_len + home + sex +
region, family = "binomial", data = train)
# Loan status predictions for all test set elements.
pred_logi_full <- predict(logi_full, newdata = test, type = "response")
# Look at the predictions range.
range(pred_logi_full)
[1] 1.422469e-09 8.544424e-01
Now, the pred_logi_full
prediction range is wider. A wider range means that the loan_st
predictions are now closer to 1. This is good because we need the model to be able to predict both no-defaults (0) and defaults (1). Let’s see a prediction comparison with respect to the rest of the models.
Code
pred_range <- rbind("logi_age" = range(pred_logi_age),
"logi_int" = range(pred_logi_int),
"logi_multi" = range(pred_logi_multi),
"logi_full" = range(pred_logi_full))
aic <- rbind(logi_age$aic, logi_int$aic, logi_multi$aic, logi_full$aic)
pred_range <- cbind(pred_range, aic)
colnames(pred_range) <- c("min(loan_st)", "max(loan_st)", "AIC")
pred_range
min(loan_st) max(loan_st) AIC
logi_age 8.417892e-02 0.1165453 13580.65
logi_int 5.019756e-02 0.3982977 13215.61
logi_multi 1.739107e-02 0.4668004 13050.30
logi_full 1.422469e-09 0.8544424 10763.67
A graphical comparison of the new pred_logi_full
and pred_logi_multi
.
Code
And all of them together.
Code
The logi_full
model predictions looks better than the other models.
Another question is: How can we know these model predictions corresponds to a default or no default? The loan status predictions go from 0 to 0.854 for the case of the logi_full
model. At the end, these loan status estimations need to be interpreted or classified as a default or no default because we are interested on that. Are they closer enough to 0 to consider a no default? This issue is addressed by setting up a cut-off rate so we can split all estimated loan status into 0 or 1.
Let’s arbitrarily consider a cutoff of 0.15 for now. This means that every estimated loan status below 0.15 will be considered as 0 (no-default), and every estimated loan status above 0.15 will be considered as 1 (default). This classification rule can be used by a financial firm to decide whether to accept or reject new loan applications by rejecting loan applications with an estimated loan status above 0.15, and accepting those with an estimated loan status below 0.15. The accept/reject loan decision now depends on a model estimation. However, this simple approach could be controversial as managers may be interested in allocating more loans, not less. An alternative that meets credit risk and managers interests is to create several cutoff ranges instead of a single cutoff of 0.15. For example, the financial firm may accept loan applications with an estimated loan status between 0.15 and 0.25 with a higher interest rate to compensate for the additional credit risk. On the other hand, we may consider a different view by incorporating a social criterion to help those people who are naturally excluded by the financial industry given their economic condition. If this is the case, then, we may be interested to identify those applicants with a lower estimated loan status.
For now, consider that every estimated loan status below 0.15 will be considered as 0 (no-default), and every estimated loan status above 0.15 will be considered as 1 (default). Graphically it looks like this:
Code
Here, predicted loan_st
values at the left of the dashed line represent no default predictions and values at the right of the dashed line represent default predictions. Let’s set up the rule to convert the estimated loan status into a binary (0 or 1) variable. See how this transformation takes place:
Code
pred_logi_full rounded pred_cutoff_15
1 1.357995e-08 0.0000 0
2 2.986278e-08 0.0000 0
18 2.299645e-02 0.0230 0
26 2.403879e-01 0.2404 1
27 1.778986e-01 0.1779 1
28 2.600172e-02 0.0260 0
These are only the first 6 rows in the test set. We can see that the rule works as expected because every estimated loan status below 0.15 is now considered as 0 (no-default), and every estimated loan status above 0.15 is considered as 1 (default). The table above shows how we can create this binary variable given the logistic model prediction.
Note that the rows numbers in the table above are 1, 2, 18, 26, 27 and 28. These are not 1, 2, 3, 4, 5 and 6 because the test
rows were selected randomly out of the dat
. So, the row numbers in the table above correspond to the original place in dat
.
Is logi_full
a good model after all? We can add a new column to the previous table. This new variable represents what really happened with the loan. Then, the first column is the logistic model prediction, the second column the transformed binary variable given a cutoff of 0.15, and the third column is what actually happened (default or no-default). Let’s take a sample of 10 observations to conduct the comparison easily. Note that the model correctly predicts a no default in most cases. Rows 308 and 329 predict a default incorrectly and row 323 predict a default correctly.
Code
pred_logi_full pred_cutoff_15 test$loan_st
308 1.537106e-01 1 0
309 4.137654e-02 0 0
310 3.113926e-02 0 0
312 5.186962e-09 0 0
318 5.337092e-02 0 0
319 8.402239e-02 0 0
323 2.795535e-01 1 1
328 5.384112e-09 0 0
329 1.893745e-01 1 0
330 9.012136e-03 0 0
There is an easy way to evaluate the rest of the cases in the test set using a simple table called confusion matrix.
pred_cutoff_15
0 1
0 6554 2081
1 308 752
The logi_full
model predicts 6,554 of no-defaults correctly and 752 defaults correctly. However, the model predicts 2,081 defaults that are in fact no-defaults and 308 no-defaults that are in fact defaults. How good are these results? Which of these four values is more important? These are questions we address later. For now, we can say that different models and different cut-off rates lead to different confusion matrix results. Please note that adding all these values leads to 9,695 which are the number of observations in the test set.
You may want to see relative and not absolute values. Let’s try the CrossTable()
function instead.
Code
Cell Contents
|-------------------------|
| N |
| N / Row Total |
|-------------------------|
Total Observations in Table: 9695
| pred_cutoff_15
test$loan_st | 0 | 1 | Row Total |
-------------|-----------|-----------|-----------|
0 | 6554 | 2081 | 8635 |
| 0.759 | 0.241 | 0.891 |
-------------|-----------|-----------|-----------|
1 | 308 | 752 | 1060 |
| 0.291 | 0.709 | 0.109 |
-------------|-----------|-----------|-----------|
Column Total | 6862 | 2833 | 9695 |
-------------|-----------|-----------|-----------|
This table is more informative. The logi_full
model correctly predicts no-defaults in 75.9% of all the actual no-default cases. In other words, the model fails to predict no-default in 24.1% of the total no-default cases. Now the default. The model correctly predicts default in 70.9% of all the actual default cases (not bad), and fails to predict default in 29.1% of the total default cases.
Instead of arbitrarily consider a cutoff of 0.15, we can follow a different approach. Now consider that we are interested in taking the 20% highest estimates (closer to 1) of pred_logi_full
as default. Equivalently, this is to take the lowest 80% pred_logi_full
estimates (closer to 0) as no-default. Let’s calculate the new cut-off that meets this new criterion.
This new approach (taking the 20% highest estimates of pred_logi_full
as default) represents a cut-off of 0.1994621. Graphically:
Code
Now the cut-off is 0.1994621. This splits the loan status predictions into two parts: higher than the cut-off is a default, and lower than the cutoff is a no-default. Taking the lowest 80% estimates (closer to 0) as no-default is an arbitrary decision. Here are the cut-off values depending on this arbitrary decision.
Code
cut_off
10% 0.00000
20% 0.01469
30% 0.02410
40% 0.03751
50% 0.06184
60% 0.09617
70% 0.14673
80% 0.19946
90% 0.29227
100% 0.85444
We can show a similar summary table as we did with the cutoff of 0.15. Here, we show the predictive ability of the logi_full
model and new cut-off of 0.1994621.
Code
Cell Contents
|-------------------------|
| N |
| N / Row Total |
|-------------------------|
Total Observations in Table: 9695
| pred_full_20
test$loan_st | 0 | 1 | Row Total |
-------------|-----------|-----------|-----------|
0 | 7309 | 1326 | 8635 |
| 0.846 | 0.154 | 0.891 |
-------------|-----------|-----------|-----------|
1 | 447 | 613 | 1060 |
| 0.422 | 0.578 | 0.109 |
-------------|-----------|-----------|-----------|
Column Total | 7756 | 1939 | 9695 |
-------------|-----------|-----------|-----------|
With a cutoff of 0.1994621 we accept 7,309 + 447 = 7,756 applications as those are the ones that the model predicts a no default. We can compare both confusion matrix:
Code
classification cut_15 cut_1994621
[1,] "correct no-default" "0.759" "0.846"
[2,] "false default" "0.241" "0.154"
[3,] "false no-default" "0.291" "0.422"
[4,] "correct default" "0.709" "0.578"
The new cut-off of 0.1994621 improves the identification of no-defaults but worsen the identification of default. Also, the new cut-off fails less in the default and fails more in the no-default. Apparently, there is some sort of trade-off here.
We can also look the detail of 0.1994621 cut-off. Comparing two columns, the one in the left with the actual loan status, and the right column with the estimated loan status.
Code
test$loan_st pred_full_20 Did the model succeed?
398 0 0 TRUE
399 1 1 TRUE
404 0 1 FALSE
414 0 0 TRUE
417 1 1 TRUE
419 0 0 TRUE
420 0 0 TRUE
425 1 1 TRUE
431 0 0 TRUE
435 0 0 TRUE
In this sample the model fails in 1 out of 10 individuals. Not bad at all. Let’s imagine we are a bank. We have a total of 9,695 applications for a loan in our desk (or computer). Assume we use the predictions from pred_full_20
to decide whether we accept a loan application or not. Our model-based acceptance rule is the following: if pred_full_20 = 0
then the model estimates a no-default and we accept the loan application. According to the extract of table above, we fortunately reject application 399, 417 and 425 because that was indeed a default. However, we reject application 404 incorrectly because it did not default. In principle, having a model as a base for a decision rule can lead to better results that guessing or a random approval rule.
Let’s count how many applications are accepted and rejected according to our rule.
Code
total accept_20 reject_20
1 9695 7756 1939
Taking the 20% highest estimates of pred_logi_full
as default and the lowest 80% pred_logi_full
as no-default mean that by construction, we accept 7,756 loan applications (80% of the total) and reject 1,939 (20% of the total). Then, the criterion determines the number of accepted applications, and the model determines which applications to accept/reject.
We can evaluate our loan accept/reject rule as we have the corresponding real values in loan_st
. First, let’s illustrate the decision making process given the model estimates.
Code
test.loan_st pred_full_20 decision
1 0 0 accept
2 0 0 accept
18 1 0 accept
26 0 1 reject
27 0 0 accept
28 0 0 accept
30 0 0 accept
32 0 0 accept
34 0 1 reject
35 0 0 accept
36 0 0 accept
37 1 0 accept
We can add an evaluation column.
Code
# First 10 accept decisions.
head(data.frame(real_pred_20[,1:2],
decision = ifelse(real_pred_20$pred_full_20 == 0,
"accept", "reject"),
evaluation = ifelse(real_pred_20$pred_full_20 == 1,
"we rejected a good customer",
ifelse(real_pred_20$pred_full_20 == 0 & real_pred_20$`test$loan_st` == 0,
"good decision", "bad decision"))), 12)
test.loan_st pred_full_20 decision evaluation
1 0 0 accept good decision
2 0 0 accept good decision
18 1 0 accept bad decision
26 0 1 reject we rejected a good customer
27 0 0 accept good decision
28 0 0 accept good decision
30 0 0 accept good decision
32 0 0 accept good decision
34 0 1 reject we rejected a good customer
35 0 0 accept good decision
36 0 0 accept good decision
37 1 0 accept bad decision
In the table above, we have the first 12 loan applications. According to our rule, we accept 10 applications and we reject 2 (application #26 and #34). A good decision is because we accept the loan that did not default. A bad decision is because we accept the loan application and defaulted. We also have some cases in which we rejected a good customer and that is not good. This table above is interesting although a bit problematic as it incorporates a counterfactual approach. In particular, we are evaluating the cases in which we reject the application. A more pragmatic approach is to evaluate our rule according to the cases in which we actually accept the loan application. This is the basically a bad rate measure: how many accepted loan applications default?
Remember we accepted 10 and rejected 2 loans. We can identify who default.
Code
# We accept loans that the model predicts a no-default (0).
# In "accepted_loans" we know whether the accepted loans are in fact
# default or no-default.
accepted_loans <- real_pred_20[pred_full_20 == 0, 1]
# The code above says: if we accept the application, tell me what happened.
head(accepted_loans, 10)
[1] 0 0 1 0 0 0 0 0 0 1
Levels: 0 1
Note that the third and the tenth application default. These are applications #18 and #37 as expected. Now let’s evaluate not 12 applications but all of them, which are 7,756. The bad rate is now expressed as a percentage of the total:
Code
[1] 0.0576328
This is, by following the model-based rule, we accepted 7,756 loan applications that represent 80% of the total applications. However, 5.76% of those accepted applications were in fact a default. In particular, we accepted 447 loans that are defaults so: \(447/7756=0.0576328\).
Models are not perfect but we are always interested to find out a good model that leads to a lower bad rate because (in principle) we do not want to accept many defaults. If we keep the same model, we could reduce this 5.76% by being more strict in the loan application which in simple terms mean to reduce the acceptance rate. This alternative could be controversial as a lower acceptance rate represents lower income (less customers) for the bank or financial firm. In any case, consider we reduce the acceptance rate from 80% to 65% so we can evaluate the impact over the bad rate.
Code
# New cutoff value.
cutoff <- quantile(pred_logi_full, 0.65)
# Split the pred_logi_full into a binary variable.
pred_full_35 <- ifelse(pred_logi_full > cutoff, 1, 0)
# A data frame with real and predicted loan status.
real_pred_35 <- cbind.data.frame(test$loan_st, pred_full_35)
# Loans that we accept given these new rules.
accepted_loans <- real_pred_35[pred_full_35 == 0, 1]
# Bad rate (accepted loan applications that are defaults).
bad_rate <- sum(accepted_loans == 1)/length(accepted_loans)
# Show the bad rate.
bad_rate
[1] 0.03713107
As expected, the bad rate is lower (from 5.76%% to 3.71%). This is, the lower the acceptance rate the lower the bad rate. In the extreme, if we accept 0 loan applications then our bad rate would be zero, but doing so is equivalent as going out of business. We can create a function such that given a vector of prediction of loan status we can return the bad rate for different cutoff values. This could be useful to build the bank strategy. This function will reveal the trade-off between the acceptance rate and the bad rate. In particular, the lower the acceptance rate, the lower the income (bad thing) and the lower the bad rate (good thing). So, which combination is the optimal?
1.4 The bank strategy.
A bank could be interested to understand the relationship between the acceptance rate and the bad rate given a model that predicts the loan status.
Code
# Function.
bank <- function(prob_of_def){
cutoff <- rep(NA, 21)
bad_rate <- rep(NA, 21)
accept_rate <- seq(1, 0, by = -0.05)
for (i in 1:21){
cutoff[i] <- quantile(prob_of_def, accept_rate[i])
pred_i <- ifelse(prob_of_def > cutoff[i], 1, 0)
pred_as_good <- test$loan_st[pred_i == 0]
bad_rate[i] <- sum(pred_as_good == 1)/length(pred_as_good)}
table <- cbind(accept_rate, cutoff = round(cutoff, 4),
bad_rate = round(bad_rate, 4))
return(list(table = table, bad_rate = bad_rate,
accept_rate = accept_rate, cutoff = cutoff))
}
We can evaluate this function for the logi_full
, and a bad model like the logi_age
. In principle, we expect the logi_full
model to have a more attractive relationship between the acceptance rate and the bad rate. This is, lower bad rates for a given acceptance rate. Any financial institution could be interested in increasing the acceptance rate without increasing too much the bad rate.
Let’s apply the function to the pred_logi_full
and the logi_age
.
Code
accept_rate Bad_model_bad_rate Good_model_bad_rate.
1 1.00 0.10933471 0.109334709
2 0.95 0.10849715 0.090662324
3 0.90 0.10849715 0.076790831
4 0.85 0.10849715 0.066626214
5 0.80 0.10547397 0.057632800
6 0.75 0.10547397 0.050612020
7 0.70 0.10396251 0.043619216
8 0.65 0.10396251 0.037131070
9 0.60 0.10092961 0.029396596
10 0.55 0.10092961 0.023443361
11 0.50 0.09933775 0.021039604
12 0.45 0.10206865 0.018565207
13 0.40 0.10206865 0.015471893
14 0.35 0.09907039 0.011788977
15 0.30 0.09972041 0.009281540
16 0.25 0.10343554 0.005363036
17 0.20 0.09722922 0.003610108
18 0.15 0.09360190 0.001374570
19 0.10 0.10125362 0.000000000
20 0.05 0.10516605 0.000000000
21 0.00 0.00000000 0.000000000
The full model is superior because at any acceptance rate we can reach a lower bad rate. A plot can reveal the main differences of these two models: logi_full
and logi_age
.
Code
# Plot the strategy functions
par(mfrow = c(1, 2))
plot(bank_logi_full$accept_rate, bank_logi_full$bad_rate,
type = "l", xlab = "Acceptance rate", ylab = "Bad rate",
lwd = 2, main = "logi_full")
abline(v = bank_logi_full[["accept_rate"]][8], lty = 2)
abline(h = bank_logi_full[["bad_rate"]][8], lty = 2)
abline(v = bank_logi_full[["accept_rate"]][5], lty = 2, col = "red")
abline(h = bank_logi_full[["bad_rate"]][5], lty = 2, col = "red")
plot(bank_logi_age$accept_rate, bank_logi_age$bad_rate,
type = "l", xlab = "Acceptance rate",
ylab = "Bad rate", lwd = 2, main = "logi_age")
abline(v = bank_logi_age[["accept_rate"]][8], lty = 2)
abline(h = bank_logi_age[["bad_rate"]][8], lty = 2)
abline(v = bank_logi_age[["accept_rate"]][5], lty = 2, col = "red")
abline(h = bank_logi_age[["bad_rate"]][5], lty = 2, col = "red")
The logi_full
model is better because for any acceptance rate we can reach a lower bad rate compared with the logi_age
. This is because the logi_full
model can identify defaults and no defaults with higher precision compared with the logi_age
. The value of a good model is that it can help us to make better business decisions, in this case better credit evaluation decisions.
The model ability to predict defaults and no defaults can be measured by the AUC. The AUC can be defined as the probability that the fit model will score a randomly drawn positive sample higher than a randomly drawn negative sample. AUC stands for area under the curve in the following context:
Code
Sensitivity is the model ability to correctly identify defaults, these are known as true positive. Specificity is the model ability to correctly identify no-default loans, these are known as true negative.
As expected, the area under the curve (AUC) is higher for the red line which corresponds to the logi_full
model. We can calculate the exact values:
Area under the curve: 0.8213
Area under the curve: 0.5301
Note that the logi_age
has an AUC of 0.5301. This is close to a loan approval process in which we randomly accept and reject with no further analysis. In other words, the logi_age
is so bad that it is almost equivalent as using no model at all and accept and reject loan applications based on a random rule. A pure-random approval rule would look like this:
Code
Area under the curve: 0.5105
In theory, this random evaluation process would lead to an AUC of \(0.5 = (1 \times 1)/2\). In contrast, now imagine we have a perfect model:
Code
Area under the curve: 1
In a perfect model, the AUC is equal to 1 (\(1 \times 1\)). The model correctly identify defaults (100% sensitivity) and at the same time the model correctly identify no-defaults (100% specificity).
Edward Malthouse explains these concepts quite well:
2 The Merton model.
The Merton model is useful when we are interested in evaluating the credit risk of public firms. In short, it evaluates how likely it is that the value of the future firm’s assets fall below a certain threshold represented by the firm’s debt. By doing that, the model is able to estimate a probability of default among other results. The model can be used as a tool to propose changes (for example) in the balance sheet to reduce the credit risk exposure of a firm.
The philosophy of these models goes back to Black and Scholes (1973), Merton (1973) and Merton (1974), they consider corporate liabilities as contingent claims on the assets of the firm.
2.1 Minimize a function in R.
The estimation of the credit risk Merton’s model requires a minimization of a function. In this section, we show a simple minimization example. Here are some useful online resources for further references.
- FRM: How d2 in Black-Scholes becomes PD in Merton model.
- Normal Probability Distribution Graph. Interactive.
- How to use optim in R.
The objective function to be minimized is \(f(a,b)=\sum(a + bx_i-y_i)^2\). We have six values for \(x\) and \(y\), and we are expected to minimize the function \(f\) by finding the parameter values \(a\) and \(b\). The \(x_i\) and \(y_i\) data looks like this:
Code
x | y | |
---|---|---|
1 | 1 | 1 |
2 | 2 | 3 |
3 | 3 | 5 |
4 | 4 | 6 |
5 | 5 | 8 |
6 | 6 | 12 |
Graphically:
Let’s implement the \(f(a,b)=\sum(a + bx_i-y_i)^2\) minimization.
Code
# Function that calculates the residual sum of squares.
min.RSS <- function(data, par) {
with(data, sum((par[1] + par[2] * x - y)^2)) }
# Function minimization.
result <- optim(par = c(0, 1), min.RSS, data = dat)
a <- result$par[1]
b <- result$par[2]
f <- sum((a + b * dat$x - dat$y)^2)
# Minimization output.
ans <- data.frame(a, b, f)
kable(ans, caption = "Minimization results f(a,b).")
a | b | f |
---|---|---|
-1.266846 | 2.02862 | 2.819048 |
The minimization result is represented by a line, and the line is characterized by an intercept equal to \(-1.266846\) and a slope equal to \(2.02862\).
We can plot it and verify that this is indeed the right answer.
Code
The function that we minimize here is basically the OLS criterion. This is why the regression model leads to the same results.
Code
Call:
lm(formula = y ~ x, data = dat)
Residuals:
1 2 3 4 5 6
0.2381 0.2095 0.1810 -0.8476 -0.8762 1.0952
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.2667 0.7815 -1.621 0.180388
x 2.0286 0.2007 10.109 0.000539 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8395 on 4 degrees of freedom
Multiple R-squared: 0.9623, Adjusted R-squared: 0.9529
F-statistic: 102.2 on 1 and 4 DF, p-value: 0.000539
The function that we minimize in the estimation of the credit risk Merton model is more elaborated, but we follow the same principle as above.
2.2 The estimation.
For an introductory reference to understand the basics of credit see Brealey et al. (2020).
This is a nice introduction of the Merton Model for credit risk assessment:
Let’s follow the example 24.3 in (Hull 2015). We have five parameters: \(E_0=3\), \(\sigma_E=0.8\), \(rf=0.05\), \(T=1\), \(D=10\) that lead to an estimate of the value of the assets today \(V_0\) and the volatility of the assets \(\sigma_V\). With these seven parameters we can estimate the probability of default \(pd\) among other interesting results.
These are the known parameters.
We need equations 24.3 and 24.4 to estimate \(V_0\) and \(\sigma_V\).
Code
eq24.3 <- function(V0, sv) {
((V0 * pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) -
D * exp(-rf * TT) * pnorm(((log(V0 / D) +
(rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) - sv * sqrt(TT)) - E0)) }
eq24.4 <- function(V0, sv) {
((pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) *
sv * V0 - se * E0)) }
# Footnote 10 indicates that we should minimize F(x,y)^2+G(x,y)^2
min.footnote10 <- function(x)
(eq24.3(x[1], x[2]))^2 + (eq24.4(x[1], x[2]))^2
# The minimization leads to the values of V0 and sv.
V0_sv <- optim(c(1, 1), min.footnote10)
# Define the values as parameters.
V0 <- V0_sv$par[1]
sv <- V0_sv$par[2]
kable(data.frame("Market value" = V0,
"Volatility" = sv),
caption = "Assets.")
Market.value | Volatility |
---|---|
12.39578 | 0.2123041 |
Calculate the probability of default \(N(-d_2)\) as a function of \(V_0\) and \(\sigma_V\).
Code
[1] 0.1269396
The probability of default is 12.69396%.
Let’s analyze the role of the values of \(d_1\), \(d_2\), \(N(d_1)\) and \(N(d_2)\). First, we isolate these values.
Code
d1 Nd1 d2 Nd2
1 1.353282 0.9120172 1.140978 0.8730604
Now, let’s illustrate how \(N(-d_2)\) lead to the probability of default given the properties of a standard normal distribution.
Code
xseq <- seq(-4, 4, 0.01)
densities <- dnorm(xseq, 0, 1) # Standard normal distribution.
# Graphical evaluation of N(-d_2).
plot(xseq, densities, xlab = "d values (-d_2 is the dotted line)",
ylab = "Density", type = "l", lwd = 2, cex = 2)
abline(h = 0)
polygon(c(xseq[xseq >= -d2], -d2), c(densities[xseq >= -d2], 0), col = "green")
polygon(c(xseq[xseq < -d2], -d2), c(densities[xseq < -d2], 0), col = "red")
legend("topleft", legend = c("N(d_2)=0.8730604
in green and
N(-d_2)=0.1269396 in red.
N(d_2)+N(-d_2)=1."),
bg = "white", bty = "n", cex = 0.9)
abline(v = -d2, lty = 2, lwd = 3)
Here you can find an alternative explanation about \(N(-d_2)\):
Now, let’s analyze the minimization that leads to \(V_0\) and \(\sigma_V\). We do not have an analytic math expressions (closed form) to calculate \(V_0\) and \(\sigma_V\), instead we have approximate values or estimates given the minimization of equations 24.3 and 24.4 in (Hull 2015). In order to see how good this approximation is, we can retrieve the value of \([F(x,y)]^2+[G(x,y)]^2\) evaluated at \(x=V_0\) and \(y=\sigma_v\). This value is supposed to be positive as both \(F(x,y)\) and \(G(x,y)\) functions are squared, so we expect a positive number close to zero when evaluated at \(x=V_0\) and \(y=\sigma_v\).
[1] 1.425797e-07
[1] 0
The minimization conducted in the function min.footnote10()
above worked fairly well as the value is zero in practical terms.
There is another way to approach how the minimization work. We can propose a graphical analysis. In particular, we can verify whether \(V_0=12.39578\) and \(\sigma_V=0.2123041\) minimizes the objective function. To illustrate this in two axis, we need to fix either \(V_0\) or \(\sigma_V\), and evaluate \([F(x,y)]^2+[G(x,y)]^2\) with different values of \(V_0\) or \(\sigma_V\). Let’s do that.
Code
# Fix sv and evaluate different values of V0.
V0.seq <- seq(from = 4, to = 20, length.out = 100)
sv.rep <- rep(sv, 100)
# Function to be evaluated.
FG <- function(V0, sv) { (eq24.3(V0, sv))^2 + (eq24.4(V0, sv))^2 }
# Apply the function with fixed sv and different V0 values.
FG.V0 <- mapply(FG, V0.seq, sv.rep)
We are ready to plot.
Code
Clearly, the minimization worked well here. For the sake of completeness, now we fix the \(V_0\) value and evaluate the function again.
Code
# Now fix V0, and evaluate different sv values.
sv.seq <- seq(0, 0.6, length.out = 100)
V0.rep <- rep(V0, 100)
# Evaluate the function with these parameters.
FG.sv <- mapply(FG, V0.rep, sv.seq)
# Plot the results.
plot(sv.seq, FG.sv, type = "l", xlab = expression(paste(sigma[V])),
ylab = expression(paste(F(x, y)^2 + G(x, y)^2)),
cex.lab = 0.8, lwd = 3)
abline(v = sv, lty = 2)
abline(h = FG(V0, sv), lty = 2)
points(sv, FG(V0, sv), pch = 1, cex = 3, col = "red", lwd = 2)
We can show the same as before but in a 3D plot.
Code
Contour plots or level plots are a way to show a three-dimensional surface on a two-dimensional plane.
Code
The following plot cannot be seen in a PDF output, but it can be seen in a browser as an interactive plot. This green plot is nice but it is not interactive. We could plot interactive three-dimension plots with other R packages like plotly
and show them in a html context or java environment.
Code
The Merton model allows us to calculate more values in this credit risk assessment.
Code
# Other results in the Merton's model.
market_value_debt <- V0 - E0
pv_promised_payment_debt <- D * exp(-rf * TT)
Expected_Loss <- (pv_promised_payment_debt - market_value_debt) /
pv_promised_payment_debt
recovery_rate <- 1 - (Expected_Loss / pd)
ans <- data.frame(market_value_debt, pv_promised_payment_debt,
Expected_Loss, recovery_rate, pd)
kable(t(ans), caption = "Other results in the Merton's model.")
market_value_debt | 9.3957765 |
pv_promised_payment_debt | 9.5122942 |
Expected_Loss | 0.0122492 |
recovery_rate | 0.9035039 |
pd | 0.1269396 |
Could we figure out the implied bond yield spread? It would be similar to the fair interest rate that this firm has to pay given its probability of default and recovery rate. Take Hull’s equation 24.2 as a reference: \(\lambda(T)=S(T)/(1-R)\) and solve for \(S(T)\): \(S(T)=\lambda(T)\times(1-R)\), so \(S(T)=(0.12693963)\times(1-0.90350393)\). This is 122.4918 basis points more than the risk-free rate. Then, with five parameters we can calculate (among other things) the firm’s probability of default and the correspondent yield spread of this firm’s bond. It would be interesting to compare this value with the market yield spread and evaluate whether the market yield is over or underestimated according to our theoretical value.
This is the spread function.
Code
spread <- function(E0, se, rf, TT, D) {
eq24.3 <- function(V0, sv) {
((V0 * pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) -
D * exp(-rf * TT) * pnorm(((log(V0 / D) +
(rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) - sv * sqrt(TT)) - E0)) }
eq24.4 <- function(V0, sv) {
((pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) *
sv * V0 - se * E0)) }
min.footnote10 <- function(x)
(eq24.3(x[1], x[2]))^2 + (eq24.4(x[1], x[2]))^2
V0_sv <- optim(c(1, 1), min.footnote10)
V0 <- V0_sv$par[1]
sv <- V0_sv$par[2]
pd <- pnorm(-(((log(V0 / D) + (rf + sv^2 / 2) * TT) /
(sv * sqrt(TT))) - sv * sqrt(TT)))
market_value_debt <- V0 - E0
pv_promised_payment_debt <- D * exp(-rf * TT)
Expected_Loss <- (pv_promised_payment_debt - market_value_debt) /
pv_promised_payment_debt
recovery_rate <- 1 - (Expected_Loss / pd)
spread <- pd * (1-recovery_rate)
spread
}
Let’s evaluate the spread function.
2.3 Inside view of the Merton’s model.
The model assumes that the assets follow a geometric Brownian motion stochastic process. A regular Wiener stochastic process looks like this.
Let’s go back to our main example and assume the daily evolution of the market value of the firm assets over a year. Here are 10 simulated paths of \(V_t\) from \(t=0,...,T\). The reference of this section is (Hull 2015) Chapter 14 and 15.
Code
# Merton assumes V follows a geometric Brownian motion process.
V0.sim <- function(i) { # the argument i is not used here.
GBM(N = 365, T = 1, t0 = 0, x0 = V0, theta = 0.05, sigma = sv) }
set.seed(3) # Reproducibility
paths <- sapply(1:10, V0.sim) # Create 10 paths for V.
# Plot results.
matplot(paths, type = "l", col = "black", lwd = 1, lty = 1,
ylab = expression(paste(V[t])),
xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")
points(rep(366, 8), paths[366, paths[366, ] > 10],
pch = 1, cex = 1.5, col = "green")
points(rep(366, 2), paths[366, paths[366, ] < 10],
pch = 1, cex = 1.5, col = "red")
These simulations can be interpreted as the evolution of the value of the firm’s assets in 10 parallel universes. In the past, some students have called this figure la gráfica de los pelitos, this is OK as long as you understand the figure implications. The right name is 10 simulated geometric Brownian motion processes. Let’s count the cases in which \(V_t<D\) note that here I use \(t\) and not \(T\), so we are counting the cases in which at least at some time \(t\) the value of the assets were lower than 10.
The value of the assets was lower than $10 at some time in the second, fourth and ninth alternate universes. Let’s plot these cases.
Code
# There might be an easier way to select these cases.
matplot(paths[,which(colSums(paths < 10) > 0)], type = "l",
col = c("green", "blue", "red"), lwd = 2, lty = 1,
ylab = expression(paste(V[t])),
xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")
The blue path was lower than 10 at some time, but at \(t=T\) it is clearly higher than 10. For this reason, the blue path does not represent a firm’s default. See how the assets are not high enough to pay the debt at maturity only in two cases: green and red. Thus, according to this simulation the probability of default is 20%. Here is how we can calculate the cases in which \(V_T<D\).
The probability of default of 20% can be disappointing because it is significantly higher than 12.693963%. This is because the number of simulations is small. The following examples show how these values tend to converge as we increase the number of simulations from 10 to 100 and to 100,000.
Here are 100 simulated paths of \(V_0\) to \(V_T\).
Code
set.seed(3) # Reproducibility
paths <- sapply(1:100, V0.sim) # Create 100 paths.
# Plot results.
matplot(paths, type = "l", col = "black", lwd = 1, lty = 1,
ylab = expression(paste(V[t])),
xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")
Given the simulation above, the probability of default is 17%. This is closer to 12.693963%.
Let’s see all the 17 cases that end in default.
Code
Now, let’s see the 11 cases in which \(V_t\) eventually went lower than $10 but at the end did not default.
Code
# V_t < 10
almost_default <- which(colSums(paths < 10) > 0)
# V_T < 10
default <- which((paths[366,] < 10))
# There should be an easier way to do this:
matplot(paths[,almost_default[which(almost_default %in%
default == FALSE)]],
type = "l", col = c(1:11), lwd = 1, lty = 1,
ylab = expression(paste(V[t])),
xlab = "Time in days (from 0 to T)")
abline(h = D, col = "red", lwd = 2)
points(1, V0, pch = 19, cex = 1.5, col = "blue")
Here are 100,000 simulated \(V_T\) showed in a histogram. The reference for this section is 15.1 ((Hull 2015)). Note that instead of drawing the complete paths, we can directly get the distribution of \(V_T\). The resulting distribution is log-normal.
Code
set.seed(13)
# 100,000 values of VT at once.
VT <- exp(rnorm(100000, log(V0) + (rf - (sv^2) / 2)*TT, sv*sqrt(TT)))
# Plot results.
h <- hist(VT, 100, plot = FALSE)
ccat <- cut(h$breaks, c(-Inf, D, Inf))
# You may also need to change the xlim if a big red square appears as changes in maturity may imply larger values of VT. I recommend trying without xlim and see.
plot(h, main = NULL, col = c("red", "blue")[ccat],
xlab = "Value of the assets at maturity", xlim = c(4, 30))
legend("topright",
legend = c("The red area is 12.657%, and represents
the simulated probability of default.
The probability of default of the model
following the textbook example is
N(-d_2)=12.69396%. As we can see,
both are very close."),
col = c("red", "blue"), pch = c(19), bg = "white", bty = "n", cex = 0.8)
The probability of default can be calculated as:
Note that 0.12657 is now much closer than 0.12693963. Convergence achieved.
This firm can be represented as an European call option payoff. The payoff of a typical stock option is \(c=max(S_T-K, 0)\) or equivalently in terms of Merton’s model: \(E_T=max(V_T-D, 0)\).
Code
ET <- pmax(VT - D, 0) # payoff function of a typical call option.
plot(sort(VT), sort(ET), type = "l", lwd = 4, xlim = c(5, 20),
ylim = c(0, 10), xlab = "Simulated assets at maturity (V_T)",
ylab = "Simulated equity at maturity (E_T)")
abline(v = 10, lty = 2)
legend("left", legend = c("According to the model,
E_T=0 happens
in 12.657% of the
100,000 cases."),
bg = "white", bty = "n", cex = 0.9)
This diagram show the firm’s market values in the future at maturity. This is because the \(E_T\) is a function of \(V_T\) and \(D\). As long as \(V_T<D\), then the value of \(E_T=0\). Otherwise, the value of \(E_T= V_T-D\), just as the accounting equation suggest but in future terms. Note that the Merton’s model says nothing about the estimated value of \(E_T\), instead we have an estimate about how likely is that \(V_T<D\), or in other words how likely is that \(E_T<0\).
Then, we know that \(E_T= max(V_T-D, 0)\). The Black-Scholes-Merton formula can also estimate the theoretical value of \(E_0\). This is, if the observable market value given by the market capitalization available in any financial site is \(E_0=3\), we can retrieve the theoretical value of \(E_0\) so we can compare whether the firm observable market value is over or undervalued. In other words, the Black-Scholes-Merton model can give us an estimate of the firm value. In particular, the model gives the value of the equity today as:
\(E_0=V_0N(d_1)-De^{-rT}N(d_2)\)
This is equation 24.3 in (Hull 2015). To estimate \(E_0\), we need \(V_0\) and \(\sigma_V\) and these two values were already estimated before with the implementation of the min.footnote10 function above. It is true that this minimization requires \(E_0\), the point here is to use the observable market capitalization \(E_0=3\) to find out the unobservable \(V_0\) and \(\sigma_V\) and then estimate the theoretical value of \(E_0\).
The code is:
Code
# Black-Scholes call.
E0.theo <- function(V0, D, rf, TT, sv) {
d1 <- (log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))
d2 <- d1 - sv * sqrt(TT)
c <- V0 * pnorm(d1) - D * exp(-rf * TT) * pnorm(d2)
}
V0 <- 12.39578 # Assets.
D <- 10 # Debt.
rf <- 0.05 # Risk-free rate.
TT <- 1 # Maturity.
sv <- 0.2123041 # Volatility of assets.
(E0.theo(V0, D, rf, TT, sv))
[1] 3.000357
Then, we can argue that the market value of the firm is slightly undervalued since the theoretical value of the firm is 3.000357 compared with 3. This approach opens the possibility to value firms. It also opens the possibility to compare the book value in the Balance sheet of the total assets versus the estimate \(V_0\). This will allow us to understand the difference between book value and market value of the assets, not only of the equity. We can even implement a similar analysis about the market value of the debt just as we did before.
2.4 The probability of default as a function of some parameters.
We can expand our analysis by evaluating how changes in parameters change the probability of default in the context of the Merton model. This is interesting because we can move from estimating a probability of default to propose changes in the firm’s parameters like the capital structure to reduce a firm’s probability of default. First, we need the probability of default as a function.
Code
pd <- function(E0, se, rf, TT, D) {
eq24.3 <- function(V0, sv) {
((V0 * pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) -
D * exp(-rf * TT) * pnorm(((log(V0 / D) +
(rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) - sv * sqrt(TT)) - E0)) }
eq24.4 <- function(V0, sv) {
((pnorm((log(V0 / D) + (rf + sv^2 / 2) * TT) / (sv * sqrt(TT))) *
sv * V0 - se * E0)) }
min.footnote10 <- function(x)
(eq24.3(x[1], x[2]))^2 + (eq24.4(x[1], x[2]))^2
V0_sv <- optim(c(1, 1), min.footnote10)
V0 <- V0_sv$par[1]
sv <- V0_sv$par[2]
pd <- pnorm(-(((log(V0 / D) + (rf + sv^2 / 2) * TT) /
(sv * sqrt(TT))) - sv * sqrt(TT)))
pd }
Now we have a convenient function: \(pd = f(E_0, \sigma_E, rf, T, D)\). Remember these five parameters are required to estimate \(V_0\) and \(\sigma_V\) before calculating \(pd\).
Now, we create vectors of parameters and evaluate the pd
function.
Code
l = 50 # Vectors of length 50.
# Create vectors of the parameters.
x.E <- seq(from = 1, to = 20, length.out = l)
x.rf <- seq(0, 4, length.out = l)
x.D <- seq(1, 20, length.out = l)
x.T <- seq(0.5, 20, length.out = l)
x.se <- seq(0, 3, length.out = l)
# Evaluate the pd at different values.
pd.E <- mapply(pd, x.E, se, rf, TT, D)
pd.rf <- mapply(pd, E0, se, x.rf, TT, D)
pd.D <- mapply(pd, E0, se, rf, TT, x.D)
pd.T <- mapply(pd, E0, se, rf, x.T, D)
pd.se <- mapply(pd, E0, x.se, rf, TT, D)
Let’s plot the probability of default as a function of equity. What we are doing here is to evaluate the probability of default function 50 times, as the \(E_0\) has 50 values from 1 to 20. By doing this, we end up with 50 values of the probability of default.
Code
These relationships between the parameters and the probability of default are all non-linear. This is interesting because according to the model, the current levels of the parameters are important when deciding which parameter might lead to a high or low impact over the probability of default.
Code
Probability of default as a function of debt. Note that adding 1 or subtracting 1 unit of debt has different impact over the probability of default.
Code
Probability of default as a function of maturity. Note that asking for a higher debt maturity (as in a negotiation), do not lead to a lower probability of default if we keep the rest of the parameters unchanged in our firm. Imagine we arrange a meeting with the bank manager and we ask for more time to pay our debt. The bank manager should not (in principle) extend the debt maturity as long as we commit to do some changes in the firm. For example, we can negotiate a longer maturity and promise to increase the firm’s equity.
Code
Probability of default as a function of the volatility of the equity \(\sigma_E\).
Code
In the previous plots we evaluate the probability of default function by changing one of the following parameters: \(E_0\), \(\sigma_E\), \(rf\), \(T\) or \(D\). We were able to do that because we constructed a vector of 50 different values for these five parameters. Note that \(V_0\) and \(\sigma_V\) do not remain fixed because they are at the same time a function of \(E_0\), \(\sigma_E\), \(rf\), \(T\) and \(D\). Now, let’s start by plotting the probability of default as a function of \(E_0\) and \(\sigma_E\).
One alternative is to evaluate the probability of default function 50 times by taking the 50 pairs of values of \(E_0\) and \(\sigma_E\). By doing that, we will end with three vectors of size 50 that we can plot as a three dimension scatter plot.
Code
The red point represent the case of the original textbook example in which \(pd = f(E_0=3, \sigma_E=0.8,rf=0.05,T=1,D=10)=0.1269396\). Note that the red point is not part of the 50 plotted observations simply because there is no case in which the probability of default function is evaluated in \(E_0=3, \sigma_E=0.8\). Although the plot above is not incorrect, it might be incomplete as we are not showing all possible values of the probability of default. One alternative to show a more complete plot is to evaluate all possible combinations of \(E_0\) and \(\sigma_E\) to evaluate the probability of default function. If we do that, we will have 50 values for \(E_0\) and \(\sigma_E\) that represent the \(x\) and \(y\) axis, and a \(50 \times 50\) matrix containing the probability of default. This allows us to plot a surface plot and a contour plot.
Let’s plot the results.
Code
# Plot results.
p.ED1 <- persp(x.E, x.se, p_E_se, zlab = "pd",
xlab = "Equity at time zero",
ylab = "Volatility of equity", theta = 60, phi = 10,
expand = 0.5, col = "orange", shade = 0.2,
ticktype = "detailed")
# Add the original pd value as in the textbook example.
points(trans3d(E0, se, pd1, p.ED1), cex = 2, pch = 19, col = "blue")
Now the plot becomes a plane. A contour plot simplify the reading of a three dimension plot by reducing it into two dimensions: \(x\) and \(y\) coordinates. The value of the probability of default is represented by the contour lines.
Code
The blue point is the original case. Note that the value is slightly higher than 0.1, which is consistent with the \(pd=0.1269396\).
Code
l = 40 # Vectors of length 50.
# Create vectors of the parameters.
x.E <- seq(from = 2, to = 4, length.out = l)
x.se <- seq(0.2, 1.5, length.out = l)
# Create the empty matrix.
p_E_se <- matrix(0, nrow = l, ncol = l)
# Fill the empty matrix with probability of default values.
for(i in 1 : l){ # Is there an easier way to do this?
for(j in 1 : l){
p_E_se[i,j] <- mapply(pd, x.E[i], x.se[j], rf, TT, D) } }
Code
plot_ly(type = "surface" , x = x.se, y = x.E , z = p_E_se ) %>%
layout(#title = "Minimization of f at V0=12.395 an sv=0.212",
scene = list(xaxis = list(title = "Volatility of equity"),
yaxis = list(title = "Market value of equity"),
zaxis = list(title = "Probability of default"))) %>%
hide_colorbar()
2.5 GoT: capital structure.
The problem. Daenerys Targaryen owns a big manufacturing firm that produces fire extinguishers. Surprisingly, the firm data corresponds exactly as the data in the example 24.3 ((Hull 2015)). She is planning to ask the Iron Bank of Braavos for a loan using the peaceful civilized way (without dragons). However, she would like to reduce the probability of default of the firm first, in that way she might negotiate better credit conditions. Daenerys has some understanding about very basic finance because she knows that she could either reduce the debt, or increase the capital in order to reduce the probability of default. Doing both alternatives at the same time is clearly more expensive so she is not very keen about it.
What is the best strategy to reduce the probability of default? Reduce the debt by 2 or increase the capital by 2?
Code
D.seq <- seq(from = 0, to = 13, by = 0.1)
# Evaluate pd function: E0 changes from 1 to 5; debt goes from 0 to 13.
E1 <- mapply(pd, 1, se, rf, TT, D.seq)
E2 <- mapply(pd, 2, se, rf, TT, D.seq)
E3 <- mapply(pd, 3, se, rf, TT, D.seq)
E4 <- mapply(pd, 4, se, rf, TT, D.seq)
E5 <- mapply(pd, 5, se, rf, TT, D.seq)
Plot the results.
Code
# We use these vectors for colors and legends in the following plots.
colors <- c("green", "purple", "black", "blue", "red")
leg <- c("E0=1", "E0=2", "E0=3 (initial value)", "E0=4", "E0=5")
# Plot.
plot(D.seq, E1, type = "l", col = "green", lwd = 3,
xlab = "Debt value. Initially, D=10",
ylab = "Probability of default. Initially PD=12.69%")
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = D, lty = 2)
abline(h = pd1, lty = 2)
legend("bottomright", legend = leg, lwd = rep(3, 5), col = colors,
bg = "white")
points(D, pd1, pch = 19, cex = 2)
Let’s illustrate the alternatives as clear as possible.
Code
plot(D.seq, E1, type = "l", col = "green", lwd = 3,
xlab = "Debt value. Initially, D=10",
ylab = "Probability of default. Initially PD=12.69%")
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = D, lty = 2)
abline(v = 8, lty = 2)
abline(h = pd1, lty = 2)
abline(h = pd(E0, se, rf, TT, 8), lty = 2)
abline(h = pd(5, se, rf, TT, D), lty = 2)
legend("bottomright", legend = leg, lwd = rep(3, 5), col = colors,
bg = "white")
points(D, pd1, pch = 19, cex = 2)
points(D, pd(5, se, rf, TT, D), pch = 19, cex = 2, col = "red")
points(8, pd(3, se, rf, TT, 8), pch = 19, cex = 2, col = "yellow")
Consider a different initial situation. Now, \(E_0=2\) and \(D=2\) and the rest remains the same. In this case, the probability of default is now 7.13%. What is the best strategy to reduce the probability of default? Reduce the debt by 1 or increase the capital by 1?
Code
leg2 <- c("E0=1", "E0=2 (initial value)", "E0=3", "E0=4", "E0=5")
plot(D.seq, E1, type = "l", col = "green", lwd = 3,
xlab = "Debt value. Initially, D=2",
ylab = "Probability of default. Initially PD=7.13%",
xlim = c(0, 2.5), ylim = c(0, 0.08))
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = 2, lty = 2)
abline(h = pd(2, se, rf, TT, 2), lty = 2)
legend("bottomright", legend = leg2, lwd = rep(3, 5), col = colors,
bg = "white", cex = 0.6)
points(2, pd(2, se, rf, TT, 2), pch = 19, cex = 2, col = "purple")
The solution can be illustrated as follows:
Code
plot(D.seq, E1, type = "l", col = "green", lwd = 3,
xlab = "Debt value. Initially, D=2",
ylab = "Probability of default. Initially PD=7.13%",
xlim = c(0, 2.5), ylim = c(0, 0.08))
lines(D.seq, E2, col = "purple", lwd = 3)
lines(D.seq, E3, col = "black", lwd = 3)
lines(D.seq, E4, col = "blue", lwd = 3)
lines(D.seq, E5, col = "red", lwd = 3)
abline(v = 2, lty = 2)
abline(v = 1, lty = 2)
abline(h = pd(2, se, rf, TT, 2), lty = 2)
abline(h = pd(2, se, rf, TT, 1), lty = 2)
abline(h = pd(E0, se, rf, TT, 2), lty = 2)
legend("bottomright", legend = leg2, lwd = rep(3, 5), col = colors,
bg = "white", cex = 0.6)
points(2, pd(2, se, rf, TT, 2), pch = 19, cex = 2, col = "purple")
points(1, pd(2, se, rf, TT, 1), pch = 19, cex = 2, col = "yellow")
points(2, pd(E0, se, rf, TT, 2), pch = 19, cex = 2)
Interesting.
Consider a third problem. In a parallel universe, Daenerys Targaryen’s firm has some liquidity short-term troubles. She needs either more cash now or some more time to pay her current debt. She would like to know which alternative leads to the lowest increase of the probability of default. Add $2 more to her current debt, or ask for a half year more time to pay her current debt?
Code
leg3 <- c("T=2.5", "T=2", "T=1.5", "T=1", "T=0.5")
plot(D.seq, T5, type = "l", col = "green", lwd = 3,
xlab = "Debt value. Initially, D=6",
ylab = "Probability of default. Initially PD=20.33%")
lines(D.seq, T4, col = "purple", lwd = 3)
lines(D.seq, T3, col = "black", lwd = 3)
lines(D.seq, T2, col = "blue", lwd = 3)
lines(D.seq, T1, col = "red", lwd = 3)
abline(v = 6, lty = 2)
abline(h = pd(E0, se, rf, 1.5, 6), lty = 2)
legend("bottomright", legend = leg3, lwd = rep(3, 5), col = colors,
bg = "white")
points(6, pd(E0, se, rf, 1.5, 6), pch = 19, cex = 2)
Let’s visualize the result.
Code
plot(D.seq, T5, type = "l", col = "green", lwd = 3,
xlab = "Debt value. Initially, D=6",
ylab = "Probability of default. Initially PD=20.33%")
lines(D.seq, T4, col = "purple", lwd = 3)
lines(D.seq, T3, col = "black", lwd = 3)
lines(D.seq, T2, col = "blue", lwd = 3)
lines(D.seq, T1, col = "red", lwd = 3)
abline(v = 6, lty = 2)
abline(v = 8, lty = 2)
abline(h = pd(E0, se, rf, 1.5, 6), lty = 2)
abline(h = pd(E0, se, rf, 2, 6), lty = 2)
abline(h = pd(E0, se, rf, 1.5, 6), lty = 2)
abline(h = pd(E0, se, rf, 1.5, 8), lty = 2)
legend("bottomright", legend = leg3, lwd = rep(3, 5), col = colors,
bg = "white")
points(6, pd(E0, se, rf, 1.5, 6), pch = 19, cex = 2)
points(8, pd(E0, se, rf, 1.5, 8), pch = 19, cex = 2, col = "yellow")
points(6, pd(E0, se, rf, 2, 6), pch = 19, cex = 2, col = "purple")
Nice.
3 The Gaussian copula model.
Copulas allow us to decompose a joint probability distribution into their marginals (which by definition have no correlation) and a function which couples (hence the name) them together and thus allows us to specify the correlation separately. The copula is that coupling function. Here, we introduce the simplest type of copulas into a very common problem in credit risk which is the time to default.
3.1 The basics.
In a finance-context, the variable \(x\) represents a firm’s future (unknown) performance measure that goes from \(-4\) to \(4\) in the horizontal axis. Strictly speaking, more extreme values of \(x\) like \(-\infty\) and \(+\infty\) are theoretically possible but are very rare and happen quite infrequently at least in real-life situations. At the moment, we do not care too much about what kind of performance measure this is, it could be liquidity for example, solvency, or any other that it is normalized to have values between \(-4\) and \(4\). In a statistics-context, we can think that \(x\) is equivalent to the so-called \(z\)-score in the context of the standardized normal distribution function.
Code
# The standard normal distribution function is: y = f(x).
x <- seq(-4, 4, 0.001) # First define x.
y <- 1 / sqrt(2 * pi) * exp(-x ^ 2 / 2) # Define y as a function of x.
# Now plot.
plot(x, y, type = "l", lwd = 3, col = "blue" ,
ylab = "Standard normal density function: dnorm(x)")
abline(h = 0, lty = 2)
The copula model, including the Gaussian, considers that the future firm performance measure \(x\) in the horizontal axis is related with the firm’s probability of default (from 0 to 1, or 0% to 100%) represented by the value of the cumulative density function pnorm()
of the normal distribution.
Assume the firm will default in six months as long as the firm performance \(x\) in six months is lower than a given threshold: \(\bar{x}=-2.5\). The threshold \(\bar{x}=-2.5\) in the Gaussian copula model is equivalent to the \(-d_2\) in the Merton model. Then, the six months firm’s probability of default in math notation is \(N(-2.5)\), or pnorm(-2.5)=0.006209665
in R code, which is very low as it is close to 0. It is the red area below, which is 0.62% of the 100% possible area under the curve.
Code
Now assume the firm will default in one year as long as the firm performance \(x\) in one year is lower than a higher given threshold: \(\bar{x}=-2\). Then, the one year probability of default is \(N(-2)\) or pnorm(-2)=0.02275013
which is still low but higher than the six months probability of default. Note that the new red area is 2.27%.
Code
At the moment, we have not discussed how to estimate the threshold values \(\bar{x}\). However, we can say that the Gaussian copula model assumes that the probability of default increases as time passes. As if the firm will eventually default in the future. The six month probability of default 0.62% is low because it is not very likely that the future firm performance lies below -2.5 in six months. The one year probability of default 2.27% is a bit higher because it is more likely that the future firm performance lies below -2 in one year.
Then, the pnorm()
R function allows us to transform threshold values \(\bar{x}\) into a probabilities of default. Transform probabilities into \(\bar{x}\) is also possible as the function qnorm()
is the inverse of pnorm()
.
From \(\bar{x}\) to probabilities:
From probabilities to \(\bar{x}\):
Nice.
The function dnorm()
is relevant when we implement a graphical approach because it represents how frequent (or how likely) these values are given the standard normal distribution, so in both extreme values of \(x\) the value of dnorm()
is low. Here, extreme values of \(x\) can be represented by \(-4\) and \(4\). This function delivers the height of the standard normal distribution, dnorm(4)=0.0001338302
, and dnorm(-4)=0.0001338302
. Given that the standard normal function is symmetrical, we have that in general: dnorm(-x)=dnorm(x)
. The letter \(d\) in dnorm()
stands for density and it is maximum at \(x=0\). When plotting the density values we get the bell-shaped normal curve.
See how these pnorm()
, qnorm()
, dnorm()
R functions work and relate:
Code
# Here, x has 11 values only.
x.summary <- c(-Inf, -4:4, Inf) # vector of x values to evaluate functions.
ans <- data.frame(x.summary, dnorm(x.summary),
pnorm(x.summary), qnorm(pnorm(x.summary)))
colnames(ans) <- c("x", "dnorm(x)=height", "pnorm(x)=pd", "qnorm(pd)=x")
kable(ans, caption = "Review of standard normal distribution functions.",
digits = 5)
x | dnorm(x)=height | pnorm(x)=pd | qnorm(pd)=x |
---|---|---|---|
-Inf | 0.00000 | 0.00000 | -Inf |
-4 | 0.00013 | 0.00003 | -4 |
-3 | 0.00443 | 0.00135 | -3 |
-2 | 0.05399 | 0.02275 | -2 |
-1 | 0.24197 | 0.15866 | -1 |
0 | 0.39894 | 0.50000 | 0 |
1 | 0.24197 | 0.84134 | 1 |
2 | 0.05399 | 0.97725 | 2 |
3 | 0.00443 | 0.99865 | 3 |
4 | 0.00013 | 0.99997 | 4 |
Inf | 0.00000 | 1.00000 | Inf |
You can type for example ?dnorm
in the RStudio console to see more details about this (and other) functions.
Note that the only distribution function that we are currently analyzing is the standard normal (Gaussian) distribution function. A standard normal distribution function is the one that has mean 0 and variance 1. There are other copula models that assume other kinds of distributions. These other kinds of copulas are useful when we are interested in modeling cases in which extreme values are more likely to happen compared with the standard normal distribution. Understanding Gaussian copulas allows you to understand other more elaborated copulas.
Instead of a density function, we can plot the cumulative probability distribution. Now, we do not need the \(N(\cdot)\) function to compute the probability as the vertical axis already represents the probability of default.
Code
Both in the same plot.
3.2 Introduction to example 24.6.
Here, we start analyzing Hull (2015) example 24.6. We do not analyze the 10 firms yet as in the original example. Instead, we first make some sense about the data, the relevant analysis, the logic, and the model basics before dealing with the full features in example in Hull (2015) 24.6.
Recall that, pnorm()
leads to a probability, whereas qnorm()
leads to a value of \(x\). According to Hull (2015), the cumulative probabilities of default of 1%, 3%, 6%, 10% and 15% are taken as given for the maturities of 1, 2, 3, 4 and 5 years respectively. To implement the model, we take this information to derive the corresponding threshold values \(\bar{x}\):
Code
year | pd | x (values given in Hull) |
---|---|---|
1 | 0.01 | -2.326348 |
2 | 0.03 | -1.880794 |
3 | 0.06 | -1.554774 |
4 | 0.10 | -1.281552 |
5 | 0.15 | -1.036433 |
Note that the example assumes that the probability of default increases as we consider a longer maturity (from 1 year to 5 years). Probabilities above are cumulative, so they go from year 0 to year 1, from year 0 to year 2 and so on. To calculate the probability of default during a specific year we need to calculate the differences. In particular:
Code
PD | |
---|---|
x.y1 to x.y2 | 0.02 |
x.y2 to x.y3 | 0.03 |
x.y3 to x.y4 | 0.04 |
x.y4 to x.y5 | 0.05 |
This is how we can illustrate the case of a probability of default of 15% in 5 years. The green area represents the 15% of the whole area below the bell-shaped curve.
Code
plot(x, D, type = "l", col = 'black', lwd = 3, ylim = c(0, 0.4), xlab = "x",
ylab = "Standard normal density function: dnorm(x)")
abline(h = 0, lty = 2)
abline(v = x.y[5], lty = 2)
polygon(c(x[x < x.y[5]], x.y[5]),
c(D[x < x.y[5]], 0), col = "green")
legend("topright", legend=c(
"pd(5years)=15%
The left green area
represents 15% of the
whole bell-shape.
N^-1(0.15)=-1.036433
N(-1.036433)=0.15"),
bg = "white", pch = 19, cex = 0.8, bty = "n", col = "green")
A complementary view is the cumulative probability function. Let’s illustrate the same case: a probability of default of 15% in 5 years. In this case, we do not need to calculate the area since the y-axis already represents the probability.
Code
plot(x, P, type = "l", col = 'black', lwd = 3, ylim = c(0, 1),
xlab = "x",
ylab = "Cumulative probability function: pnorm(x)")
abline(h = 0, lty = 2)
abline(h = 1, lty = 2)
lines(seq(-5, x.y[5], length.out = 2), rep(pnorm(x.y[5]), 2),
col = "green", lwd = 3)
lines(rep(x.y[5], 2), seq(0, pnorm(x.y[5]), length.out = 2),
col = "green", lwd = 3)
points(x.y[5], 0.15, pch = 19, col = "green", cex = 2)
legend("right", legend=c(
"pd(5years)=15%
N^-1(0.15)=-1.036433
N(-1.036433)=0.15"),
bg = "white", pch = 19, cex = 1, bty = "n", col = "green")
A closer view to the figure above to see the 3-year and 5-year cases:
Code
plot(x, P, type = "l", col = 'black', lwd = 5, ylim = c(0, 0.22),
xlim = c(-4, 0), ylab = "Cumulative probability function: pnorm(x)",
xlab = "x")
abline(h = 0, lty = 2)
abline(h = 1, lty = 2)
lines(seq(-5, x.y[3], length.out = 2), rep(pnorm(x.y[3]), 2),
col = "purple", lwd = 3, lty = 2)
lines(rep(x.y[3], 2), seq(0, pnorm(x.y[3]), length.out = 2),
col = "purple", lwd = 3, lty = 2)
lines(seq(-5, x.y[5], length.out = 2), rep(pnorm(x.y[5]), 2),
col = "green", lwd = 3, lty = 2)
lines(rep(x.y[5], 2), seq(0, pnorm(x.y[5]), length.out = 2),
col = "green", lwd = 3, lty = 2)
points(x.y[3], 0.06, pch = 19, col = "purple", cex = 2)
points(x.y[5], 0.15, pch = 19, col = "green", cex = 2)
legend("topleft", legend=c("pd(5years)=15%: N(-1.036433)=0.15",
"pd(3years)=6%: N(-1.554774)=0.06"),
pch = 19, col = c("green", "purple"), bg = "white", cex = 1, bty = "n")
Nice.
3.3 One firm.
Now let’s introduce a simulation approach. This is, instead of generating continuous values of \(x\) from \(-4\) to \(4\), we simulate many \(x\) values (10,000 in this case) that follow a standard normal distribution function using the rnorm()
function. The simulation approach is useful especially when we are interested in replicating what happens in real-life situations because we can artificially replicate the observed behavior many times and understand what may happen in the future. In other words, x
was used before to characterize a perfect normal distribution. Now, we incorporate x.sim
that behaves as a normal distribution and allows some randomness just as in real life situations.
See the difference between both approaches to generate \(x\).
Code
x.sim | x |
---|---|
-1.7939058 | -4.000 |
-0.7515449 | -3.999 |
-0.3569900 | -3.998 |
0.9629299 | -3.997 |
0.3680735 | -3.996 |
1.0714250 | -3.995 |
Note that the probabilities of default per maturity in the simulated approach are close to the values of the previous section. In particular, \(0.01\) is equivalent to \(0.0102\), and \(0.03\) is equivalent to \(0.0307\). They do not match exactly simply because we are comparing theoretical versus simulated probabilities.
Code
# Function to calculate proportions that we understand as probabilities.
prop <- function(x) {
ans <- length(x.sim[x.sim <= x]) / N
}
pd.sim <- mapply(prop, x.y) # Apply the function.
ans <- data.frame(x.y, pd, pd.sim)
colnames(ans) <- c("x", "pd.theo", "pd.sim")
rownames(ans) <- c("y1", "y2", "y3", "y4", "y5")
kable(ans, caption = "Theoretic versus simulated probabilities of default.")
x | pd.theo | pd.sim | |
---|---|---|---|
y1 | -2.326348 | 0.01 | 0.0102 |
y2 | -1.880794 | 0.03 | 0.0307 |
y3 | -1.554774 | 0.06 | 0.0629 |
y4 | -1.281552 | 0.10 | 0.1008 |
y5 | -1.036433 | 0.15 | 0.1480 |
Let’s view the results of the simulated approach. First in a histogram.
Code
# Some parameters we need to plot.
L <- c(-4, 4) # axis limits.
colors2 <- c("blue", "red", "purple", "pink", "green")
legend2 = c("pd(1year)=1.02%: x<=-2.326348",
"pd(2years)=3.07%: x<=-1.880794", "pd(3years)=6.29%: x<=-1.554774",
"pd(4years)=10.08%: x<=-1.281552", "pd(5years)=14.8%: x<=-1.036433")
# The histogram.
hist(x.sim, 500, xlim = L, ylim = c(0, 100), main = NULL, xlab = "x.sim")
abline(h = 0, lty = 2)
abline(v = x.y[1], lwd = 3, col = "blue")
abline(v = x.y[2], lwd = 3, col = "red")
abline(v = x.y[3], lwd = 3, col = "purple")
abline(v = x.y[4], lwd = 3, col = "pink")
abline(v = x.y[5], lwd = 3, col = "green")
legend("topright", legend = legend2, bg = "transparent",
text.col = colors2, cex = 0.9)
The area at the left hand side of each colored line represents the cumulative probability of default just as we explained before. In the same way, the area between two colored lines represents the probability of default in a specific period of time.
Now let’s see all the simulated data at once.
Code
plot(x.sim, ylab = "x.sim", pch = ".", ylim = c(-4, 4),
xlab = "Number of simulaltions")
abline(h = x.y[1], lwd = 2, col = "blue")
abline(h = x.y[2], lwd = 2, col = "red")
abline(h = x.y[3], lwd = 2, col = "purple")
abline(h = x.y[4], lwd = 2, col = "pink")
abline(h = x.y[5], lwd = 2, col = "green")
abline(v = 0, lty = 2)
abline(v = 10000, lty = 2)
# legend("topright", legend = legend2, bg = "white",
# text.col = colors2, cex = 0.9)
We normally conduct a simulation approach because we might adapt the distribution function parameters to match what we see in the real life situations. This is how we can reproduce what may happen in the future and at the same time allow for some randomness.
In a different context, here we have an old video in which we drop 2,000 blue balls. Balls are dropped randomly given a normal distribution. Note how the bell-shape distribution is formed as we increase the number of balls.
3.4 Two firms.
Consider the case in which we have two firms instead of one. The main difference now is that instead of simulating one firm we need two. Moreover, each firm follows a standard normal distribution function and both of them are correlated by a given correlation value so the firms are not independent. If they are not independent, then what happens to one firm has some impact on what happens to the other.
In a different context, here we have an old video in which we drop many blue balls. Balls are dropped randomly given a multi-variate normal distribution. Note how the 3-D bell-shape distribution is formed as we increase the number of balls.
The case of two firms is not the one presented in Hull (2015) example but it can help us to visualize how the Gaussian copula approach works in a two-dimension plot where the default correlation is 0.2.
The simulation of both firms’ performance measures is x2
.
Code
m <- 2 # number of firms
n <- 10000 # number of simulations
rho_pos02 <- 0.2 # correlation
corr_pos02 <- matrix(rep(rho_pos02, m * m), m, m) # correlation matrix
diag(corr_pos02) <- 1
set.seed(130575)
x2 <- mvrnorm(n, mu = rep(0, m), Sigma = corr_pos02)
x2 <- data.frame(x2)
colnames(x2) <- c("f1", "f2")
The matrix x2
length is 10,000 for each firm. In other words, we have 10,000 observations of the performance measure or \(z\)-score for two firms that are related. This matrix is big, but we can visualize the header (the first six observations).
f1 | f2 | |
---|---|---|
1 | 0.0880039 | -2.8671108 |
2 | 0.0092849 | -1.1735732 |
3 | 0.0323809 | -0.5854275 |
4 | 1.5547886 | -0.0630241 |
5 | 0.9679767 | -0.3977597 |
6 | 0.7751329 | 0.8847116 |
Remember the cumulative probabilities of default are 1%, 3%, 6%, 10% and 15% for the maturities of 1, 2, 3, 4 and 5 years respectively. How do we extract those cases in which both firms will default in 5 years? In rows 15, 53, 61 and so on both firms default at the same time. Note that in all cases the \(x\) values are indeed below -1.036433.
Code
# These names are going to be useful later.
n.year <- c("year 1", "year 2", "year 3", "year 4", "year 5")
n.pd <- c("pd.y1", "pd.y2", "pd.y3", "pd.y4", "pd.y5")
n.f <- c("f1", "f2", "f1 default?", "f2 default?")
# Function to calculate cases in which firms default and probabilities.
fun.X <- function(x) {
both <- x2[x2$f2 < x & x2$f1 < x, ] # both default.
any <- x2[x2$f2 < x | x2$f1 < x, ] # at least one firm default.
onlyf1 <- x2[x2$f2 > x & x2$f1 < x, ] # only firm 1 default.
onlyf2 <- x2[x2$f1 > x & x2$f2 < x, ] # only firm 2 default.
one <- x2[(x2$f2 < x & x2$f1 > x | # only one firm default.
x2$f2 > x & x2$f1 < x),]
none <- x2[x2$f2 > x & x2$f1 > x, ] # no firm default.
# Gather results and probabilities in a list.
ans <- list(both = both, any = any, onlyf1 = onlyf1,
onlyf2 = onlyf2, one = one, none = none,
both.pd = (nrow(both) / n), any.pd = (nrow(any) / n),
onlyf1.pd = (nrow(onlyf1) / n), onlyf2.pd = (nrow(onlyf2)/ n),
one.pd = (nrow(one) / n), none.pd = (nrow(none) / n))
}
# X has all the relevant results for x2.
X <- mapply(fun.X, x.y)
See the cases in which both firms default in 5 years.
Code
f1 | f2 | f1 default? | f2 default? | |
---|---|---|---|---|
15 | -1.123330 | -1.783569 | TRUE | TRUE |
53 | -2.724912 | -1.235614 | TRUE | TRUE |
61 | -1.968789 | -1.516563 | TRUE | TRUE |
108 | -2.121617 | -1.807062 | TRUE | TRUE |
156 | -1.559930 | -1.132879 | TRUE | TRUE |
157 | -1.194204 | -2.120824 | TRUE | TRUE |
In total, we have 343 cases in which both firms default at the same time. The first case is number 15, the second 53, the third 61 and so on. It is easy to know the total cases if we count the number of rows.
How do we extract those cases in which any firm will default in 5 years? This is, only firm 1, only firm 2, or both at the same time. This is a less strict condition so we would expect to have more cases to match this new criteria compared with both
. Note that in all cases at least one one firm is indeed below -1.036433. In row 1, 2 and 10 firm 2 defaults. In row 15 both firms default. In row 18 and 20 firm 1 and firm 2 default respectively.
Code
f1 | f2 | f1 default? | f2 default? | |
---|---|---|---|---|
1 | 0.0880039 | -2.8671108 | FALSE | TRUE |
2 | 0.0092849 | -1.1735732 | FALSE | TRUE |
10 | 1.4475216 | -1.1512274 | FALSE | TRUE |
15 | -1.1233296 | -1.7835692 | TRUE | TRUE |
18 | -1.5829965 | -0.7174298 | TRUE | FALSE |
20 | -0.9221666 | -1.9982869 | FALSE | TRUE |
Now we have 2,660 cases. Considerably more as the |
restriction is less strict than the &
. Let’s see the cases in which only firm 1 defaults.
Code
f1 | f2 | f1 default? | f2 default? | |
---|---|---|---|---|
18 | -1.582996 | -0.7174298 | TRUE | FALSE |
34 | -2.842355 | 1.5400020 | TRUE | FALSE |
35 | -2.090320 | 0.7431275 | TRUE | FALSE |
36 | -1.600142 | 0.3082653 | TRUE | FALSE |
41 | -2.193496 | -0.8465008 | TRUE | FALSE |
49 | -1.764711 | -0.3914202 | TRUE | FALSE |
Only firm 2 defaults.
Code
f1 | f2 | f1 default? | f2 default? | |
---|---|---|---|---|
1 | 0.0880039 | -2.867111 | FALSE | TRUE |
2 | 0.0092849 | -1.173573 | FALSE | TRUE |
10 | 1.4475216 | -1.151227 | FALSE | TRUE |
20 | -0.9221666 | -1.998287 | FALSE | TRUE |
22 | -0.1110555 | -1.348605 | FALSE | TRUE |
40 | 0.1058949 | -1.752620 | FALSE | TRUE |
Only one firm default at year 5.
Code
f1 | f2 | f1 default? | f2 default? | |
---|---|---|---|---|
1 | 0.0880039 | -2.8671108 | FALSE | TRUE |
2 | 0.0092849 | -1.1735732 | FALSE | TRUE |
10 | 1.4475216 | -1.1512274 | FALSE | TRUE |
18 | -1.5829965 | -0.7174298 | TRUE | FALSE |
20 | -0.9221666 | -1.9982869 | FALSE | TRUE |
22 | -0.1110555 | -1.3486051 | FALSE | TRUE |
Lastly, when no firm defaults.
Code
f1 | f2 | f1 default? | f2 default? | |
---|---|---|---|---|
3 | 0.0323809 | -0.5854275 | FALSE | FALSE |
4 | 1.5547886 | -0.0630241 | FALSE | FALSE |
5 | 0.9679767 | -0.3977597 | FALSE | FALSE |
6 | 0.7751329 | 0.8847116 | FALSE | FALSE |
7 | 1.3361498 | 1.0392145 | FALSE | FALSE |
8 | 1.3039572 | 1.5191303 | FALSE | FALSE |
Finally, probabilities.
Code
both.pd <- t(data.frame(X["both.pd",]))
any.pd <- t(data.frame(X["any.pd",]))
onlyf1.pd <- t(data.frame(X["onlyf1.pd",]))
onlyf2.pd <- t(data.frame(X["onlyf2.pd",]))
one.pd <- t(data.frame(X["one.pd",]))
none.pd <- t(data.frame(X["none.pd",]))
ans <- data.frame(both.pd, any.pd, onlyf1.pd, onlyf2.pd,
one.pd, none.pd)
rownames(ans) <- n.year
kable(ans, caption = "Probabilities of default.")
both.pd | any.pd | onlyf1.pd | onlyf2.pd | one.pd | none.pd | |
---|---|---|---|---|---|---|
year 1 | 0.0003 | 0.0192 | 0.0103 | 0.0086 | 0.0189 | 0.9808 |
year 2 | 0.0019 | 0.0560 | 0.0282 | 0.0259 | 0.0541 | 0.9440 |
year 3 | 0.0066 | 0.1141 | 0.0561 | 0.0514 | 0.1075 | 0.8859 |
year 4 | 0.0176 | 0.1854 | 0.0864 | 0.0814 | 0.1678 | 0.8146 |
year 5 | 0.0343 | 0.2660 | 0.1210 | 0.1107 | 0.2317 | 0.7340 |
So interesting.
Note that 15% is the theoretical probability that one firm will default in 5 years. Here, this 15% is 12.1% for firm 1 and 11.07% for firm 2 when the data is simulated.
We can perform a simple test to verify that everything is alright. For example, this equation must hold: onlyf1+onlyf2=one
. Substituting for the year 5: \(0.121+0.1107=0.2317\). As you can see, everything is alright. This equation must hold as well: any-both=one
. Substituting for the year 5: \(0.266-0.0343=0.2317\).
Let’s visualize all 10,000 cases. Each dot represents a couple of Firm 1 and Firm 2 \(x\) values and the dotted lines the threshold that represents the probability of default in 5 years.
Code
These 10,000 observations are highly concentrated around the mean which is very close to zero. This can be also easily seen in the following density plot.
Code
This is the 3-D rayshader
video version of the plot above.
And a third interactive version.
Code
f <- function(x, y) dmnorm(cbind(x, y), c(0, 0), corr_pos02)
z <- outer(sort(x2[1:100,1]), sort(x2[1:100,2]), f)
#create contour plot
#contour(x2[1:100,1], x2[1:100,2], z)
plot_ly(type = "surface" , x = sort(x2[1:100,2]),
y = sort(x2[1:100,1]) , z = z ) %>%
layout(#title = "Surface",
scene = list(xaxis = list(title = "f1", range = c(-2,2)),
yaxis = list(title = "f2", range = c(-2,2)),
zaxis = list(title = "Density"))) %>%
hide_colorbar()
We can visualize the default cases. First, the case when both firms default at year 5.
Code
The values within the plot represent the number of cases. Here, we have 343 times (out of 10,000) in which both firms default at the same time in 5 years. Note that this is a cumulative probability of default.
Now, the case in which any firm defaults in 5 years.
Code
Only firm 1 defaults in 5 years.
Code
Only firm 2 defaults in 5 years.
Code
This is the case in which only one firm defaults.
Code
This is the case in which no one firm defaults.
Code
It is a good idea to summarize all previous plots in one.
Code
# none
par(mfrow = c(2, 3), oma = c(0, 0, 2, 0))
par(pty = "s")
plot(X[["none", 5]], xlim = L, ylim = L, pch = ".", cex = 1,
main = "None.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("bottomright", legend = c(paste(none.pd[5]*n)), bty = "n")
# both
par(pty = "s")
plot(X[["both", 5]], xlim = L, ylim = L, pch = ".", cex = 1,
main = "Both.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(both.pd[5]*n)), bty = "n")
# any
par(pty = "s")
plot(X[["any", 5]], xlim = L, ylim = L, pch = ".", cex = 1,
main = "Any.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(any.pd[5]*n)), bty = "n")
# onlyfirm1
par(pty = "s")
plot(X[["onlyf1", 5]], xlim = L, ylim = L, pch = ".", cex = 1,
main = "Only f1.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(onlyf1.pd[5]*n)), bty = "n")
# onlyfirm2
par(pty = "s")
plot(X[["onlyf2", 5]], xlim = L, ylim = L, pch = ".", cex = 1,
main = "Only f2.")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("topright", legend = c(paste(onlyf2.pd[5]*n)), bty = "n")
# one
par(pty = "s")
plot(X[["one", 5]], xlim = L, ylim = L, pch = ".", cex = 1,
main = "One.")
abline(h = x.y[5], lty = 2)
abline(v = x.y[5], lty = 2)
legend("topright", legend = c(paste(one.pd[5]*n)), bty = "n")
It is interesting to compare two different correlation values. Here, we compare 0 versus 0.2.
Code
m <- 2 # number of firms
n <- 10000 # number of simulations
rho_pos02 <- 0 # correlation
corr_pos02 <- matrix(rep(rho_pos02, m * m), m, m) # correlation matrix
diag(corr_pos02) <- 1
set.seed(130575)
X0 <- mvrnorm(n, mu = rep(0, m), Sigma = corr_pos02)
X0 <- data.frame(X0)
colnames(X0) <- c("f1", "f2")
Just as before, we create a function, evaluate it, and store results in X0
.
Code
fun.X0 <- function(x) {
both <- X0[X0$f2 < x & X0$f1 < x, ]
any <- X0[X0$f2 < x | X0$f1 < x, ]
onlyf1 <- X0[X0$f2 > x & X0$f1 < x, ]
onlyf2 <- X0[X0$f1 > x & X0$f2 < x, ]
one <- X0[(X0$f2 < x & X0$f1 > x |
X0$f2 > x & X0$f1 < x),]
none <- X0[X0$f2 > x & X0$f1 > x, ]
ans <- list(both = both, any = any, onlyf1 = onlyf1,
onlyf2 = onlyf2, one = one, none = none,
both.pd = (nrow(both) / n), any.pd = (nrow(any) / n),
onlyf1.pd = (nrow(onlyf1) / n), onlyf2.pd = (nrow(onlyf2) /n),
one.pd = (nrow(one) / n), none.pd = (nrow(none) / n)) }
X0 <- mapply(fun.X0, x.y)
none.pd0 <- data.frame(X0["none.pd",]) * n
both.pd0 <- data.frame(X0["both.pd",]) * n
onlyf1.pd0 <- data.frame(X0["onlyf1.pd",]) * n
onlyf2.pd0 <- data.frame(X0["onlyf2.pd",]) * n
A graphical analysis shows that in the case of 0.2 it is more likely that both firms default at the same time, and it is less likely that any firm default at the same time.
Code
par(mfrow=c(1, 2), oma = c(0, 0, 2, 0))
par(pty = "s")
plot(X[["none", 5]], pch = ".", xlim = L, ylim = L,
cex = 1, main = "Correlation = 0.2")
points(X[["both", 5]], pch = ".", col = "red")
points(X[["onlyf1", 5]], pch = ".", col = "purple")
points(X[["onlyf2", 5]], pch = ".", col = "blue")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("bottomleft", legend = c(paste(both.pd[5]*n)), bty = "n")
legend("topright", legend = c(paste(none.pd[5]*n)), bty = "n")
legend("topleft", legend = c(paste(onlyf1.pd[5]*n)), bty = "n")
legend("bottomright", legend = c(paste(onlyf2.pd[5]*n)), bty = "n")
par(pty = "s")
plot(X0[["none", 5]], pch = ".", xlim = L, ylim = L,
cex = 1, main = "Correlation = 0.")
points(X0[["both", 5]], pch = ".", col = "red")
points(X0[["onlyf1", 5]], pch = ".", col = "purple")
points(X0[["onlyf2", 5]], pch = ".", col = "blue")
abline(v = x.y[5], lty = 2)
abline(h = x.y[5], lty = 2)
legend("bottomleft", legend = c(paste(both.pd0[5])), bty = "n")
legend("topright", legend = c(paste(none.pd0[5])), bty = "n")
legend("topleft", legend = c(paste(onlyf1.pd0[5])), bty = "n")
legend("bottomright", legend = c(paste(onlyf2.pd0[5])), bty = "n")
par(pty = "s")
Very interesting indeed.
The Gaussian copula model is not the only copula model. See how other distributions work.
t-coupla:
Gumbel coupla:
Frank coupla:
Clayton coupla:
3.5 Ten firms.
The original Hull (2015) example proposes a 10 firm case and here we implement this example following a simulation approach. First, we need a \(10\times10\) correlation matrix to produce the new \(x\) values using a random multi-variate distribution algorithm. According to Hull (2015) example the copula default correlations between each pair of companies is 0.2. The code below has the option to vary the default correlation given a uniform random distribution.
The new \(10\times10\) correlation matrix is then:
Code
1.0 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 1.0 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 1.0 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 1.0 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 1.0 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 1.0 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 1.0 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 1.0 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 1.0 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 1.0 |
Now, we can simulate the multivariate normal distribution. The variable X10
length is 1,000,000 for each firm. We choose a higher number of simulations because otherwise the most restrictive constraints would not lead to any observations.
Code
# Create the simulated cases.
set.seed(130575) # Reproducibility.
X10 <- mvrnorm(n, mu = rep(0, m), Sigma = x)
X10 <- data.frame(X10) # 10,000,000 observations.
colnames(X10) <- c("f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10")
kable(head(X10), caption = "10 firms' performance, 1,000,000 simulations.",
digits = 3)
f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 | f10 |
---|---|---|---|---|---|---|---|---|---|
0.449 | 2.362 | 1.716 | -0.350 | 0.807 | 1.578 | -0.085 | 1.295 | 0.634 | 1.086 |
0.795 | -2.315 | 0.722 | -0.607 | -0.514 | 0.540 | 1.620 | 2.240 | 0.310 | 1.184 |
0.513 | 0.318 | 0.256 | -0.146 | 0.668 | 1.071 | 0.929 | -1.511 | 1.701 | -1.908 |
0.643 | -1.120 | -0.818 | 0.398 | -0.985 | -2.094 | 0.412 | -0.001 | -0.699 | -0.831 |
-1.404 | -0.385 | 0.916 | 1.768 | 0.785 | 0.425 | -0.873 | -1.025 | -0.164 | -1.990 |
0.038 | -0.818 | 0.402 | -1.354 | -0.153 | -0.830 | -1.523 | -0.856 | -0.433 | -0.142 |
How do we extract those cases in which all 10 firms will default in 5 years (at the same time)? Here are the first 6 of those cases. Note that in all cases the X10
values are lower than \(-1.036433\).
Code
# Given that we have 10 firms, it is easier to use filter_all function.
# Although probably this could be simplified even further.
y5.all.2 <- filter_all(X10, all_vars(. < x.y[5]))
y4.all.2 <- filter_all(X10, all_vars(. < x.y[4]))
y3.all.2 <- filter_all(X10, all_vars(. < x.y[3]))
y2.all.2 <- filter_all(X10, all_vars(. < x.y[2]))
y1.all.2 <- filter_all(X10, all_vars(. < x.y[1]))
Let’s analyze the case of 5 years.
Code
f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 | f10 |
---|---|---|---|---|---|---|---|---|---|
-1.625 | -1.198 | -1.740 | -2.203 | -1.489 | -3.152 | -1.395 | -1.409 | -2.809 | -2.027 |
-1.410 | -3.567 | -1.849 | -2.273 | -1.336 | -1.638 | -1.289 | -2.135 | -2.671 | -1.675 |
-3.617 | -1.495 | -1.210 | -1.367 | -3.220 | -1.481 | -1.784 | -1.891 | -1.443 | -1.375 |
-2.624 | -1.162 | -2.479 | -2.613 | -1.929 | -1.734 | -3.395 | -2.635 | -2.289 | -1.593 |
-1.834 | -1.063 | -1.888 | -1.390 | -3.494 | -1.280 | -2.108 | -1.664 | -4.007 | -2.058 |
-2.083 | -1.800 | -2.085 | -2.775 | -1.437 | -2.610 | -1.215 | -1.739 | -1.219 | -2.841 |
f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 | f10 |
---|---|---|---|---|---|---|---|---|---|
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
How many cases are there?
Which are those 72 cases?
Code
11119 | 92221 | 252520 | 350035 | 402689 | 496478 | 574643 | 633402 | 701255 |
18645 | 131986 | 266761 | 354133 | 413425 | 511731 | 576215 | 643935 | 708339 |
45042 | 165394 | 284189 | 363067 | 423852 | 522630 | 585654 | 644895 | 727092 |
57098 | 201832 | 284680 | 374237 | 437629 | 523010 | 615497 | 647511 | 747221 |
57696 | 227833 | 302542 | 384869 | 438202 | 527750 | 630487 | 650216 | 789221 |
81982 | 248381 | 328106 | 387617 | 477385 | 560115 | 630626 | 688847 | 797265 |
In total, we only have 72 cases. This is, the 10 firms will default at the same time in 5 years in 72 out of 1,000,000 total cases. For the rest of the years, the cases are less frequent, in fact we have zero cases for year 1, 2 and 3.
How do we extract those cases in which at least one of the 10 firms will default?
Here are the first 6 cases for the 5-year default.
f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 | f10 |
---|---|---|---|---|---|---|---|---|---|
0.795 | -2.315 | 0.722 | -0.607 | -0.514 | 0.540 | 1.620 | 2.240 | 0.310 | 1.184 |
0.513 | 0.318 | 0.256 | -0.146 | 0.668 | 1.071 | 0.929 | -1.511 | 1.701 | -1.908 |
0.643 | -1.120 | -0.818 | 0.398 | -0.985 | -2.094 | 0.412 | -0.001 | -0.699 | -0.831 |
-1.404 | -0.385 | 0.916 | 1.768 | 0.785 | 0.425 | -0.873 | -1.025 | -0.164 | -1.990 |
0.038 | -0.818 | 0.402 | -1.354 | -0.153 | -0.830 | -1.523 | -0.856 | -0.433 | -0.142 |
0.353 | -0.045 | -0.393 | -1.914 | -0.168 | -0.439 | -0.379 | -0.742 | -1.612 | -2.775 |
f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 | f10 |
---|---|---|---|---|---|---|---|---|---|
FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | TRUE |
FALSE | TRUE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE |
TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE |
FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE |
FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE |
How many cases are these?
In total, we have 682,148 cases. This is, at least one of 10 firms will default in 5 years in 682,148 of 1,000,000 cases.
And how to convert them into probabilities?
Code
atleastone02 <- t(data.frame(nrow(y1.any.2), nrow(y2.any.2),
nrow(y3.any.2), nrow(y4.any.2), nrow(y5.any.2)))
all02 <- t(data.frame(nrow(y1.all.2), nrow(y2.all.2),
nrow(y3.all.2), nrow(y4.all.2), nrow(y5.all.2)))
res02 <- data.frame(all02 / n, atleastone02 / n)
rownames(res02) <- n.year
colnames(res02) <- c("All firms", "At least one")
kable(res02, caption = "Probabilities of default (10 firms, corr=0.2).")
All firms | At least one | |
---|---|---|
year 1 | 0.0e+00 | 0.086835 |
year 2 | 0.0e+00 | 0.226086 |
year 3 | 0.0e+00 | 0.386264 |
year 4 | 1.0e-05 | 0.543684 |
year 5 | 7.2e-05 | 0.682440 |
Let’s see the difference when we assume a different correlation matrix. This case, the correlation vary randomly between 0.45 and 0.65.
Code
1.000 | 0.622 | 0.597 | 0.622 | 0.530 | 0.517 | 0.518 | 0.490 | 0.455 | 0.578 |
0.622 | 1.000 | 0.523 | 0.594 | 0.465 | 0.501 | 0.558 | 0.560 | 0.460 | 0.513 |
0.597 | 0.523 | 1.000 | 0.556 | 0.485 | 0.603 | 0.534 | 0.524 | 0.611 | 0.547 |
0.622 | 0.594 | 0.556 | 1.000 | 0.564 | 0.555 | 0.496 | 0.500 | 0.502 | 0.513 |
0.530 | 0.465 | 0.485 | 0.564 | 1.000 | 0.608 | 0.513 | 0.559 | 0.647 | 0.505 |
0.517 | 0.501 | 0.603 | 0.555 | 0.608 | 1.000 | 0.452 | 0.486 | 0.644 | 0.518 |
0.518 | 0.558 | 0.534 | 0.496 | 0.513 | 0.452 | 1.000 | 0.577 | 0.635 | 0.466 |
0.490 | 0.560 | 0.524 | 0.500 | 0.559 | 0.486 | 0.577 | 1.000 | 0.585 | 0.524 |
0.455 | 0.460 | 0.611 | 0.502 | 0.647 | 0.644 | 0.635 | 0.585 | 1.000 | 0.474 |
0.578 | 0.513 | 0.547 | 0.513 | 0.505 | 0.518 | 0.466 | 0.524 | 0.474 | 1.000 |
The resulting values of Xr
are:
Code
X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 |
---|---|---|---|---|---|---|---|---|---|
0.673 | 1.507 | 0.776 | 1.038 | 1.997 | 0.557 | 1.759 | 1.578 | 2.529 | 1.326 |
-0.181 | -0.887 | -0.485 | 1.532 | 0.829 | 2.031 | 0.800 | 0.824 | 0.471 | 0.838 |
0.731 | -0.826 | 1.679 | -1.181 | -0.034 | -0.270 | 0.478 | 0.920 | 0.412 | 0.818 |
-0.852 | -0.288 | -0.362 | -0.457 | -1.596 | 0.005 | -0.153 | -1.029 | -1.703 | -0.914 |
0.692 | -0.689 | 0.906 | -1.140 | -0.080 | 0.082 | -0.968 | -2.028 | 0.089 | 0.256 |
-1.123 | -0.810 | -1.242 | -1.438 | -0.850 | -0.394 | -0.601 | -0.458 | 0.080 | -1.393 |
Extract the cases in which all ten firms default at the same time in 5 years and the cases in which any firm default at the same time in 5 years.
Code
Xr <- as_tibble(Xr)
# All firms.
Xr.y1.all <- filter_all(Xr, all_vars(. < x.y[1]))
Xr.y2.all <- filter_all(Xr, all_vars(. < x.y[2]))
Xr.y3.all <- filter_all(Xr, all_vars(. < x.y[3]))
Xr.y4.all <- filter_all(Xr, all_vars(. < x.y[4]))
Xr.y5.all <- filter_all(Xr, all_vars(. < x.y[5]))
# At least one firm.
Xr.y1.any <- filter_all(Xr, any_vars(. < x.y[1]))
Xr.y2.any <- filter_all(Xr, any_vars(. < x.y[2]))
Xr.y3.any <- filter_all(Xr, any_vars(. < x.y[3]))
Xr.y4.any <- filter_all(Xr, any_vars(. < x.y[4]))
Xr.y5.any <- filter_all(Xr, any_vars(. < x.y[5]))
All firms default in 5,935 cases out of 1,000,000.
Let’s see the first 6 cases:
Code
X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 |
---|---|---|---|---|---|---|---|---|---|
-1.357 | -1.241 | -2.165 | -1.486 | -1.820 | -1.181 | -1.816 | -1.146 | -1.823 | -2.462 |
-1.874 | -1.751 | -2.281 | -1.390 | -2.020 | -2.189 | -1.752 | -2.580 | -2.062 | -2.731 |
-1.701 | -2.546 | -1.671 | -1.895 | -1.852 | -1.603 | -1.292 | -2.384 | -1.184 | -1.592 |
-1.443 | -1.452 | -2.499 | -1.561 | -1.166 | -2.392 | -1.658 | -1.369 | -3.522 | -2.372 |
-1.326 | -2.036 | -1.252 | -1.729 | -2.569 | -1.650 | -1.795 | -1.235 | -2.822 | -1.722 |
-2.314 | -2.703 | -2.063 | -3.262 | -1.551 | -2.172 | -2.205 | -3.088 | -2.130 | -1.908 |
Verify that those cases default.
X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 |
---|---|---|---|---|---|---|---|---|---|
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
At least one firm default in 497,987 out of 1,000,000.
X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 |
---|---|---|---|---|---|---|---|---|---|
0.731 | -0.826 | 1.679 | -1.181 | -0.034 | -0.270 | 0.478 | 0.920 | 0.412 | 0.818 |
-0.852 | -0.288 | -0.362 | -0.457 | -1.596 | 0.005 | -0.153 | -1.029 | -1.703 | -0.914 |
0.692 | -0.689 | 0.906 | -1.140 | -0.080 | 0.082 | -0.968 | -2.028 | 0.089 | 0.256 |
-1.123 | -0.810 | -1.242 | -1.438 | -0.850 | -0.394 | -0.601 | -0.458 | 0.080 | -1.393 |
-1.089 | -1.470 | -1.292 | -2.389 | -1.046 | -1.960 | 0.653 | -1.052 | -1.027 | -1.017 |
-1.280 | -0.320 | -2.717 | -1.976 | -1.970 | -1.238 | -0.891 | -2.161 | -0.682 | -0.657 |
X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 |
---|---|---|---|---|---|---|---|---|---|
FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | TRUE | FALSE |
FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE |
TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | TRUE | FALSE | FALSE |
TRUE | FALSE | TRUE | TRUE | TRUE | TRUE | FALSE | TRUE | FALSE | FALSE |
And the corresponding probabilities:
Code
allR <- t(data.frame(nrow(Xr.y1.all), nrow(Xr.y2.all),
nrow(Xr.y3.all), nrow(Xr.y4.all), nrow(Xr.y5.all)))
atleastoneR <- t(data.frame(nrow(Xr.y1.any), nrow(Xr.y2.any),
nrow(Xr.y3.any), nrow(Xr.y4.any), nrow(Xr.y5.any)))
resR <- data.frame(allR / n, atleastoneR / n)
ans <- data.frame(res02, resR)
rownames(ans) <- n.year
colnames(ans) <- c("All (corr=0.2)", "At least one (corr=0.2)",
"All (corr=rand)", "At least one (corr=rand)")
kable(ans, caption = "Probability of default.",
row.names = TRUE)
All (corr=0.2) | At least one (corr=0.2) | All (corr=rand) | At least one (corr=rand) | |
---|---|---|---|---|
year 1 | 0.0e+00 | 0.086835 | 0.000021 | 0.062784 |
year 2 | 0.0e+00 | 0.226086 | 0.000184 | 0.156169 |
year 3 | 0.0e+00 | 0.386264 | 0.000783 | 0.266124 |
year 4 | 1.0e-05 | 0.543684 | 0.002438 | 0.382602 |
year 5 | 7.2e-05 | 0.682440 | 0.006144 | 0.497513 |
4 Credit VaR example 24.7.
This replicates Hull (2015) example 24.7.
This is the Vasicek model (equation 24.10) in a function form.
See if it works. Let’s evaluate equation 24.10 at 99% and 99.9%. The 1-year 99% and 99.9% credit VaR is:
Code
[1] 5.129484
[1] 3.294271
Let’s evaluate the model at all confidence levels (from 0 to 1) simulating 10,000 values.
Now, visually:
Code
hist(Loss, 100, xlim = c(0, 7), xlab = "Losses in millions", main = NULL)
abline(v = CVaR.999, lty = 2, col = "red", lwd = 2)
abline(v = CVaR.99, lty = 2, col = "blue", lwd = 2)
legend("topright", legend = c("1-year 99% credit VaR is 3.294271",
"1-year 99.9% credit VaR is 5.129484"),
col = c("blue", "red"), lwd = 2, lty = 2, bg = "white", cex = 0.8)
Code
dat <- data.frame(Loss, l)
ggplot(dat, aes(x = Loss, fill = "Losses in millions")) +
geom_density(color = "darkblue", fill = "lightblue") +
geom_vline(aes(xintercept = CVaR.999),
color = "red", linetype = "dashed", size = 1) +
geom_vline(aes(xintercept = CVaR.99 ),
color = "blue", linetype = "dashed", size = 1) +
scale_x_continuous(labels = scales::dollar)
Nice.
This document took 44.41 seconds to compile in Quarto version 1.3.450, and R version 4.3.2 (2023-10-31 ucrt).