Introduction to Loss Distributions

Valentine Chisango
6 min readFeb 20, 2023

An illustration of distribution selection, parameter estimation and goodness-of-fit testing

In the ordinary course of business, a general insurer will receive claims that will most likely be of various amounts. The insurer will need to understand the distribution of these claims for functions including pricing and reserving. In practice, the true underlying claims distribution is unlikely to be known and instead the insurer will have to use available data to approximate this distribution. What follows is an end-to-end walk through of that process, both by hand and with the aid of statistical computing in R.

Loss distributions are defined as statistical distributions used to model individual claim amounts. There are numerous statistical distributions that can be used for this purpose. In practice, long-tailed positively skewed distributions usually provide the best fit for general insurance claims. Suppose we have observed the motor claims experience of a general insurer, summarised by the histogram and table below, and wish to fit a loss distribution to this data.

Graph by author
Selected summary statistics for the motor insurance claims data

The first step is to choose an appropriate distribution, or a set of distributions. For simplicity, we will use a simple loss distribution with only one parameter — the exponential distribution. Recall that for an exponential distribution, the probability density function (pdf), cumulative density function (cdf) and first moment are defined as follows:

Exponential distribution pdf, cdf and expectation

The second step is to estimate the parameter(s) for the chosen distribution(s) — i.e. fitting the distribution(s) . There are three approaches at our disposal for the estimation of the lambda parameter, all of which are easily extended to estimating parameters for more complex distribution. The first is the method of moments. In this approach, we equal the first theoretical moment of the exponential distribution to the first sample moment — that is, the mean observed in the sample data. For distributions with more than one parameter, higher order moments would be used in the same fashion.

Method of moments approach estimate for lambda

The next approach is maximum likelihood estimate. Here, the estimate of lambda is the value that maximises the probability of observing the data that was observed. This probability is represented by, for a given value of lambda, the likelihood function. In practice it is usually simpler (and yields the same estimate) to maximise the logarithm of the likelihood function. As per normal, maximizing the function requires differentiating the log-likelihood with respect to lambda, setting the resulting expression equal to zero and solving for lambda. For distributions with more parameters the method is extended by following the same steps (differentiate, equate to zero, solve) for each parameter.

Maximum likelihood estimate of lambda

The final approach is the method of percentiles. This approach is similar to the method of moments approach with the difference being that instead of equating moments we equate percentiles. More specifically, we equate the theoretical median of the distribution to the sample median. For distributions with more than one parameter, we would need to make use of other percentiles. For example, for a two-parameter distribution we may choose to instead use the upper and lower quartiles.

Method of percentiles estimate for lambda

In R, the three estimates of lambda can be easily computed in a few lines using the same logic as above:

x_bar = sum_xi/N
lambda_mme = 1/x_bar
lambda_mle = 1/x_bar
lambda_mpe = -log(0.5)/median_xi

Now that we have fitted a “candidate” distribution, the final step is testing the goodness-of-fit of this distribution. A distribution that provides a poor fit is not going to be useful for the insurer’s pricing and reserving tasks, so this step is essential. A chi-squared goodness-of-fit can be applied to this end. For this test, we need to first group the data into buckets by claim amount. Then we count the observed number of claims falling into each bucket, and the expected number of claims falling into each bucket. The expected numbers for each bucket are calculated by multiplying the total number of observations by the probability that a claim would fall into that group. The test statistic is then computed as follows:

In our specific case, we are testing the null hypothesis that claim amounts are exponentially distributed against the alternative hypothesis that claim amounts are not exponentially distributed. Since the parameter lambda is estimated from the data, we lose an additional degree of freedom — i.e. one degree of freedom is lost for each parameter estimated from the data.

Hypotheses for goodness-of-fit test

We begin by selecting 10 buckets each of size 1000. The buckets represent the ranges or intervals for the claim amounts, i.e. (0; 1000], (1000; 2000], … (9000; 10000]. For the observed amount we simply count the number of observed claims that fell into each interval. In R, this is achieved by specifying our buckets (in the breaks variable), using the cut function to allocate each observed claim into the appropriate bucket and using the table function to count the number of claims in each bucket.

breaks = seq(from = 0, to = 10000, by = 1000)
intervals = cut(loss_data, breaks = breaks)
observed_counts = as.numeric(table(intervals))

For the expected counts, we compute the probability of an observation from an exponential random variable falling into each range and multiple this by the total number of observations. This needs to be done three times, one for each estimate of lambda since each different estimate of lambda corresponds to a different loss distribution. The R code follows the same structure for each of three.

intervals_text <- str_replace_all(levels(intervals), "[()\\[\\]]", "")

interval_ranges <- lapply(intervals_text, function(x) c(as.numeric(strsplit(x, ",")[[1]])))

expected_counts_mme = c()
for (i in 1:length(interval_ranges)){
expected_counts_mme[i] = length(loss_data)*(pexp(interval_ranges[[i]][2], rate = lambda_mme) - pexp(interval_ranges[[i]][1], rate = lambda_mme))
}

The test statistics are then computed using the formula from above. For each test statistic we can calculate the exact the p-value in R. Recall that the p-value represents the probability of observing a test statistic as extreme or more extreme than one observed if the null hypothesis is true.

test_stat_mle = sum((observed_counts-expected_counts_mle)^2/expected_counts_mle)
p_value_mle = pchisq(test_stat_mle, df = length(observed_counts)-2, lower.tail = FALSE)

In this instance, the p-value is effectively zero for all our estimates indicating very strong evidence against the null hypothesis. Other distributions should therefore be explored and similarly assessed. The graph below adds the density of the exponential distributions fitted above as well as the density of a better fitting log-normal distribution for illustrative purpose. Recall that the method of moments and method of maximum likelihood approaches produced the same estimate and thus have the same density.

Graph by author

The repository with the R code to reproduce the graphs and calculations can be found here: https://github.com/ValentineChisango/A212-CS2

--

--