ACTUARIES & DATA SCIENCE
Jerome Tuttle, FCAS, CPCU
Retired Actuary
1
2
What is an actuary?
Jack Nicholson – Ben Stiller –
About Schmidt (2002) Along Came Polly (2004)
3
Insurance is a unique business
■ Rates may not be unfairly discriminatory.
■ A rate is unfairly discriminatory to a group of risks if the rate does not bear a reasonable relationship to the expected loss experience among the risks.
$$$$$ $$$
4
The intersection among math/stats, computer sci, & subject matter knowledge to extract meaningful insights from data translating into tangible business value.
What is data science?
5
Examples of data science e
6
Actuaries and data science
Generalized linear models, K-nearest neighbors, K-means clustering, Bayes classifier, decision trees, random forest, principal component analysis. Also a predictive anal specialty.
7
Randomly split data into training versus testing data
RMSE on test data = √[∑ (Actual – Predicted)2 / n]
8
Some actuarial examples of data science techniques
Decision trees: underwriting
Clustering: territories
Principal component analysis: detect fraud
9
For insurance rating, we group (hopefully) similar customers into classes and charge an average rate for the class. Classification is rarely perfect.
Before classification After classification
10
Insurance classes may include age, gender, urban / rural territory, marital status, miles driven, claims history, car type, car age, etc.
But within each n-dimensional slice, there is still considerable variability. A company wants to choose the better than average customers within each class to make a profit.
11
Generalized Linear Models: pricing
Times factor for Age i = 1.50
Times factor for Gender j = 1.20
Times factor for Territory k = 1.40, ... , etc.
Effect of telematics on claims
Underwriting score cards
Predict claims likely to settle far above their initial estimate
12
Decision trees: underwriting
13
Clustering: territories
Yao, J. (2008). Clustering in ratemaking; applications in territories clustering. Casualty Actuarial Society Predictive Modeling Seminar
14
Principal component analysis: fraud detection
of mutually uncorrelated variables that preserves
as much variability as possible.
15
Credit scoring
■ Loss ratio = f (variables X1, …, Xn)
■ Probability of default = f (variables X1, …, Xn)
16
Text Analysis
17
References