ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
Introduction
2
This sheet aims to make the data about the DV program more accessible and provide data-driven insights.

The official data is not easily accessible, since it's locked up in various document formats (PDF, HTML) and also isn't uniform over the years. I cleaned it up and resolved many conflicts, for example due to changing country names (e.g. Swaziland to Eswatini). With the cleaned up data I created interactive pivot tables and charts that allow more insights into the DV program, for example how the chance to win varies by country and by year.
3
4
Usage
5
The "Entered", "Selected" and "Issued" sheets show the number of people that entered the DV program, were selected by the DV program (i.e. won), were successfully issued a visa through the DV program. You can interact with the pivot table, e.g. show value as percentage of the column.

The "Charts" sheet shows a few basic charts. More can be made easily using data obtained from the pivot tables. The chart "Chances Of Winning By Country" can be customized by changing the value in the orange cells.
6
7
Notes
8
The inner workings are in hidden to make using this sheet as easy as possible. You can make them visible again by going to View -> Hidden sheets.

We count "Sub-Countries" (countries that belong to another country, e.g. French islands) as part of that country. The DV program counts them as standalone countries. Since they don't add much the numbers for the "parent countries" shouldn't be off by a lot.
9
10
Sources of Data
11
The data is obtained from https://travel.state.gov/content/travel/en/us-visas/immigrate/diversity-visa-program-entry/diversity-visa-program-statistics.html.

I used Tabula to extract the tables from the .pdf and import them as .csv, and Google Sheet's ImportHTML() function to extract the tables from the .html pages. The harder part is to clean it up, unpivot it by year (as well as entry type for entries), and consolidate it with the existing data. You can see in the "* Data" sheets how the cleaned up data should look like.
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100