ABCDEFG
1
This project is deprecated.
2
3
Automated Air Quality Index and Allergen Database
4
By Damian M. Garcia, PharmD
5
https://damianmgarcia.com
6
Last Updated: 2020-09-13
7
8
PERMISSIONS
9
This database may be viewed by anyone with the link, but it will function in a read-only mode for those users,
10
meaning changes cannot be made and many functions cannot be tested.
11
12
CODE
13
JS
14
15
FUNCTION
16
Takes a date (required) and dyspnea level (required if manual entry) and attempts to extract air quality
17
index (AQI) and allergen level data for that date for the San Antonio, Texas area from two data sources. The
18
air quality data source is a "clean" JSON file from airnow.gov. The allergen data source (the only one I
19
could find with numerical values) is a "dirty" HTML file from news4sanantonio.com. A dyspnea scale (0-4),
20
which is not a validated scale, is included to study statistical relationships between dyspnea and
21
environmental factors such as air pollution and airborne allergens.
22
23
FEATURES
24
1) Automated Database Updates
25
The program automatically extracts yesterday's data on a daily basis. The program can also take dates
26
entered by the user. Simply enter a date and dyspnea rating, then click the (+) button.
27
28
2) Automated Quality Control
29
The program attempts to validate data before adding it to the database. If the program is only able to get
30
partial data (i.e. only AQI or only allergens) due to validity issues, it will change the format of the
31
corresponding date to yellow and bold to notify database users that data extraction for that date may be
32
incomplete. If the program is not able to get any AQI or allergen data, it will change the format of the
33
corresponding date to red and bold to notify database users that data extraction for that date failed. For
34
either case, whether partial or complete data extraction failure, the program will automatically note these
35
cases and attempt to extract the data on a once-daily basis for up to 3 more attempts.
36
37
3) Automated Data Auditing
38
If a user attempts to change a date in the database, the program will notify the user that if they continue,
39
this action will result in the deletion of the date and its associated data from the database. If a user
40
attempts to change a value in the database, the program will request a reason for the change, which will then
41
be included in a data change log for that value along with a timestamp, the user's ID, the old value, and the
42
new value. If additional changes are made to the same value in the future, those changes will be appended to
43
the data change log.
44
45
4) Smart Data Filter
46
This feature allows the user to intelligently filter and show only data within a certain number of days of
47
interest. More than simply hiding the date rows past the cutoff, the program also scans through all data
48
columns to find columns without any data points. Columns that lack any data points for the date range of
49
interest will automatically be hidden, thus only showing relevant data.
50
51
5) Automated Creation of New Data Columns
52
The program will automatically create new data columns when necessary, such as when the program encounters a
53
new allergen not already in the database.
54
55
6) Data Categories at a Glance
56
The data values for AQI and allergens are colored to indicate severity categories according to their category
57
ranges. The category ranges that are used to determine colors are based on the category ranges created by the
58
United States Environmental Protection Agency for the AQI and by the American Academy of Allergy, Asthma &
59
Immunology National Allergen Bureau for allergen levels.
60
61
CHALLENGES
62
1) Parsing "Dirty" Data
63
Parsing data from the news4sanantonio.com HTML file is challenging. HTML as a source is not usually ideal: it
64
is full of various HTML tags between data points (e.g. <p>, <br>, etc.). Additionally, the dataset, although
65
extensive, is inconsistently formatted: allergen names for the same allergen sometimes vary (e.g. Mt Cedar vs
66
Mt. Cedar), the date formats in the dataset are inconsistent (e.g. mm-dd-yyyy vs m-dd-yyyy vs m-d-yy, etc.),
67
and there is also inconsistent use of delimiters (e.g. "," vs ";" vs ":").