A | B | C | D | E | F | G | |
---|---|---|---|---|---|---|---|
1 | This project is deprecated. | ||||||
2 | |||||||
3 | Automated Air Quality Index and Allergen Database | ||||||
4 | By Damian M. Garcia, PharmD | ||||||
5 | https://damianmgarcia.com | ||||||
6 | Last Updated: 2020-09-13 | ||||||
7 | |||||||
8 | PERMISSIONS | ||||||
9 | This database may be viewed by anyone with the link, but it will function in a read-only mode for those users, | ||||||
10 | meaning changes cannot be made and many functions cannot be tested. | ||||||
11 | |||||||
12 | CODE | ||||||
13 | JS | ||||||
14 | |||||||
15 | FUNCTION | ||||||
16 | Takes a date (required) and dyspnea level (required if manual entry) and attempts to extract air quality | ||||||
17 | index (AQI) and allergen level data for that date for the San Antonio, Texas area from two data sources. The | ||||||
18 | air quality data source is a "clean" JSON file from airnow.gov. The allergen data source (the only one I | ||||||
19 | could find with numerical values) is a "dirty" HTML file from news4sanantonio.com. A dyspnea scale (0-4), | ||||||
20 | which is not a validated scale, is included to study statistical relationships between dyspnea and | ||||||
21 | environmental factors such as air pollution and airborne allergens. | ||||||
22 | |||||||
23 | FEATURES | ||||||
24 | 1) Automated Database Updates | ||||||
25 | The program automatically extracts yesterday's data on a daily basis. The program can also take dates | ||||||
26 | entered by the user. Simply enter a date and dyspnea rating, then click the (+) button. | ||||||
27 | |||||||
28 | 2) Automated Quality Control | ||||||
29 | The program attempts to validate data before adding it to the database. If the program is only able to get | ||||||
30 | partial data (i.e. only AQI or only allergens) due to validity issues, it will change the format of the | ||||||
31 | corresponding date to yellow and bold to notify database users that data extraction for that date may be | ||||||
32 | incomplete. If the program is not able to get any AQI or allergen data, it will change the format of the | ||||||
33 | corresponding date to red and bold to notify database users that data extraction for that date failed. For | ||||||
34 | either case, whether partial or complete data extraction failure, the program will automatically note these | ||||||
35 | cases and attempt to extract the data on a once-daily basis for up to 3 more attempts. | ||||||
36 | |||||||
37 | 3) Automated Data Auditing | ||||||
38 | If a user attempts to change a date in the database, the program will notify the user that if they continue, | ||||||
39 | this action will result in the deletion of the date and its associated data from the database. If a user | ||||||
40 | attempts to change a value in the database, the program will request a reason for the change, which will then | ||||||
41 | be included in a data change log for that value along with a timestamp, the user's ID, the old value, and the | ||||||
42 | new value. If additional changes are made to the same value in the future, those changes will be appended to | ||||||
43 | the data change log. | ||||||
44 | |||||||
45 | 4) Smart Data Filter | ||||||
46 | This feature allows the user to intelligently filter and show only data within a certain number of days of | ||||||
47 | interest. More than simply hiding the date rows past the cutoff, the program also scans through all data | ||||||
48 | columns to find columns without any data points. Columns that lack any data points for the date range of | ||||||
49 | interest will automatically be hidden, thus only showing relevant data. | ||||||
50 | |||||||
51 | 5) Automated Creation of New Data Columns | ||||||
52 | The program will automatically create new data columns when necessary, such as when the program encounters a | ||||||
53 | new allergen not already in the database. | ||||||
54 | |||||||
55 | 6) Data Categories at a Glance | ||||||
56 | The data values for AQI and allergens are colored to indicate severity categories according to their category | ||||||
57 | ranges. The category ranges that are used to determine colors are based on the category ranges created by the | ||||||
58 | United States Environmental Protection Agency for the AQI and by the American Academy of Allergy, Asthma & | ||||||
59 | Immunology National Allergen Bureau for allergen levels. | ||||||
60 | |||||||
61 | CHALLENGES | ||||||
62 | 1) Parsing "Dirty" Data | ||||||
63 | Parsing data from the news4sanantonio.com HTML file is challenging. HTML as a source is not usually ideal: it | ||||||
64 | is full of various HTML tags between data points (e.g. <p>, <br>, etc.). Additionally, the dataset, although | ||||||
65 | extensive, is inconsistently formatted: allergen names for the same allergen sometimes vary (e.g. Mt Cedar vs | ||||||
66 | Mt. Cedar), the date formats in the dataset are inconsistent (e.g. mm-dd-yyyy vs m-dd-yyyy vs m-d-yy, etc.), | ||||||
67 | and there is also inconsistent use of delimiters (e.g. "," vs ";" vs ":"). |