1 of 14

Air Pollution Concentrations in Predicted Areas

111522056 黃怡庭

111522057 何佳馨

111522099 黃梓豪

2 of 14

Introduction

  • Nowadays, air pollution has become an important social issue.
  • In recent years, governments and research institutions have established a large number of air monitoring stations to monitor particulate matter.

3 of 14

Motivation

  • Due to the increasing harm caused by air pollution, it is necessary to have accurate data to prevent air pollution.
  • At present, the method to measure PM2.5 is to detect by sensor.

4 of 14

Project Target

  • The goal of our project is designing a model that can immediately predict local air indicators.
    • Use the pre-processed datasets to train our model
    • Predict the image of PM2.5 in the next 5 hours

5 of 14

Dataset

  • EPA
    • More accurate air monitoring station
    • Total number is 88
    • Provides data for every 1 hour
  • EPA Attribute
    • PM2.5
    • Lon
    • Lat
    • Time (Month/Day/Hour)

6 of 14

Preprocessing Data

  • Split the value preprocessing into three parts
    • Filling missing data
    • Detecting abnormal monitoring stations
    • Detecting abnormal values
  • Image preprocessing

7 of 14

Value Preprocessing

  • Filling missing data
    • The current data set may have missing data resulting in a non-consecutive time. So fill in the missing time first and then use the PM2.5 of last time to fill the missing PM2.5 value.

8 of 14

Value Preprocessing

  • Detecting abnormal monitoring stations
    • First use DBSCAN to group detection stations according to latitude and longitude.
    • All detection stations are grouped according to latitude and longitude and PM2.5 data.
    • Normally, a detection station should belong to the same cluster in the two clusters. So if a detection station is in different clusters in the two clusters, it is an abnormal monitoring station.

9 of 14

Value Preprocessing

  • Detecting abnormal values
    • Use Hampel Filter to find out if there are any outliers in the air pollution values of each monitoring station at all time points and replace them with median.

10 of 14

Image Preprocessing

  • Using nearest-neighbor interpolation to fill in the distribution of PM2.5 values across Taiwan.
  • Using Basemap and Polygon to generate Training image.
    • Color distribution:[0,1,6,10,15,20,25,30,35,40,45,50,55,65,75,90,100,125,150]

11 of 14

Model

12 of 14

Result

  • Get the PM2.5 of the EPA site in the image.
  • The average MAE of the testing data is “7.95”.

13 of 14

Result

  • GroundTruth
  • Prediction

14 of 14

Thanks for listening