JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 8

Golden Stacks

By Anh Nguyen, Anh Hoang, Kaleb Dickerson, and Phu Nguyen

2 of 8

Problem Statement:

Discover and document how the environment and the markets are tied together
Extract further insights into how impact the environment and markets are on one another.

Briefly visualize the data to select the appropriate environmental metric for our model
Pre-process the data
Create and train the machine learning prediction model
Stress-test the model with stocks from all sectors
Draw insights and conclusions

Approaches:

3 of 8

Visualize Metrics and Stocks Trends

Possible Correlation?

Total CO2 emission vs US energy sector stocks

4 of 8

Challenges

Manually identified a database related to the United States
Perform SQL queries to organize and extract relevant data
Result? Relatively clean dataset with stock information and the specified environmental metric
Metrics that were collected monthly and annually, the metric is assigned uniformly across the daily stock data to study the effects.

Data Usage & Integration

Pre-process the Data

Mismatched time metrics (Data collected daily vs monthly/annually)
Environmental datasets have vast amounts of information to filter

5 of 8

Pre-process the Data (continued)

Managing complexity

Maintaining a consistent metric to organize the data (order by date and enforcing strict date equality)

Maintaining clean data

Filtering out irrelevant information by removing outliers and null values

Coherent transformation

Stock data is time sensitive and use the date/month to maintain consistency
Pair-programmed table join operations to ensure correct logic implementation

Data Normalization & Serialization

6 of 8

Training the models

Due to time constraint, we decided to identify the challenge as a binary classification problem.

Classify in given year, will there be an uptrend or a downtrend for a given sector
Support vector machine

Training

Trained the model with energy sector due to due to our personal assumption

Three models created

Base model: Trained without environmental features
Irrelevant model: Trained with irrelevant environmental metrics

U.S birth rates, mortality rates, population growth

Relevant model: Trained with our identified environmental metric

7 of 8

Result

8 of 8

Insights and Conclusion

Tentative Conclusion: Accounting the environment factor, we can more accurately predict stock trends

Assumption: Stocks in a specific sector is correlated to each other
There might be some correlation between the environment and the markets.

Interestingly, predictions for other sectors such as tech and healthcare improved as well
How to Improve:

A transformer architecture would likely perform better with this time-sensitive data
Perform a feature importance test to choose the appropriate feature to add to our model.
Graph the data distribution to identify outliers such as XOM, PNRL, etc.