1 of 8

Golden Stacks

By Anh Nguyen, Anh Hoang, Kaleb Dickerson, and Phu Nguyen

2 of 8

Problem Statement:

  • Discover and document how the environment and the markets are tied together
  • Extract further insights into how impact the environment and markets are on one another.
  • Briefly visualize the data to select the appropriate environmental metric for our model
  • Pre-process the data
  • Create and train the machine learning prediction model
  • Stress-test the model with stocks from all sectors
  • Draw insights and conclusions

Approaches:

3 of 8

Visualize Metrics and Stocks Trends

Possible Correlation?

Total CO2 emission vs US energy sector stocks

4 of 8

Challenges

  • Manually identified a database related to the United States
  • Perform SQL queries to organize and extract relevant data
  • Result? Relatively clean dataset with stock information and the specified environmental metric
  • Metrics that were collected monthly and annually, the metric is assigned uniformly across the daily stock data to study the effects.

Data Usage & Integration

Pre-process the Data

  • Mismatched time metrics (Data collected daily vs monthly/annually)
  • Environmental datasets have vast amounts of information to filter

5 of 8

Pre-process the Data (continued)

  • Managing complexity
    • Maintaining a consistent metric to organize the data (order by date and enforcing strict date equality)
  • Maintaining clean data
    • Filtering out irrelevant information by removing outliers and null values
  • Coherent transformation
    • Stock data is time sensitive and use the date/month to maintain consistency
    • Pair-programmed table join operations to ensure correct logic implementation

Data Normalization & Serialization

6 of 8

Training the models

  • Due to time constraint, we decided to identify the challenge as a binary classification problem.
    • Classify in given year, will there be an uptrend or a downtrend for a given sector
    • Support vector machine
  • Training
    • Trained the model with energy sector due to due to our personal assumption
  • Three models created
    • Base model: Trained without environmental features
    • Irrelevant model: Trained with irrelevant environmental metrics
      • U.S birth rates, mortality rates, population growth
    • Relevant model: Trained with our identified environmental metric

7 of 8

Result

8 of 8

Insights and Conclusion

Tentative Conclusion: Accounting the environment factor, we can more accurately predict stock trends

    • Assumption: Stocks in a specific sector is correlated to each other
    • There might be some correlation between the environment and the markets.
  • Interestingly, predictions for other sectors such as tech and healthcare improved as well
  • How to Improve:
    • A transformer architecture would likely perform better with this time-sensitive data
    • Perform a feature importance test to choose the appropriate feature to add to our model.
    • Graph the data distribution to identify outliers such as XOM, PNRL, etc.