| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | If you have any datasets you'd like to add, please add a comment to this sheet and tag me. | |||||||||||||||||||||||||
2 | ||||||||||||||||||||||||||
3 | Free and publicly available datasets | |||||||||||||||||||||||||
4 | Dataset | Unit | Frequency | Website | Description | |||||||||||||||||||||
5 | Discern 2.0 | Patent | 1980 to 2021 | https://zenodo.org/records/13153196 | Links patents and scientific publications to U.S. publicly listed firms, taking subsidies into account | |||||||||||||||||||||
6 | American Stories | Article | Daily | dell-research-harvard/AmericanStories · Datasets at Hugging Face | 2 centuries of newspaper articles - easy python import | |||||||||||||||||||||
7 | ACS | Household | Yearly | https://usa.ipums.org/usa/ | Big US household survey 2006-onwards | |||||||||||||||||||||
8 | IPUMS | Various | Various | https://www.ipums.org/ | Census and survey data from around the world integrated across time and space. IPUMS integration and documentation makes it easy to study change, conduct comparative research, merge information across data types, and analyze individuals within family and community contexts. | |||||||||||||||||||||
9 | FRED | National | Daily | https://fred.stlouisfed.org/ | Macro time series, Pandas can import | |||||||||||||||||||||
10 | Zillow research data | Zipcode | Monthly | https://www.zillow.com/research/data/ | Housing prices, sales, etc | |||||||||||||||||||||
11 | Realtor.com data | Zipcode | Monthly | https://www.realtor.com/research/data/ | Housing prices, sales, etc | |||||||||||||||||||||
12 | American Housing Survey | Household | Annual | https://www.census.gov/programs-surveys/ahs/data.html | Housing characteristics (costs, physical condition, with household characteristics) | |||||||||||||||||||||
13 | Home Mortgage Disclosure Act | Mortgage | Annual | https://www.consumerfinance.gov/data-research/hmda/historic-data/ | Mortgage rates, LTV, borrower characteristics | |||||||||||||||||||||
14 | National Survey of Mortgage Originations | Household | Quarterly | https://www.fhfa.gov/DataTools/Downloads/Pages/National-Survey-of-Mortgage-Originations-Public-Use-File.aspx | Mortgages | |||||||||||||||||||||
15 | Consumer expenditure survey | Household | Yearly | https://www.bls.gov/cex/ | Consumer expenditures on very fine categories | |||||||||||||||||||||
16 | Survey of consumer finances | Household | Yearly | https://www.federalreserve.gov/econres/scfindex.htm | Consumer expenditures and finance data | |||||||||||||||||||||
17 | American time use survey | Household | Yearly | https://www.bls.gov/tus/ AND https://timeuse.ipums.org/ | Consumer time use | |||||||||||||||||||||
18 | Panel Study of Income Dynamics | Household | Yearly (panel) | https://psidonline.isr.umich.edu/ | Tracks consumers over time, so can be used to look at career progression, etc. | |||||||||||||||||||||
19 | Call reports | Bank | Quarterly | https://cdr.ffiec.gov/public/ | Panel data on banks' balance sheets | |||||||||||||||||||||
20 | American Fact Finder | Geography | Yearly | https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml | Various firm + household characteristics | |||||||||||||||||||||
21 | NHGIS | Geography | Yearly | https://data2.nhgis.org/main | Various geographic characteristics (demographics, etc.) | |||||||||||||||||||||
22 | General Social Survey | Household | Yearly | https://gss.norc.org/ | Demographics, social characteristics, attitudes | |||||||||||||||||||||
23 | CFPB data | Various | Various | https://www.consumerfinance.gov/data-research/ | Various | |||||||||||||||||||||
24 | Data.gov | Various | Various | Data.gov | Various | |||||||||||||||||||||
25 | data.census.gov | Various | Various | data.census.gov | Various census datasets. | |||||||||||||||||||||
26 | data.world | Various | Various | data.world | Various | |||||||||||||||||||||
27 | https://ourworldindata.org | Various | Various | https://ourworldindata.org | Many data series, and a python package for easy data loading https://github.com/owid/owid-importer . See their other repos for hacks to make analysis easier. | |||||||||||||||||||||
28 | CongressData | Member | Yearly | https://cspp.ippsr.msu.edu/congress/ | Data from 1789-2021 on characteristics of congressional districts, the members of congress themselves, and the behavior of those members in policymaking. | |||||||||||||||||||||
29 | Household Pulse Survey | Household | Weekly | https://www.census.gov/programs-surveys/household-pulse-survey/datasets.html | Weekly COVID survey | |||||||||||||||||||||
30 | I³ Open Innovation Dataset Index | Various | Various | https://iiindex.org/datasets | A collection of innovation datasets, and related tools, platforms and resources | |||||||||||||||||||||
31 | Harvard Dataverse | Various | Various | https://dataverse.harvard.edu/dataverse/harvard | Includes replication kits for publications in some journals. Recent finance and econ pubs in many journals require replication code, and these can be found on their sites. Also, google for the authors personal sites (like mine!), often code and data will be there even if not on the journal site. | |||||||||||||||||||||
32 | Management Science | Various | Various | https://pubsonline.informs.org/toc/mnsc/current | MS is an A+ journal and has required replication data and code for years. Go to the page of an article and look for a link to "Supplementary Materials" to get code. | |||||||||||||||||||||
33 | Pandas-datareader | N/A | N/A | https://pydata.github.io/pandas-datareader/remote_data.html | Data importer. | |||||||||||||||||||||
34 | Ken French data library | Portfolio | Various | https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html | Portfolio returns. Pandas can import. | |||||||||||||||||||||
35 | Open Asset Pricing | Portfolio | Various | https://www.openassetpricing.com/ | Portfolio returns, asset pricing code for tests. | |||||||||||||||||||||
36 | Assaying Anomalies | N/A | N/A | https://sites.psu.edu/assayinganomalies/ | Immediately test if your asset pricing signal produces alpha. | |||||||||||||||||||||
37 | OECD | Country | Yearly | https://stats.oecd.org/ | Country statistics. Pandas can import. | |||||||||||||||||||||
38 | World Bank | Various | Various | https://data.worldbank.org/ | Panel data around the world on many subjects. | |||||||||||||||||||||
39 | Quandl | Firm | Daily | https://www.quandl.com/ | Stock prices. Pandas can import. | |||||||||||||||||||||
40 | Yahoo Finance | Firm | Daily | https://finance.yahoo.com/ | Stock prices. Pandas can import. | |||||||||||||||||||||
41 | Notre Dame Software Repository for Accounting and Finance | N/A | N/A | https://sraf.nd.edu/ | Textual analysis resources, designed for business text. Business-domain sentiment word lists, code and resources. | |||||||||||||||||||||
42 | Parsing messy names (people, address, firms) | N/A | N/A | https://github.com/datamade?q=&type=all&language=&sort=stargazers | Repos: usaddress and probablepeople | |||||||||||||||||||||
43 | Subnational ideology and presidential vote estimates | Various | Yearly | https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BQKU4M | 2006-2021 estimates of the ideology and presidential voting behavior for ~1 million survey respondents, ZCTA's, cities, counties, sub-county units, school districts, state legislative districts, and congressional districts | |||||||||||||||||||||
44 | US Panel Study of Income Dynamics (PSID) | Household | Yearly | https://psidonline.isr.umich.edu/default.aspx | Longitudinal household survey | |||||||||||||||||||||
45 | ||||||||||||||||||||||||||
46 | Highly useful but not free | |||||||||||||||||||||||||
47 | Dataset | Unit | Frequency | Website | Description | |||||||||||||||||||||
48 | CRSP | Stock and asset price/return/trading data | ||||||||||||||||||||||||
49 | Compustat | Accounting data on public firms | ||||||||||||||||||||||||
50 | I/B/E/S | |||||||||||||||||||||||||
51 | ||||||||||||||||||||||||||
52 | ||||||||||||||||||||||||||
53 | ||||||||||||||||||||||||||
54 | ||||||||||||||||||||||||||
55 | ||||||||||||||||||||||||||
56 | ||||||||||||||||||||||||||
57 | ||||||||||||||||||||||||||
58 | ||||||||||||||||||||||||||
59 | ||||||||||||||||||||||||||
60 | ||||||||||||||||||||||||||
61 | ||||||||||||||||||||||||||
62 | ||||||||||||||||||||||||||
63 | ||||||||||||||||||||||||||
64 | ||||||||||||||||||||||||||
65 | ||||||||||||||||||||||||||
66 | ||||||||||||||||||||||||||
67 | ||||||||||||||||||||||||||
68 | ||||||||||||||||||||||||||
69 | ||||||||||||||||||||||||||
70 | ||||||||||||||||||||||||||
71 | ||||||||||||||||||||||||||
72 | ||||||||||||||||||||||||||
73 | ||||||||||||||||||||||||||
74 | ||||||||||||||||||||||||||
75 | ||||||||||||||||||||||||||
76 | ||||||||||||||||||||||||||
77 | ||||||||||||||||||||||||||
78 | ||||||||||||||||||||||||||
79 | ||||||||||||||||||||||||||
80 | ||||||||||||||||||||||||||
81 | ||||||||||||||||||||||||||
82 | ||||||||||||||||||||||||||
83 | ||||||||||||||||||||||||||
84 | ||||||||||||||||||||||||||
85 | ||||||||||||||||||||||||||
86 | ||||||||||||||||||||||||||
87 | ||||||||||||||||||||||||||
88 | ||||||||||||||||||||||||||
89 | ||||||||||||||||||||||||||
90 | ||||||||||||||||||||||||||
91 | ||||||||||||||||||||||||||
92 | ||||||||||||||||||||||||||
93 | ||||||||||||||||||||||||||
94 | ||||||||||||||||||||||||||
95 | ||||||||||||||||||||||||||
96 | ||||||||||||||||||||||||||
97 | ||||||||||||||||||||||||||
98 | ||||||||||||||||||||||||||
99 | ||||||||||||||||||||||||||
100 | ||||||||||||||||||||||||||