The new ecosystem of health data keeps getting Bigger

Charlie Ornstein, ProPublica,, @charlesornstein

Updated January 2019

Each year, the release of new datasets makes it more exciting to cover health care. No longer are we limited to comparing states to one another to look for differences. No longer are doctors’ practice patterns protected by outdated privacy rules. We have entered an era in which we can compare one doctor to another. And what we’re learning is that there are huge, seemingly unexplainable, differences among them.

This tip sheet offers both very broad data sources, as well as more granular ones. None of the data sets cover individual claims-level data, which require special permissions and often cost a lot of money.

Sign up for data updates from CMS:

CMS is the Centers for Medicare and Medicaid Services (yes, the acronym doesn’t quite work). Click on receive email updates in the lower right. CMS’s Data Navigator is also a resource to look up data sources, but I find it overly complex to use. Socrata powers its own tool that allows you to look up all datasets available through its network:

CMS Fast Facts:

This stat sheet is at the highest level, but it is the most current one the agency produces. (Updated August 2018)

Medicare and Medicaid Statistics (top level, super useful): Last released: March 2017. Released annually.

Additional summary data and charts can be found: and In all cases, there is a data lag of at least a year, sometimes longer.

CMS Program Statistics: Includes summary statistics on national health care, Medicare populations, utilization, and expenditures, as well as counts for Medicare-certified institutional and non-institutional providers. CMS Program Statistics is organized into sections which can be downloaded and viewed separately. The Medicare Enrollment Dashboard is an interactive online tool presenting monthly enrollment figures and yearly trends at several geographical areas, including national, state/territory, and county.

Medicaid enrollment:

Geographic variation: How Medicare utilization differs from state to state and from county to county. Data is available from 2007 to 2017. This will help put other data sets in context. Last released: February 2019. Released annually.

Mapping geographic variation: Choose your conditions, choose  your geography, choose your unit of measurement. Neat tool. Last updated September 2018.

National health expenditures: Released at least annually. Last update December 2018. 

How much doctors are paid by Medicare and what they do for that money: ProPublica has created a nice visualization for each provider: and so does the Wall Street Journal: Last released: May 2018; covers the year 2016. The site includes historical data for 2012-2015 as well.  CMS has pledged to release this data annually. (Important caveat with this data: It only contains claims for services rendered in Medicare fee-for-service, not for those in Medicare Advantage. As a result, it will leave out lots of services delivered in regions of the country where Medicare HMO enrollment is high, and thus may not be representative.)

Another wonderful thing about this dataset is that it now includes demographic data on each doctor’s patients, including breakdowns by gender; age ranges; race; dual eligible status; prevalence of common chronic conditions; and an aggregate risk score that allows you to compare how “sick” a doctor’s patients are relative to peers.

If you want an earlier glimpse at national numbers, you may be able to find them a few months earlier at these two links: or

Payments to hospitals for inpatient and outpatient visits:

What drugs doctors prescribe most: In 2015, Medicare began publicly releasing prescribing data in its Part D program. The data now covers 2013-2016 and is spliced and diced in a variety of ways. ProPublica filed Freedom of Information Act requests for earlier data and has put the data online for free: You can see our app at This accounts for all prescriptions covered in Medicare Part D, but it’s important to note that not every Medicare recipient is enrolled in Part D. Also, data is redacted when a doctor prescribed fewer than 11 prescriptions for a particular drug. Last released: May 2018. Updated annually.

You can look up Medicaid utilization data as well (but this is summarized by state, not by doctor): 

Medicare drug spending dashboard: CMS released an online dashboard to look at drug spending in Medicare and Medicaid. And it released five years worth of data (total scripts, cost, cost for patients) for every drug used in Part B and Part D. (Updated December 2016)

Medicare opioid drug mapping tool: The tool shows geographic comparisons, at the state, county, and ZIP code levels, of opioid prescription claims in Medicare Part D. It allows you to see both the number and percentage of opioid claims at the local level. Obvious drawback: The data is from 2013-2016. For an evolving epidemic, this data sadly lags far behind. This map shows the changes over that time: Data for 2016 is included in the broader Part D file, mentioned higher on this page.

More opioid data: See my comprehensive tipsheet at

Who doctors refer patients to: Health care data guru Fred Trotter of DocGraph and CareSet Systems has been working for years to track patient movement from one doctor to another to sketch out the social graph of medicine. You can find CareSet’s data here:

What durable medical equipment doctors order: In October 2015, Medicare began releasing data on durable medical equipment and supplies ordered and dispensed in Medicare Part B, organized by the ordering physician. There is an aggregate files (high level) and a detail file (aggregating by type of DME ordered by each doctor). Historical data is available for 2013-2015, in addition to new data for 2016. (Last updated May 2018 with 2016 data)

Utilization of home health services: In December 2015, Medicare began releasing data on services provided to Medicare beneficiaries by home health agencies. The Home Health Agency Utilization and Payment Public Use File (Home Health Agency PUF) contains information on utilization, payments, and submitted charges organized by provider, state and home health resource group. It is organized by home health provider, not by doctor who ordered the services. (Last updated August 2018 with 2016 data)

Therapy services in nursing homes: The Skilled Nursing Facility Utilization and Payment Public Use File (SNF PUF) contains information on utilization, payments, and submitted charges organized by provider, state, and resource utilization group (RUG).  The data, first released in 2016, now includes data for 2013-2016. (Updated October 2018)

Utilization of hospice services: In October 2016, Medicare first released for the first time data on hospice services to Medicare beneficiaries and information on hospice agencies. The Hospice Utilization and Payment Public Use File (Hospice PUF) contains information on utilization, payment (Medicare payment and standard payment), submitted charges, primary diagnoses, sites of service, and hospice beneficiary demographics organized by provider and state. Data is now available for 2014-2016. (Released June 2018)

Additional hospice quality data can be found here:

Market saturation and utilization data tool: This is an interactive map and dataset that “shows national-, state-, and county-level provider services and utilization data for selected health service areas. Market saturation, in the present context, refers to the density of providers of a particular service within a defined geographic area relative to the number of the beneficiaries receiving that service in the area.” The services include Home Health, Ambulance (Emergency, Non-Emergency, Emergency & Non-Emergency), Independent Diagnostic Testing Facilities (Part A and Part B), Skilled Nursing Facilities, Hospice, Physical and Occupational Therapy, Clinical Laboratory (Billing Independently), Long-Term Care Hospitals, Chiropractic Services, Cardiac Rehabilitation Programs, Psychotherapy, Federally Qualified Health Centers, and Ophthalmology. And here is a CMS fact sheet about the data: (Last updated April 2018)

Chronic conditions among Medicare beneficiaries: Tools to help you gain a better understanding of the burden of chronic conditions among Medicare fee-for-service beneficiaries, including prevalence, utilization of health services, and Medicare spending for specific chronic conditions and multiple chronic conditions. (Updated March 2017)

What incentive payments doctors have received for using electronic medical records:  Updated monthly.

Doctor addresses and specialties: These are part of the National Provider Identifier system, updated weekly/monthly, based on health professionals’ self-reported information. Download the file here: (link to download is at the bottom of the page.) Physician Compare has additional information on doctors who participate in Medicare:  NPI data updated monthly. Physician Compare data last updated in October 2017.

Hospital characteristics: The CMS Provider of Services file, updated annually, contains data on characteristics of hospitals and other types of healthcare facilities, including the name and address of the facility and the type of Medicare services the facility provides, among other information (number of beds, ownership information).

Pharma/device company payments to doctors: Data for the last five months of 2013 and all of 2014-2017 is available at Every June, CMS must release data from the prior calendar year. ProPublica has an easy-to-use website to search the data at ProPublica’s site also includes data from companies that were required to report this data earlier (because of corporate integrity agreements).

Quality measures: includes information about hospitals, nursing homes, doctors, dialysis facilities and home health care companies. Data includes demographic information (names, addresses, phone numbers), as well as quality and satisfaction measures. ProPublica used the emergency room quality measures to power its ER Wait Watcher app ( 

Hospital, ambulatory surgery center and nursing home inspection reports: The federal government has also started making available online its deficiency reports for health facilities. It currently does so for nursing homes (, scroll down under “related links”), for hospitals (, scroll down to downloads), for ambulatory surgery centers (, scroll down to downloads) and for home health agencies (, scroll down to downloads). The Association of Health Care Journalists makes hospital inspection reports available in an easy-to-use interface: and ProPublica does the same for nursing home reports: Nursing home inspection reports are updated monthly; hospitals and surgery centers quarterly.

Speaking of which, here’s a data-rich report from CMS on how the “great recession” led to a serious decline in nursing home oversight. (Released June 2016)

Finally, in 2017, CMS began to provide public access to its dashboard of health-care inspections, called Quality, Certification and Oversight Reports. This is invaluable: CMS also posts online when it sends a termination letter to a health facility--and if it rescinds the threat: And as part of the tool, CMS posts information on which accrediting organizations oversee hospitals with serious deficiencies (

Custom calculations: Use this link to estimate Medicare cohorts for diseases, etc.:

Medicare Cost Reports: Financial data that Medicare-enrolled health facilities are required to report.  Per CMS, “The cost report contains provider information such as facility characteristics, utilization data, cost and charges by cost center (in total and for Medicare), Medicare settlement data, and financial statement data.”

Obamacare health insurance exchange data

Which insurance networks do doctors participate in? Under the Affordable Care Act, insurance companies participating in the Obamacare marketplaces are required to provide up-to-date information about which doctors and facilities are in their networks. Links to insurers’ directories can be found here:

Health Insurance Marketplace Public Use Files (Marketplace PUFs): These files are available for plan years 2014 through 2019 for states on the federal exchange--those that use They include information on: Benefits and Cost Sharing PUF (BenCS-PUF) – Plan-level data on essential health benefits, coverage limits, and cost sharing; Rate PUF (Rate-PUF) – Plan-level data on individual rates based on an eligible subscriber’s age, tobacco use, and geographic location; Plan Attributes PUF (Plan-PUF) – Plan-level data on maximum out of pocket payments, deductibles, cost sharing, HSA eligibility, formulary ID, and other plan attributes; Business Rules PUF (BR-PUF) – Plan-level data on the application of rates, such as allowed relationships (e.g., spouse, dependents) and tobacco use; Service Area PUF (SA-PUF) – Issuer-level data on the geographic coverage or service area (i.e., where the plan is offered) including state, county, and zip code; Network (Ntwrk-PUF) – Issuer-level data identifying provider network URLs; Plan ID Crosswalk PUF (CW-PUF) – Plan-level data mapping plans offered in 2014 to plans offered in 2015; Machine-readable URL PUF (MR-PUF)—Issuer-level URL locations for machine-readable plan network provider and formulary information; Transparency in Coverage PUF (TC-PUF) – Issuer-level claims, appeals, and active URL data. This PUF contains data from PY2015 for issuers participating in the Marketplace in PY2017; Quality PUF (Qual-PUF) – Quality ratings data for plans in PY2017 pilot states (VA and WI).

Health Insurance State-based Marketplace Public Use Files: Health plan information including benefits, copayments, premiums, and geographic coverage is publically available on the various State-based Marketplace (SBM) websites, but CMS has also released downloadable public use files (PUF) so that researchers and other stakeholders can more easily access SBM data. Currently, the Health Insurance State-based Marketplace Public Use Files (SBM PUF) are available for plan year 2018 

Enrollment in Obamacare plans:

Your best source for plan data: HIX Compare, run by the Robert Wood Johnson Foundation, takes plan data for the entire fully insured marketplace and makes it available in one place. You can also seek network and formulary data at no cost by request for non commercial use. “HIX Compare is a set of plan-level public use files of the individual and small group fully insured market in all 50 states plus D.C., available for non-commercial use. Sponsored by the Robert Wood Johnson Foundation, HIX Compare is the only dataset with information on nearly every individual (2014-2019) and small group (2014-2018) marketplace plan, and most off marketplace plans as well. HIX Compare contains information on plan characteristics, such as premiums and benefit design. HIX Compare is a machine readable file available in CSV and Stata formats.”

 Other health agencies

Drug side effects: Now available on the OpenFDA website,, along with other public use files from the agency. Check out to look at how this data can be visualized. also has a data visualization tool:

Doctor discipline: The Health Resources and Services Administration has summary data by state and profession and year on doctor discipline, as well as malpractice payouts. and

Centers for Disease Control and Prevention: and and

Agency for Healthcare Research and Quality maintains a free, on-line query system based on data from the Healthcare Cost and Utilization Project (HCUP). It provides access to health statistics and information on hospital inpatient and emergency department utilization.

Health Resources and Services Administration releases county-level health care on an array of topics, including the number health care professionals by specialty, hospital usage, air quality, and demographics.

Health Care Cost Institute has data from private insurers. While the vast majority is not online and carries a high cost and restrictions on use, some summary data is available at and

City Health Dashboard has updated health metrics on the 500 largest cities in the country, down to census tract.

State Health Facts run by the Kaiser Family Foundation has many health indicators broken down by state.