1 of 23

Open Government Data: Understanding Open Access vs. Public Domain

Documents Association of New Jersey

October 25, 2019: Trenton, NJ �Thomas Edison State University,

Presenter: Jennifer C. Boettcher, Georgetown University

2 of 23

Jennifer C. Boettcher

Jennifer C. Boettcher and Leonard M. Gains. Industry Research Using the Economic Census. Greenwood Press: Phoenix, AZ. 2004

M.B.A., Georgetown University, Washington, D.C., 2005

M.L.S., State University of New York, Albany, N.Y.,1992

B.A., University of New Hampshire, Durham, N.H., 1987

Georgetown University 1997-present

ALA RUSA BRASS Member since 1991

Founder of Business Information Finders (BIF) and Capital Area Business Academic Librarians (CABAL) in DC

2013 Emerald Research Grant: Zombie List (reanimated business sources)

2010 Gale Cengage Learning Award for Excellence in Business Librarianship

3 of 23

Librarian & Information Scientist

  • As a Librarian, I
    • Understand the source
    • Know how to find the source
    • Know the related subjects
    • Know how it’s connected to other sources
    • Know how to read it
    • Make connections between publisher and researcher
  • As a Librarian, I don’t
    • Publish the primary source
    • Have your context or expertise
    • Do statistical analysis
    • Interpret the data
    • Do data entry
    • Have legal expertise

These are my views and do not reflect those of Georgetown.

Boettcher, J. C., & Dames, K. M. (2018). Government data as intellectual property:

Is public domain the same as open access? Online Searcher, 42(4), 42-48. 

4 of 23

Adaptations of DIKW pyramid by US Army Knowledge Managers,

from https://en.wikipedia.org/wiki/DIKW_pyramid

Data are not:

Information

Technology

Digital

Analytics

Evidence

Research

Visualizations

Ideas

Data are

collected facts

“raw material”

5 of 23

Public Domain: No Copyright Restrictions

Public Domain is not protected by intellectual property laws. Anyone can use a public domain work without obtaining permission, but no one can ever own it.

Example: no longer protected due to age of creative work.

Works produced for the U.S. Government by its officers and employees should not be subject to copyright. The provision applies the principle equally to unpublished and published works. 17 USC 105

REMEMBER: Public domain data must be attributed.

6 of 23

Administrative Data and the Freedom of Information Act (FOIA)  5 U.S.C. § 552, 1966

  • What to ask for
    • Anything unpublished by US government
    • Controlled Unclassified Information (CUI)

Read this from Archives

File here FOIAonline

Oversight: Office of Government Information Services

7 of 23

Why Open Data Exists

  • Open Access is not law. It’s a license agreement from the copyright owner and a set of principles: CC0
    • Do you have the right to convey copyright?
    • States and local governments
    • NGO’s and non-profit
  • Principals
    • Reuse and redistribution of the data
    • Allows derivative works as Open only
    • No restrictions on who can access and use
    • Electronically transferable
    • Machine-readable

8 of 23

Public Domain Vs. Open Access

  • Public Domain
    • Free data flow
    • Law
    • Fed Government products
    • Data at any stage can be retrieved by FOIA
    • Not for some sub-nationals
  • Open Access
    • Free data flow
    • Because of ownership of copyright
    • Principles and license

9 of 23

Major Sources of Social Science Data in the US Government

10 of 23

Major Sources of Natural Science Data from the US Government

https://www.flickr.com/photos/notbrucelee/6897137283/in/photostream

11 of 23

Problems that come with government data

  • Beggars can’t be choosers
    • Too old
    • Not to the geographic level needed
    • Too detailed
    • Have to file a FOIA request
  • Compatibility
    • Standardization
    • Combining two datasets even from same source might not be possible
    • Combining two different sources must look at methodology

12 of 23

P.E.S.T. Analysis for Industry

  • Political
    • Legislative
      • Congress.gov
    • Executive
      • Regulations.com
    • Judicial
      • United States Courts

  • Economic
    • Sector Inflation
      • BLS’s Producers Price Index
    • Microeconomic trends
  • Socio-cultural
    • Norms & Ratios
      • IRS’s Statistics of Income
    • Peers and partners

  • Technology
    • Patents
      • Citation Analysis
    • Tech Transfer

13 of 23

Let’s discuss

boettcher@georgetown.edu

202 687-7495

Twitter: @jenny.wombat

These slides are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© Bill Waterhouse, with permission

AMSTAT images from

http://magazine.amstat.org/blog/2018/05/01/fy18fedbudget

14 of 23

Vocabulary: Tools, Process, and Products

Datasets or compilation: Raw or statistical numbers, can be flat file such as Comma Separated Variable (CSV) or proprietary like Excel

Metadata: Includes field descriptions for the dataset, found in codebooks

Schema: How data is organized or structured using standards, like classification

Application Program Interface (API): Read-only machine to machine querying, generally from JSON or XML files

Big data: Raw, unstructured data; normally transactional (example: each check out)

Natural Language Processing (NLP): Use for text analysis, not numeric data

Artificial Intelligence (AI): Includes predictive analytics and machine learning

Reports: Usually aggregated statistics based on big data (example: how many checkouts)

Data Visualization: Using software to visually communicate relationships and context of data

Open Data: Freely accessible data, created for a specific purpose; by-product of decision making or research

15 of 23

Funding for Federal Data Collection

NIH- National Institutes of Health (HHS)

NSF- National Science Foundation

AHRQ- Agency for Healthcare Research & Quality (HHS)

FDA- Food & Drug Agency (HHS)

BEA- Bureau of Economic Analysis (DoC)

BJS- Bureau of Justice Statistics (DoJ)

BTS- Bureau of Transportation Stat. (DoT)

Census- DoC

EIA- Energy Information Admin. (DoE)

ERS- Economic Research Service (DoA)

NASS- Nat. Agricultural Stat. Service (DoA)

NCES- Nat. Center of Education Stat. (DoE)

NCHS- Nat. Center for Health Stat . (HHS)

NCSES- Nat. Center for Science and Engineering Sat. (NSF)

ORES- Off. of Research, Evaluation, and Statistcs (SSA)

SOI- Statistics of Income (IRS)

Image from AmStat (permission pending)

16 of 23

One Statistical Office in US: Why Not?

1. Privacy: The Privacy Act of 1974, Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA), and Statistical Policy Directive No. 1 (2014) require agencies to ensure that the collection and maintenance of citizens' data is accurate, confidential, and within legal restrictions. With different offices having access to those records, there would be less possibility of everything being leaked.

2. Security: Along the lines of fewer offices having access to data records. The more servers that hold the data, the safer it is. The times when an exchange of information is necessary laws and regulations among departments allow to protect access to data.

3. Integrity: The income you report to IRS might be different from what you report to the Census Bureau.

4. Methodology: Sometimes data must have a higher number of people questioned so the accuracy will be better; different methods of collection or sampling may be required.

5. Popularity: Anything being done by the government has a political dimension, especially funding for employees and for modernizing and updating technology, attractiveness of the research, and repetition of statistical programs by agencies.

17 of 23

Future of the Bureau of Labor Statistics

In danger: Nat. Longitudinal Sur., JOLTS, Am. Time Use Sur., Employee Benefits Sur.,

Cen. of Fatal Occupational Injuries, Evaluation $27M>$2M

Protected

Principal Federal Economic Indicators (PFEI) and programs written into or referenced by law for allocation or other purpose. 85% of budget

18 of 23

Open Government

US Federal

International

19 of 23

States and Cities

https://data.sonomacounty.ca.gov/dataset/SoCo-Data-PNG/3m9t-bc35

20 of 23

Major International Data Sources

By topic

By Country

National Statistical Offices

More data available in national language

Some charge for access

Citizens of that country might have free access

National Repositories/Archives

Historical

Datasets

21 of 23

Associations: Blogs and Conferences

For Librarians

For Federal data Policy

22 of 23

Learning more

Government Sources

FDLP Academy

Accidental Government Librarian

DigitalGov from Digital Government Division of GSA

Standards for Born Digital images

Numerical Data

Public Knowledge: Access and Benefits (Information Today, 2016)

Innovation in Federal Statistics (National Academics, 2017)

23 of 23

Legal issues

Data and IP

Licensing Data