Free and Open Data�Where to find it and Can you use it
Business Reference and Services Section (BRASS)
of Reference User Services Association (RUSA)
in American Library Association (ALA)
March 18, 2020: Data in Libraries Webinar Series
Presenter: Jennifer C. Boettcher, Georgetown University
Jennifer C. Boettcher
Jennifer C. Boettcher and Leonard M. Gains. Industry Research Using the Economic Census. Greenwood Press: Phoenix, AZ. 2004
M.B.A., Georgetown University, Washington, D.C., 2005
M.L.S., State University of New York, Albany, N.Y.,1992
B.A., University of New Hampshire, Durham, N.H., 1987
ALA RUSA BRASS Member since 1991
Georgetown University 1997-present
Founder of Business Information Finders (BIF) and Capital Area Business Academic Librarians (CABAL) in DC
2013 Emerald Research Grant: Zombie List (reanimated business sources)
Seeking contributors: https://boettcher.georgetown.domains/HisBusColl
2010 Gale Cengage Learning Award for Excellence in Business Librarianship
Librarian & Information Scientist
These are my views and do not reflect those of Georgetown or RUSA.
Boettcher, J. C., & Dames, K. M. (2018). Government data as intellectual property:
Is public domain the same as open access? Online Searcher, 42(4), 42-48.
Data Vocabulary
Adaptations of DIKW pyramid by US Army Knowledge Managers,
from https://en.wikipedia.org/wiki/DIKW_pyramid
Data are not:
Information
Technology
Digital
Analytics
Evidence
Research
Visualizations
Ideas
Data are
collected facts
“raw material”
Copyright provides the owner of copyright with the exclusive right to
Copyright and Numeric Data
Facts are not copyrighted (In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.) 17 USC 102b
In US collections of facts or data that fail to meet the minimum threshold of creativity also are ineligible for copyright protection, even if assembling such a collection takes significant time, effort, or resources: “sweat of the brow.”
Creative expression of data in compilation is protected (Feist 1991)
Public Domain: No Copyright Restrictions
Public Domain is not protected by intellectual property laws, like copyright. Anyone can use a public domain work without obtaining permission, but no one can ever own it.
Example: no longer protected due to age of creative work.
Works produced for the U.S. Government by its officers and employees should not be subject to copyright. The provision applies the principle equally to unpublished and published works. 17 USC 105
REMEMBER: Public domain data must be attributed.
Data policy in the Federal Government
Caveats of Open Government Data�
Why not?
Why Open Data exists
Public Domain Vs. Open Access
Data as input and output
Input
Output
Questions?
CC0, https://pixabay.com/en/hedgehog-child-young-hedgehog-1759027
Major Sources of Social Science Data in the US Government
Major Sources of Natural Science Data from the US Government
https://www.flickr.com/photos/notbrucelee/6897137283/in/photostream
P.E.S.T. Analysis for Industry
Problems that come with government data
Major International Data Sources
By topic
Financial & Economic- International Monetary Fund
Labor- International Labour Org
Telecommunications- International Telecommunications Union
Governance- Transparency International
Developed Countries- Organisation for Economic Co-operation and Development (OECD)
By Country
More data available in national language
Some charge for access
Citizens of that country might have free access
National Repositories/Archives
Historical
Datasets
Where to start
Where to learn MORE
For Librarians
For Federal data
Let’s discuss
202 687-7495
Twitter: @jenny.wombat
These slides are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© Bill Waterhouse, with permission
AMSTAT images from
http://magazine.amstat.org/blog/2018/05/01/fy18fedbudget
Numeric Data: who is responsible
Numeric Data: who is responsible
Vocabulary: Tools, Process, and Products
Datasets or compilation: Raw or statistical numbers, can be flat file such as Comma Separated Variable (CSV) or proprietary like Excel
Metadata: Includes field descriptions for the dataset, found in codebooks
Schema: How data is organized or structured using standards, like classification
Application Program Interface (API): Read-only machine to machine querying, generally from JSON or XML files
Big data: Raw, unstructured data; normally transactional (example: each check out)
Natural Language Processing (NLP): Use for text analysis, not numeric data
Artificial Intelligence (AI): Includes predictive analytics and machine learning
Reports: Usually aggregated statistics based on big data (example: how many checkouts)
Data Visualization: Using software to visually communicate relationships and context of data
Open Data: Freely accessible data, created for a specific purpose; by-product of decision making or research
F.A.I.R data
Mainly for scientific literature and in Europe
Works produced for the U.S. Government:�Lifecycle of Data
Policy Makers who ask the questions about what has to be found or measured
Researchers who design methods or experiments to collect the data and where the data and codebooks are created.
Statisticians who manipulate datasets using models and algorithms to see trends in longitudinal data and to interpret data at a moment of time in cross-sectional studies.
Analysts who see patterns using predictive analytics, seek the emerging relationships between the numbers, transforming data into information by giving it context.
Other Data Scientists will link graphics, statistical downloads, and application programming interfaces (APIs) to the researcher's raw data.
Writers and Data Visualization Designers, who uses their imagination to apply their knowledge to make data understandable in reports, press releases, and other resources.
The federal agency will act as Publishers, putting the synthesized resources on its website for all-primarily for decision makers but also for citizens-to read.
Funding for Federal Data Collection
NIH- National Institutes of Health (HHS)
NSF- National Science Foundation
AHRQ- Agency for Healthcare Research & Quality (HHS)
FDA- Food & Drug Agency (HHS)
BEA- Bureau of Economic Analysis (DoC)
BJS- Bureau of Justice Statistics (DoJ)
BTS- Bureau of Transportation Stat. (DoT)
Census- DoC
EIA- Energy Information Admin. (DoE)
ERS- Economic Research Service (DoA)
NASS- Nat. Agricultural Stat. Service (DoA)
NCES- Nat. Center of Education Stat. (DoE)
NCHS- Nat. Center for Health Stat . (HHS)
NCSES- Nat. Center for Science and Engineering Sat. (NSF)
ORES- Off. of Research, Evaluation, and Statistcs (SSA)
SOI- Statistics of Income (IRS)
Image from AmStat (permission pending)
One Statistical Office in US: Why Not?
1. Privacy: The Privacy Act of 1974, Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA), and Statistical Policy Directive No. 1 (2014) require agencies to ensure that the collection and maintenance of citizens' data is accurate, confidential, and within legal restrictions. With different offices having access to those records, there would be less possibility of everything being leaked.
2. Security: Along the lines of fewer offices having access to data records. The more servers that hold the data, the safer it is. The times when an exchange of information is necessary laws and regulations among departments allow to protect access to data.
3. Integrity: The income you report to IRS might be different from what you report to the Census Bureau.
4. Methodology: Sometimes data must have a higher number of people questioned so the accuracy will be better; different methods of collection or sampling may be required.
5. Popularity: Anything being done by the government has a political dimension, especially funding for employees and for modernizing and updating technology, attractiveness of the research, and repetition of statistical programs by agencies.
Future of the Bureau of Labor Statistics
In danger: Nat. Longitudinal Sur., JOLTS, Am. Time Use Sur., Employee Benefits Sur.,
Cen. of Fatal Occupational Injuries, Evaluation $27M>$2M
Protected
Principal Federal Economic Indicators (PFEI) and programs written into or referenced by law for allocation or other purpose. 85% of budget
Administrative Data and the Freedom of Information Act (FOIA) 5 U.S.C. § 552, 1966
File in at FOIA Online
Oversight: Office of Government Information Services
OMB’s Statistical Policy Directive No. 1�Executive agencies must:
strategy.data.gov�Guidance from OMB
Open Government
US Federal
International
States and Cities
https://data.sonomacounty.ca.gov/dataset/SoCo-Data-PNG/3m9t-bc35
Legal issues
Data and IP
Learning more
Government Sources
Accidental Government Librarian
DigitalGov from Digital Government Division of GSA
Standards for Born Digital images
Numerical Data
Public Knowledge: Access and Benefits (Information Today, 2016)
Innovation in Federal Statistics (National Academics, 2017)