1 of 26

Data in Libraries: Open Data�

ALA RUSA Business Reference And Services Section (BRASS) Webinar

October 26, 2017

Presenter: Jennifer C. Boettcher

Slides will be sent to you after the webinar

2 of 26

Jennifer C. Boettcher

Jennifer C. Boettcher and Leonard M. Gains. Industry Research Using the Economic Census. Greenwood Press: Phoenix, AZ. 2004

M.B.A., Georgetown University, Washington, D.C., 2005

M.L.S., State University of New York, Albany, N.Y.,1992

B.A., University of New Hampshire, Durham, N. H., 1987

Georgetown Univ 1997-present

Catholic Univ of America, Adjunct Faculty 03-07

Texas A&M Univ 94-97

ALA and RUSA Member since 1991

SLA Member since 1992

Founder of Business Information Finders (BIF) in DC

2013 Emerald Research Grant: Zombie List (reanimated business sources)

2010 Gale Cengage Learning Award for Excellence in Business Librarianship

3 of 26

Librarian & Information Scientist

  • As a Librarian, I
    • Understand the source
    • Know how to find the source
    • Know the subjects covered
    • Know how it’s connected to other sources
    • Know how to read it
    • Know how to organize the source to reveal patterns
    • Make connections between publisher and researcher
  • As a Librarian, I don’t
    • Publish the primary source
    • Have your context or expertise
    • Do statistical analysis
    • Interpret the data
    • Do data entry
    • Have legal expertise

These are my views and do not reflect Georgetown or RUSA

4 of 26

Copyright and Numeric Data

  • Facts are not copyrighted (In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.) 17 USC 102b

  • Creative expression of data in compilation is protected (Feist 1991)

IN US ONLY

5 of 26

Adaptations of DIKW pyramid by US Army Knowledge Managers,

from https://en.wikipedia.org/wiki/DIKW_pyramid

6 of 26

Numeric Data: who is responsible

  • Analytics
    • Analysts
      • Knowledge
      • Context of numbers
  • Tacit Knowledge
    • Expert
      • Wisdom
      • “I Just know it”
  • Curation
    • Repository
      • Preserving access
  • Datum (single of data)
    • Researchers
      • Data
      • Creation of numbers

  • Statistics
    • Publishers
      • Information
      • Relations among numbers

7 of 26

Numeric Data: who is responsible

  • Analytics
    • Analysts
      • Knowledge
      • Context of numbers
  • Tacit Knowledge
    • Expert
      • Wisdom
      • “I Just know it”
  • Curation
    • Repository
      • Preserving access
  • Datum (single of data)
    • Researchers
      • Data
      • Creation of numbers

  • Statistics
    • Publishers
      • Information
      • Relations among numbers

8 of 26

Vocabulary

  • Datasets: Raw or statistical numbers, can be flat file as Comma Separated Variable (CSV) or Proprietary like Excel (see Bobray Bordelon’s RUSA Presentation)
  • Metadata: Variables or fields in the record (example, Author)
  • Application program interface (API): piece of software that interfaces between getting the data out and putting in your computer
  • Big data: Transactional (example, each check out)
  • Reports: How many checkout, usually as aggregated statistics

  • Open Data: Freely accessible data, created for a specific purpose, by-product of decision making

9 of 26

Freely available Data: Public Domain

  • US Federal Government produced or funded data
  • Some States have data in public domain: California, Florida*, Indiana, Louisiana, Massachusetts*, Minnesota*, New Jersey*, North Carolina, District of Columbia, Puerto Rico, and the "organized territories" (incorporated or unincorporated)

*Check with issuing agency

10 of 26

Articulate why Open Data exists

  • Funded research created for a specific purpose
    • US National Data
    • Some Other Countries
    • Non-Government Organizations (NGO)
    • Grants (mainly scientific, e.g. PubMed Central)
    • Publisher required (mainly scientific, e.g. Science)
  • By-product of research used in decision making
  • Fair Use and licensing restrictions may apply

11 of 26

It “should be” Open Data, But I can’t get it

  • Privacy Concerns
    • Personally Identifiable Information (PII)
    • Health Insurance Portability and Accountability Act (HIPAA)
    • Family Educational Rights and Privacy Act (FERPA)
  • Security Concerns
    • Generally military and intelligence related
  • Financial Concerns
    • Contains proprietary data
    • Requires cost recovery
    • Contracts with creator don’t allow it
  • Licensing Issues

12 of 26

Understand past, present, and future open data policy issues in the Federal Government

  • No single statistical office in US
    • Different data is collected for different reasons
      • Honesty in reporting (e.g. telling the IRS and Census what is income)
      • Methods and sampling maybe different
      • Quality and quantity vary depending on funding sources and congressional champions
  • Policy on data collection (priorities)
    • Mandated by law (in CFR)
    • Implemented by regulations (Federal Register)
    • Directed by memorandum (Presidential )
    • Standard of practice (Agencies)

13 of 26

Understand past, present, and future of open data policy issues in the Federal Government

  • Office of Management and Budget
    • 1st responsibility is to create the Presidential Budget
    • OMB evaluates the effectiveness of agency programs, policies, and procedures, assesses competing funding demands among agencies, and sets funding priorities.
  • Oversight of paperwork and statistical gathering
  • Previous Administrations
    • Creation of Data.gov (2009)
    • OMB Revised Circular A-130: Managing Information as a Strategic Resource

“enables the data to be fully discoverable and usable by end users” (p33)

14 of 26

Understand past, present, and future of open data policy issues in the Federal Government

  • New Head of OMB
    • Mick Mulvaney (as of February 16, 2017)
      • Was the Congressman from the 5th District of South Carolina, where he was first elected in 2010, he is the first Republican member to hold the seat in 128 years.
  • New Chief Statistician of the U.S. advises Head of OMB
    • Nancy Potok (as of January 19, 2017)
      • Was Deputy Director and Chief Operating Officer of the U.S. Census Bureau. She previously served at the U.S. Department of Commerce as Deputy Under Secretary for Economic Affairs, the Census Bureau's Associate Director for Demographic Programs, and the Principal Associate Director and Chief Financial Officer in charge of Field Operations, Information Technology, and Administration.
  • Sunlight Foundation concerns remain the same: budget cuts that could reduce quality, frequency of release or even collection, and purposeful alteration or miscommunication of research or statistical information.

15 of 26

Questions?

16 of 26

P.E.S.T. Analysis for Industy

  • Political
    • Legislative
      • Congress.gov
    • Executive
      • NARA’s Federal Register
    • Judicial
  • Economic
    • Sector Inflation
      • BLS’s Producers Price Index
    • Microeconomic trends
  • Socio-cultural
    • Norms & Ratios
      • IRS’s Statistics of Income
    • Peers and partners

  • Technology
    • Patents
      • Citation Analysis
    • Tech Transfer

17 of 26

Know who are the major publishers of the data sources: US Government

18 of 26

Know who are the major publishers of the data sources: International sources

  • United Nations Data page: As Guide
    • http://data.un.org
    • Use Advanced Search
    • If you don’t find what, use the links on results page to go deeper
    • UN Data is about 5 years old

19 of 26

Know who are the major publishers of the data sources: International sources

National Statistical Offices

National language will have more

Some charge for access

Citizens of that county might have free access

National Repositories/Archives

Historical

Datasets

BY COUNTRY

BY TOPIC

20 of 26

Demonstrate methods in seeking data

  • United States https://www.data.gov

  • International http://data.un.org

21 of 26

Identify problems that might come with open data

  • Beggars can’t be choosers
    • Too old
    • Not to the geographic level needed
  • Licensing
    • Nationality
    • Embargos
    • Payment for some formats
  • Compatibility
    • Standardization
    • Combining two datasets even from same source might not be possible
    • Combining two different sources must look at methodology

22 of 26

Associations: Blogs and Conferences

FOR FEDERAL DATA POLICY

FOR LIBRARIANS

23 of 26

Learning more

NUMERICAL DATA

GOVERNMENT SOURCES

24 of 26

Legal issues

Licensing Data

Data and IP

25 of 26

General Sources

26 of 26

Let’s discuss

boettcher@georgetown.edu

202 687-7495

@jenny.wombat

Send me your favorites

open data website