1 of 12

6/7/18 FOLIO Product Council Meeting

Sharon Beltaine, Reporting SIG Convener

Data Warehouse Proposal from the Reporting Special Interest Group

| www.folio.org

1

2 of 12

Several Critical needs

  • Proof: establishes proof that data warehouse reporting will work prior to implementation
  • Build: design, build, and test infrastructure components required to move and transform data into a format for report creation
  • Integrity: ensures referential integrity and consistency both within and across microservices
  • Sandbox: enables data analysts to begin rebuilding their reports and tackling the learning curve for the move to FOLIO

| www.folio.org

2

3 of 12

Working Example to Establish Confidence

  • template for a working data warehouse
  • each participating institution needs to set up a data warehouse for their own use
  • Data Lake is important, but the end deliverable is a Data Warehouse
  • requires data structure that can be used to generate reports with existing reporting applications
  • SQL is a common data structure for a data warehouse

| www.folio.org

3

4 of 12

Build: Reporting Infrastructure Components

  • January 2018: “external” reporting established as the first priority, and “in-app” reporting as the second
  • Rationale: data warehouse can provide any report needed starting day one of a FOLIO implementation, while in-app reporting cannot
  • March 2018: Data Lake proof-of-concept project showed it was possible to report, but much more work needs to be done

| www.folio.org

4

5 of 12

Build: Reporting Infrastructure Components

JIRA Tickets for Reporting:

  • OKAPI functionality will be extended to allow the creation of a “tap” to export transactional data from a FOLIO LMS to an external data store, such as a data “lake” (see JIRA ticket OKAPI-570).
  • Pre-Handler and Post-Handler filters for reporting will be added to OKAPI to move requests and responses into a data lake (see JIRA ticket OKAPI-591).

| www.folio.org

5

6 of 12

Outstanding Questions

  • Once data is moved into a data lake, how will the data be transformed into a format for reporting applications, as would be done in a data warehouse?
  • Will data dereferencing of identifiers (e.g., UUIDs) be done at the point of the initial transaction or when the report is generated?
  • What structures will be in place to prevent the impact of schema mutation on reporting?
  • Is a Message Queue needed to provide a holding mechanism that external systems could use to retrieve data for transfer into a data lake?�

| www.folio.org

6

7 of 12

Integrity: Insuring Data Confidence

  • need to ensure referential integrity
  • references to related and linked data must work reliably
  • need to standardize microservice data structures for accurate reporting
  • certification can ensure that microservices conform to data structures to support integrity in the data warehouse

| www.folio.org

7

8 of 12

Sandbox: Steep Learning Curve Ahead

  • library reporting today is done against relational database systems
  • data analysts have hundreds of reports developed
  • shift to FOLIO will require significant time and effort in learning and redeveloping reports Data Warehouse is urgently needed for data analysts to begin learning how to rebuild their reports
  • new reports needed to support library services, statistics, operations, and LMS system diagnostics
  • Reports must be in place at GO LIVE�

| www.folio.org

8

9 of 12

Timeline for Reference Data Warehouse

  • July-August 2018 (2 months): FOLIO developers complete FOLIO data warehouse setup.
  • September-December 2018 (4 months): Data analysts build and test reports against the test data warehouse, creating JIRA tickets for developers as problems arise. FOLIO developers address JIRA tickets for the reference data warehouse and document steps for setup and maintenance.
  • January-June 2019 (6 months): First Implementers (and other institutions who are interested) use reference data warehouse as template and complete final versions of reports at their own institutions.
  • July 2019: Data Warehouse Reporting is ready for Go Live for First Implementers.

| www.folio.org

9

10 of 12

Next Steps: Data Warehouse Roadmap

  • Phase I: Data Lake
    • OKAPI changes to flow data into Data Lake
    • Reference implementation of flow into Data Lake
    • Functioning project test environment for Data Lake
  • Phase II: Data Warehouse
    • Reference implementation: load from Data Lake to Data Warehouse
    • Decision - data structure for warehouse:
      • Does the JSON support in PostgreSQL allow us to manipulate data as we need for reports?
      • Or must we convert to normalized relational structures?
      • Trade-off involves large amount of implementation time.
    • Confirm reporting tools working with Data Warehouse (or Data Lake if we are really lucky)
  • Which reporting tools can make use of the JSON extensions in PostgreSQL?

| www.folio.org

10

11 of 12

Summary

The Reporting SIG recommends that the FOLIO Product Council:

  • approve the plan for the FOLIO project to build a Reference Data Warehouse,
  • establish the Reference Data Warehouse as a version 1 priority on the FOLIO project roadmap, and
  • set a deadline of September 1, 2018 to create Reference Data Warehouse

| www.folio.org

11

12 of 12

END

| www.folio.org

12