1 of 11

Data Warehousing in the Cloud�An Iterative Approach��Vanessa Brown & Rob Snyder�Ithaca College

2 of 11

About Ithaca College

4-year private residential college, started as a conservatory of music in 1892

6,000 undergrads, 500 grads, and 700 faculty

4 professional schools and 1 liberal arts school

$43,978 tuition (2018-19)

3 of 11

Background

The college recognizes the need for a comprehensive, consistent, and consumable repository of institutional data to support an evolving culture of data-driven decision making and data-informed strategy

There have been several attempts to build an enterprise data warehouse, involving various consultants and vendor / partner products and technologies

4 of 11

Background

  • What we’ve learned
    • The ultimate value of this work is not just the data, it’s connecting, understanding, modeling, and publishing that data in a useful and meaningful form
    • The process of doing this – extracting data from operational systems, building models with the help of BI and campus wide SMEs, etc. – is where fundamental knowledge and value are built
    • This fundamental, critical knowledge is largely expressed in the processes that build the warehouse
    • Existing solutions “lock up” this knowledge
      • No meaningful, comprehensive way to holistically understand what has been built
      • No viable exit strategy
      • In some cases, all of this knowledge becomes a “black box”

5 of 11

Ground Rules

  • Favor open source and open standards
  • Avoid Vendor Lock In
  • Built in an Iterative Fashion
  • Modular Infrastructure with Replaceable Parts
  • Complete openness and transparency in the modelling process, with self-documenting dictionary and full traceability
  • Creation and Preservation of Institutional Knowledge

6 of 11

Data Engineering Activities

    • Collect data from multiple operational and other systems
    • Resolve discrepancies and align data entities

Normalize across Disparate Systems

    • Overcome limitations in source systems historical tracking
    • Ensure cross systems data is available at consistent time intervals

Improve Data Granularity to Support Trend and Comparative Analysis

    • Leverage SMEs to embed intelligence in the models
    • Demystify the data through the application of a semantic layer

Build Consistent, Accurate, and Consumable Strategic Data Models

7 of 11

Toolset

Data Engineering / Data Science

    • R, R Studio
    • Python, PyCharm, Spider
    • Tableau
    • Power BI

Code Management / DevOps

    • All artifacts under Source Code Management (GIT)
    • All processes deployed as Docker containers
    • Environment Managed by Rancher

Data Lake

    • Azure BLOB Storage
    • Combination of Off-the-Shelf (Azure Data Factory) and internally developed (Java) ingestion
    • Historical data Capture done with Python / Pandas

Warehouse

    • PostgreSQL
    • Azure SQL Warehouse

Data Warehousing

    • Generic, Reusable Python / SQL Process for Data capture, history tracking, and ETL

8 of 11

Sharing

  • The overall architecture is inspired by a simplified service bus approach
    • Azure Queues used to coordinate work from ingestion through capture, modelling, and extracts
  • All of the components can be used stand-alone and swapped out as needed
  • The college is open to sharing any and all of the fundamental code, and is certainly open to collaborating with others who have similar needs and want to contribute to open solutions

9 of 11

Competing Demands and Priorities

  • Maintaining current BI environment
    • Focused on the operational reporting for largely administrative audience
    • Oracle
      • 15 servers
      • OID and Access Manager for group management and SSO
      • OBIEE – for reporting portal for self publishing of administrative offices
    • Constant upgrade cycle to keep products up to date
    • Little time (or skillsets) left for starting something “new” with 2-3 FTEs
  • Projects from “out of the blue”
  • Large ERP implementations with shifting timelines and requirements

10 of 11

Culture Shift

  • BI and Analytics
    • On premise systems to cloud services
      • Thinking differently
      • Allows for prototyping and quick spinups and tests but what about costs?
      • And what about all those services and how they work?
    • Focus shift on providing business consumable, integrated data and less on tool and infrastructure support at different levels
    • Variety of user needs – data science, administrative and senior leadership
    • More nimble data for answering questions
  • Ithaca College
    • Asking questions vs “business as usual”
    • Emphasis on collaborating to maximize resources – Data Modeling Core team

11 of 11

The Path Forward

  • College Strategic plan
    • 9 Goals with many potential objectives and measures(requiring data)
  • BI/Analytics Roadmap
  • Current/Upcoming Projects (in support of Strat Plan and Roadmap)
    • Dining Services project data
    • Slate implementation – getting the data
  • Azure – Microsoft shift
  • More agile approach to deal with shifting priorities
  • Show value faster
  • Experiment and learn as we go…ADJUST