2 of 11

About Ithaca College

4-year private residential college, started as a conservatory of music in 1892

6,000 undergrads, 500 grads, and 700 faculty

4 professional schools and 1 liberal arts school

$43,978 tuition (2018-19)

3 of 11

Background

The college recognizes the need for a comprehensive, consistent, and consumable repository of institutional data to support an evolving culture of data-driven decision making and data-informed strategy

There have been several attempts to build an enterprise data warehouse, involving various consultants and vendor / partner products and technologies

4 of 11

Background

What we’ve learned

The ultimate value of this work is not just the data, it’s connecting, understanding, modeling, and publishing that data in a useful and meaningful form
The process of doing this – extracting data from operational systems, building models with the help of BI and campus wide SMEs, etc. – is where fundamental knowledge and value are built
This fundamental, critical knowledge is largely expressed in the processes that build the warehouse
Existing solutions “lock up” this knowledge

No meaningful, comprehensive way to holistically understand what has been built
No viable exit strategy
In some cases, all of this knowledge becomes a “black box”

5 of 11

Ground Rules

Favor open source and open standards
Avoid Vendor Lock In
Built in an Iterative Fashion
Modular Infrastructure with Replaceable Parts
Complete openness and transparency in the modelling process, with self-documenting dictionary and full traceability
Creation and Preservation of Institutional Knowledge

6 of 11

Data Engineering Activities

Collect data from multiple operational and other systems
Resolve discrepancies and align data entities

Normalize across Disparate Systems

Overcome limitations in source systems historical tracking
Ensure cross systems data is available at consistent time intervals

Improve Data Granularity to Support Trend and Comparative Analysis

Leverage SMEs to embed intelligence in the models
Demystify the data through the application of a semantic layer

Build Consistent, Accurate, and Consumable Strategic Data Models

7 of 11

Toolset

Data Engineering / Data Science

R, R Studio
Python, PyCharm, Spider
Tableau
Power BI

Code Management / DevOps

All artifacts under Source Code Management (GIT)
All processes deployed as Docker containers
Environment Managed by Rancher

Data Lake

Azure BLOB Storage
Combination of Off-the-Shelf (Azure Data Factory) and internally developed (Java) ingestion
Historical data Capture done with Python / Pandas

Warehouse

PostgreSQL
Azure SQL Warehouse

Data Warehousing

Generic, Reusable Python / SQL Process for Data capture, history tracking, and ETL

8 of 11

Sharing

The overall architecture is inspired by a simplified service bus approach

Azure Queues used to coordinate work from ingestion through capture, modelling, and extracts

All of the components can be used stand-alone and swapped out as needed
The college is open to sharing any and all of the fundamental code, and is certainly open to collaborating with others who have similar needs and want to contribute to open solutions

9 of 11

Competing Demands and Priorities

Maintaining current BI environment

Focused on the operational reporting for largely administrative audience
Oracle

15 servers
OID and Access Manager for group management and SSO
OBIEE – for reporting portal for self publishing of administrative offices

Constant upgrade cycle to keep products up to date
Little time (or skillsets) left for starting something “new” with 2-3 FTEs

Projects from “out of the blue”
Large ERP implementations with shifting timelines and requirements

10 of 11

Culture Shift

BI and Analytics

On premise systems to cloud services

Thinking differently
Allows for prototyping and quick spinups and tests but what about costs?
And what about all those services and how they work?

Focus shift on providing business consumable, integrated data and less on tool and infrastructure support at different levels
Variety of user needs – data science, administrative and senior leadership
More nimble data for answering questions

Ithaca College

Asking questions vs “business as usual”
Emphasis on collaborating to maximize resources – Data Modeling Core team

11 of 11

The Path Forward

College Strategic plan

9 Goals with many potential objectives and measures(requiring data)

BI/Analytics Roadmap
Current/Upcoming Projects (in support of Strat Plan and Roadmap)

Dining Services project data
Slate implementation – getting the data

Azure – Microsoft shift
More agile approach to deal with shifting priorities
Show value faster
Experiment and learn as we go…ADJUST