1 of 28

Compiling Mappings to Bridge Applications and DatabasesSIGMOD 2007, Beijing

Sergey Melnik, Atul Adya, Phil Bernstein

{melnik | adya | philbe}@microsoft.com

Microsoft Corporation, USA

2 of 28

Outline

  • Motivation
  • Our key insights
    • Execution
    • Specification
    • Mapping compilation
  • Implementation

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

2

3 of 28

Motivation

    • Microsoft's Data Platform Vision
      • Uniform data access to all types of data in all tiers
      • Rich data services (BI, sync, security, reporting, etc.)
      • D. Campbell & A. Nori, Thursday, industrial track, 201-B

  • Impedance mismatch
    • Write object-oriented code
    • But: data in relational DBMS
    • Need to translate data and�data access operations
    • 40% of code in enterprise�systems

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

3

Objects

Relationaldatabase

Business logic

4 of 28

O/R Mismatch: State of the Art

  • 1980s: Persistent programming languages
    • One or two commercial products
  • 1990s: OODBMS
    • No widespread acceptance, zero-billion-dollar market
  • "Objects & Databases: A Decade in Turmoil"
    • Carey & DeWitt (VLDB'96), bet on ORDBMS
  • 2000: ORDBMS go mainstream
    • DB2 & Oracle implement hardwired O/R mapping
    • O/R features rarely used for business data
  • Today: client-side data mapping layers

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

4

5 of 28

Data Mapping Layers

    • Goal: Updatable view aligned with application's data model, driven by mapping, backed by RDBMS
  • Packaged business applications
    • E.g., BAPI in SAP R/3, CRM, ERP
  • O/R mappers
    • 60+ commercial products
    • Open source (e.g., Hibernate, Rails)
  • Enterprise frameworks
    • E.g., JPA in EJB 3.0
    • Benefits: DBMS portability, legacy data,�data independence, scale out, etc.

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

5

  • Varying�capability, robustness & TCO
  • Fundamental technical challenges

6 of 28

Sample Scenario

  • Define mapping declaratively
  • Query and update via application schema
  • Mapping layer translates data access operations

Client

Id Name Score ...

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

Person

Id

Name

HR

Id Name

Empl

Id Dept

6

6

Employee

Dept

Customer

BillingAddr

CreditScore

People = { }

var result =� from p in People

where p.Name == "Alice"

select p;

E.g., LINQ�in .NET:

Application schema�≠ DB schema

7 of 28

Assembling Entities from Tables

  • Problem 1:� Updates
  • Problem 2:� Higher-level� specification
  • Problem 3:� Tie specification� to execution

People = � SELECT VALUECASEWHEN (T5._from2 AND NOT T5._from1) THEN Person(T5.Id, T5.Name) � WHEN (T5._from1 AND T5._from2) � THEN Employee(T5.Id, T5.Name, T5.Dept)� ELSE Customer(T5.Id, T5.Name, T5.CreditScore, T5.BillingAddr)� ENDFROM ( (SELECT T1.Id, T1.Name, T2.Dept, � CAST (NULL AS int) AS CreditScore, � CAST (NULL AS nvarchar) AS BillingAddr, False AS _from0, � (T2._from1 AND T2._from1 IS NOT NULL) AS _from1, T1._from2� FROM ( SELECT T.Id, T.Name, True AS _from2� FROM HR AS T) AS T1� LEFT OUTER JOIN (� SELECT T.Id, T.Dept, True AS _from1� FROM Empl AS T) AS T2� ON T1.Id = T2.Id )� UNION ALL (� SELECT T.Id, T.Name, � CAST (NULL AS nvarchar) AS Dept, � T.Score AS CreditScore, T.Addr AS BillingAddr, � True AS _from0, False AS _from1, False AS _from2� FROM Client AS T)� ) AS T5

7

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

8 of 28

Problem 1: Update Execution

  • View update problem (Dayal & Bernstein, 1978)
    • Underspecified update behavior
    • Hard: in commercial DBMSes views with joins/unions usually not updatable
    • Not specific to O/R mapping
  • Current approaches
    • Object-at-a-time updates, case-by-case logic
    • INSTEAD OF triggers, stored procedures
    • Custom callbacks for updates

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

8

9 of 28

Insight: Update Mechanism

  1. Materialized view maintenance
    • ∆Tables = ∆U ( Entities, ∆Entities )
  2. View unfolding
    • ∆Tables = ∆U ( Q(Tables), ∆Entities )

∆Entities

Entities

Entities + ∆Entities

Query views

Update views

virtual state

persistent�data

∆Tables ?

Tables

Tables + ∆Tables

9

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

  • Uniform algorithm
  • Object-at-a-time and set updates

10 of 28

Example: Update Translation

10

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

Client

Id Name Score ...

HR

Id Name

Empl

Id Dept

U

HR = SELECT p.Id, p.Name FROM People p� WHERE p IS OF (ONLY Person) OR� p IS OF (ONLY Employee)

Empl = SELECT e.Id, e.Dept FROM People e

WHERE e IS OF Employee

Client = SELECT c.Id, c.Name, ... FROM People c

WHERE c IS OF Customer

Person

Id

Name

Employee

Dept

Customer

BillingAddr

CreditScore

People = { }

11 of 28

Example: Update Translation

11

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

People =�{� Person(1, "Alice"),� Employee(2, "Bob", "R&D")�}

∆Tables

∆Entities

= { (1, "Alice"),� (2, "Bob") }

= { (2, "R&D") }

Client

Id Name Score ...

HR

Id Name

Empl

Id Dept

U

HR = SELECT p.Id, p.Name FROM People p� WHERE p IS OF (ONLY Person) OR� p IS OF (ONLY Employee)

Empl = SELECT e.Id, e.Dept FROM People e

WHERE e IS OF Employee

Client = SELECT c.Id, c.Name, ... FROM People c

WHERE c IS OF Customer

= { }

∆U

HR = SELECT p.Id, p.Name FROMPeople p� WHERE p IS OF (ONLY Person) OR� p IS OF (ONLY Employee)

Empl = SELECT e.Id, e.Dept FROMPeople e

WHERE e IS OF Employee

Client = SELECT c.Id, c.Name, ... FROMPeople c

WHERE c IS OF Customer

12 of 28

Problem 2: Specifying Mappings

  • EJB 3.0

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

12

@Entity

@Table(name="HR")

@SecondaryTables({

@SecondaryTable(name="Empl",

pkJoinColumns=@PrimaryKeyJoinColumn(name="Id")) })

public class Employee : Person { ... }

@Entity ...

public class Customer : Person { ... }

@Entity ...

public class Person { ... }

"Specifying one or more secondary tables indicates that the data for the entity class is stored across multiple tables."

  • Mapping = custom data structure
  • Semantics defined by examples
  • Capabilities? Correctness?

"If the mapping is incompletely specified, the JDO implementation might silently use mapping defaults or throw an exception."

  • JDO 2.0

UNSUPPORTED

13 of 28

Insight: �Declarative Mapping Language

  • Well-defined semantics
  • Allows non-expert users to specify complex O/R mappings
  • Uniform specification (vs. scenario-driven)

13

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

14 of 28

Example: Mapping Fragments

Client

Id Name Score ...

SELECT p.Id, p.Name FROM People p�WHERE p IS OF (ONLY Person) OR� p IS OF (ONLY Employee)

SELECT Id, Name

FROM HR

=

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

Person

Id

Name

HR

Id Name

Empl

Id Dept

14

14

SELECT e.Id, e.Dept FROM People e

WHERE e IS OF Employee

SELECT Id, Dept

FROM Empl

=

Employee

Dept

Customer

BillingAddr

CreditScore

SELECT c.Id, c.Name, ... FROM People c

WHERE c IS OF Customer

SELECT Id, Name, ...

FROM Client

=

People = { }

QE1

QE2

QE3

QT1

QT2

QT3

Fragment views

    • Qi = π(σ(…))
    • Expressed in Entity SQL
    • Conditions on types & scalars, ".", keys

15 of 28

Problem 3:�Tie Specification to Execution

  • Got higher-level mappings
  • Need explicit bidirectional�data transformations
    • That "respect" the mapping
    • And are correct
  • Correctness criterion
    • Entities = Q(U(Entities))
    • Undecidable for relationally�complete languages

Query views

Update views

Mapping

?

15

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

16 of 28

Outline

  • Motivation
  • Our key insights
    • Execution
    • Specification
    • Mapping compilation
  • Implementation

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

16

17 of 28

Insight: Mapping Compilation

  • Formal problem statement
  • Bipartite mappings
    • Support update scenarios
  • Steps of the algorithm
    • Partitioning
    • Exploiting outer joins
    • Producing CASE statements
    • Eliminating self-joins and�self-unions
  • Mapping validation, merge views, etc.

17

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

Query views

Update views

Mapping

compile

18 of 28

Problem Statement

  • Mapping: binary relation on app & DB states

  • Mapping roundtrips if map map-1 = Id(Entities)
  • Data roundtripping problem

For a given map, construct views Q and U expressed in some language L, such that U ⊆ map ⊆ Q-1, or show that such views do not exist

    • Exist iff map roundtrips
    • Implies Entities = Q(U(Entities))

18

map

Entities

Tables

Query views

Update views

19 of 28

Bipartite Mappings

19

f

g

map

f′

g′

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

People = Rewriting(QE1, QE2, QE3)

= Rewriting(QT1, QT2, QT3)

Entities

Tables

f′:

f′ g:

  • map: { QE1 = QT1� QE2 = QT2� QE3 = QT3 }
  • Mapping roundtrips only if f is injective, i.e., there exists an exact rewriting f′ of Id query on Entities

= f g-1

f

g

20 of 28

Rewriting Algorithm

  • Partition People
    • Divide & conquer
    • Exploits mapping
  • Rewrite each partition (so f is injective)
    • P1 = QE1 QE2, P2 = QE1QE2, P3 = QE3
  • Outer join
    • Rewriting(QE1, QE2, QE3) = CASE … (QE1 QE2 QE3)� = CASE … ((QE1 QE2) ∪a QE3)
    • Minimal # of operators, better optimization potential
  • CASE statements
    • Use partitioning: Person=P1, so (fromE1 and not fromE2)

20

P1 : σe IS OF (ONLY Person) a

P2 : σe IS OF (ONLY Customer) a

P3 : σe IS OF (ONLY Employee)

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

21 of 28

Outline

  • Motivation
  • Our key insights
    • Execution
    • Specification
    • Mapping compilation
  • Implementation

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

21

22 of 28

ADO.NET Entity Framework

Entities�(conceptual schema based on EER)

Relationaldatabase

Mapping

  • Entity SQL

var recentHires =� from p in db.SalesPeople

where p.HireDate.Year > 2000

select p;

"SELECT p

FROM SalesPeople AS p

WHERE p.HireDate.Year > 2000"

  •  Language Integrated Query� (LINQ)

Objects

(C#, VB)

Users

Applications

Tools

  •  Create/Read/Update/Delete� on Objects

22

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

23 of 28

Architecture

Merge views

23

Mapping

Query pipeline

Update pipeline

Mapping compiler

Assemble �entities

Object services

Apply view maintenance to bidirectional views

Language integration

Runtime

Design time

Data providers

  • Unfold query views
  • Push relational ops
  • Optimize

∆Entities

∆Tables

Entity SQL

Algebraic trees

Entities

Tuples

Database system

API

generator

UI Tools

Query

views

Update

views

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

Adya, Blakeley, Melnik, Muralidhar & ADO.NET Team (SIGMOD'07)

24 of 28

Evaluation

  • Correctness
    • Enforced by mapping compiler
    • Comprehensive model-based testing
  • Efficiency
    • Design-time compilation, not on critical path
  • Performance of runtime engine
    • Exploiting mapping constraints helps DBMS optimizer
    • Server load approaches that of handcrafted solutions
  • Inheritance mapping exceeds industry standard

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

24

25 of 28

Related Work

  • Objects & databases
    • Atkinson & Buneman (ACM Comp. S. '87), Carey & DeWitt (VLDB'96), Cook & Ibrahim (ODBMS.ORG'06)
    • OO extensions in DB2 and Oracle: fixed built-in mapping
    • O/R mappers (Hibernate, Oracle TopLink)
    • BEA AquaLogic (XML/SOA, Carey et al SIGMOD'06)
  • Bidirectional views & view update
    • Foster et al (ACM TOPLAS'07), Barbosa et al (VLDB'05), Bohannon et al (VLDB'05), Hull (PODS'86), Miller (IS'93)
    • Dayal & Bernstein (VLDB'78), Bancilhon & Spyratos (TODS'81), Keller et al (SIGMOD'93), Braganholo et al (VLDB'04), Kotidis et al (ICDE'06)
  • Mapping compilation
    • IBM's Clio: XQuery and XSLT
  • Answering queries using views & view matching
    • Halevy (VLDBJ'01), Gou et al (SIGMOD'06), Larson & Zhou (VLDB'05)

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

25

Surveys

26 of 28

Contributions: Principled Approach

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

26

  • Declarative mapping language
    • Allows non-expert users to specify�complex O/R mappings
    • Formal semantics
  • Mechanism for updatable views
    • Large class of updates, not O/R specific
    • Leverages view maintenance technology

Mapping

compile

Bidirectional�views

  • Mapping compilation
    • Guarantees correctness
    • Exploits outer joins &�case statements
  • Implemented in ADO.NET Entity Framework
  • Part of Microsoft's .NET platform

27 of 28

Research Challenges (see Bernstein's keynote)

  • Runtime
    • Updates, notifications, concurrency, client-side constraints, debugging, access control, distributed QP, bulk loading, business logic, etc.� ⎯ via mappings
  • Tools
    • Mapping compilation, DB validation, schema translation, reverse engineering, schema tuning, schema evolution, data migration, etc.

27

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

28 of 28

More Information (SIGMOD'07)

  • Industrial papers
    • Anatomy of the ADO.NET Entity Framework �A. Adya, J. Blakeley, S. Melnik, S. Muralidhar,�the ADO.NET Team
    • The Microsoft Data Platform�D. Campbell, A. Nori, Thursday, 201-B, 5pm
  • Demo
    • ADO.NET Entity Framework: Raising the Level of Abstraction in Data ProgrammingP. Castro, S. Melnik, A. Adya
  • Download at msdn.microsoft.com/data

S. Melnik, A. Adya, P. A. Bernstein: Compiling Mappings to Bridge Applications and Databases, SIGMOD 2007

28