1 of 23

Tensors are the most important data structure�in modern Machine Learning applications

2 of 23

But the tensor API seems to be broken...

“[Tensor] forces bad habits, such as exposing private dimensions, broadcasting based on absolute position, and keeping type information in documentation…”

3 of 23

Simplified code for self attention (in pytorch)

Transpositions needed for later operations...

Semantics of every dimension in the documentation

it is done according to the semantics of every dimension

bmm (Batch Matrix Multiplication) is a specific function that assumes that the first dimension is the batch index

Yet another transposition

Another transposition to make it work

What is dimension 2 at this point?

And If we now want to pass just one example � with no batch dimension, then the code � just doesn’t work anymore :-(

4 of 23

Possible solution: high-level query-languages�to manipulate Tensors

We focus on MATLANG and LARA�mostly from a theoretical point of view

We compare them in terms of their ability to express�certain relevant Machine Learning operations

5 of 23

Expressiveness of Matrix and Tensor �Query Languages in terms of ML Operators

Jorge Pérez�@perez

joint work with Pablo Barceló, Nelson Higuera, �and Bernardo Subercaseaux

DCC UChile and IMFD Chile

6 of 23

Why is expressiveness relevant?

7 of 23

MATLANG and LARA

8 of 23

MATLANG

[BGVW’17@ICDT]

9 of 23

LARA

Associative tables: � Table [ Keys, Values ]

Example: � Imgs [ {batch, width, height}, {red,green,blue} ]� TImg [ {batch, width, height, channel}, {val} ]� TText [ {batch, time, features}, {val} ]

[HHS’17@SIGMOD]

10 of 23

LARA

Three operators:

Map

[HHS’17@SIGMOD]

(k)

f

4

11 of 23

LARA

Three operators:

Map

[HHS’17@SIGMOD]

(k)

f

j

_

i

12 of 23

Self attention �in LARA

Schema:

[{batch,time,features}]

[{batch,idx,features}]

[{batch,idx,features}]

13 of 23

Convolution - Einstein Summation - Inversion

14 of 23

Convolution

A

K

A K

15 of 23

Convolution is not expressible in MATLANG

Theorem

MATLANG satisfies a very strong genericity property over keys (indexes)

Proof idea

Convolution is expressible in LARA

Theorem

LARA allows to compare keys, and we can take advantage of that

Proof idea

16 of 23

Einstein Summation

17 of 23

Einstein summation notation

Indexed expressions with free repeated indexes �Imply an implicit summation

Tik = AijBjk

= j AijBjk

Tik = AijBjk

<u,v> = uivi

Tils = AijkBkjls

= jk AijkBkjls

einsum((ijk,kjlsils), A, B)

18 of 23

LARA can easily express a named version of �general Einstein Summation

MATLANG can also express Einstein Summation when input and output are matrices (or vectors)

Theorem

19 of 23

Matrix Inversion

20 of 23

Matrix inversion is not expressible in MATLANG

  • Inversion allows to express Transitive Closure
  • Transitive Closure is a non-local query
  • MATLANG is contained in FO+Aggregation
  • FO+Aggregation is a local query language

Thus, MATLANG cannot express Inversion

MATLANG cannot express inversion

Theorem [BGVW’17]

Proof idea

21 of 23

Can we use the same approach to prove that �matrix inversion is not expressible in LARA?

LARA can express non-local queries

Theorem

Matrix inversion is complete for DET

Observations

If LARA can express inversion then �LogSpace = DET

Theorem

LARA can be evaluated in LogSpace

It is believed that LogSpace DET

22 of 23

Concluding remarks

Tensors seems to need an �abstract manipulation language

We study the expressiveness of two languages,�but they were not designed thinking on ML operators

Studying these languages is not trivial, and �interesting theoretical/practical problems arise

23 of 23

Expressiveness of Matrix and Tensor �Query Languages in terms of ML Operators

Jorge Pérez�@perez

joint work with Pablo Barceló, Nelson Higuera, �and Bernardo Subercaseaux

DCC UChile and IMFD Chile