Tensors are the most important data structure�in modern Machine Learning applications
But the tensor API seems to be broken...
“[Tensor] forces bad habits, such as exposing private dimensions, broadcasting based on absolute position, and keeping type information in documentation…”
Simplified code for self attention (in pytorch)
Transpositions needed for later operations...
Semantics of every dimension in the documentation
it is done according to the semantics of every dimension
bmm (Batch Matrix Multiplication) is a specific function that assumes that the first dimension is the batch index
Yet another transposition
Another transposition to make it work
What is dimension 2 at this point?
And If we now want to pass just one example � with no batch dimension, then the code � just doesn’t work anymore :-(
Possible solution: high-level query-languages�to manipulate Tensors
We focus on MATLANG and LARA�mostly from a theoretical point of view
We compare them in terms of their ability to express�certain relevant Machine Learning operations
Expressiveness of Matrix and Tensor �Query Languages in terms of ML Operators
Jorge Pérez�@perez
joint work with Pablo Barceló, Nelson Higuera, �and Bernardo Subercaseaux
DCC UChile and IMFD Chile
Why is expressiveness relevant?
MATLANG and LARA
MATLANG
[BGVW’17@ICDT]
LARA
Associative tables: � Table [ Keys, Values ]
Example: � Imgs [ {batch, width, height}, {red,green,blue} ]� TImg [ {batch, width, height, channel}, {val} ]� TText [ {batch, time, features}, {val} ]
[HHS’17@SIGMOD]
LARA
Three operators:
Map
[HHS’17@SIGMOD]
(k)
f
4
LARA
Three operators:
Map
[HHS’17@SIGMOD]
(k)
f
j
_
≡
i
Self attention �in LARA
Schema:
[{batch,time,features}]
[{batch,idx,features}]
[{batch,idx,features}]
Convolution - Einstein Summation - Inversion
Convolution
A
K
A K
﹡
﹡
Convolution is not expressible in MATLANG
Theorem
MATLANG satisfies a very strong genericity property over keys (indexes)
Proof idea
Convolution is expressible in LARA
Theorem
LARA allows to compare keys, and we can take advantage of that
Proof idea
Einstein Summation
Einstein summation notation
Indexed expressions with free repeated indexes �Imply an implicit summation
Tik = AijBjk
= ∑j AijBjk
Tik = AijBjk
<u,v> = uivi
Tils = AijkBkjls
= ∑j∑k AijkBkjls
einsum((ijk,kjls → ils), A, B)
LARA can easily express a named version of �general Einstein Summation
MATLANG can also express Einstein Summation when input and output are matrices (or vectors)
Theorem
≡
Matrix Inversion
Matrix inversion is not expressible in MATLANG
Thus, MATLANG cannot express Inversion
MATLANG cannot express inversion
Theorem [BGVW’17]
Proof idea
Can we use the same approach to prove that �matrix inversion is not expressible in LARA?
LARA can express non-local queries
Theorem
Matrix inversion is complete for DET
Observations
If LARA can express inversion then �LogSpace = DET
Theorem
LARA can be evaluated in LogSpace
It is believed that LogSpace ⊈ DET
Concluding remarks
Tensors seems to need an �abstract manipulation language
We study the expressiveness of two languages,�but they were not designed thinking on ML operators
Studying these languages is not trivial, and �interesting theoretical/practical problems arise
Expressiveness of Matrix and Tensor �Query Languages in terms of ML Operators
Jorge Pérez�@perez
joint work with Pablo Barceló, Nelson Higuera, �and Bernardo Subercaseaux
DCC UChile and IMFD Chile