1 of 4

Jan Motl (jan@motl.us)

2 of 4

Need

Whenever we want to mine a database (which contains several tables), we have to first transform all the tables into a single table, that is processable by conventional classifiers (KNN, SVM...).

This preprocessing is a manual, tedious, boring and time consuming task.

But as it’s tedious and boring, it is also automatable!

3 of 4

Approach

  • Automatic “propositionalization” of database with a set of ~30 proved patterns.
  • The prescription how to transform several tables into a single table will be delivered in SQL to make the solution easily deployable.
  • Generation of the SQL itself will be done in a programming language like C# or Java.

4 of 4

Benefits

  • Fast, painless & correct data preprocessing
    • Fast: what takes 1-4 weeks will take 2 days
    • Painless: automatization of boring and tedious work
    • Correct: machines are not prone to fatigue
    • Accurate: much more exhaustive search space
    • Documented: Achilles heel of each ad-hoc work