Themen des Seminars “Big Data Analysis: Frameworks and Algorithms”
Artur Andrzejak, Lutz Büch
Sommersemester 2013, Institut für Informatik, Universität Heidelberg
Link zur Seminarseite: http://pvs.ifi.uni-heidelberg.de/teaching/summer-2013/s-big-data-analysis/
Teilnehmer
Vorname | Nachname | Thema | Block | Datum | Vorbesprechung | Gast-Teilnahme | Ausarb. |
Felix | Eichler | B4 | 1 | 26.05.13 | 17.5.2013, 10:30 |
|
|
Sebastian | Butterweck | B5 | 1 | 26.05.13 | 16.5.2013, 14:00 |
|
|
Max | Löhlein | B6 | 1 | 26.05.13 | 21.5.2013, 14:00 |
|
|
Julia | Kreutzer | A2 | 1 | 26.05.13 | 22.5.2013, 14:00 |
|
|
Matthias | Hauck | A3 | 2 | 23.06.13 | 13.06.2013, 11:00 |
|
|
Daniel | Barthel | A4 | 2 | 23.06.13 | 12.06.2013, 11:00 |
|
|
Roman | Hable | C1 | 2 | 23.06.13 | 13.06.2013, 14:00 |
|
|
Dominik | Riedinger | C2 | 2 | 23.06.13 |
|
|
|
A. Query processing on MapReduce
- Apache Pig
- Project page (link)
- Pig Latin: A Not-So-Foreign Language for Data Processing (link)
- Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience (link)
- Apache Hive
- Project page (link)
- Hive - A Petabyte Scale Data Warehouse Using Hadoop (link)
- Hive - A Warehousing Solution Over a MapReduce Framework (link)
- Shark - Hive on Spark
- Project page (link)
- Shark: Fast Data Analysis Using Coarse-grained Distributed Memory (link)
- DryadLINQ
- Distributed Data-Parallel Computing Using a High-Level Programming Language (link)
- Some sample programs written in DryadLINQ (link)
- Jaql
- Jaql : A Scripting Language for Large Scale Semistructured Data Analysis (link)
- Comparing High Level MapReduce Query Languages (link)
B. MapReduce paradigm
- Programming model
- Google's MapReduce Programming Model Revisited (by R. Lammal), (link)
- MapReduce algorithm design
- Chapter 3 from book “Data-Intensive Text Processing with MapReduce” (by Jimmy Lin and Chris Dyer) (link)
- Generalized Model: Parallelized Contract (PACT)
- MapReduce and PACT - Comparing Data Parallel Programming Models (link)
- Massively Parallel Data Analysis with PACTs on Nephele (link)
- Spark framework
- Project page (link)
- Spark: Cluster Computing with Working Sets (link)
- Twister framework
- Project page (link)
- Twister: A Runtime for Iterative MapReduce (link)
- Apache Mahout machine learning library for Hadoop
- Project page (link)
- Introducing Mahout (link)
- Implementation of algorithms, e.g. k-means clustering (link), Random Forest classification (link)
C. Declarative Systems for Machine Learning
- ScalOps / Hyracks
- Declarative Systems for Large-Scale Machine Learning (link)
- Machine learning in ScalOps, a higher order cloudcomputing language (link)
- SystemML
- SystemML: Declarative Machine Learning on MapReduce (link)
- Presentation: SystemML: DeclarativeMachine Learning on MapReduce (link)
- MLbase
- Project page (link)
- MLbase: A Distributed Machine-learning System (link)