Master in Information and Computer Sciences

Big Data Analytics
Module:Module 2.5, Semester 2
Objective: The lecture provides an entry point to large-scale data management and distributed computing principles in recent NoSQL architectures. We start with an overview of distributed file systems and MapReduce in Apache Hadoop and then move on to more advanced analytical tasks based on the machine-learning libraries in Apache Spark. The lecture serves as an ideal basis for further topics in this area (such as Master seminars, projects and theses).
Course learning outcomes: - Students become familiar with the usage of recent Big Data platforms such as Apache Hadoop and Spark

- Student obtain an overview of both the theoretical foundations and practical applications of various Big Data and Machine Learning algorithms

- Students learn how to approach and solve different data-analysis tasks by a number of programming exercises with real-world datasets

Description: The course consists of a combination of theory-oriented lectures and practical exercises, through which the students are guided by a series of real-world use cases and hands-on examples. Specifically, we focus on the following topics:

-    Distributed File Systems (DFS) and MapReduce in Apache Hadoop
-    Resilient Distributed Data (RDD) objects and DataFrames in Apache Spark
-    Implementation of complex DataFlow programs in Spark using Scala
-    Performing advanced analytical tasks in Spark's MLlib:
  o    Distributed clustering and classification of objects
  o    Decision trees and random forests
  o    Recommender systems via matrix factorization
  o    Text analysis via latent semantic indexing
  o    Geospatial data analysis
  o    Social-network analysis

Organization:The course offers both theory lectures and practical exercise sessions. The lectures serve as theoretical basis for the algorithmic concepts which we then apply during the practical sessions. The solutions to the exercises are developed and demonstrated interactively with the tutors.
Language: English
Lecturer: THEOBALD Martin
Rating: Practical exercises: 50%
Final written exam: 50%