PhD Course on Data Analytics

The aim of the course is to introduce students to two important elements of data engineering technology that makes it possible to extract valuable knowledge from “Big Data”, namely distributed data processing using the MapReduce framework, and data analytics of large graphs. In each case, the course will first introduce fundamental theoretical and architectural concepts, and then present technology for using MapReduce and querying large graphs, respectively. For MapReduce, we present examples of algorithms that can be successfully parallelised and thus are able to take advantage of distributed data architectures, and we will suggest practical exercises using the Spark and Hadoop technology stack. Regarding graph analytics, we focus on community detection algorithms as an example, and we will usethe Neo4J graph DBMS along with the Graph Data Science library for practical exercises.

The course will be held online in the ICT class of Microsoft Teams by professor Paolo Missier and Ph.D. Luca Gagliardelli

Here the detailed schedule: