Related Projects

Vehicle Data Collection and Analysis

This project involved the design of a Big Data pipeline and its prototypical implementation. After receiving the vehicle data (geo-coordinates), clustering of the coordinates was performed (implemented as a Spark batch job). Additionally, cluster assignment was carried out in a Spark streaming job. The pre-processed data were then analyzed using machine learning.

Spark

Scala

Setup of Big Data Development-Toolchain

The toolchain helps reduce the effort involved in testing Big Data applications. It simulates all components of the production environment using Docker containers. This allows for local testing of, for example, Spark jobs. To start the environment, only a Bash script needs to be executed.