This project involved the design of a Big Data pipeline and its prototypical implementation. After receiving the vehicle data (geo-coordinates), clustering of the coordinates was performed (implemented as a Spark batch job). Additionally, cluster assignment was carried out in a Spark streaming job. The pre-processed data were then analyzed using machine learning.