Apache Spark

Intermediate

Apache Spark is a large-scale data analytics engine that allows operating on very heavy duty data engineering, data science and machine learning workloads. It consists of a set of abstractions that enable operating on datasets distributed over the nodes of a cluster.

I have used Apache Spark through the PySpark Python API to process the data in tabular from from 11 million customers of the electricity distribution branch of Endesa, part of the Enel Group.