Final Evaluation

Guillaume Eynard-Bontemps, CNES (Centre National d’Etudes Spatiales - French Space Agency)

2020-11-18

Context

Prerequisite: Dask Hub platform deployed before.

Dataset: New York Taxi cab statistics simplified.

What has to be done

  • What’s wrong with pandas?
  • Clean big amounts of data using Dask in the cloud (interactive distributed processing)
  • Train a machine learning model with a big dataset in input
  • Train machine learning models in parallel (hyper parameter search) on smaller input dataset