Deploy Data processing platform on the Cloud
Guillaume Eynard-Bontemps, CNES (Centre National
d’Etudes Spatiales - French Space Agency)
2020-11-16
Context
Dask
- #1 Distributed processing tool in Python
- Distributed Dataframe and Arrays
- We’ll see more of it tomorrow
Pangeo
- Community of (geo)scientists and developers
- Tackling big data problems: analysing simulations or sensor
data
- Providing recipes to build a Pangeo platform:
- Jupyterhub
- Dask
- Xarray
- on Kubernetes or HPC
Exercise
We will focus on getting a data processing platform on the Cloud:
- Dask cluster running in Kubernetes,
- With Pangeo libraries available,
- Which we will then use in the final evaluation.
Documentation
notebook
This is the first part of you evaluation!