Guillaume Eynard-Bontemps, CNES (Centre National d’Etudes Spatiales - French Space Agency)
2020-11-17
Matlab (and equivalent Scilab)
C/C++, Java
What is the most used language (in Data Science)?
Answer link Key: pc
Nearly every scientist working in Python draws on the power of NumPy.
# The standard way to import NumPy:
import numpy as np
# Create a 2-D array, set every second element in
# some rows and find max per row:
x = np.arange(15, dtype=np.int64).reshape(3, 5)
x[1:, ::2] = -99
x
array([[ 0, 1, 2, 3, 4],
[-99, 6, -99, 8, -99],
[-99, 11, -99, 13, -99]])
x.max(axis=1)
array([ 4, 8, 13])
# Generate normally distributed random numbers:
rng = np.random.default_rng()
samples = rng.normal(size=2500)
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
# Plot the surface.
surf = ax.plot_surface(X, Y, Z, cmap=cm.coolwarm,
linewidth=0, antialiased=False)
# Customize the z axis.
ax.set_zlim(-1.01, 1.01)
ax.zaxis.set_major_locator(LinearLocator(10))
# A StrMethodFormatter is used automatically
ax.zaxis.set_major_formatter('{x:.02f}')
# Add a color bar which maps values to colors.
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()
Which tools allows manipulating tabular data?
Answer link Key: fp
What Dask does better than Spark (multiple choices)?
Answer link Key: dt
Matlplotlib is the only vizualisation library for Python.
Answer link Key: jf
Which is the best Deep Learning library in Python?
Answer link Key: hy
conda | pip | |
---|---|---|
manages | binaries | wheel or source |
can require compilers | no | yes |
package types | any | Python-only |
create environment | yes, built-in | no, requires virtualenv or venv |
dependency checks | yes | no |
Difference between Conda and Pip according to Anaconda.
Numba makes Python code fast
Turn a Git repo into a collection of interactive notebooks
Follow this first tutorial at least till chapter 6. Use the binder button!
If you have time, go through part “The predictive modeling pipeline”. Notebook 01 to 03. With Binder too.