How to Data Analyse with Python


Download anaconda. It will come with Jupyter, pandas, dask, matplotlib, and various other things you want out of the box. No fiddling with various installations and then downloading packages and trying to figure out how to get them to work.

Launch a Jupiter notebook and follow along some of the conference workshops that are on YouTube.

Pandas let’s you do various statistics and plot using a plotting library. Dask lets you do the same stuff as pandas, but you can feed it a list of data sources and it will load just enough from each and let you plan your calculation without loading the full amount in memory.

Practice with pandas on one dataset or subset. Plot out what you need to do. Then scale it up later with dask. You can stick to pure pandas if your data is small enough to fit into memory all at once, or if you don’t mind using only cpu 1 core for a while.