Archive

Archive for the ‘Data Science’ Category

Pandas in a Nutshell

February 28, 2020 Leave a comment

Hi,

Hope you enjoyed my previous article and introduction to NumPy .

The purpose of today’s blog post is to follow the same format but go through some pandas topics. Pandas is an open source library built on top of numpy. It provides easy to use data structures for data analysis, time series and statistics. Pandas is like Python’s version of Excel or R. Let’s get started.

Series is similar to the numpy array but pandas series can be label indexed. On the same note, another main difference between pandas series and numpy array is that pandas series can have objects of different types.

You can use a Python list, numpy array or even a dictionary to create a Series. See below an example which creates series from numpy array.

 

The index is a key thing with the Series allowing quick lookup access of the data. I will use the top headline of the Coronavirus at the moment as an example. In this case, the label is the country with data points against it.

 

Moving on to the next topic which is dataframes. Think of a dataframe as a bunch of series sharing the index. In the example below, df uses the default labels of 0, 1 and 2 with some values for columns X, Y and Z. df_rand has been created from random values and has both labels and column headers named accordingly. X, Y, Z column is a Series itself. You can also read a dataframe from a csv file.

 

Now let’s just select a row which would be a Series too. If more than one row is selected, then obviously that’s a dataframe 🙂

 

Similarly to numpy, pandas have the ability to apply conditional selection.

 

Finally, let’s demonstrate how to join dataframes together or ultimately stack them. If you have SQL background,  then this should be easy peasy.

6_Pandas

Enjoy! 🙂

 

Categories: Data Science Tags: , , ,
%d bloggers like this: