## Pandas in a Nutshell

Hi,

Hope you enjoyed my previous article and introduction to NumPy .

The purpose of today’s blog post is to follow the same format but go through some pandas topics. Pandas is an open source library built on top of numpy. It provides easy to use data structures for data analysis, time series and statistics. Pandas is like Python’s version of Excel or R. Let’s get started.

Series is similar to the numpy array but pandas series can be label indexed. On the same note, another main difference between pandas series and numpy array is that pandas series can have objects of different types.

You can use a Python list, numpy array or even a dictionary to create a Series. See below an example which creates series from numpy array.

The index is a key thing with the Series allowing quick lookup access of the data. I will use the top headline of the Coronavirus at the moment as an example. In this case, the label is the country with data points against it.

Moving on to the next topic which is dataframes. Think of a dataframe as a bunch of series sharing the index. In the example below, df uses the default labels of 0, 1 and 2 with some values for columns X, Y and Z. df_rand has been created from random values and has both labels and column headers named accordingly. X, Y, Z column is a Series itself. You can also read a dataframe from a csv file.

Now let’s just select a row which would be a Series too. If more than one row is selected, then obviously that’s a dataframe 🙂

Similarly to numpy, pandas have the ability to apply conditional selection.

Finally, let’s demonstrate how to join dataframes together or ultimately stack them. If you have SQL background, then this should be easy peasy.

Enjoy! 🙂

## NumPy in a Nutshell

Hello and welcome back. I have started a new category in my blog about Python. The purpose of this post is to go through NumPy library. I will be using Jupyter for the demo but will provide the py file if you prefer to run it in PyCharm for example. NumPy is a core Python Linear Algebra library for Data Science used for faster array processing than the native Python lists with a bunch of handy methods. Let’s make a start!

You can cast a normal list to a one-dimensional array using the array function.

Or have a list of list and cast it as a two-dimensional array. This effectively is a matrix that has 2 rows and 4 columns. The size attribute gives the number of elements of the array.

Next section shows different ways to create NumPy arrays.

Functions ones and zeros are a handy way to create arrays of 1s and 0s. Linspace is another function similar to arange but using equal steps. Also check out reshape and ravel().

See examples of other useful methods below.

Next, let’s have a look at selections and indexing.

Great stuff. To Illustrate the indexing, let’s create a new two-dimensional array.

Let’s see what other operations you can do apart from copy().

Let’s have a look at some basic operations like += or *= to change an existing array instead of creating a new one. Check out how to calculate the sum of all elements of an array or find the min or max value below.

As promised see below the py file with all the examples.

import numpy as np # normal list v_even_list = [20, 40, 60] print(v_even_list) # cast to 1-dimensional array print(np.array(v_even_list)) # 2-dimensional array v_matrix = np.array([[10,20,30,40], [50, 60, 70, 80]]) print(v_matrix) print(v_matrix.shape) print(v_matrix.size) # Create Array: # Use arange v_array = np.arange(20) print(v_array) # Use array to create one-dimensional array v_array1 = np.array([1,3,5]) print(v_array1) # Use array to create n-dimensional array v_array2 = np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]) print(v_array2) v_array3 = np.array([(2.25,3.25,4.25), (5,6,7)]) print(v_array3) # Create array of 1s. Note default type is float64 v_one = np.ones((2,3)) print(v_one) # Create array of 0s, specify the type needed v_zero = np.zeros((2,4), dtype=np.int16) print(v_zero) # Array of 3 numbers between 5 and 10 in equal steps v_eq_steps = np.linspace(5,10, 3) print(v_eq_steps) # 2-dimensional array with 3 rows and 5 columns to modify the shape the way you need. # ravel() is the opposite and will flatten the array r = np.arange(15).reshape(3,5) print(r) # Array of random values in this case a matrix with 2 rows and 3 columns v_array = np.random.rand(2,3) print(v_array) # Random 20 integer values in the range of 10 and 100 v_arr_int = np.random.randint(10, 100, 20) print(v_arr_int) # The index of the min value in the array print(v_arr_int.argmin()) # The index of the max value in the array print(v_arr_int.argmax()) # return elements of the array where value is > 30 print(v_arr_int[v_arr_int>30]) # Create a sample array v_array = np.arange(20) print(v_array) # Slice from index 5 to 10 print(v_array[5:10]) # Everything up to index 10 print(v_array[:10]) # All elements beyond index 10 print(v_array[10:]) # We can assign values which is called broadcast and then slice v_array[15:20]=-5 print(v_array) v_slice_array = v_array[15:20] print(v_slice_array) # Broadcast actually change the original array. # You can use v_array.copy() to keep the original values # Create a sample matrix v_matrix = np.array([[1,2,3,4], [5,6,7,8]]) print(v_matrix) # Get the row specified by the index print(v_matrix[0]) # Get just one value - the element from the last row and last column print(v_matrix[1,3]) # Return submatrices eg a slice which is anything beyond row 0 and after column 2 print(v_matrix[0:,2:]) # Nice one! # Create a sample with 0s # In[2]: m = np.zeros((2,4), dtype = int) print(m) # Modify existing a to add 5 m += 5 print(m) # Modify a to multiply by 4 m *= 4 print(m) print(m.sum()) print(m.min()) print(m.max()) # Sum of each column print(m.sum(axis = 0)) # Cumulative sum of each row b = np.arange(6).reshape(2,3) print(b) print(b.cumsum(axis = 1))

That’s all for now. Stay tuned.

Cheers,

Maria