NumPy Tutorial : Step by Step Guide

NumPy (acronym for 'Numerical Python' or 'Numeric Python') is one of the most essential package for speedy mathematical computation on arrays and matrices in Python. It is also quite useful while dealing with multi-dimensional data. It is a blessing for integrating C, C++ and FORTRAN tools. It also provides numerous functions for Fourier transform (FT) and linear algebra.

Python : Numpy Tutorial

Why NumPy instead of lists?

One might think of why one should prefer arrays in NumPy instead we can create lists having the same data type. If this statement also rings a bell then the following reasons may convince you:

Numpy arrays have contiguous memory allocation. Thus if a same array stored as list will require more space as compared to arrays.
They are more speedy to work with and hence are more efficient than the lists.
They are more convenient to deal with.

NumPy vs. Pandas

Pandas is built on top of NumPy. In other words, Numpy is required by pandas to make it work. So Pandas is not an alternative to Numpy. Instead pandas offers additional method or provides more streamlined way of working with numerical and tabular data in Python.

Importing numpy

Firstly you need to import the numpy library. Importing numpy can be done by running the following command:

import numpy as np

It is a general approach to import numpy with alias as 'np'. If alias is not provided then to access the functions from numpy we shall write numpy.function. To make it easier an alias 'np' is introduced so we can write np.function. Some of the common functions of numpy are listed below -

Functions	Tasks
array	Create numpy array
ndim	Dimension of the array
shape	Size of the array (Number of rows and Columns)
size	Total number of elements in the array
dtype	Type of elements in the array, i.e., int64, character
reshape	Reshapes the array without changing the original shape
resize	Reshapes the array. Also change the original shape
arrange	Create sequence of numbers in array
Itemsize	Size in bytes of each item
diag	Create a diagonal matrix
vstack	Stacking vertically
hstack	Stacking horizontally

1D array

Using numpy an array is created by using np.array:

a = np.array([15,25,14,78,96])
a
print(a)

a
Output: array([15, 25, 14, 78, 96])

print(a)
Output: [15 25 14 78 96]

Notice that in np.array square brackets are present. Absence of square bracket introduces an error. To print the array we can use print(a).

Changing the datatype

np.array( ) has an additional parameter of dtype through which one can define whether the elements are integers or floating points or complex numbers.

a.dtype
a = np.array([15,25,14,78,96],dtype = "float")
a
a.dtype

Initially datatype of 'a' was 'int32' which on modifying becomes 'float64'.

Creating the sequence of numbers

If you want to create a sequence of numbers then using np.arange we can get our sequence. To get the sequence of numbers from 20 to 29 we run the following command.

b = np.arange(start = 20,stop = 30)
b

array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

In np.arange the end point is always excluded.

Create an Arithmetic Progression

np.arange provides an option of step which defines the difference between 2 consecutive numbers. If step is not provided then it takes the value 1 by default.

Suppose we want to create an arithmetic progression with initial term 20 and common difference 2, upto 30; 30 being excluded.

c = np.arange(20,30,2) #30 is excluded.
c

array([20, 22, 24, 26, 28])

It is to be taken care that in np.arange( ) the stop argument is always excluded.

Reshaping the arrays

To reshape the array we can use reshape( ).

f = np.arange(101,113)
f.reshape(3,4)
f

 array([101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112])

Note that reshape() does not alter the shape of the original array. Thus to modify the original array we can use resize( )

f.resize(3,4)
f

array([[101, 102, 103, 104],
       [105, 106, 107, 108],
       [109, 110, 111, 112]])

If a dimension is given as -1 in a reshaping, the other dimensions are automatically calculated provided that the given dimension is a multiple of total number of elements in the array.

f.reshape(3,-1)

array([[101, 102, 103, 104],
       [105, 106, 107, 108],
       [109, 110, 111, 112]])

In the above code we only directed that we will have 3 rows. Python automatically calculates the number of elements in other dimension i.e. 4 columns.

2D arrays

A 2D array in numpy can be created in the following manner:

g = np.array([(10,20,30),(40,50,60)])
#Alternatively
g = np.array([[10,20,30],[40,50,60]])
g

The dimension, total number of elements and shape can be ascertained by ndim, size and shape respectively:

g.ndim
g.size
g.shape

g.ndim
Output: 2

g.size
Output: 6

g.shape
Output: (2, 3)

Creating some usual matrices.

numpy provides the utility to create some usual matrices which are commonly used for linear algebra.

To create a matrix of all zeros of 2 rows and 4 columns we can use np.zeros( ):

np.zeros( (2,4) )

array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

Here the dtype can also be specified. For a zero matrix the default dtype is 'float'. To change it to integer we write 'dtype = np.int16'

np.zeros([2,4],dtype=np.int16)

array([[0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int16)

To get a matrix of all random numbers from 0 to 1 we write np.empty.

np.empty( (2,3) )

array([[  2.16443571e-312,   2.20687562e-312,   2.24931554e-312],
       [  2.29175545e-312,   2.33419537e-312,   2.37663529e-312]])

Note: The results may vary everytime you run np.empty.
To create a matrix of unity we write np.ones( ). We can create a 3 * 3 matrix of all ones by:

np.ones([3,3])

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

To create a diagonal matrix we can write np.diag( ). To create a diagonal matrix where the diagonal elements are 14,15,16 and 17 we write:

np.diag([14,15,16,17])

array([[14,  0,  0,  0],
       [ 0, 15,  0,  0],
       [ 0,  0, 16,  0],
       [ 0,  0,  0, 17]])

To create an identity matrix we can use np.eye( ) .

np.eye(5,dtype = "int")

array([[1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0],
       [0, 0, 0, 0, 1]])

By default the datatype in np.eye( ) is 'float' thus we write dtype = "int" to convert it to integers.

Reshaping 2D arrays:

To get a flattened 1D array we can use ravel( )

g = np.array([(10,20,30),(40,50,60)])
g.ravel()

 array([10, 20, 30, 40, 50, 60])

To change the shape of 2D array we can use reshape. Writing -1 will calculate the other dimension automatically and does not modify the original array.

g.reshape(3,-1) # returns the array with a modified shape
#It does not modify the original array
g.shape

 (2, 3)

Similar to 1D arrays, using resize( ) will modify the shape in the original array.

g.resize((3,2))
g #resize modifies the original array

array([[10, 20],
       [30, 40],
       [50, 60]])

Time for some matrix algebra.

Let us create some arrays A,b and B and they will be used for this section:

A = np.array([[2,0,1],[4,3,8],[7,6,9]])
b = np.array([1,101,14])
B = np.array([[10,20,30],[40,50,60],[70,80,90]])

In order to get the transpose, trace and inverse we use A.transpose( ) , np.trace( ) and np.linalg.inv( ) respectively.

A.T #transpose
A.transpose() #transpose
np.trace(A) # trace
np.linalg.inv(A) #Inverse

A.transpose()  #transpose
Output: 
array([[2, 4, 7],
       [0, 3, 6],
       [1, 8, 9]])

np.trace(A)  # trace
Output: 14

np.linalg.inv(A)  #Inverse
Output: 
array([[ 0.53846154, -0.15384615,  0.07692308],
       [-0.51282051, -0.28205128,  0.30769231],
       [-0.07692308,  0.30769231, -0.15384615]])

Note that transpose does not modify the original array.

Matrix addition and subtraction can be done in the usual way:

A+B
A-B

A+B
Output: 
array([[12, 20, 31],
       [44, 53, 68],
       [77, 86, 99]])

A-B
Output: 
array([[ -8, -20, -29],
       [-36, -47, -52],
       [-63, -74, -81]])

Matrix multiplication of A and B can be accomplished by A.dot(B). Where A will be the 1st matrix on the left hand side and B will be the second matrix on the right side.

A.dot(B)

array([[  90,  120,  150],
       [ 720,  870, 1020],
       [ 940, 1160, 1380]])

To solve the system of linear equations: Ax = b we use np.linalg.solve( )

np.linalg.solve(A,b)

array([-13.92307692, -24.69230769,  28.84615385])

The eigen values and eigen vectors can be calculated using np.linalg.eig( )

np.linalg.eig(A)

(array([ 14.0874236 ,   1.62072127,  -1.70814487]),
 array([[-0.06599631, -0.78226966, -0.14996331],
        [-0.59939873,  0.54774477, -0.81748379],
        [-0.7977253 ,  0.29669824,  0.55608566]]))

The first row are the various eigen values and the second matrix denotes the matrix of eigen vectors where each column is the eigen vector to the corresponding eigen value.

Some Mathematics functions

We can have various trigonometric functions like sin, cosine etc. using numpy:

B = np.array([[0,-20,36],[40,50,1]])
np.sin(B)

array([[ 0.        , -0.91294525, -0.99177885],
       [ 0.74511316, -0.26237485,  0.84147098]])

The resultant is the matrix of all sin( ) elements.
In order to get the exponents we use **

B**2

array([[   0,  400, 1296],
       [1600, 2500,    1]], dtype=int32)

We get the matrix of the square of all elements of B.
In order to obtain if a condition is satisfied by the elements of a matrix we need to write the criteria. For instance, to check if the elements of B are more than 25 we write:

B>25

array([[False, False,  True],
       [ True,  True, False]], dtype=bool)

We get a matrix of Booleans where True indicates that the corresponding element is greater than 25 and False indicates that the condition is not satisfied.
In a similar manner np.absolute, np.sqrt and np.exp return the matrices of absolute numbers, square roots and exponentials respectively.

np.absolute(B)
np.sqrt(B)
np.exp(B)

Now we consider a matrix A of shape 3*3:

A = np.arange(1,10).reshape(3,3)
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

To find the sum, minimum, maximum, mean, standard deviation and variance respectively we use the following commands:

A.sum()
A.min()
A.max()
A.mean()
A.std() #Standard deviation
A.var() #Variance

A.sum()
Output: 45

A.min()
Output: 1

A.max()
Output: 9

A.mean()
Output: 5.0

A.std()   #Standard deviation
Output: 2.5819888974716112

A.var()
Output: 6.666666666666667

In order to obtain the index of the minimum and maximum elements we use argmin( ) and argmax( ) respectively.

A.argmin()
A.argmax()

A.argmin()
Output: 0

A.argmax()
Output: 8

If we wish to find the above statistics for each row or column then we need to specify the axis:

A.sum(axis=0)
A.mean(axis = 0)
A.std(axis = 0)
A.argmin(axis = 0)

A.sum(axis=0)                 # sum of each column, it will move in downward direction
Output: array([12, 15, 18])

A.mean(axis = 0)
Output: array([ 4.,  5.,  6.])

A.std(axis = 0)
Output: array([ 2.44948974,  2.44948974,  2.44948974])

A.argmin(axis = 0)
Output: array([0, 0, 0], dtype=int64)

By defining axis = 0, calculations will move in downward direction i.e. it will give the statistics for each column.
To find the min and index of maximum element fow each row, we need to move in rightwise direction so we write axis = 1:

A.min(axis=1)
A.argmax(axis = 1)

A.min(axis=1)                  # min of each row, it will move in rightwise direction
Output: array([1, 4, 7])

A.argmax(axis = 1)
Output: array([2, 2, 2], dtype=int64)

To find the cumulative sum along each row we use cumsum( )

A.cumsum(axis=1)

array([[ 1,  3,  6],
       [ 4,  9, 15],
       [ 7, 15, 24]], dtype=int32)

Creating 3D arrays

Numpy also provides the facility to create 3D arrays. A 3D array can be created as:

X = np.array( [[[ 1, 2,3],
[ 4, 5, 6]],
[[7,8,9],
[10,11,12]]])
X.shape
X.ndim
X.size

X contains two 2D arrays Thus the shape is 2,2,3. Totol number of elements is 12.
To calculate the sum along a particular axis we use the axis parameter as follows:

X.sum(axis = 0)
X.sum(axis = 1)
X.sum(axis = 2)

X.sum(axis = 0)
Output: 
array([[ 8, 10, 12],
       [14, 16, 18]])

X.sum(axis = 1)
Output: 
array([[ 5,  7,  9],
       [17, 19, 21]])

X.sum(axis = 2)
Output: 
array([[ 6, 15],
       [24, 33]])

axis = 0 returns the sum of the corresponding elements of each 2D array. axis = 1 returns the sum of elements in each column in each matrix while axis = 2 returns the sum of each row in each matrix.

X.ravel()

 array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

ravel( ) writes all the elements in a single array.

Indexing in arrays

It is important to note that Python indexing starts from 0. The syntax of indexing is as follows -

x[start:end] : Elements in array x start through the end (but the end is excluded)
x[start:] : Elements start through the end
x[:end] : Elements from the beginning through the end (but the end is excluded)

If we want to extract 3rd element we write the index as 2 as it starts from 0.

x = np.arange(10)
x[2]
x[2:5]

 x[2]
Output: 2

x[2:5]
Output: array([2, 3, 4])

Note that in x[2:5] elements starting from 2nd index upto 5th index(exclusive) are selected.
If we want to change the value of all the elements from starting upto index 7,excluding 7, with a step of 3 as 123 we write:

x[:7:3] = 123
x

 array([123,   1,   2, 123,   4,   5, 123,   7,   8,   9])

To reverse a given array we write:

x = np.arange(10)
x[ : :-1] # reversed x

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

Note that the above command does not modify the original array.
Consider a 3D array:

X = np.array( [[[ 1, 2,3],
[ 4, 5, 6]],
[[7,8,9],
[10,11,12]]])

To extract the 2nd matrix we write:

X[1,...] # same as X[1,:,:] or X[1]

array([[ 7,  8,  9],
       [10, 11, 12]])

Remember python indexing starts from 0 that is why we wrote 1 to extract the2nd 2D array.
To extract the first element from all the rows we write:

X[...,0] # same as X[:,:,0]

array([[ 1,  4],
       [ 7, 10]])

Indexing with Arrays of Indices

Consider a 1D array.

x = np.arange(11,35,2)
x

array([11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33])

We form a 1D array i which subsets the elements of x as follows:

i = np.array( [0,1,5,3,7,9 ] )
x[i]

array([11, 13, 21, 17, 25, 29])

In a similar manner we create a 2D array j of indices to subset x.

j = np.array( [ [ 0, 1], [ 6, 2 ] ] )
x[j]

array([[11, 13],
       [23, 15]])

Similarly we can create both i and j as 2D arrays of indices for x

x = np.arange(15).reshape(3,5)
x
i = np.array( [ [0,1], # indices for the first dim
[2,0] ] )
j = np.array( [ [1,1], # indices for the second dim
[2,0] ] )

To get the ith index in row and jth index for columns we write:

x[i,j] # i and j must have equal shape

array([[ 1,  6],
       [12,  0]])

To extract ith index from 3rd column we write:

x[i,2]

array([[ 2,  7],
       [12,  2]])

For each row if we want to find the jth index we write:

x[:,j]

array([[[ 1,  1],
        [ 2,  0]],

       [[ 6,  6],
        [ 7,  5]],

       [[11, 11],
        [12, 10]]])

Fixing 1st row and jth index,fixing 2nd row jth index, fixing 3rd row and jth index.

You can also use indexing with arrays to assign the values:

x = np.arange(10)
x
x[[4,5,8,1,2]] = 0
x

array([0, 0, 0, 3, 0, 0, 6, 7, 0, 9])

0 is assigned to 4th, 5th, 8th, 1st and 2nd indices of x.
When the list of indices contains repetitions then it assigns the last value to that index:

x = np.arange(10)
x
x[[4,4,2,3]] = [100,200,300,400]
x

array([  0,   1, 300, 400, 200,   5,   6,   7,   8,   9])

Notice that for the 5th element(i.e. 4th index) the value assigned is 200, not 100.
Caution: If one is using += operator on repeated indices then it carries out the operator only once on repeated indices.

x = np.arange(10)
x[[1,1,1,7,7]]+=1
x

 array([0, 2, 2, 3, 4, 5, 6, 8, 8, 9])

Although index 1 and 7 are repeated but they are incremented only once.

Indexing with Boolean arrays

We create a 2D array and store our condition in b. If we the condition is true it results in True otherwise False.

a = np.arange(12).reshape(3,4)
b = a > 4
b

array([[False, False, False, False],
       [False,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)

Note that 'b' is a Boolean with same shape as that of 'a'.
To select the elements from 'a' which adhere to condition 'b' we write:

a[b]

array([ 5,  6,  7,  8,  9, 10, 11])

Now 'a' becomes a 1D array with the selected elements
This property can be very useful in assignments:

a[b] = 0
a

array([[0, 1, 2, 3],
       [4, 0, 0, 0],
       [0, 0, 0, 0]])

All elements of 'a' higher than 4 become 0
As done in integer indexing we can use indexing via Booleans:
Let x be the original matrix and 'y' and 'z' be the arrays of Booleans to select the rows and columns.

x = np.arange(15).reshape(3,5)
y = np.array([True,True,False]) # first dim selection
z = np.array([True,True,False,True,False]) # second dim selection

We write the x[y,:] which will select only those rows where y is True.

x[y,:] # selecting rows
x[y] # same thing

Writing x[:,z] will select only those columns where z is True.

x[:,z] # selecting columns

x[y,:]                                   # selecting rows
Output: 
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

x[y]                                     # same thing
Output: 
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

x[:,z]                                   # selecting columns
Output: 
array([[ 0,  1,  3],
       [ 5,  6,  8],
       [10, 11, 13]])

Stacking various arrays

Let us consider 2 arrays A and B:

A = np.array([[10,20,30],[40,50,60]])
B = np.array([[100,200,300],[400,500,600]])

To join them vertically we use np.vstack( ).

np.vstack((A,B)) #Stacking vertically

array([[ 10,  20,  30],
       [ 40,  50,  60],
       [100, 200, 300],
       [400, 500, 600]])

To join them horizontally we use np.hstack( ).

np.hstack((A,B)) #Stacking horizontally

array([[ 10,  20,  30, 100, 200, 300],
       [ 40,  50,  60, 400, 500, 600]])

newaxis helps in transforming a 1D row vector to a 1D column vector.

from numpy import newaxis
a = np.array([4.,1.])
b = np.array([2.,8.])
a[:,newaxis]

array([[ 4.],
       [ 1.]])

#The function np.column_stack( ) stacks 1D arrays as columns into a 2D array. It is equivalent to hstack only for 1D arrays:

np.column_stack((a[:,newaxis],b[:,newaxis]))
np.hstack((a[:,newaxis],b[:,newaxis])) # same as column_stack

np.column_stack((a[:,newaxis],b[:,newaxis]))
Output: 
array([[ 4.,  2.],
       [ 1.,  8.]])

np.hstack((a[:,newaxis],b[:,newaxis]))
Output: 
array([[ 4.,  2.],
       [ 1.,  8.]])

Splitting the arrays.

Consider an array 'z' of 15 elements:

z = np.arange(1,16)

Using np.hsplit( ) one can split the arrays

np.hsplit(z,5) # Split a into 5 arrays

[array([1, 2, 3]),
 array([4, 5, 6]),
 array([7, 8, 9]),
 array([10, 11, 12]),
 array([13, 14, 15])]

It splits 'z' into 5 arrays of eqaual length.
On passing 2 elements we get:

np.hsplit(z,(3,5))

[array([1, 2, 3]),
 array([4, 5]),
 array([ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15])]

It splits 'z' after the third and the fifth element.
For 2D arrays np.hsplit( ) works as follows:

A = np.arange(1,31).reshape(3,10)
A
np.hsplit(A,5) # Split a into 5 arrays

[array([[ 1,  2],
        [11, 12],
        [21, 22]]), array([[ 3,  4],
        [13, 14],
        [23, 24]]), array([[ 5,  6],
        [15, 16],
        [25, 26]]), array([[ 7,  8],
        [17, 18],
        [27, 28]]), array([[ 9, 10],
        [19, 20],
        [29, 30]])]

In the above command A gets split into 5 arrays of same shape.
To split after the third and the fifth column we write:

np.hsplit(A,(3,5))

[array([[ 1,  2,  3],
        [11, 12, 13],
        [21, 22, 23]]), array([[ 4,  5],
        [14, 15],
        [24, 25]]), array([[ 6,  7,  8,  9, 10],
        [16, 17, 18, 19, 20],
        [26, 27, 28, 29, 30]])]

Copying

Consider an array x

x = np.arange(1,16)

We assign y as x and then say 'y is x'

y = x
y is x

Let us change the shape of y

y.shape = 3,5

Note that it alters the shape of x

x.shape

(3, 5)

Creating a view of the data:

Let us store z as a view of x by:

z = x.view()
z is x

False

Thus z is not x.
Changing the shape of z

z.shape = 5,3

Creating a view does not alter the shape of x

x.shape

(3, 5)

Changing an element in z

z[0,0] = 1234

Note that the value in x also get alters:

x

array([[1234,    2,    3,    4,    5],
       [   6,    7,    8,    9,   10],
       [  11,   12,   13,   14,   15]])

Thus changes in the display does not hamper the original data but changes in values of view will affect the original data.

Creating a copy of the data:
Now let us create z as a copy of x:

z = x.copy()

Note that z is not x

z is x

Changing the value in z

z[0,0] = 9999

No alterations are made in x.

x

array([[1234,    2,    3,    4,    5],
       [   6,    7,    8,    9,   10],
       [  11,   12,   13,   14,   15]])

Python sometimes may give 'setting with copy' warning because it is unable to recognize whether the new dataframe or array (created as a subset of another dataframe or array) is a view or a copy. Thus in such situations user needs to specify whether it is a copy or a view otherwise Python may hamper the results.