Lately, I’ve been studying statistics and data analysis. I have beforehand knowledge of the Python programming language, so when looking at the two most widely used programming tools applied in this domain, Pandas and R, I chose Pandas – a software library for Python. Pandas uses another library in the construction of it’s data structures (Series and DataFrames), called NumPy. This post goes over some of what can be done with NumPy.

## The ndarray

The fundamental data structure in NumPy is an N-dimensional array named `ndarray`

. The `zeros`

, `empty`

, and `array`

methods are used to create an ndarray.

# Import the NumPy library. import numpy as np arr = np.zeros((2,3), dtype=np.int64) print(arr)

[[0 0 0] [0 0 0]]

*It’s conventional to import numpy as np*

Notice how the zeros method creates an ndarray with all zeros. The first (required) parameter of the zeros method is `shape`

, meaning the shaped of the ndarray. In this case the shape `(2,3)`

tells NumPy to create an ndarry with 2 rows and 3 columns.

The `dtype`

keyword argument (not required, but it brings up an important concept) tells NumPy what we want the data type of the elements inside of the array to be.

**Note**: the elements inside of an ndarray are all of the same type, unlike lists which can be of mixed type (e.g. a list containing strings and integers).

We can change the data type of the elements with ndarray’s astype method. I’ll demonstrate…

arr = np.empty((2,3), dtype=np.float_) print(arr)

[[ 2.12202817e-314 2.12199579e-314 1.61897901e-318] [ 1.56307538e-311 0.00000000e+000 0.00000000e+000]]

The `empty`

method creates an ndarray (of the given shape) with uninitialized values – i.e. garbage values. Notice how infinitesimal the non-zero numbers are. Let’s cast them to integers.

arr2 = arr.astype(np.int64) print(arr2)

[[0 0 0] [0 0 0]]

The `astype`

method created a new ndarray by copying the values of the source ndarray casting its values from floating point (a.k.a double precision) into integers, and with the loss of precision the numbers become zero.

The `astype`

method is useful when converting string values to numerical values to perform calculations.

arr = np.array(['1','2','3'], dtype=np.string_) arr2 = arr.astype(np.int64) print(arr2)

[1 2 3]

Here we used the `array`

method to first create an ndarray of the characters 1, 2, and 3. Then we were able to create a new array with those characters translated into integers.

As the first parameter of `np.array`

method we used a list, but it can be any array like object, such as a list or tuple. To create an ndarray of the shape (2,3) like we did for `zeros`

and `empty`

, simply pass in an array like object with nested array like objects.

arr = np.array([(1,2,3), (4,5,6)]) print(arr)

[[1 2 3] [4 5 6]]

## Array Arithmetic

import random as rand # Create a list of 10 integers between 1 and 100. nums = rand.sample(range(1, 101), 10) # Convert the list to a NumPy array. nums = np.array(nums) print(nums)

[42 40 67 65 92 84 22 73 49 98]

We can easily perform arithmetic operations on the elements in a NumPy array. For example, let’s increase each element (integer) in the array by a factor of 10.

nums = nums * 10 print(nums)

[420 400 670 650 920 840 220 730 490 980]

Pretty neat, eh!? We can also perform arithmetic with two ndarrays as operands.

diff = nums - nums print(diff)

[0 0 0 0 0 0 0 0 0 0]

## Selecting ndarray Data

We can select data in an ndarry with **indexing** and **slicing**. Using indexing, we can retrieve a specific value; with slicing, we can fetch multiple values from the source ndarray.

Still working with our `nums`

ndarray, let’s grab the first element.

print(nums[0])

420

ndarray indexing, like regular list indexing, is zero based – i.e. the first element starts at 0, the second at 1, etc.. Let’s say we wanted to select the 5th, 6th, and 7th elements. We can use a slice for that.

print(nums[4:8])

[920 840 220]

The slice starts at the index value before the colon and returns it and the values after until, but not including, the value at the index after the colon. Now let’s grab data from an ndarray of a different shape.

nums2 = np.array([nums[:5], nums[5:]]) print(nums2) print(nums2.shape)

[[420 400 670 650 920] [840 220 730 490 980]] (2, 5)

Notice the shape of the ndarray `(2, 5)`

, 2 rows and 5 columns. We can select the first row.

print(nums2[0])

[420 400 670 650 920]

We could get the second row with `nums2[1]`

, and if there was a third `nums2[2]`

and so on. Let’s fetch the value in the third column of the second row.

print(nums2[1,2])

730

With the comma syntax `[x,y]`

, the first position is the row and the second the column – in a two dimensional ndarray. We can combine this syntax with slicing.

print(nums2[1,2:])

[730 490 980]

To demonstrate how indexing and slicing works on 3-dimensional ndarrays we’ll use slices of the nums array and the np.arange method to create 2 more ndarrays.

nums3 = np.array([ [nums[:5], nums[5:]], [np.arange(0,5), np.arange(5,10)] ]) print(nums3) print(nums3.shape)

[[[420 400 670 650 920] [840 220 730 490 980]] [[ 0 1 2 3 4] [ 5 6 7 8 9]]] (2, 2, 5)

To select the row with numbers 0 – 4, we first select the highest dimension of data and then on to lower dimensions.

print(nums3[1,0])

[0 1 2 3 4]

So if we wanted the last two elements from the second row in the first ndarray of shape 2,5 we’d do the following.

print(nums3[0,1,-2:])

[490 980]

This is equivalent to `nums3[0,1,3:]`

, but with the negative index slice (i.e. `[-2:]`

) we can say start at 2 before the end and go to until the end.

### Boolean Indexing

Not only can we select data from an ndarray using the index values, we also have the power to select based on comparisons. In other words, if we want data from an ndarray the only matches a certain condition, we can use a conditional statement in the square brackets to retrieve only that data.

First, I’ll create an ndarray using NumPy’s random module, populating the array with random data.

rand_data = np.random.randn(4,4) print(rand_data)

[[ 1.41605716 1.18998145 -0.79101034 1.67810952] [-0.66823165 0.35587036 -0.24547048 0.11480275] [-0.03796473 -0.09380624 0.071222 -1.73191764] [ 0.15127885 -1.15335985 1.97041638 1.7231374 ]]

Like we did with arithmetic, we can use an ndarray as an operand in a comparison operation.

print(rand_data > 0)

[[ True True False True] [False True False True] [False False True False] [ True False True True]]

The resulting ndarray, in this case, tells us where the positive values are located in the `rand_data`

ndarray. This can be used to extract just those values.

pos_data = rand_data[rand_data > 0] print(pos_data)

[1.41605716 1.18998145 1.67810952 0.35587036 0.11480275 0.071222 0.15127885 1.97041638 1.7231374 ]

There’s a lot more to NumPy and ndarrays, but this is enough of an introduction to get started.