## The performance impact of vectorized operations

Scripting language are often associated with being slow. As computers get faster, this may become less and less relevant, but as we get into the realm of big(ger) data, performance efficiency is always welcome. In this blog post we take a look at the impact of vectorized operations. As an example, we will create a Python script to check whether a postal code lies within a certain distance from a reference point and measure how what is the performance impact when we use vectorization.

Checking whether two points are within a certain distance, given their latitude and longitude, is usually very fast. Vectorization wouldn't help much when doing the operation only once. But what if you repeat that operation hundred of thousands of times? In a C-world, that would also be very fast. In Python? Not so much!

### First things first: distance between two points

As it would not make sense to improve the performance of a script without knowing what the script should do, let me introduce you with the Haversine formula to compute the distance between two points:

where \(\phi_{1,2}\) is the latitude of the two points and \(\lambda_{1,2}\) is their longitude (both in radians). If you fill in for \(r\) the earth (average) radius, you will get the distance between the two points in the same unit (so in \(km\) if you used \(km\), \(m\) if you used \(m\), and so forth.).

### A pythonic haversine formula

If you're scared by math but in love with Python fear not: we also present you with the ready to use Python code (WARNING: you need NumPy to get it working):

from numpy import sin, cos, pi, arcsin, sqrt def get_distance(lat, lon, pcode_lat, pcode_lon): """ Find the distance between `(lat,lon)` and the reference point `(pcode_lat,pcode_lon)`. """ RAD_FACTOR = pi / 180.0 # degrees to radians for trig functions lat_in_rad = lat * RAD_FACTOR lon_in_rad = lon * RAD_FACTOR pcode_lat_in_rad = pcode_lat * RAD_FACTOR pcode_lon_in_rad = pcode_lon * RAD_FACTOR delta_lon = lon_in_rad - pcode_lon_in_rad delta_lat = lat_in_rad - pcode_lat_in_rad # Next two lines is the Haversine formula inverse_angle = (sin(delta_lat / 2) ** 2 + cos(pcode_lat_in_rad) * cos(lat_in_rad) * sin(delta_lon / 2) ** 2) haversine_angle = 2 * arcsin(sqrt(inverse_angle)) EARTH_RADIUS = 6367 # kilometers return haversine_angle * EARTH_RADIUS

We could ask ourselves how fast this code is when we want to check 400000 postal codes (which is, approximately, the number of postal codes in the Netherlands). This is useful is we want to get a list of postal codes that fall within a certain radius from a reference postal code.

### Some performance measurements

To measure the performance of the above code we create an array containing some
random latitude and longitude pairs centered around `(52.3905927,4.8412508)`

(the coordinate pair of our office)

from numpy import random godatadriven = (52.3905927,4.8412508) points = random.randn(400000, 2) * 0.01 points[:, 0] = points[:, 0] + godatadriven[0] points[:, 1] = points[:, 1] + godatadriven[1]

The `points[:, 0]`

syntax is the
NumPy way of saying: select all fields from the first column, which in
our case would be the latitude of `points`

. To measure how much time is needed to get
the distances between `points`

and `godatadriven`

, we can use the ipython
`%timeit`

magic function, that measures how much time, averaged on multiple
runs, a script takes to execute:

def iterate_distance(): d = [] for p in points: d.append(get_distance(p[0], p[1], godatadriven[0], godatadriven[1])) %timeit iterate_distance() # runs in 3.53 seconds

If you're familiar with Python, the code above will give you the shivers, as it does not use list comprehension. Changing the code, alas, has only a marginal impact on performance:

%timeit [get_distance(p[0], p[1], godatadriven[0], godatadriven[1]) for p in points] # runs in 3.46 seconds

If you're building a real-time application, where a postal code check could be an intermediate computation, \(3.46s\) is an eternity. Luckily, we can do some magic with NumPy.

### Vectorization to the rescue

In our code, `points`

is a NumPy array. NumPy arrays can be easily manipulated
because, under the hood, they have been coded to be a really convenient way to
describe blocks of computer memory. The particular structure chosen by NumPy
allows its arrays to be straightforwardly modified by C or Fortran code. This is
fundamental here because, as we saw above, computing a similar operation 400000
times in Python is not exactly the most clever idea.

Vectorization is a process by which these element-wise operations are grouped together, allowing NumPy to perform them much more rapidly. In the code above, you already saw a couple of examples of vectorized operations:

points = random.randn(400000, 2) * 0.01 points[:, 0] = points[:, 0] + godatadriven[0]

Scaling `randn`

by 0.01 didn't require two for-loop for each of its elements:
we simply told NumPy to multiply the whole array by 0.01 and it was done. The
same when we added `godatadriven[0]`

to `points[:, 0]`

: we didn't have to write
a for-loop for each element because once again NumPy took care of it.

That said, here's the vectorized operation of getting the distance in NumPy:

%timeit get_distance(points[:, 0], points[:, 1], godatadriven[0], godatadriven[1]) # runs in 21.5 ms!

which is some 165 times faster!

### Putting it all together with Pandas

Now suppose you have a pandas dataframe with the complete list of postal codes

import pandas as pd zip = pd.read_csv("zip.csv", names=["zip", "lat", "lon"], index = "zip")

and that, given a postal code and a given radius, you want a list of postal codes that fall within the given radius. That can be accomplished with the following code:

def get_zips(postal_code, radius): """ Return the postal code that are within `radius` of `postal_code`. """ lat, lon = zip.ix[postal_code] zip_distance = get_distance(zip.lat, zip.lon, lat, lon) < radius return zip[zip_distance].index.values

Thanks to the vectorized nature of NumPy operations (including the comparison
with `radius`

on the previous chunk of code), the check is done in a matter of
milliseconds, making it a good fit for real time web applications. Not only
that: the resulting code is very clean and pythonic, requiring even less typing
than what a list comprehension would.

Note: if you are so inclined, a IPython notebook with the performance test is online at nbviewer. The times reported there may vary slightly compared with the ones in this post.