Writing C++ Extensions for Numpy Code
Introduction
There are a lot of times when I've started coding a project in Python,
only to hit a runtime bottleneck for some fundamental operation I need
in an algorithm. Python is fantastic for fast prototyping and
even production-ready machine learning, but performance really suffers if
your project needs primitives that aren't exposed by libraries like
Numpy, Scipy, or Pytorch. This is a tutorial on extending Python
by writing C++ extensions using Pybind11. No complicated buildfiles
required! We'll be using Ben Thompson's awesome cppimport extension, which you can install via pip
.
There are many other tutorials on Pybind11 online. This tutorial focuses specifically on interacting with Numpy / Scipy code. Does any of the following apply to you?
- You're writing an app mostly in Scipy / Numpy, but there's one part of your algorithm that doesn't map neatly onto the functions that these libraries provide.
- You have a large existing C++ codebase or library, and you want a set of Python bindings that enable users to call into these libraries.
- Your application requires shared-memory multithreading, which Python does not (at the time of writing) support.
If so, you're in the right place. I should mention that this tutorial is based on the following resources:
-
The official Pybind11 documentation, and in particular the page on Numpy arrays.
The Pybind11 documentation has a comprehensive set of example codes with better coding practices than what this tutorial exposes. That said, I hope that this tutorial can save you some googling.
Credit for the Python in the thumbnail for this post goes to Diego Delso, delso.photo, License CC BY-SA.
Getting Started
You can find the completed code for this tutorial at https://github.com/vbharadwaj-bk/python_cpp_tutorial. The instructions below assume that you start from scratch on either a Linux or Mac platform with Python3 and Numpy installed.
Fire up a terminal / text-editor and create
a directory Pybind11_tutorial
. Inside, create a pair of files my_extension.cpp
and calling_code.py
. Your directory structure should look like this:
Pybind11_tutorial (directory)
├── calling_code.py
└── my_extension.cpp
We will do all of our work inside the Pybind11_tutorial
directory.
The cppimport package is
by far the simplest (and coolest) way I've seen to start writing
C++ extensions quickly. Install the package via
pip install cppimport
Then enter the following into my_extension.cpp
:
//cppimport
/* my_extension.cpp */
#include <pybind11/pybind11.h>
#include <iostream>
void hello_world(int x) {
std::cout << "Hello world! Your number is " << x << std::endl;
}
void print_double(double x) {
std::cout << "The double you entered is " << x << std::endl;
}
PYBIND11_MODULE(my_extension, m) {
m.def("hello_world", &hello_world);
m.def("print_double", &print_double);
}
/*
<%
setup_pybind11(cfg)
%>
*/
That's all you need to create a working extension! Let's walk through the key lines:
-
//cppimport
: This first-line comment is necessary if you're using cppimport to compile your extensions. -
#include<pybind11/pybind11.h>
: This header exposes classes and functions necessary to interface with Python. -
PYBIND11_MODULE...
: After declaring two test functionshello_world
andprint_double
, we define a Python module namedmy_extension
and give it a local namem
. We expose two functions to the Python layer with them.def
statements on the next two lines. Note: the function names at the Python and C++ layers do not have to be the same. For example, you can rename thehello_world
function at the C++ layer by providing a different string argument in them.def
line, which can be very useful. -
setup_pybind11(cfg)
: More boilerplate related to cppimport. This is a Mako block; we'll cover what you can do with this in a little bit, but you can ignore this set of comments for now.
Tip: Using an Alternate Build System
You don't have to use cppimport to compile your extension, and it's
probably not the best choice for large, complex builds.
The extension .cpp
file can be compiled via CMake or any other
build system of your choice, provided that you supply the Pybind11
header paths and libraries.
Now let's write the corresponding code to call the extension at the Python layer.
Enter the following into
calling_code.py
:
# calling_code.py
import cppimport
import cppimport.import_hook
import my_extension
my_extension.hello_world(5)
my_extension.print_double(3.14)
Let's test it out. Open a terminal window and try the following:
[shell]% python calling_code.py
Hello world! Your number is 5
The double you entered is 3.14
You should see the output listed above on your console. Congratulations! You now have a working extension.
Tip
cppimport
will attempt to locate a C / C++ compiler for you.
If you need a specific compiler, you can set the shell
variables CC and CXX environment variables from the command
line, e.g. CC=gcc-13; CXX=g++-13
.
Behind the Scenes
Here's a little more information about what's going on.
When you use import cppimport.import_hook
, you're enabling
cppimport to compile extensions from .cpp
files as needed.
When the line import my_extension
executes, cppimport
notices that you have a .cpp
file in the current directory
with the same name as the module and the comment //cppimport
as the first line. The cppimport package then compiles the
Python module, and you can subsequently call any of the
methods defined.
One smart feature about cppimport is that the extension is only recompiled when a C++ dependency file changes. Later in this tutorial, we'll mention how to specify the list of dependencies explicilty for multi-file builds. For now, try executing the Python code a second time. You'll notice that the second execution is marginally faster, since cppimport doesn't bother recompiling the unchanged extension file.
Passing Arguments to C++
Variables of many common datatypes can be seamlessly passed between the C++ and Python layers . Here's an example of a function defined at the C++ layer that works exactly as you expect when called from Python.
...
uint64_t greet_and_return5(std::string name) {
std::cout << "Howdy, " << name << std::endl;
return 5;
}
...
This magic is brought to you by Pybind11. That said, passing scalar arguments back and forth isn't that interesting. Usually, we want to make arrays of data from the Python layer accessible to C++, or vice-versa. Let's say we want to write a C++ extension to compute the Frobenius norm of a matrix \(X \in \RR^{n \times m}\), defined as
\[\norm{X}_F = \sqrt{\sum_{i=1, j=1}^{n, m} X_{ij}^2}\]
Here's what the calling code looks like at the Python layer:
import numpy as np
import cppimport, cppimport.import_hook
import my_extension
rng = np.random.default_rng()
X = rng.standard_normal(size=(500, 5), dtype=np.float64)
f_norm_extension = my_extension.f_norm(X)
print(f"Extension Fro. norm is {f_norm_extension:.2f}")
f_norm_numpy = np.linalg.norm(X, "fro")
print(f"Numpy Fro. norm is {f_norm_numpy:.2f}")
Here's what that extension looks like at the C++ layer:
//cppimport
/* my_extension.cpp */
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <cmath>
using namespace std;
namespace py=pybind11;
double f_norm(py::array_t<double> mat_py) {
py::buffer_info info = mat_py.request();
double* ptr = static_cast<double*>(info.ptr);
uint64_t num_elements = info.shape[0] * info.shape[1];
double sq_fnorm = 0.0;
for(uint64_t i = 0; i < num_elements; i++) {
sq_fnorm += ptr[i] * ptr[i];
}
return sqrt(sq_fnorm);
}
PYBIND11_MODULE(my_extension, m) {
m.def("f_norm", &f_norm);
}
/*
<%
setup_pybind11(cfg)
%>
*/
Neat! Let's step through the code.
- We define an argument to the function
f_norm
of typepy::array_t<double>
, corresponding to the Numpy datatypenp.float64
. - The class
py::buffer_info
captures all relevant information about the Python buffer passed in. We extract the pointer to the data (which is stored in row-major format) as well as the shape. - We iterate through the input matrix (accessing it through the raw data pointer) to compute the squared Frobenius norm, and return it as a scalar value.
Two additional points: first, none of the data in the input
array X
is copied. The C++ layer just gains access to a
pointer for the head of the memory chunk allocated for the
data. Second, the class py::array_t
offers a convenient
[]
operator to access the contents of the array, which the
example above does not use. I personally find it comfortable
to work with the raw data pointer, since I'm not sure what
kind of overhead the []
operator incurs. Libraries like
BLAS / LAPACK implementations also typically require raw
pointer inputs, so it's useful to know how to extract them
from the Python buffers.
Tip: Move Extension Files to a Separate Folder
If you want to keep your codebase clean, try moving
the .cpp
extension files to a subfolder of your main
directory. If my_extension.cpp
is located in folder
cpp_ext
relative to the current directory, the line
import cpp_ext.my_extension
will allow you to use
the extension correctly.
Mutating and Returning Data from C++
We can also mutate data in Python buffers directly.
Suppose we want to add to add 3 to all elements of a
matrix X
:
import numpy as np
import cppimport, cppimport.import_hook
import my_extension
X = np.ones(shape=(2, 2), dtype=np.float64)
print(X)
my_extension.add3(X)
print(X)
Here's the extension to do that:
//cppimport
/* my_extension.cpp */
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <cmath>
using namespace std;
namespace py=pybind11;
void add3(py::array_t<double> mat_py) {
py::buffer_info info = mat_py.request();
double* ptr = static_cast<double*>(info.ptr);
uint64_t num_elements = info.shape[0] * info.shape[1];
for(uint64_t i = 0; i < num_elements; i++) {
ptr[i] += 3.0;
}
}
PYBIND11_MODULE(my_extension, m) {
m.def("add3", &add3);
}
/*
<%
setup_pybind11(cfg)
%>
*/
Danger: Auto-conversions create unexpected behavior
Suppose your Python extension takes an argument of type
py::array_t<double>
, but you pass a Numpy array of
type np.float32
. This actually works fine: Pybind11
will automatically convert the single-precision float
array to a double precision array (incurring some overhead
in the process). But if you attempt
to MUTATE values at the C++ layer, these changes will
not reflect in the original array. Auto-conversion can
be disabled in Pybind11.
Finally, you can also initialize and return Numpy arrays from the C++
layer; here's an revsision of the add3
function that returns a new buffer
instead of mutating in place:
//cppimport
/* my_extension.cpp */
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <cmath>
using namespace std;
namespace py=pybind11;
py::array_t<double> add3_returncopy(py::array_t<double> mat_py) {
py::buffer_info info_in = mat_py.request();
double* ptr_in = static_cast<double*>(info_in.ptr);
uint64_t num_elements = info_in.shape[0] * info_in.shape[1];
auto result = py::array_t<double>(info_in.shape);
py::buffer_info info_out = result.request();
double* ptr_out = static_cast<double*>(info_out.ptr);
for(uint64_t i = 0; i < num_elements; i++) {
ptr_out[i] = ptr_in[i] + 3.0;
}
return result;
}
PYBIND11_MODULE(my_extension, m) {
m.def("add3_returncopy", &add3_returncopy);
}
/*
<%
setup_pybind11(cfg)
%>
*/
Exercise: CSR Matrix-Vector Multiplication
Let's say we want to write a CSR-based matrix-vector multiplication function.
For this tutorial, we're going to use the Matrix-Market (.mtx) format of bcspwr02
, a small power system graph that
you can download from the Suitesparse Matrix Collection.
Scipy contains a blazingly-fast multithreaded routine to load the adjacency matrix \(A\) of the graph:
import numpy as np
import scipy as sp
path = '/Users/Vivek/Desktop/bcspwr02.mtx'
adjacency = None
with open(path, 'rb') as f:
adjacency = sp.io.mmread(f).tocsr()
After executing this code, adjacency
is a compressed-sparse-row (CSR) Scipy sparse matrix
containing the graph adjacency matrix (Wikipedia has a nice explanation of the format). There are three fields
of the adjacency
class that we care about:
-
adjacency.data
: A 1D Numpy array of typefloat64
with length \(\textrm{nnz}(A)\). The valueadjacency.data[i]
gives the weight of nonzero \(i\) from \(A\), where the nonzeros are ordered first by row index, then by column index. -
adjacency.indices
: A 1D Numpy array of typeint32
with length \(\textrm{nnz}(A)\). The valueadjacency.indices[i]
gives the column index of nonzero \(i\) from \(A\). -
adjacency.indptr
: A 1D numpy array of typeint32
with length \(\abs{V}+1\). The nonzeros in the rangeadjacency.indptr[i]
(inclusive) toadjacency.indptr[i+1]
(exclusive) all belong to row \(A_{i:}\).
Your task: write a multithreaded, CSR-based matrix-vector multiply code in C++ and expose a Python binding for it.