Cython notes and tips

Monday. October 07, 2019

Cython Python

This post is a collection of notes and tips about Cython, that I learned or discovered while developping for scikit-learn. I will describe my (basic) workflow, and I will describe what I learned about avoiding Python interactions, memory views, the GIL and other fun stuff.

This is neither a tutorial, nor an introduction to Cython. This is rather a list of things I wish I knew before writing a few thousands lines of Cython code. I hope it can be useful to others!

Workflow

My Cython workflow is pretty basic. Write code, compile, fix compilation errors, then run the tests… Nothing new here. There are a few Cython-specific steps though. Since Cython is pretty magical, a small change in your code can induce significant drops (or gains) in performance. I strongly suggest to:

Always benchmark your code, extensively.
Always track down Python interactions. Yellow is your enemy (see below).

About debugging: there is, AFAIK, no debugger for Cython. You can of course still use a debugger for the generated C-code, but that’s not really convenient. Personally, I use the good old print() statements with extensive unit tests. Tip from Jérémie in the comments: inside a nogil section (where print() statements are forbidden), you can temporarily re-acquire the gil back with with gil: and use print().

Python interactions and how to avoid them

Cython generates C code that conceptually operates in 2 different modes: either in “Python mode” or in “pure C mode”.

Python mode is when the code manipulates Python objects, through the Python/C API: for example when you are using a dict, or a numpy array. A call to this Python/C API is called a Python interaction.

Pure C mode is when the code only manipulates pure C types (things that are cdef‘ed) and does not make any use of the Python/C API. For example if you want to manipulate a numpy array in pure C mode, use a memory view instead (see below).

My simplistic proxy is that Python interaction = slow = bad, while pure C mode = fast = good. When writing Cython, you want to avoid Python interactions as much as possible, especially deep inside for loops. The command cython -a file.pyx will output a html file of the generated C code, where each line has a different shade of yellow: more yellow means more Python interactions.

When I write Cython code, I always use cython -a on every file. My goal is that all the file is white (i.e. no interaction), although interactions are unavoidable in some places: typically when arguments are passed in, or when returning Python objects.

Some tips to avoid Python interactions:

Use cdef as much as you can
Use memory views to manipulate numpy arrays. If your function accepts a numpy array, declare a cdef‘ed memory view to manipulate it.
A good way to make sure that you’re not risking any interaction is to write code within a with nogil context manager (see below).
For some reason, declaring a variable locally in a function may make some interactions disappear. Typically, I noticed that using a local variable as an alias to the attribute of a cdef‘ed class will remove Python interactions (cdef int local_var = self.var). Don’t follow these tips blindly though. In some cases it will make your code faster, but sometimes it won’t. In the end, always let your benchmarks decide.

Multi-threading and the GIL

The GIL is this annoying thing that prevents programs run with CPython to do multi-threaded parallelism. You can do multi-processing (e.g. with joblib), but unless the GIL is “released”, multi-threading isn’t possible.

Cython allows you to release the GIL. That means that you can do multi-threading in at least 2 ways:

Directly in Cython, using OpenMP with prange.
Using e.g. joblib with a multi-threading backend (the parts of your code that will be parallelized are the parts that release the GIL)

We use both in scikit-learn.

To release the GIL in Cython, you just need to use the with nogil: context manager (you can also do that directly when using prange).

Your code inside of a nogil statement cannot have any Python interaction. Any variable you use will have to be cdef‘ed, and you won’t be allowed to use numpy arrays since these are objects: use views instead! If you call a function, it needs to be labeled as a nogil function, like so:

cdef void my_func() nogil:
    # ...

This is carefully explained in the docs, but I didn’t really get it at first: labelling a function as nogil does not release the GIL. It simply tells Cython that the function may be called without the GIL (Cython uses this information to do some static compilation-time checks). As a result this function must not have any kind of Python interaction. You are still allowed to call the function with the GIL, though.

Another thing that wasn’t clear for me at first: it is perfectly OK to release the GIL inside of a def function that has some Python interactions, as long as the Python interactions happen outside of the nogil block. In other words, no need to overthink your functions interfaces. For example, this is perfectly possible:

def f(array):  # Take Python object as input

    cdef:
        int [:] my_view = array

    # Python interactions possible here
    # ...

    with nogil:
        # No Python interactions here
        # ...
        # do stuff to my_view, maybe in parallel with prange...
        # ...

    # Python interactions possible here
    # ...

    return array  # return a Python object

# Then from Python, in a .py file:
out = f(np.arange(10))

Finally: releasing and aquiring the GIL takes time. You don’t want to do that deep inside nested for loops. It is instead better to write big chunks of nogil code.

Using numpy arrays and memory views

In scikit-learn we rely on numpy arrays for almost everything. Cython supports numpy arrays but since these are Python objects, we can’t manipulate them without the GIL.

In the past, the workaround was to use pointers on the data, but that can get ugly very quickly, especially when you need to care about the memory alignment of 2D arrays (C vs Fortran).

Cython now supports memory views, which can be used without the GIL. A view is a light struct that basically contains a pointer to the raw data, and info about the type, memory alignment, etc.

The interaction between numpy arrays and views is pretty flexible. In particular, the following patterns are perfectly acceptable:

# Declare an argument as a view, and pass in an array
def f(int [:] my_view):
    # ...

f(np.arange(10))

# Take an array as input, and locally map it to a view
def f(array):

    cdef int [:] my_view = array  # bind array to a view

    # You can now manipulate the view, possibly without the GIL

# Allocate an array inside of a function, and manipulate it with a view.
def f():

    cdef int [:] my_view = np.arange(10)

    # You can now manipulate the view, possibly without the GIL

    return np.asarray(my_view)

That might seem obvious, but views do not support the numpy array API: You cannot do things like my_view.mean(). You can do np.mean(my_view), but not without the GIL.

I find myself not using pointers at all, unless I really need to malloc() and then free() something in a nogil section.

Other tips about memory views:

You can manipulate views over numpy structured array (i.e. arrays with a complex dtype): see this PR, which isn’t merged at the time of writing.

Since you’re writing C-like code, in general you’ll want to disable the bounds checks and the wraparound to make your code faster (see the docs).

If you know whether your array is C-aligned or Fortran-aligned, definitely let Cython know. Else, Cython will generate general code that can work with arbitrary alignment, which is less optimized:

cdef int [:, ::1] my_view  # C aligned (contiguous on the last dim)
cdef int [::1, :] my_view  # Fortran aligned (contiguous on the 1st dim)

Last remarks

Cython is an amazing tool, and the whole Python data-science ecosystem owes Cython a lot. Scikit-learn, Scipy and pandas heavily rely on it.

Much like Numba, it can however be a bit (too) magical sometimes, and even the smallest change can have a huge impact on the performance of your code, sometimes for obscure reasons. I can’t stress this enough: always benchmark your code.

The Cython documentation is full of great tips, though with time the organization is becoming a bit confusing (with e.g. some redundancy between the User Guide and the tutorials). This Scipy paper also has some useful info that are not all covered in the docs.

I also found the Cython book by Kurt W. Smith to be very clear and useful. It covers most of the topics from the docs in a very accessible way.

Nicolas Hug

ML Engineer