Efficiently Using 3D LUTs in Python: Methods and Benchmarks

Using LUTs in Python

3D LUTs can encapsulate any global colour transformation algorithm, applying it to images with exceptional efficiency and high precision. You will inevitably come across LUTs during development, so how can we make the best use of them in Python?

TL;DR: If you have no specific requirements, go with OpenColorIO; if you prefer not to introduce new dependencies, colour-science or a manual NumPy implementation is recommended.

In Python, there are numerous ways to use 3D LUTs, each with its own characteristics and suited for different scenarios. These include, but are not limited to:

Approach	Interpolation Method	Implementation
colour-science	Trilinear/Tetrahedral	Python (NumPy)
Pillow	Trilinear	C
OpenColorIO	Trilinear/Tetrahedral	C++
PyTorch	Trilinear	C++/GPU

The Python files for these approaches can be found here: Github

Manual Implementation

Taking the crown for the most flexible approach. With the help of various AI tools, you can completely implement a LUT application algorithm from scratch in just a few minutes.

The most basic dependency is NumPy, used for vectorised image processing and gridded data. NumPy is more than enough to handle trilinear and tetrahedral interpolation, and you can also bring in SciPy to implement more complex interpolation algorithms.

When implementing manually, it is best to pick any of the following methods as a test benchmark to verify the correctness of your implementation.

Colour-Science

colour-science is a comprehensive colour science library that covers every aspect of the field, including reading and using LUTs. It supports multiple interpolation methods, but its performance falls short of libraries specifically optimised for image processing; under the hood, it is essentially the same as a manual NumPy implementation.

import colour

# Read the LUT
lut = colour.io.read_LUT("your_lut.cube")
# The LUT needs to be applied to an image in the [0, 1] range
image_float = np.random.rand(800, 800, 3)
# The default interpolation method is trilinear
output_trilinear = lut.apply(image_float)
# Apply the LUT using tetrahedral interpolation
output_tetrahedral = lut.apply(
    image_float, 
    interpolator=colour.algebra.table_interpolation_tetrahedral,
    )

Pillow

Pillow is the most popular image processing library in Python. Under the hood, Pillow is implemented in C, offering decent performance, but it only supports trilinear interpolation. Additionally, you need to convert the image to 8-bit first, meaning it cannot process floating-point images with high precision.

from PIL import Image, ImageFilter

# Load Color3DLUT (requires manually parsing the CUBE file)
size, table = read_cube_file("your_lut.cube")
# size is the grid resolution of the LUT
# table is a list of length size^3 * 3, containing the RGB output values of the LUT
lut_filter = ImageFilter.Color3DLUT(size, table)
# Load the image and apply the LUT
img = Image.open("your_image.jpg")
output_img = img.filter(lut_filter)

OpenColorIO

OpenColorIO (OCIO) is an industry-standard colour management library, widely used in film and television post-production. OCIO supports various interpolation methods, including trilinear and tetrahedral. It is implemented in C++ and bound to Python via Pybind11, delivering outstanding performance. The API is relatively complex, making it a great benchmark implementation.

uv add opencolorio

import PyOpenColorIO as OCIO

config = OCIO.Config.CreateRaw()
lut_transform = OCIO.FileTransform("your_lut.cube")
# Set the interpolation method to tetrahedral
lut_transform.setInterpolation(OCIO.INTERP_TETRAHEDRAL)
# Set the interpolation method to trilinear
lut_transform.setInterpolation(OCIO.INTERP_LINEAR)
# Get the processor
processor = config.getProcessor(lut_transform)
cpu_processor = processor.getDefaultCPUProcessor()
# Apply the transform in-place (acts on floating-point images)
image_float = np.random.rand(800, 800, 3)
output_image = image_float.copy()
cpu_processor.applyRGB(output_image)

PyTorch

PyTorch is a popular deep learning framework that provides powerful GPU acceleration capabilities. By using the grid_sample function, you can achieve highly efficient trilinear interpolation. However, you need to convert both the image and the LUT into PyTorch tensors. This conversion is a bit fiddly, so please check out the specific code implementation.

import torch
import torch.nn.functional as F

out_tensor = F.grid_sample(
    lut_tensor,      # Shape (1, 3, N, N, N)
    grid_tensor,     # Shape (1, 1, H, W, 3) with values in the [-1, 1] range
    mode="bilinear", 
    padding_mode="border", 
    align_corners=True
)

Performance and Accuracy

To test the processing speed of different approaches in actual production scenarios, a floating-point image with a resolution of 3840x2160 was used. The time taken to apply the LUT was measured across the different implementations.

Execution Time

The test results are as follows, with time measured in milliseconds, tested on an M1 Pro processor.

Implementation	Interpolation	17 Steps	33 Steps	65 Steps
colour-science	Trilinear	1493.3	1423.7	1381.1
colour-science	Tetrahedral	2106.3	2091.9	2088.7
Pillow	Trilinear	153.9	157.0	166.3
OpenColorIO	Trilinear	87.2	85.3	86.9
OpenColorIO	Tetrahedral	49.7	50.5	49.5
PyTorch (CPU)	Trilinear	262.2	262.5	264.4
PyTorch (GPU/MPS)	Trilinear	31.1	20.9	22.3

Increasing the LUT grid size does not affect the number of interpolation and lookup operations. OCIO’s implementation is highly efficient, and PyTorch’s GPU performance is particularly impressive, already sitting very close to native performance.

Accuracy

Apart from Pillow, all other methods process floating-point images directly. Using the colour-science implementation as the baseline, the differences between various libraries under the same interpolation algorithm are on the scale of 1e-8 to 1e-9, which can be considered completely identical.

Pillow requires converting the image to 8-bit first. Compared to colour-science, on a benchmark test image, 2% of the pixels showed a discrepancy of one code value, whilst the rest were exactly the same. Due to the limitations of 8-bit input and output precision, it is not recommended for LUT workflows.

Using LUTs in Python#

Manual Implementation#

Colour-Science#

Pillow#

OpenColorIO#

PyTorch#

Performance and Accuracy#

Execution Time#

Accuracy#