Trying OpenCL on Guix: An Experience Report

Recently, I wanted to run Leela Zero with q5go to help me get better at playing Go. I run GNU Guix on my machine, and so I did the following: guix package -i leela-zero gnugo q5go. This got q5go going and I was able to point it at Leela Zero (~/.guix-profile/bin/leelaz) as an analysis engine; however, it does not work. Here is my experience determining why.

Like many machine learning programs, Leela Zero can optionally use your GPU utilizing OpenCL to drastically speed up its operations per second. Unfortunately, invoking leelaz with leelaz --gtp wasn't working for me. If I invoked Leela Zero like this: leelaz --cpu-only it worked, but took a long time to analyze moves. This suggested to me that it was an issue with my OpenCL setup. We can use the clinfo program to troubleshoot this.

guix environment --ad-hoc clinfo -- clinfo
Number of platforms                               0

This strongly suggests there is something wrong with Guix's OpenCL setup. We can use strace to learn more.

guix environment --ad-hoc strace clinfo -- strace -o/dev/stdout -eopenat clinfo
openat(AT_FDCWD, "/gnu/store/2ax9z25142khhqx61ks767jr758pzq5r-clinfo-3.0.21.02.21/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/gnu/store/i70jq190cpc45crbnrw8g8lgb4djyi9r-opencl-icd-loader-2021.06.30/lib/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/gnu/store/094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/OpenCL/vendors", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
Number of platforms                               0
+++ exited with 0 +++

Despite mesa-opencl being installed in the system profile, Guix had not populated /etc/OpenCL from the package. It is, however present:

find -L /run/current-system/profile -name OpenCL
/run/current-system/profile/etc/OpenCL

This is mystery number one: why isn't this being populated into Guix's root /etc?

Pointing clinfo at the vendors directory can be achieved with the OPENCL_VENDOR_PATH environmental variable. The contents of this file are:

cat /run/current-system/profile/etc/OpenCL/vendors/mesa.icd
/gnu/store/48qh6x7ky8r1cxbfalwzngch4hgnrrr9-mesa-opencl-icd-21.3.8/lib/libMesaOpenCL.so.1

By running strace, we can see that despite mesa-opencl-icd being installed in the system's profile, it cannot find the location of the library:

OPENCL_VENDOR_PATH=/run/current-system/profile/etc/OpenCL/vendors guix environment --ad-hoc strace clinfo -- strace -o/dev/stdout -eopenat clinfo
openat(AT_FDCWD, "/gnu/store/2ax9z25142khhqx61ks767jr758pzq5r-clinfo-3.0.21.02.21/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/gnu/store/i70jq190cpc45crbnrw8g8lgb4djyi9r-opencl-icd-loader-2021.06.30/lib/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/gnu/store/094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/OpenCL/vendors", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
Number of platforms                               0
+++ exited with 0 +++

It is, however, present:

find -L /run/current-system/profile -name libMesaOpenCL.so.1
/run/current-system/profile/lib/libMesaOpenCL.so.1

This is mystery number two: why can't Guix locate the library?

If we create our own vendors file, populate it with the location of the libMesaOpenCL.so file, and point clinfo at this, things begin to look better.

cat ${HOME}/.local/etc/OpenCL/vendors/mesa.icd
/run/current-system/profile/lib/libMesaOpenCL.so.1
OPENCL_VENDOR_PATH=${HOME}/.local/etc/OpenCL/vendors clinfo
Number of platforms                               0

However, Leela Zero is still not working:

OPENCL_VENDOR_PATH=${HOME}/.local/etc/OpenCL/vendors leelaz --tune-only 2>&1 || true
A network weights file is required to use the program.
By default, Leela Zero looks for it in /home/katco/.local/share/leela-zero/best-network.

There is a curious error from the output of clinfo:

Preferred work group size multiple <getWGsizes:1200: create kernel : error -46>

If we set the LD_DEBUG environment variable to libs, we can shed some light as to what is wrong:

OPENCL_VENDOR_PATH=${HOME}/.local/etc/OpenCL/vendors LD_DEBUG=libs clinfo 2>&1 |grep error

Indeed, this file is not present.

[ -f /gnu/store/h86b3253bc3mnp3p57n1vls2vkfv2h6z-libclc-9.0.1/share/clc/gfx1010-amdgcn-mesa-mesa3d.bc ]
echo $?
1

Further research turned up a bug (44841) against libclc which suggests that while support for my card was included into LLVM v10 (at the time of this writing, LLVM has released v12), libclc does not support my card's architecture, gfx1010.

I attempted to build libclc v12.0.0 locally, but it segfaulted. Building v11.0.0 worked, but as suggested by the open bug, support for my card's architecture still has not been implemented.

I briefly entertained creating a Guix package from AMD's amdgpu-pro packages, but it appears as though my card is not supported, and according to a bug (819) against ROCm, likely won't be.

So it would seem I'm out of luck, and I'm stuck running Leela Zero on the CPU for now. Analyzing one of my games on 60 compute cores took somewhere around ten minutes, so not intractable.

Still, perhaps this helps others running Guix with GPUs that are supported by libclc.

As an aside, this research is perhaps an indication of why — despite my years of interest in Go — I remain a Kyu player.