Microarchitecture rpm macros

During Hackweek 20 at SUSE I created some rpm macros to create packages easily that use the glibc-hwcaps feature. There’s a post with the journal from the hackweek in case you want to read it. Here I’ll just explain how to use the new macros I created.

The package definition

First you have to add a BuildRequires to use the macros:

BuildRequires:  microarch-rpm-macros

Then before the %description section, you have to add a line like:

%microarch_subpackage -n %{libname}

The %microarch_subpackage macro is used to generate the subpackage sections. It’s important that the parameter passed to it is the same as the parameter passed to the %package section that defines the library package. It’ll also generate the %files section with the same contents as the %files section in the library package but with the directory adapted to the microarchitecture of each subpackage.

The %build section

Let’s consider the following code in the build section:

autoreconf -fiv
%configure \
   --with-pic \
   --disable-static
make %{?_smp_mflags} CFLAGS="%{optflags}"

We will replace that with:

autoreconf -fiv
%{microarch_build %$configure \
  --with-pic \
  --disable-static
  make %{?_smp_mflags} CFLAGS="%{$optflags}"
}

The %microarch_build macro will take care of executing 4 times the code within to build the baseline version of the package and then the x86-64-v2, x86-64-v3 and x86-64-v4 versions, each in a different directory and with different %optflags values which include the respective -march and -mtune parameters in each case as well as different %_libdir values so the library is installed to the right place later in %install.

Note that %configure was replaced with %$configure and %{optflags} was replaced with %{$optflags}. This is done so that they’re not expanded before passing the arguments to %microarch_build .

Also note that the autoreconf execution was left out of the macro. This is so that the configure script is generated in the root source directory. Then the %microarch_build macro can generate build.x86-64-vN directories and put there the build files.

In my test with the bzip2 package I had a special case. If the %do_profiling boolean is set then code is built, the tests are executed and then the code is built again with the generated profiling information. The %build section used was:

%configure \
  --with-pic \
  --disable-static
%if 0%{?do_profiling}
  make %{?_smp_mflags} CFLAGS="%{optflags} %{cflags_profile_generate}"
  make %{?_smp_mflags} CFLAGS="%{optflags} %{cflags_profile_generate}" test
  make %{?_smp_mflags} clean
  make %{?_smp_mflags} CFLAGS="%{optflags} %{cflags_profile_feedback}"
%else
  make %{?_smp_mflags} CFLAGS="%{optflags}"
%endif

And I replaced that with:

%{microarch_build %$configure \
  --with-pic \
  --disable-static
%if 0%{?do_profiling} && "%{$microarch_current_flavor}" == "x86-64"
  make %{?_smp_mflags} CFLAGS="%{$optflags} %{cflags_profile_generate}"
  make %{?_smp_mflags} CFLAGS="%{$optflags} %{cflags_profile_generate}" test
  make %{?_smp_mflags} clean
  make %{?_smp_mflags} CFLAGS="%{$optflags} %{cflags_profile_feedback}"
%else
  make %{?_smp_mflags} CFLAGS="%{$optflags}"
%endif
}

Note that here again I used a $ within %{$microarch_current_flavor} so it can be replaced in each flavor with the right value.

The %install section

%install sections usually consist on running something like %make_install and then maybe installing some files manually. In this case, we would replace this:

%make_install pkgconfigdir=%{_libdir}/pkgconfig
install -Dpm 0755 foo %{buildroot}%{_bindir}/foo
install -m 0644 %{SOURCE2} %{buildroot}%{_mandir}/man1

with:

%microarch_install %make_install pkgconfigdir=%{_libdir}/pkgconfig
install -Dpm 0755 foo %{buildroot}%{_bindir}/foo
install -m 0644 %{SOURCE2} %{buildroot}%{_mandir}/man1

%microarch_install will run the argument passed to it 4 times but first it’ll run the microarch flavors and the baseline flavor will be run last.

Note that after the flavors are installed and before the baseline installation is done, it’ll remove all *.so files within the glibc-hwcaps directories since we don’t want development files in there.

The %check section

In the %check section packages usually run the generated binaries to test they work as expected. Note that we can’t do that for all flavors since we may not have a recent enough CPU to run them. Because of this, I opted to just check the baseline flavor.

Just replace

make %{?_smp_mflags} test

or anything you have to run from the build directory in %check with:

pushd %microarch_baseline_builddir
make %{?_smp_mflags} test
popd

And that should be enough for the simple case of bzip2 and similar packages.

Please note that this is work in progress and currently using the macros with %cmake or %meson will fail and is not supported yet. Check the conclusions on the previous post for information about what’s still missing.

Hackweek 20: glibc-hwcaps in openSUSE

This week we’ve held Hackweek 20 in SUSE so I’ll try to explain here what I’ve worked on. I recently noticed glibc 2.33 introduced hwcaps support which means it’s now possible to install libraries using an expanded cpu instruction set from recent CPUs in addition to the regularly compiled libraries and glibc will automatically choose the version optimized for the current cpu in use. This sounded very nice so I thought I’d try to work on that for my hackweek project.

My plan was to work at the package building level: Add/modify rpm macros to make it easy to build packages so that subpackages optimized for the different microarchitectures are (semi-)automatically generated and SUSE/openSUSE users can easily install those packages with optimizations for the specific cpu in use.

The preliminary tests

I began by creating a home:alarrosa:branches:hackweek:glibc-hwcaps project in obs to force gcc-11 to be used by default to build every package I wanted to test and then added a home:alarrosa:branches:hackweek:glibc-hwcaps:baseline subproject where I’d build baseline versions of packages and a home:alarrosa:branches:hackweek:glibc-hwcaps:x86-64-v3 project where I’d build the packages using `-march=x86-64-v3 -mtune=skylake` so they’re optimized for my cpu and I can measure the speed improvement.

I first thought I’d benchmark converting an x264 file to x265 using ffmpeg, so I built fdk-aac, libx264, x265 and ffmpeg-4 in both projects (baseline and x86-64-v3). The results were practically the same with both versions but that was partly expected since ffmpeg and most video libraries usually already contain code to check the current cpu and run code specifically optimized for it in assembly.

So I thought I should try a C/C++ library that’s not video-related, which brought me to building baseline and x86-64-v3 versions of libpng16, poppler, cairo and freetype2 libraries.

I then executed the following command to render png files for each page of a large pdf file using both sets of libraries:

time pdftocairo asio.pdf -png

The results were:

  • 325.618 seconds (mean over 3 runs with 1.235 seconds of difference between the min and max results) for the baseline version.
  • 336.672 seconds (mean over 4 runs with 0.664 seconds of difference between the min and max results) for the x86-64-v3 version

Yes, you read that right. Unexpectedly, the optimized version was noticeably slower. I got a bit frustrated with that result but still thought that it might be related to problems with the current version of the compiler that might be fixed in the future, so it might be worth to continue working on the project.

A quick test for glibc-hwcaps

I created a really small libbar dynamic library with a function that prints a message on the screen, built it three times with three different messages and put each of them into /usr/lib64, /usr/lib64/glibc-hwcaps/x86-64-v2 and /usr/lib64/glibc-hwcaps/x86-64-v3 . I then did a small foo binary that linked to libbar and called that function. Making only some of the libraries available worked as expected so I confirmed that glibc-hwcaps support worked as expected.

The microarch rpm macros

At this point (it was already wednesday afternoon), I could start working on the rpm macros. In order to test them, I created yet another project at home:alarrosa:branches:hackweek:glibc-hwcaps:test . In there I created a new package microarch-rpm-macros that would install … well… the rpm macros 🙂 and then another package called microarch that would be used on one hand to generate a microarch-filesystem package that owns the new directories /usr/lib64/glibc-hwcaps and /usr/lib64/glibc-hwcaps/x86-64-v[234] and 3 other packages (microarch-x86-64-v2, microarch-x86-64-v3 and microarch-x86-64-v4) that you’ll see in a moment what they’re used for.

I worked on the rpm macros and these packages on Thursday and Friday and by 19:00 on Friday I got everything working.

I’ll explain the rpm macros I created on my next post so that it can be used as a reference without having all the explanations in this post about the story to develop them.

The rpm macros built the package four times with different optimization flags, generated all three subpackages with the optimized versions, put the library files in place and then adding the repository from the test obs project I could do:

sudo zypper in microarch-x86-64-v3
Loading repository data…
Reading installed packages…
Resolving package dependencies…

The following 2 NEW packages are going to be installed:
 libbz2-1-x86-64-v3 microarch-x86-64-v3

2 new packages to install.
Overall download size: 52.6 KiB. Already cached: 0 B. After the operation, additional 74.4 KiB will be used.
Continue? [y/n/v/…? shows all options] (y):

So just installing the microarch-x86-64-v3 package pulls in all optimized packages for that microarchitecture automatically.

Conclusions

I consider the hackweek project was a partial success. I did what I wanted in the original plan and it works well. There’s still work to do of course:

  • The rpm macros need to be polished (a lot) before submitting them to Factory.
  • More packages apart from bzip2 should be adapted to use them.
  • The macros will need to be adapted to more use cases. For example, using cmake or meson to build a package with the %microarch* macros is not tested and I have no doubts it’ll fail. Fortunately, now that the main work is done I think this will be easy to implement.
  • I need to provide NOOP versions of the macros for other architectures since currently they just fail to build packages on anything different than x86-64 (does glibc-hwcaps support microarchitectures for other architectures?)

And then, even if I work on all points above there’s still the main issue of the optimized libraries being slower than the baseline ones. In any case, once this issue is solved, all this should bring some benefits to our distributions. And the project was also useful to have a confirmation that using optimization flags doesn’t always means that the generated code will be faster.

Before ending I’d like to thank Florian Weimer, Michael Matz, Dario Faggioli, Albert Astals and Dan Čermák for their valuable input on this project as well as Matěj Cepl, Ben Greiner and the rest of the authors of the great openSUSE python singlespec macros which are the inspiration of this project.