Deep down the FOSS: an unexpected journey

08/04/2023 │ Blog

Once upon a VegaFEM

It all began with a spark…

On my spare time, I decided to sharpen my skills and change my mind by toying with VegaFEM, an open-source (BSD-3-Clause) middleware providing facilities for soft-body simulation in the realm of computer graphics. It has been developed by Jernej Barbič and his group over many years, yet, it does not have an official release on any major git repository hosting websites. Its distribution occurs through the personal webpage, and copies elsewhere are often ephemeral results of isolated contributors. I decided to give a go on my own with the latest version (4.0.0) and let my repository join the rank of numerous forks. Doing so with this library offers the possibility to work on many aspects related to devops, maintenance, and development.

The trigger ARPACK

VegaFEM is a library mixing generously C and C++ formalism, and mostly developed on Linux systems, for which it offers a Makefile. There are solution files provided for Visual Studio, however, setting up the dependencies is up to the user and not all them are easily available. Among them, there is ARPACK.

ARPACK, short for Arnoldi Package, is a Fortran 77 library for solving eigenproblems on large sparse linear system in a matrix-free fashion.

ARPACK is used as a solver in an utility application — the large modal deformation factory. This technique improves the simulation time (= more FPS) by significantly reducing the number of unknowns (= fewer calculations)1. The gist of it is to approximate a complex motion by a linear combination of a few vectors2 (deformation modes, or eigenvectors). More on those technical aspects perhaps one day…

Problem, ARPACK is not easy to come by on Windows, so it clashed with my multi-OS CI pipeline plans.

Twists and turns

Replacing ARPACK ?

I was aware of the library Spectra, which is supposed to offer similar facilities as ARPACK, a modern take on the problem and being C++ first. However, it would mean diving into the code replacing some components, besides adding dependency on Eigen. Quite some big changes, and a line I was not willing to cross at that point since I am aiming to remain as close as possible to the source material.

ARPACK-NG to the rescue

The main website for ARPACK is (long?) dead, however, out of its ashes rose a common effort to maintain it: ARPACK-NG. Several major applications rely on it for computing eigenvalues and eigenvectors, such as Octave, Scilab, or Mathematica. The latest version of the repository was not so recent, on December 2020, but changes are still actively committed. To my surprise, the repository seems to support CMake as build generator.

VCPKG my old friend

Package managers are not standardized in the Fortran and C++ ecosystem. Unlike Rust or Node.js, one has a (too many) choices to acquire and integrate dependencies in a project. Bonus point for flexibility, yet a failing grade for uniformity.

I have enjoyed using vcpkg for a long time. It is particularly great on Windows because it simplifies tremendously C++ development for consuming third party libraries. Besides, it is well integrated with Visual Studio, but more importantly, multiplatform.

I was already using vcpkg for other dependencies needed by VegaFEM, such as OpenGL, FreeGLUT, TBB, etc. Therefore, I decided to bring ARPACK to vcpkg instead of relying on other mechanisms, such as git submodule or CMake (ExternalProject/FetchContent), for this sole dependency.

A previous search let me know I was not the first to look into it. The effort was halted because upstream changes were needed prior to merging the contribution. The author contributed significantly in ARPACK-NG to improve the CMake files, and thanks to their effort, my version of the port version ran successfully on Linux (via WSL) , even on Windows!… For all (common) triplets? No, one triplet held stubbornly against the developer assault.

Completionism to some extents

The port was not playing nice as a static library when tested on Windows (with triplet x64-windows-static). The reason was found quickly, but mostly by luck. I was quite zealous and added the BLAS and LAPACK dependencies of ARPACK in the CMakeLists file myself with the following line target_link_libraries(main LAPACK::LAPACK BLAS::BLAS), because I was not trusting the installed CMake targets for ARPACK to do so.

However, once I removed it, the compilation failed with nasty undefined reference to some obscure Fortran function during link time. Thus, my initial assumption was that the install configuration was lacking declaring those dependencies. Digging further, I was wrong:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Summarized content
find_dependency(BLAS REQUIRED)
find_dependency(LAPACK REQUIRED)

add_library(arpackng SHARED IMPORTED)

set_target_properties(arpackng PROPERTIES
  INTERFACE_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include"
  INTERFACE_LINK_LIBRARIES "BLAS::BLAS LAPACK::LAPACK"
)

Analyzing further after this new insight the (thinned out) compilation logs allowed for a simple diff

✔️
/usr/bin/c++ -O2 -g -DNDEBUG CMakeFiles/main.dir/main.cpp.o -o main libarpack.a libopenblas.a liblapack.a -lm /usr/lib/x86_64-linux-gnu/libgfortran.so.5 -lgfortran -lquadmath/usr/bin/c++ -O2 -g -DNDEBUG CMakeFiles/main.dir/main.cpp.o -o main libarpack.a liblapack.a -lm /usr/lib/x86_64-linux-gnu/libgfortran.so.5 libopenblas.a -lgfortran -lquadmath

Where the only difference was the link order of OpenBLAS. Changing the link order to LAPACK then BLAS fixed the issue. Indeed, link order matters, especially for static libraries. Little did I know that link order with static libraries with CMake is a complex topic, with a possible solution only available from CMake 3.24.

The second problem was that the library built tests by default with no way to prevent it. Thankfully, the solution for this is straightforward: to guard building the tests by a variable. To respect existing usage, it is on by default.

The macOS plateau

Happy of my findings, I want to bring them to the world by contributing back. Thus, I create one PR for each of those changes and wait worriless for the CI to pass…

QUOIII ?

The pull requests are passing for all jobs… all but macOS ones.

All those jobs run until timeout or don’t even start. In an unfortunate turn of event, a search (actions/runner-images#6384) showed that the macos-latest image for Github Actions runners switches from macOS 11 to macOS 12 during that period, with scheduled downtime. Alas, one shall not merge unless the holy CI turns a uniform green. After the outage, the CI runs again and… fails again for macOS. ARGHHH ! The switch to macOS 12 led to some newly found incompatibilities.

After a month seeing no sign of progress on the resolution of the problem, I decide to wear my get-it-done belt to not let my PRs join the shadow realm of stalled contributions. Sadly I do not possess such a machine at home, I had to rely on the CI for feedback on my changes. Yes, this would be undesirable in a professional context, but it is on my personal time and I am not (that much) in a rush. In the same time, I discovered Github Codespace, a nice tool starting in seconds from a single button press which starts a virtual machine in the cloud with a VS Code interface and a readily checked out repository. This makes simple multi-file edits much simpler compared to setting up the repository and environment locally, a blessing for a contributing 🐝 like me.

When diving into details of the workflow file, I noticed that macOS variants, already using macos-latest were upgrading the OS. Now it made sense why one of the step was taking so long — 40min while all other jobs are a matter of minutes — it sure kills momentum !… And IMHO, it defeats a bit the purpose of a CI pipeline meant to test a fixed set of supported platforms.

OK, let’s summarize, we have a CI job that upgrades the OS and a failing step. If the OS was already upgraded before the encountered problem, then it was already on macOS 12 or later, ruling out de facto an OS problem (after checking that both were upgraded to the same version, of course). Looking at the new image release announcement for breaking changes or differences in environment setup, it turns out that I am not the only one, yey !

The macOS image changed the Python installation of version 3.12 from homebrew (like all version ≤ 3.11) to the system package manager. Effectively, this broke the homebrew installation which required a repair.

The C++ standard chasm

A very strange issue occurred on Windows with Visual Studio, where, switching to C++17 or above led a working example build to fail. An investigation led to the following information

Starting with the C++17 standard, the C header <complex.h> is mandated to include — and be replaced by — the content of <complex>, the C++ standard alternative. In substance, it is the C header wrapped up inside the std namespace. However, Microsoft does not implement fully compliant C99 standard in which the complex numbers facilities are defined. The C complex type are not implemented as such in the Microsoft C Runtime library, and they recommend using platform-specific and non-standard, but binary compatible, alternatives instead.

Standard typeMicrosoft type
float complex or float _Complex_Fcomplex
double complex or double _Complex_Dcomplex
long double complex or long double _Complex_Lcomplex

Microsoft is developing an open-source implementation of the C++ standard library on GitHub: microsoft/STL, which has picked up a lot of momentum over time. It is the most complete implementation of the C++23 standard library at the time of writing this document (2023/03/24) !

The issue is that the UCRT and the C++ STL implementation are handled by different teams. So the Microsoft C++ standard library implementation is not supporting the C types recommended by Microsoft ! It means that any user library relying on those C types in their public API would fail when consumed from a C++ application or library ! Unless… unless the C developers took time to consider that their code would be used from C++17 onwards and test their library accordingly. They would then, like me, discover the need to define an undocumented escape hatch preprocessor definition to override the C++ standard clause above.

Instead of introducing non-standard types in the C++ standard library, understandably so, the issue has been redirected to the documentation repository behind learn.microsoft.com/cpp for documenting the undocumented escape hatch _CRT_USE_C_COMPLEX_H. Unfortunately, in order to use the recommended types above, one will necessarily spill the complex functions into the global namespace for every downstream developer.

Your help would be greatly appreciated

While on my streak, I have been asked to see if I could do something for enabling tests for the CI on Windows. A quick logging session later and several CI runs later, TADA 🎉 The CI is fixed on Windows for regular tests and the MPI ones.

Back to VCPKG

On a last stretch of my ballad turned into a marathon, the port of arpack-ng for vcpkg was published and merged without difficulties, after a new release had been cut to be able to set the right versioning system for the vcpkg database.

Epilogue

Through this adventure I was led to interact with several major repositories, implementing several fixes, and hopefully improving (to a small scale) the general state of the C/C++ ecosystem, by helping to build ARPACK simply across platforms and pushing for clarification for C/C++ complex types on Windows. Now that this side quest is completed, time to get back onto the main storyline 😄