UPDATE 2023-05-11: Finally we have builtin inline subword atomic support in GCC!

So we won't be bothered by this issue anymore.


The original blog post:

TL;DR: see "Wrap Up" and "Solution".

Problem

Recently, when we're working on a series of package rebuild pipelines triggered by a glibc update (from 2.33 to 2.34), we discovered that some previously compiling packages, mostly configured to use CMake, refuse to compile now. Their error logs look like this:

1
2
3
/usr/bin/ld: libbson-1.0.so.0.0.0: undefined reference to `__atomic_exchange_1'
/usr/bin/ld: libbson-1.0.so.0.0.0: undefined reference to `__atomic_compare_exchange_1'
collect2: error: ld returned 1 exit status

This is quite strange to us, because we used to think we have got rid of this for all by manually appending set(THREADS_PREFER_PTHREAD_FLAG ON) to these packages' CMakeLists.txt, despite replacing all -lpthread with -pthread. This was done in a per-package manner, modifying the patch to meet the need of the tech stack used by the specific package, and we are sure that they used to work before the glibc upgrade.

Before proceeding, you need to know the difference between -pthread and -lpthread, and that -pthread is the most preferable way if you want to link to the pthread library, at least for glibc<=2.33.

So, what's happening? Is there anything vital got changed in this glibc upgrade?

Cause

After reading the release note of glibc 2.34, we did notice something related:

...all functionality formerly implemented in the libraries libpthread, libdl, libutil, libanl has been integrated into libc. New applications do not need to link with -lpthread, -ldl, -lutil, -lanl anymore. For backwards compatibility, empty static archives libpthread.a, libdl.a, libutil.a, libanl.a are provided, so that the linker options keep working.

Hmm, good, sounds like we don't need -pthread anymore. Actually, it appeared to us that CMake thinks the same. After running cmake for the failing packages, we can confirm that nothing looks like -pthread is appended to either CXXFLAGS or LDFLAGS. However, if forced to compile with -pthread, the previous-mentioned link error disappears. In other words, applying this patch fixes the error.

1
2
- make
+ CFLAGS="-pthread $CFLAGS" CXXFLAGS="-pthread $CXXFLAGS" make

At this point, every clue we have gathered so far seems to indicate a bug inside cmake:

  • Other tools work fine
  • set(THREADS_PREFER_PTHREAD_FLAG ON) appears to be "broken"
  • Manually appending -pthread fixes the issue, so cmake failed to recognize that -pthread is necessary

Root Cause

Blame CMake?

So we dig into CMake's source code to see what happened. To our surprise, we didn't notice any change to the detection code it used for determining whether -pthread is necessary. CMake's detection code, at the time of writing, looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <pthread.h>
static void* test_func(void* data) {
return data;
}
int main(void) {
pthread_t thread;
pthread_create(&thread, NULL, test_func, NULL);
pthread_detach(thread);
pthread_cancel(thread);
pthread_join(thread, NULL);
pthread_atfork(NULL, NULL, NULL);
pthread_exit(NULL);
return 0;
}

It turned out that according to the glibc upgrade, this detection code now compiles without additional arguments:

1
2
3
$ gcc test_pthread.c
$ echo $?
0

As far as I can tell, this does not look like a CMake bug. The expected behavior, which is exactly what we have seen, is that if the test code can be compiled without -pthread, then the argument should not be appended to the command like (or {C,CXX}FLAGS, correspondingly). This lead to another question: why those packages are failing now, provided that the detection code is unchanged, and it used to work properly?

Dive into Source Code

Let's take the package mongo-c-driver as an example. As of version 1.21.1, we can see the code listed below in its src/libbson/src/bson/bson-atomic.h (Feel familiar with this path? Remember the libbson-1.0.so.0.0.0: undefined reference error log mentioned before?):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#define DECL_ATOMIC_STDINT(Name, VCSuffix) \
DECL_ATOMIC_INTEGRAL (Name, Name##_t, VCSuffix)

#define DECL_ATOMIC_INTEGRAL(NamePart, Type, VCIntrinSuffix) \
static BSON_INLINE Type bson_atomic_##NamePart##_fetch_add ( \
Type volatile *a, Type addend, enum bson_memory_order ord) \
{ \
DEF_ATOMIC_OP (BSON_CONCAT (_InterlockedExchangeAdd, VCIntrinSuffix), \
__atomic_fetch_add, \
__sync_fetch_and_add, \
ord, \
a, \
addend); \
} // Omitted many lines...

As we can tell from libbson's source code, it attempts to use the __atomic_* builtins of GCC, like __atomic_fetch_add and __atomic_compare_exchange. We can construct some test cases to check whether we're right:

1
2
3
4
5
// test1.c
int main() {
char u = 0, v = 1;
__atomic_exchange_n(&u, &v, __ATOMIC_RELAXED);
}
1
2
3
4
5
// test2.c
int main() {
short u = 0, v = 1;
__atomic_exchange_n(&u, &v, __ATOMIC_RELAXED);
}
1
2
3
4
5
// test4.c
int main() {
int u = 0, v = 1;
__atomic_exchange_n(&u, &v, __ATOMIC_RELAXED);
}
1
2
3
4
5
// test8.c
int main() {
long long u = 0, v = 1;
__atomic_exchange_n(&u, &v, __ATOMIC_RELAXED);
}

Here, we use the gcc built-in __atomic_* because libbson used it. Replacing this with std::atomic<T> (and compile it with g++) yields identical results correspondingly. You may also use uint*_t provided by the linux/types.h header, as their sizes are promised to be the same across all architectures.

Compile results:

1
2
3
4
5
6
7
8
9
10
$ gcc test1.c
/usr/bin/ld: /tmp/ccNst0x1.o: in function `.L0 ':
test1.c:(.text+0x34): undefined reference to `__atomic_exchange_1'
collect2: error: ld returned 1 exit status
$ gcc test2.c
/usr/bin/ld: /tmp/ccvK2EbX.o: in function `.L0 ':
test2.c:(.text+0x36): undefined reference to `__atomic_exchange_2'
collect2: error: ld returned 1 exit status
$ gcc test4.c
$ gcc test8.c

If you are kind of familiar with C++, you may have heard that codes using atomic operations might need to link libatomic (I remembered once reading this, but I can't find it now). It turned out that if we provide -latomic to gcc, the code compiles:

1
2
$ gcc test1.c -latomic
$ gcc test2.c -latomic

And this also works for -pthread:

1
2
$ gcc test1.c -pthread
$ gcc test2.c -pthread

Smart readers might have felt something unusual:

First, as we can see, gcc compiles test4.c and test8.c successfully without -latomic or -pthread, but it couldn't handle the atomic operations in test1.c and test2.c without making a call to libatomic:

1
2
3
4
5
$ gcc test1.c -latomic -S
$ gcc test4.c -S
$ cat test1.s | grep atomic
call __atomic_exchange_1@plt
$ cat test4.s | grep atomic

This is partially explained in a GitHub issue (riscv-collab/riscv-gcc#12). Till April 2022, i.e. when this blog is written, gcc does not support inlining subword atomics, and this is the reason why -latomic must be presented when invoking gcc.

But what about -pthread? Why -pthread works either? I mean, libpthread itself does not provide those atomic symbols, right? Actually, if you use ldd to check a.out generated by gcc test1.c -pthread, you will notice that libatomic is linked, instead of libpthread as we are expecting from -pthread. Changing -pthread with -lpthread won't work, so gcc must have done something internally for -pthread despite linking libpthread. In order to figure out the difference, we have to check the gcc spec:

1
2
$ gcc -dumpspecs | grep pthread
%{pthread:-lpthread} %{shared:-lc} %{!shared:%{profile:-lc_p}%{!profile:-lc}} %{pthread:--push-state --as-needed -latomic --pop-state}

Oh, the answer finally reveals: gcc silently append --as-needed -latomic if -pthread is provided. This also explains why changing -lpthread to -pthread works.

The reason why gcc decided to link libatomic when -pthread is provided had been lost in a squashed commit. My guess is that they tried to cover up the subword atomic pitfall using this strategy. (UPDATE: aswaterman's initial fix did not limit the scope to -pthread. The limitation was added one week later.) Generally, atomics are used with multi-threading, and you use pthread in such cases. However, one may use atomics without pthread, like when writing SIGNAL handlers. So IMO this workaround is not good enough, and should be replaced by subword atomic inline support.

UPDATE: We discussed this with some gcc RISC-V devs, and it turned out that when compiling gcc, at stage 1 it may not recognize libatomic (i.e. -latomic may not work) if you are compiling it with a gcc that uses the old spec So you'll have to set --with-spec at stage 1 to workaround this. Hopefully there would be an --always-link-libatomic configuration flag in the future.

Fun facts: gcc once had added --as-needed -latomic to its LIB_SPEC unconditionally, in this commit (2017-02-01):

1
2
3
+ #undef LIB_SPEC
+ #define LIB_SPEC GNU_USER_TARGET_LIB_SPEC \
+ " " LD_AS_NEEDED_OPTION " -latomic " LD_NO_AS_NEEDED_OPTION \

But later in the final port commit submitted to gcc (2017-02-07; svn r245224), they limited the change to -pthread:

1
2
-   " " LD_AS_NEEDED_OPTION " -latomic " LD_NO_AS_NEEDED_OPTION \
+ " %{pthread:" LD_AS_NEEDED_OPTION " -latomic " LD_NO_AS_NEEDED_OPTION "}"

The reason of making such limitation remains unknown.

Wrap Up

glibc moved libpthread into libc, and CMake's pthread sample code compiles directly. Hence, CMake thinks there's no need to provide -pthread when invoking gcc / g++. Since we used to patch such packages with -pthread instead of -latomic, and are actually relying on -pthread's side effect (--as-needed -latomic), this led to the consequence that neither -pthread nor -latomic is provided, so subword atomics refuse to compile as libatomic is not linked. For ways of fixing this problem, please refer to the Solution chapter.

Also note that the dependence on libatomic causes another problem. Some packages / frameworks, e.g. JUCE insists that its atomic wrap only accepts lock-free values. However, when a program is compiled, the compiler cannot know whether a call to libatomic will be lock-free or not (actually, the emitted sequence will be lock-free, but the compiler doesn't know). Hence, the compiler will return false for std::atomic<bool>::always_lock_free, and this also happens for the macro ATOMIC_BOOL_LOCK_FREE defined in atomic.h. As a result, if you use juce::Atomic<bool>, some static asserts will fail, and the package refuses to compile.

Solution

Solutions vary according to the scale of projects.

  • As a developer / user on behalf of yourself, export CFLAGS="$CFLAGS --as-needed -latomic" and likewise for CXXFLAGS solve the problem. As an alternative, you can also patch the Makefile file, and edit the flags inside it.
  • As a packager maintaining multiple packages with the same toolchain, you may want to write a module / universal patch. For projects with CMake, you can use LLVM's CheckAtomic.cmake, which (if you are using a newer version) has taken subword atomics into consideration.
  • As a developer maintaining a relatively large project that has gained some users, you may need to consider the compatibility with clang and compiler-rt. Clang should be able to handle -latomic, but better perform a double-check. You can temporarily rely on the side-effect of gcc -pthread (which is not recommended), or edit your configure script manually to disable -latomic when clang and compiler-rt are detected.
  • As a developer of a widely-used toolchain project, despite considering clang and compiler-rt, compatibility with older compilers / linkers that do not support --as-needed and even -pthread might need to be considered. CMake reverted their -pthread fix on this because a compiler named XL does not support -pthread, and interprets it as -p -t hread, returning zero (oops! false negative) as its exit code.
  • The final approach of solving this is to add subword atomic inlining support into gcc. There's a pending patch regarding this, but this patch only adds support for __atomic_fetch_add, so the problem would not be mitigated (but this patch indicates a good start!).

Complicated solutions may increase the burden on project developers / maintainers, and their willingness to port their projects to RISC-V might be reduced. Aiming to tackle the problem, we have proposed to change LIB_SPEC back to its originally patched form to mitigate this issue partially (still not lock-free), and hope that the subword atomic pitfall can be solved completely in the future.

UPDATE: After discussing this with gcc RISC-V devs, we reached the consensus that maybe a gcc configuration argument --always-link-libatomic can be added.

UPDATE2: This breaks the bootstrap process of GCC. GCC compiles itself twice to get rid of e.g. dirty compiling environments stuffs. Modifying the spec string too early causes glitches, like libatomic calls itself infinitely. So we have to use sed to replace the spec string inside the first-stage GCC binary carefully, filling blanks (U+0020) to make sure old and new spec strings have the same length.

来源:https://blog.jiejiss.com/