Rust is incompatible with LLVM, at least partially
Lately we've been compiling rust 1.59.1 on
riscv64gc (will be referred to as
rv64 in this article). We are packaging for the PLCT's Arch Linux RISC-V project, and thanks to the Arch wisdom, tools in the toolchain we use are always the latest. Still, we met a strange compile error:
error: unknown directive
After performing some simple queries (cs.github.com is awesome!), we managed to locate the following code (ref)：
//! Shared RISC-V intrinsics
pause, fence and .insn
fence w,0, which is
.insn i 0x0F, 0, x0, x0, 0x010.
pauseis provided by the
Zihintpauseextension. Though the widely adopted
riscv64gcdoes not contains
fence w,0won't trigger a
SIGILL, because it is treated similar to a
nopinstruction, so we can use it safely without bothering about compatibility.
Firstly, by reading the comments, we can infer that this
.insn assembly acts as the
pause instruction. This stunned me a bit because AFAIK, the HINT feature in RISC-V ISA has always been in the reserved state (at least until December 2021):
No standard hints are presently defined. We anticipate standard hints to eventually include memory-system spatial and temporal locality hints, branch prediction hints, thread-scheduling hints, security tags, and instrumentation flags for simulation/emulation.
It's not hard to realize, though, that the
pause instruction is introduced by
Zihintpause extension, as an alias for
fence w, 0. Previously, we have to use
nop to polyfill the
pause function implemented for other architectures:
From 4e559dabe28e57ee27cb45c8297e1e387beed1d3 Mon Sep 17 00:00:00 2001
However, in RISC-V, the
nop instruction simply stands for "no operation", and does not provide any further clues to relax the CPU, hence not saving any energy (but instead wasting it). So, it's definitely better to replace the fake
nop) with the real
pause, as provided in the
One may say that, hey, this extension is not part of the
riscv64gc extension set! This argue is valid, as
riscv64gc stands for
riscv64imafdc_Zicsr_Zifencei (used to be
riscv64imafdc, when the
I baseline has not been split to
Zifencei). Let's look at the
pause instruction in detail:
Before doing this, you need to grab a RISC-V run-time, via QEMU or via buying a board from SiFive.
Uh-oh, seems like the assembler does not know this instruction! That's because 1) the
pause instruction is too new; 2) I'm using
riscv64gc, which does not include
Zihintpause. Never mind, we can still use
fence w,0 as noted in the spec:
Not good. But at least the rust version should work, as they have RISC-V as tier-2 target, and managed to make the release pass the CI test, right?
Hmm, seems like the
.insn i is working (compiling, at least), but what does this
fence w,unknown stands for? Let's have a loot at the spec:
Shouldn't it be
fence w,0? Actually, it is
fence w,0, as denoted by the hex
IMO this is a subtle bug (used to be a feature) of the disassembler, and as we can see, this is already fixed in the llvm toolchain:
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp
After digging into the underlying mechanism of the
pause instruction, we can easily conclude that it will not trigger a
SIGILL, as the
fence instruction is part of the
RV32I baseline instruction set, hence available in all valid RISC-V instruction sets.
In conclusion, it's 100% safe to replace
.insn i 0x0F, 0, x0, x0, 0x010 to make the code compile, regardless of what RISC-V extensions you are using.
Tracing the Problem
Ah, I know someone must be already complaining: you've write so much analysis, but how are they related to the error occurred when compiling Rust? Actually, the direct answer to this question is simple and naive. Let's take the
pause.c, and use clang 13 to compile it, and see what will happen:
Obviously, clang 13.0.1 still doesn't support compiling the
.insn directive. The support is to be added into clang since 14.0, as we can see from the target branch of llvm commit
28387979: [RISCV] Initial support .insn directive for the assembler. This is imported into rust in this pull request (#91528).
Still, there's a tiny issue haunting: The rust PR (
#91528) is merged months before the initial release of llvm
14.0.0-rc1. Sure, Rust guys are always keen on trying those nightly, bleeding-edge stuffs, but how can they grab the 14.x llvm toolchain before the upstream has ever released it?
After investigating PR
#91528, things become clear. The Rust team is maintaining a fork of llvm at
rust-lang/llvm-project, and they cherry-picked commit
28387979 to make the
.insn stuff compiles when using llvm 13.
By maintaining a fork and constantly modifying / cherry-picking on demand, the Rust team is able to benefit from unreleased changes, or add support for older OS/platforms that are not supported by the upstream. However, this is definitely not a good news for downstream packagers:
- You can't compile Rust with the original toolchain from time to time, when there're cherry-picked commits, or when the Rust team adds new features to their fork, and failed to submit them to the upstream quickly enough.
- If you compile
rust-lang/llvm-projectfirst, and use the compiled llvm toolchain (let's call it
rust-llvm) to compile Rust itself, then the
rustpackage would have conflicts with the
rustcmay need to link to
.sofiles provided by
rust-llvm, depends on which one you are using to compile
rustc), and the
.sofiles may have different ABI (Application Binary Interface), causing incompatibility. Also,
- Arch Linux only provides shared build, so static linking is not a preferable way to solve this.
- Splitting the
rustwon't help. That's because the linked
.sofiles need to be presented at run-time, so
rust-llvmwill be put into
dependsarray, hence failing to resolve the conflict.
Currently, we can still hide the problem by letting the build fail for some time, and wait for newer llvm releases that contain those features required by building
rustc. Sure, this solution is not elegant, and things might get worse when the difference between
llvm becomes so huge that it's impossible to compile
rustc with the upstream
llvm. But we can't take the burden to make
rust incompatible with
llvm -- there are way too much packages that depend on both
llvm now. Fortunately, consider the Rust team's claim on
rust-llvm, that they will always attempt to submit their new features to upstream, maybe we don't need to worry too much.