Rust is incompatible with LLVM, at least partially
Lately we've been compiling rust 1.59.1 on riscv64gc (will be referred to as rv64 in this article). We are packaging for the PLCT's Arch Linux RISC-V project, and thanks to the Arch wisdom, tools in the toolchain we use are always the latest. Still, we met a strange compile error:
1 | error: unknown directive |
After performing some simple queries (cs.github.com is awesome!), we managed to locate the following code (ref):
1 | //! Shared RISC-V intrinsics |
pause, fence and .insn
TL;DR:
pauseisfence w,0, which is.insn i 0x0F, 0, x0, x0, 0x010.pauseis provided by theZihintpauseextension. Though the widely adoptedriscv64gcdoes not containsZihintpause, thefence w,0won't trigger aSIGILL, because it is treated similar to anopinstruction, so we can use it safely without bothering about compatibility.
Firstly, by reading the comments, we can infer that this .insn assembly acts as the pause instruction. This stunned me a bit because AFAIK, the HINT feature in RISC-V ISA has always been in the reserved state (at least until December 2021):
No standard hints are presently defined. We anticipate standard hints to eventually include memory-system spatial and temporal locality hints, branch prediction hints, thread-scheduling hints, security tags, and instrumentation flags for simulation/emulation.
It's not hard to realize, though, that the pause instruction is introduced by Zihintpause extension, as an alias for fence w, 0. Previously, we have to use nop to polyfill the pause function implemented for other architectures:
1 | From 4e559dabe28e57ee27cb45c8297e1e387beed1d3 Mon Sep 17 00:00:00 2001 |
However, in RISC-V, the nop instruction simply stands for "no operation", and does not provide any further clues to relax the CPU, hence not saving any energy (but instead wasting it). So, it's definitely better to replace the fake pause (i.e. nop) with the real pause, as provided in the Zihintpause extension.
One may say that, hey, this extension is not part of the riscv64gc extension set! This argue is valid, as riscv64gc stands for riscv64imafdc_Zicsr_Zifencei (used to be riscv64imafdc, when the I baseline has not been split to I + Zicsr + Zifencei). Let's look at the pause instruction in detail:
Before doing this, you need to grab a RISC-V run-time, via QEMU or via buying a board from SiFive.
1 | cat pause.asm |
Uh-oh, seems like the assembler does not know this instruction! That's because 1) the pause instruction is too new; 2) I'm using riscv64gc, which does not include Zihintpause. Never mind, we can still use fence w,0 as noted in the spec:
1 | cat pause.asm |
Not good. But at least the rust version should work, as they have RISC-V as tier-2 target, and managed to make the release pass the CI test, right?
1 | cat pause.asm |
Hmm, seems like the .insn i is working (compiling, at least), but what does this fence w,unknown stands for? Let's have a loot at the spec:

Shouldn't it be fence w,0? Actually, it is fence w,0, as denoted by the hex 0100000f:

IMO this is a subtle bug (used to be a feature) of the disassembler, and as we can see, this is already fixed in the llvm toolchain:
1 | diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp |
After digging into the underlying mechanism of the pause instruction, we can easily conclude that it will not trigger a SIGILL, as the fence instruction is part of the RV32I baseline instruction set, hence available in all valid RISC-V instruction sets.
1 | cat pause.c |
In conclusion, it's 100% safe to replace pause with .insn i 0x0F, 0, x0, x0, 0x010 to make the code compile, regardless of what RISC-V extensions you are using.
Tracing the Problem
Ah, I know someone must be already complaining: you've write so much analysis, but how are they related to the error occurred when compiling Rust? Actually, the direct answer to this question is simple and naive. Let's take the pause.c, and use clang 13 to compile it, and see what will happen:
1 | clang -v |
Obviously, clang 13.0.1 still doesn't support compiling the .insn directive. The support is to be added into clang since 14.0, as we can see from the target branch of llvm commit 28387979: [RISCV] Initial support .insn directive for the assembler. This is imported into rust in this pull request (#91528).
Still, there's a tiny issue haunting: The rust PR (#91528) is merged months before the initial release of llvm 14.0.0-rc1. Sure, Rust guys are always keen on trying those nightly, bleeding-edge stuffs, but how can they grab the 14.x llvm toolchain before the upstream has ever released it?
After investigating PR #91528, things become clear. The Rust team is maintaining a fork of llvm at rust-lang/llvm-project, and they cherry-picked commit 28387979 to make the .insn stuff compiles when using llvm 13.
By maintaining a fork and constantly modifying / cherry-picking on demand, the Rust team is able to benefit from unreleased changes, or add support for older OS/platforms that are not supported by the upstream. However, this is definitely not a good news for downstream packagers:
- You can't compile Rust with the original toolchain from time to time, when there're cherry-picked commits, or when the Rust team adds new features to their fork, and failed to submit them to the upstream quickly enough.
- If you compile
rust-lang/llvm-projectfirst, and use the compiled llvm toolchain (let's call itrust-llvm) to compile Rust itself, then therustpackage would have conflicts with thellvmpackage, asrustcmay need to link to.sofiles provided byllvm(orrust-llvm, depends on which one you are using to compilerustc), and the.sofiles may have different ABI (Application Binary Interface), causing incompatibility. Also,- Arch Linux only provides shared build, so static linking is not a preferable way to solve this.
- Splitting the
rustpackage torust-llvm+rustwon't help. That's because the linked.sofiles need to be presented at run-time, sorust-llvmwill be put intorust'sdependsarray, hence failing to resolve the conflict.
Currently, we can still hide the problem by letting the build fail for some time, and wait for newer llvm releases that contain those features required by building rustc. Sure, this solution is not elegant, and things might get worse when the difference between rust-llvm and llvm becomes so huge that it's impossible to compile rustc with the upstream llvm. But we can't take the burden to make rust incompatible with llvm -- there are way too much packages that depend on both rust and llvm now. Fortunately, consider the Rust team's claim on rust-llvm, that they will always attempt to submit their new features to upstream, maybe we don't need to worry too much.