This article has moved to my new substack: https://fprox.substack.com/p/risc-v-vector-extension-in-a-nutshell-part-3.
Personal blog on computer architecture, RISC-V, technical coding challenges, problems and sometime solutions. Mathematics, Arithmetic, Cryptography, Compilers, Floating-point, Compression or whatever interests me.
RISC-V Vector Extension in a Nutshell (Part 2): arithmetic operations
This article has moved to my new substack: https://fprox.substack.com/p/risc-v-vector-extension-in-nutshell
RISC-V Vector Extension in a Nutshell (Part 1)
New option for RISC-V Vector performance simulation
There have been some news since I published Performance simulation of RISC-V Vector
Rivos Inc (a recent contender in the RISC-V race) has been working on extending GEM5 to support RVV.
More info are available on this post of the gem5-dev mailing list and the source code is accessible on Rivos's github page: https://github.com/rivosinc/gem5/commits/rivos/dev/joy/initial_RVV_support.
It seems their port is not directly related to Cristobal Ramirez / PCLT effort but you should still be able to follow the direction given in my initial article.
Performance simulation of RISC-V vector extension using GEM5
We will be using a fork of Cristobal Ramirez Lazo's gem5 fork namely https://github.com/plctlab/plct-gem5, this fork is quite active and has been updated to RVV 1.0 . The initial fork extends GEM5 with RVV support and a vector processing unit extension (and its GEM5 configuration).
Building gem5 for RISC-V
# on ubuntu 20.04 sudo apt install build-essential git m4 scons zlib1g zlib1g-dev \ libprotobuf-dev protobuf-compiler libprotoc-dev libgoogle-perftools-dev \ python3-dev python3-six python-is-python3 libboost-all-dev pkg-config
git clone https://github.com/plctlab/plct-gem5.git cd plct-gem5 # the -j option set the number of parallel jobs for the build # the actually this option can be tuned to your setting. scons build/RISCV/gem5.opt -j3 --gold-linker
Execution
${GEM5_DIR}build/RISCV/gem5.opt ${GEM5_DIR}configs/example/riscv_vector_engine.py \ --cmd="$program $program_args"
Note
References:
- gem5 fork by PCLTLAB (still active) with RVV support https://github.com/plctlab/plct-gem5
- Cristobal Ramirez Lazo's gem5 fork with initial risc-v vector support https://github.com/RALC88/gem5/tree/develop
- gem5 source code repository mirror on github: https://github.com/gem5/gem5
- gem5 project task to track RISCV-V vector upstream support: https://gem5.atlassian.net/browse/GEM5-618
- Tutorial on GEM5 https://www.cs.sfu.ca/~ashriram/Courses/CS7ARCH/tutorials/gem5/index.html
Programming with RISC-V Vector extension: how to build and execute a basic RVV test program (emulation/simulation)
Update(s):
- Jan 16th 2022 adding section on Objdump
RISC-V Vector Extension (RVV) has recently been ratified in its version 1.0 (announcement and specification pdf). The 1.0 milestone is key, it means RVV maturity has reached a stable state: numerous commercial and free implementation of the standard are appearing and software developers can now dedicate significant effort to port and develop library on top of RVV without fear of seeing the specification rug being pulled under their feet. In this article we will review how to build a version of the clang compiler compatible with RVV (v0.10) and to develop, build and execute our first RVV program.
Building the compiler
# update to the your intended install directory export RISCV=~/RISCV-TOOLS/ # downloading basic riscv gnu toolchain, providing: # - runtime environement for riscv64-unknown-elf (libc, ...) # - spike simulator git clone https://github.com/riscv-collab/riscv-gnu-toolchain ./configure --prefix=$RISCV make -j3
# downloading llvm-project source from github git clone https://github.com/llvm/llvm-project.git cd llvm-project # configuring build to build llvm and clang in Release mode # using ninja # and to use gold as the linker (less RAM required) # limiting targets to RISCV, and using riscv-gnu-toolchian # as basis for sysroot cmake -G Ninja -DLLVM_ENABLE_PROJECTS="clang;lld;" \ -DCMAKE_BUILD_TYPE=Release \ -DDEFAULT_SYSROOT="$RISCV/riscv64-unknown-elf/" \ -DGCC_INSTALL_PREFIX="$RISCV" \ -S llvm -B build-riscv/ -DLLVM_TARGETS_TO_BUILD="RISCV" # building clang/llvm using 4 jobs (can be tuned to your machine) cmake --build build/ -j4
This process will generate clang binary in llvm-project/build/bin/clang .
More information on how to download and build clang/llvm can be found on the project github page.
Development.
As a first exercise, we will use the SAXPY example from rvv-intrinsic-doc rvv_saxpy.c.
Building the simulator
# downloading and install proxy-kernel for riscv64-unknown-elf git clone https://github.com/riscv-software-src/riscv-pk.git cd riscv-v mkdir build && cd build make -j4 && make install ../configure --prefix=$RISCV --host=riscv64-unknown-elf
Let's now build the simulator (directly from the top of the master branch, why not !).
git clone https://github.com/riscv-software-src/riscv-isa-sim cd riscv-isa-sim mkdir build && cd build ../configure --prefix=$RISCV make -j4 && make install
Building the program
RVV is supported as part of the experimental extensions of clang. Thus it must be enabled explicitly when executing clang, and it must be associated with a version number, the current master of clang only support v0.10 of the RVV specification.clang -L $RISCV/riscv64-unknown-elf/lib/ --gcc-toolchain=$RISCV/ \
rvv_saxpy.c -menable-experimental-extensions -march=rv64gcv0p10 \
-target riscv64 -O3 -mllvm --riscv-v-vector-bits-min=256 \
-o test-riscv-clang
Executing
To execute the program we are going to use the spike simulator and the riscv-pk proxy kernel.
Spike is part of the riscv-gnu-toolchain available at https://github.com/riscv-collab/riscv-gnu-toolchain , riscvv-pk is also available on github. https://github.com/riscv-software-src/riscv-pk
the binary image of pk must be the first unnamed argument to spike before the main elf.
$RISCV/bin/spike --isa rv64gcv $RISCV/riscv64-unknown-elf/bin/pk \
test-riscv-clang
NOTES: I tried to use riscv-tools (https://github.com/riscv-software-src/riscv-tools) does not seem actively maintain and several issue poped up when I tried building it.
Objdump
Not all objdump support RISC-V vector extension. If you have built llvm has indicated above, you should be able to use the llvm-objdump program built within to disassemble a program with vector instructions.
llvm-objdump -d --mattr=+experimental-v <binary_file>
References
- IREE (MLIR dialect) page of riscv-v cross compilation https://google.github.io/iree/building-from-source/riscv/
- Official llvm github project https://github.com/llvm/llvm-project
- RVV intrinsic documentation on github https://github.com/riscv-non-isa/rvv-intrinsic-doc
- https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/GiTkPw-9r8A?pli=1
- Properly configuring clang/llvm build https://stackoverflow.com/questions/68580399/using-clang-to-compile-for-risc-v
Assisted assembly development for RISC-V RV32
In this post we will present how the assembly development environment tool (asmde) can ease assembly program development for RISC-V ISA.
You will develop a basic floating-point vector add routine.
Introducing ASMDE
The ASseMbly Development Environment (asmde, https://github.com/nibrunie/asmde) is an open-source set of python utility to help the assembly developper. The main eponym utility, asmde, is a register assignation script. It consumes a templatized assembly source file and fill in variable names with legal register, removing the burden of register allocation from the developper.
Recently, alpha support for RV32 (32-bit version of RISC-V) was added to asmde. We are going to demonstrate how to use it in this post.
Vector-Add testbench
/** Basic single-precision vector add * @param dst destination array * @param lhs left-hand side operand array * @param lhs right-hand side operand array * @param n vector sizes */ void my_vadd(float* dst, float* lhs, float* rhs, unsigned n);
The program is split in two files:
- a test bench main.c
- an asmde template file vec_add.template.S
Review of the assembly template
// testing for basic RISC-V RV32I program // void vector_add(float* dst, float* src0, float* src1, unsigned n) //#PREDEFINED(a0, a1, a2, a3) .option nopic .attribute arch, "rv32i2p0_m2p0_a2p0_f2p0_d2p0" .attribute unaligned_access, 0 .attribute stack_align, 16 .text .align 1 .globl my_vadd .type my_vadd, @function my_vadd: // check for early exit condition n == 0 beq a3, x0, end loop: // load inputs flw F(LHS), 0(a1) flw F(RHS), 0(a2) // operation fadd.s F(ACC), F(LHS), F(RHS) // store result fsw F(ACC), 0(a0) // update addresses addi a1, a1, 4 addi a2, a2, 4 addi a0, a0, 4 // update loop count addi a3, a3, -1 // branch if not finished bne x0, a3, loop end: ret .size my_vadd, .-my_vadd .section .rodata.str1.8,"aMS",@progbits,1
ASMDE Macro
ASMDE Variable
flw F(LHS), 0(a1) flw F(RHS), 0(a2) // operation fadd.s F(ACC), F(LHS), F(RHS) // store result fsw F(ACC), 0(a0)
Assembly template translation
python3 asmde.py -S --arch rv32 \
examples/riscv/test_rv32_vadd.S \
--output vadd.S
Building and executing the test program
#include <stdio.h> #ifdef LOCAL_IMPLEMENTATION void my_vadd(float* dst, float* lhs, float* rhs, unsigned n){ unsigned i; for (i = 0; i < n; ++i) dst[i] = lhs[i] + rhs[i]; } #else void my_vadd(float* dst, float* lhs, float* rhs, unsigned n); #endif int main() { float dst[4]; float a[4] = {1.0f, 2.0f, 3.0f, 4.0f}; float b[4] = {4.0f, 3.0f, 2.0f, 1.0f}; my_vadd(dst, a, b, 4); int i; for (i = 0; i < 4; ++i) { if (dst[i] != 5.0f) { printf("failure\n"); return -1; } } printf("success\n"); return 0; }
(requires rv32 gnu toolchain and a 32-bit proxy kernel pk)
# building test program $ riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -o test_vadd vadd.S test_vadd.c # executing binary $ spike --isa=RV32gc riscv32-unknown-elf/bin/pk ./test_vadd
Conclusion
References:
- asmde github page: https://github.com/nibrunie/asmde