New option for RISC-V Vector performance simulation

There have been some news since I published Performance simulation of RISC-V Vector

 Rivos Inc (a recent contender in the RISC-V race) has been working on extending GEM5 to support RVV.

More info are available on this post of the gem5-dev mailing list and the source code is accessible on Rivos's github page: https://github.com/rivosinc/gem5/commits/rivos/dev/joy/initial_RVV_support.

It seems their port is not directly related to Cristobal Ramirez / PCLT effort but you should still be able to follow the direction given in my initial article.



Performance simulation of RISC-V vector extension using GEM5

We will be using a fork of Cristobal Ramirez Lazo's gem5 fork namely https://github.com/plctlab/plct-gem5, this fork is quite active and has been updated to RVV 1.0 . The initial fork extends GEM5 with RVV support and a vector processing unit extension (and its GEM5 configuration).

Building gem5 for RISC-V

gem5 has some external dependencies which must be installed before the build

# on ubuntu 20.04
sudo apt install build-essential git m4 scons zlib1g zlib1g-dev \
    libprotobuf-dev protobuf-compiler libprotoc-dev libgoogle-perftools-dev \
    python3-dev python3-six python-is-python3 libboost-all-dev pkg-config

More Information on building GEM5 can be found here: https://www.gem5.org/documentation/general_docs/building .
Then the actual build can be executed:

git clone https://github.com/plctlab/plct-gem5.git
cd plct-gem5
# the -j option set the number of parallel jobs for the build
# the actually this option can be tuned to your setting.
scons build/RISCV/gem5.opt -j3  --gold-linker

Execution

As described in the project README.md, execution a program is straightforward: 
${GEM5_DIR}build/RISCV/gem5.opt ${GEM5_DIR}configs/example/riscv_vector_engine.py \
                                --cmd="$program $program_args"

Note

This fork is still under development and some instructions are not supported yet.

References:

Programming with RISC-V Vector extension: how to build and execute a basic RVV test program (emulation/simulation)

Update(s):

- Jan 16th 2022 adding section on Objdump


RISC-V Vector Extension (RVV) has recently been ratified in its version 1.0 (announcement and specification pdf). The 1.0 milestone is key, it means RVV maturity has reached a stable state: numerous commercial and free implementation of the standard are appearing and software developers can now dedicate significant effort to port and develop library on top of RVV without fear of seeing the specification rug being pulled under their feet. In this article we will review how to build a version of the clang compiler compatible with RVV (v0.10) and to develop, build and execute our first RVV program.

Building the compiler

Before building a compiler for RVV with need a basic riscv toolchain. This toolchain will provide the standard library and some basic tools require to build a functioning binary. The toolchain will be installed under ~/RISCV/ (feel free to adapt this directory to your setup).

# update to the your intended install directory
export RISCV=~/RISCV-TOOLS/

# downloading basic riscv gnu toolchain, providing:
# - runtime environement for riscv64-unknown-elf (libc, ...)
# - spike simulator
git clone https://github.com/riscv-collab/riscv-gnu-toolchain
./configure --prefix=$RISCV
make -j3

Compiling for RVV requires a recent version of clang (this was tested with clang 14).

# downloading llvm-project source from github
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
# configuring build to build llvm and clang in Release mode
# using ninja
# and to use gold as the linker (less RAM required)
# limiting targets to RISCV, and using riscv-gnu-toolchian
# as basis for sysroot
cmake -G Ninja -DLLVM_ENABLE_PROJECTS="clang;lld;" \
      -DCMAKE_BUILD_TYPE=Release \
      -DDEFAULT_SYSROOT="$RISCV/riscv64-unknown-elf/" \
      -DGCC_INSTALL_PREFIX="$RISCV" \
      -S llvm -B build-riscv/ -DLLVM_TARGETS_TO_BUILD="RISCV"
# building clang/llvm using 4 jobs (can be tuned to your machine)
cmake --build build/ -j4


Building clang/llvm require a large amount of RAM (8GB seems to be the bare minimum, 16GB is best) and will consume a lot of disk space. Those requirements can be reduced by selecting Release build type (rather than the default Debug) and by using gold linker.

This process will generate clang binary in llvm-project/build/bin/clang .

More information on how to download and build clang/llvm can be found on the project github page.
Development.

The easiest way to develop software directly is for RVV is to rely on the rvv intrinsics. This project offers intrinsics for most of the instruction of the extension. The documentation is accessible on github and support is appearing in standard compilers (most notably clang/llvm).

As a first exercise, we will use the SAXPY example from rvv-intrinsic-doc rvv_saxpy.c.

Building the simulator

Let's first build an up-to-date proxy kernel pk:

# downloading and install proxy-kernel for riscv64-unknown-elf
git clone https://github.com/riscv-software-src/riscv-pk.git
cd riscv-v
mkdir build && cd build
make -j4 && make install
../configure --prefix=$RISCV --host=riscv64-unknown-elf

Let's now build the simulator (directly from the top of the master branch, why not !).

git clone https://github.com/riscv-software-src/riscv-isa-sim
cd riscv-isa-sim
mkdir build && cd build
../configure --prefix=$RISCV
make -j4 && make install

Building the program

RVV is supported as part of the experimental extensions of clang. Thus it must be enabled explicitly when executing clang, and it must be associated with a version number, the current master of clang only support v0.10 of the RVV specification.

clang -L $RISCV/riscv64-unknown-elf/lib/ --gcc-toolchain=$RISCV/ \
       rvv_saxpy.c -menable-experimental-extensions -march=rv64gcv0p10 \
      -target riscv64 -O3 -mllvm --riscv-v-vector-bits-min=256 \
       -o test-riscv-clang

Executing

To execute the program we are going to use the spike simulator and the riscv-pk proxy kernel.

Spike is part of the riscv-gnu-toolchain available at https://github.com/riscv-collab/riscv-gnu-toolchain , riscvv-pk is also available on github. https://github.com/riscv-software-src/riscv-pk

the binary image of pk must be the first unnamed argument to spike before the main elf.

$RISCV/bin/spike --isa rv64gcv $RISCV/riscv64-unknown-elf/bin/pk \
                  test-riscv-clang

NOTES: I tried to use riscv-tools (https://github.com/riscv-software-src/riscv-tools) does not seem actively maintain and several issue poped up when I tried building it.

Objdump

Not all objdump support RISC-V vector extension. If you have built llvm has indicated above, you should be able to use the llvm-objdump program built within to disassemble a program with vector instructions.

llvm-objdump -d --mattr=+experimental-v <binary_file>

References


Assisted assembly development for RISC-V RV32

 In this post we will present how the assembly development environment tool (asmde) can ease assembly program development for RISC-V ISA.

You will develop a basic floating-point vector add routine.

Introducing ASMDE

The ASseMbly Development Environment (asmde, https://github.com/nibrunie/asmde) is an open-source set of python utility to help the assembly developper. The main eponym utility, asmde, is a register assignation script. It consumes a templatized assembly source file and fill in variable names with legal register, removing the burden of register allocation from the developper.

    Recently, alpha support for RV32 (32-bit version of RISC-V) was added to asmde. We are going to demonstrate how to use it in this post.

Vector-Add testbench

The example we chose to implement is a basic vector add.

/** Basic single-precision vector add
 *  @param dst destination array
 *  @param lhs left-hand side operand array
 *  @param lhs right-hand side operand array
 *  @param n vector sizes
 */
void my_vadd(float* dst, float* lhs, float* rhs, unsigned n);

The program is split in two files:

- a test bench main.c

- an asmde template file vec_add.template.S

Review of the assembly template

The listing below present the input template. It consists in a basic assembly source file extended with some asmde specific constructs.

// testing for basic RISC-V RV32I program
// void vector_add(float* dst, float* src0, float* src1, unsigned n)
//#PREDEFINED(a0, a1, a2, a3)
        .option nopic
        .attribute arch, "rv32i2p0_m2p0_a2p0_f2p0_d2p0"
        .attribute unaligned_access, 0
        .attribute stack_align, 16
        .text
        .align  1
        .globl  my_vadd
        .type   my_vadd, @function
my_vadd:
        // check for early exit condition n == 0
        beq a3, x0, end
loop:
        // load inputs
        flw F(LHS), 0(a1)
        flw F(RHS), 0(a2)
        // operation
        fadd.s F(ACC), F(LHS), F(RHS)
        // store result
        fsw F(ACC), 0(a0)
        // update addresses
        addi a1, a1, 4
        addi a2, a2, 4
        addi a0, a0, 4
        // update loop count
        addi a3, a3, -1
        // branch if not finished
        bne x0, a3, loop
end:
        ret
        .size   my_vadd, .-my_vadd
        .section        .rodata.str1.8,"aMS",@progbits,1


ASMDE Macro

The mandatory comment are followed by an asmde macro PREDEFINED.
This macro indicates to asmde assignator that the argument list of registers should be considered alive when entering the function. It is often used to list function arguments. 


ASMDE Variable

The second construct provided by asmde are the assembly variables.
                flw F(LHS), 0(a1)
                flw F(RHS), 0(a2)
                // operation
                fadd.s F(ACC), F(LHS), F(RHS)
                // store result
                fsw F(ACC), 0(a0)

Those variables are of the form <specifier>(<varname>). In this example we use the specifier F for floating-point register variables. The specifiers X or I can be used for integer registers. These variables are used to manipulate (write to / read from) virtual registers. asmde will perform the register assignation, taken into account the instruction semantics and the program structure.
    Here for example, we used F(LHS) variable to load an element of the left-hand side vector, F(RHS) to load elements from the right-hand side vector and F(ACC) contains the sum of those two variables which is later stored back into the destination array.


Assembly template translation

asmde can be invoked as follow to generate an assembly file with assigned registers:

python3 asmde.py -S --arch rv32 \
                 examples/riscv/test_rv32_vadd.S \
                --output vadd.S

Building and executing the test program

We can build our toy example alongside a small testbench:
#include <stdio.h>

#ifdef LOCAL_IMPLEMENTATION
void my_vadd(float* dst, float* lhs, float* rhs, unsigned n){
    unsigned i;
    for (i = 0; i < n; ++i)
        dst[i] = lhs[i] + rhs[i];
}
#else
void my_vadd(float* dst, float* lhs, float* rhs, unsigned n);
#endif


int main() {
    float dst[4];
    float a[4] = {1.0f, 2.0f, 3.0f, 4.0f};
    float b[4] = {4.0f, 3.0f, 2.0f, 1.0f};
    my_vadd(dst, a, b, 4);

    int i;
    for (i = 0; i < 4; ++i) {
        if (dst[i] != 5.0f) {
            printf("failure\n");
            return -1;
        }
    }

    printf("success\n");
    return 0;
}

And finally execute it.

 (requires rv32 gnu toolchain and a 32-bit proxy kernel pk)

# building test program
$ riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -o test_vadd vadd.S test_vadd.c
# executing binary
$ spike --isa=RV32gc riscv32-unknown-elf/bin/pk  ./test_vadd


Conclusion

I hope this small example was useful to you and that you will be able to use asmde in your own project.
If you find issues (there are many), you can report them on github https://github.com/nibrunie/asmde/issues/new/choose . If you have some feedback do not hesitate to write a comment here.

Happy hacking with RISC-V.


References:

- asmde github page: https://github.com/nibrunie/asmde

-  RISC-V unpriviledged ISA specification 

- GNU Toolchain for RISC-V

- Programming with RISC-V vector instructions