RISC-V Register Files

RISC-V ISA defines several register files. There are at least 3 in the main set of extensions: the general purpose register file (XRF) introduced in the base integer extensions, the floating-point register file (FRF) introduced in the floating-point extensions and the vector register file (you guessed it, it was introduced in the vector extension a.k.a. RVV). We will not consider the control and status registers (CSR file) which have their own specificity (it is more common to split system registers from general purpose registers, although commonality could be debated).

Diagram of RISC-V register files and the operations between them


Register files characteristics

Each register file contains 32 architectural registers. The first register of the general purpose register file (x0) is a bit specific since its value is a hardwired constant: 0. The first register of the vector register file (v0) is the only one which can be used as the mask operand in RVV 1.0 (more information on RVV masked operation can be found in RISC-V Vector Extension in a Nutshell: part 3). 

The size of the registers in each file is an architecture parameter: XLEN for the general purpose register file, FLEN for the floating-point register file and VLEN for the vector register file. 

The general purpose register file is sized to fit a virtual address value. For example for the base 32-bit RV32I ISA, the general purpose registers are 32-bit wide, while they are 64-bit wide for RV64I (and 128-bit wide for RV128I, although this architecture is seldom used). For the vectore registers, VLEN must be a power of 2, greater or equal to ELEN (maximum element width supported by the implementation) and must not exceed 65536 (2^16). 

Moving data between register files

The diagram at the beginning of this post illustrates the base characteristics of each register file and the basic move operations between them and from/to memory. For the XRF and FRF, only operations on 32-bit values are drawn. There exists similar operation for double precisions: e.g. in RV64D when XLEN=FLEN=64 bits, which adds fldfsdfmv.d.xfmv.x.d and numerous conversions from integer format to double precision and reverse (e.g. in RV64IFD fcvt.d.wu f3, x2 corresponds to converting from an unsigned 32-bit integer in the bottom 32-bit of the x2 general purpose register to a double precision number stored in the f3 floating-point register).

For the vector register file, the scalar data size and the vector element size are not encoded as part of the opcode but are configured in the vsew field of the vtype configuration register, so there are no need for type specific vector moves. The diagram only represents explicit data moves between FRF/XRF and VRF but most vector instructions admit a vector-scalar variant which reads one of its operand directly from XRF or FRF (e.g. vfmadd.vf splats a scalar floating-point register as the multiplier). For more details on the vector registers and the vector extension in general you can refer to the series: RISC-V Vector extension in a Nutshell published on this blog.

Why multiple register files ?

This discussion focuses on the general purpose and floating-point register files. It is much more understandable to have a different vector register files (vector register tends to be larger than the other types of registers) although some ISAs (e.g. x86 SSE and AVX extensions) reuse small registers as the low parts of larger registers, this is not the case in RISC-V, where general purpose, floating-point and vector registers do not overlap. 

NOTE: The option of overlapping FRF and VRF was considered and dropped during the specification process of RVV. See https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#51-scalar-operands

Having multiple register files has some advantages and some drawbacks. Let's start with a benefit.

The first benefit is that the architecture can expose more architectural registers without extending the size of a register index in the instruction encoding. The type of instruction (.e.g. floating-point addition) is encoded in the opcode, and all (most) floating-point instructions implicitly operate on floating-point registers, thus there is no need to distinguish floating-point and general purpose registers in the opcode encoding of the register itself. The number of available architectural registers impact the register allocation pressure when writing (in assembly) or compiling a program: more registers often means less flexible ABI, less spilling.

In general, RISC-V ISA uses 5 bits to encode the register index (for operands and results), which provides 32 registers, since the type of register is part of the opcode specification rather than the register index, RISC-V architecture can in fact have 96 registers: 32 general purpose registers, 32 floating-point registers and 32 vector registers (you are authorized to say only 95 and exclude the specific x0, although having a hardwired 0-value operand is quite useful and certainly outweighs the benefit of an extra general purpose register in most cases).

This benefit can also be considered a drawback since more registers also means higher hardware cost for implementations and more registers to save in case of context switch. This was solved for RISC-V by introducing the Zfinx/Zdinx/Zhinx extensions which define floating-point operations working on general purpose registers (thus limiting the cost of floating-point support for the more constrained implementations: high performance out-of-order implementation with very large physical register files are generally less concerned by limiting the size of the architectural register files, although it can impact mapping tables, context sizes ...).

Another benefit of separate register files is that register can be encoded to optimize the data they store: for example the RISC-V open-source processor rocket-chip uses hardfloat's specific encoding of floating-point numbers (see Recoded Format section here) to makes floating-point operations more efficient (simplifying the detection of special values and reducing the encoding difference between normal and subnormal numbers). The use of a format specific encoding is facilitated by the fact that data moves are explicit (you need to execute a fmv.f.x to move the content of a general purpose register to a floating-point register before performing any operation on it): the recoding of value can be performed during the explicit data move (including from/to memory) and the recoding can exploit the fact that the value type is determined by the operation acting on it. Such recoding can only leave within an internal register and values must be converted to canonical formats when moved to another register file or to memory.

Once again this can also be considered a drawback since you have to explicitly move data from one register file to the next (which may have a non-zero latency and uses encoding space) and that you need to define separate memory operations for each register files: loading a 32-bit single precision number is not the same operation as loading a 32-bit integer value, since the destination registers differ. This was surveyed by the diagram and the section Moving data between register files. This drawback is more easily alleviated by wide out-of-order implementations which can extract more ILP and cover the cost of those explicit moves (although this cost still impacts latency dominated chains of instructions). It is also removed in the Z(f/d/h)inx extensions.

Another advantage of multiple register files is that you can tune the file's architecture characteristics to the domain you want to support. RISC-V does not expose a configurable number of registers but you XLEN, FLEN and VLEN are defined separately (the first two depending on which extensions are enabled): you can have a 32-bit ISA with 64-bit double precision registers: the architecture does not have to extend its integer registers to 64 bits, while still keeping 64 registers with adapted sizes: general purpose/integer instructions can be efficient and low power while the core only activates the FRF for workloads which require the extra activity.

Finally an implementation advantage, which was pointed out by a colleague (Alex S.): having several register files reduces the number of read / write ports per register file while being able to serve as many execution pipelines more efficiently. High performance cores often have a lot of ports on each register files (more than ten read ports may not be uncommon) that serve a lot of different execution pipelines in parallel. With the same total number of execution pipelines multiplying the number of architectural register file helps to split physical register files accordingly, decreasing the number of port per file. This is a good benefit as the complexity of a register file increases rapidly with the number of ports (in particular read ports). Even for low performance implementations, limiting the number of ports per file provide more efficient register files.

Conclusion

In this post we reviewed the main register files specified by RISC-V ISA, their basic characteristics and how they interact together. We listed some of the reasons for this choice alongside some of the drawbacks of specializing register files.

Initially published Oct 17th 2022, updated Oct 19th 2022.

Thanks

Thank you to Alex S. for pointing out a key advantage (decreasing the number of ports per register file) I missed in the first version of this blog post.

References:




No comments:

Post a Comment