Untitled :: RISC-V ISA Manual

SHA2 hash function instructions.

A core which implements Zkn must implement all of the above extensions.

`Zks` - ShangMi Algorithm Suite

This extension is shorthand for the following set of other extensions:

Included Extension	Description
Zbkb	Bitmanipulation instructions for cryptography.
Zbkc	Carry-less multiply instructions.
Zbkx	Cross-bar Permutation instructions.
Zksed	SM4 block cipher instructions.
Zksh	SM3 hash function instructions.

Included Extension

Description

Zbkb

Bitmanipulation instructions for cryptography.

Zbkc

Carry-less multiply instructions.

Zbkx

Cross-bar Permutation instructions.

Zksed

SM4 block cipher instructions.

Zksh

SM3 hash function instructions.

A core which implements Zks must implement all of the above extensions.

`Zk` - Standard scalar cryptography extension

This extension is shorthand for the following set of other extensions:

Included Extension	Description
Zkn	NIST Algorithm suite extension.
Zkr	Entropy Source extension.
Zkt	Data independent execution latency extension.

Included Extension

Description

NIST Algorithm suite extension.

Zkr

Entropy Source extension.

Zkt

Data independent execution latency extension.

A core which implements Zk must implement all of the above extensions.

`Zkt` - Data Independent Execution Latency

This extension allows CPU implementers to indicate to cryptographic software developers that a subset of RISC-V instructions are guaranteed to be implemented such that their execution latency is independent of the data values they operate on. A complete description of this extension is found in Data Independent Execution Latency Subset: Zkt.

Instructions

aes32dsi

Synopsis: AES final round decryption instruction for RV32.
Mnemonic: aes32dsi rd, rs1, rs2, bs
Encoding

Description: This instruction sources a single byte from rs2 according to bs. To this it applies the inverse AES SBox operation, and XOR’s the result with rs1. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (AES32DSI (bs,rs2,rs1,rd)) = {
  let shamt   : bits( 5) = bs @ 0b000; /* shamt = bs*8 */
  let si      : bits( 8) = (X(rs2)[31..0] >> shamt)[7..0]; /* SBox Input */
  let so      : bits(32) = 0x000000 @ aes_sbox_inv(si);
  let result  : bits(32) = X(rs1)[31..0] ^ rol32(so, unsigned(shamt));
  X(rd) = EXTS(result); RETIRE_SUCCESS
}

Included in

Extension	Minimum version	Lifecycle state
Zknd (RV32)	v1.0.0	Frozen
Zkn (RV32)	v1.0.0	Frozen
Zk (RV32)	v1.0.0	Frozen

Extension

Minimum version

Lifecycle state

Zknd (RV32)

v1.0.0

Frozen

Zkn (RV32)

v1.0.0

Frozen

Zk (RV32)

v1.0.0

Frozen

aes32dsmi

Synopsis: AES middle round decryption instruction for RV32.
Mnemonic: aes32dsmi rd, rs1, rs2, bs
Encoding

Description: This instruction sources a single byte from rs2 according to bs. To this it applies the inverse AES SBox operation, and a partial inverse MixColumn, before XOR’ing the result with rs1. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (AES32DSMI (bs,rs2,rs1,rd)) = {
  let shamt   : bits( 5) = bs @ 0b000; /* shamt = bs*8 */
  let si      : bits( 8) = (X(rs2)[31..0] >> shamt)[7..0]; /* SBox Input */
  let so      : bits( 8) = aes_sbox_inv(si);
  let mixed   : bits(32) = aes_mixcolumn_byte_inv(so);
  let result  : bits(32) = X(rs1)[31..0] ^ rol32(mixed, unsigned(shamt));
  X(rd) = EXTS(result); RETIRE_SUCCESS
}

Included in

Extension	Minimum version	Lifecycle state
Zknd (RV32)	v1.0.0	Frozen
Zkn (RV32)	v1.0.0	Frozen
Zk (RV32)	v1.0.0	Frozen

Extension

Minimum version

Lifecycle state

Zknd (RV32)

v1.0.0

Frozen

Zkn (RV32)

v1.0.0

Frozen

Zk (RV32)

v1.0.0

Frozen

aes32esi

Synopsis: AES final round encryption instruction for RV32.
Mnemonic: aes32esi rd, rs1, rs2, bs
Encoding

Description: This instruction sources a single byte from rs2 according to bs. To this it applies the forward AES SBox operation, before XOR’ing the result with rs1. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (AES32ESI (bs,rs2,rs1,rd)) = {
  let shamt   : bits( 5) = bs @ 0b000; /* shamt = bs*8 */
  let si      : bits( 8) = (X(rs2)[31..0] >> shamt)[7..0]; /* SBox Input */
  let so      : bits(32) = 0x000000 @ aes_sbox_fwd(si);
  let result  : bits(32) = X(rs1)[31..0] ^ rol32(so, unsigned(shamt));
  X(rd) = EXTS(result); RETIRE_SUCCESS
}

Included in

Extension	Minimum version	Lifecycle state
Zkne (RV32)	v1.0.0	Frozen
Zkn (RV32)	v1.0.0	Frozen
Zk (RV32)	v1.0.0	Frozen

Extension

Minimum version

Lifecycle state

Zkne (RV32)

v1.0.0

Frozen

Zkn (RV32)

v1.0.0

Frozen

Zk (RV32)

v1.0.0

Frozen

aes32esmi

Synopsis: AES middle round encryption instruction for RV32.
Mnemonic: aes32esmi rd, rs1, rs2, bs
Encoding

Description: This instruction sources a single byte from rs2 according to bs. To this it applies the forward AES SBox operation, and a partial forward MixColumn, before XOR’ing the result with rs1. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (AES32ESMI (bs,rs2,rs1,rd)) = {
  let shamt   : bits( 5) = bs @ 0b000; /* shamt = bs*8 */
  let si      : bits( 8) = (X(rs2)[31..0] >> shamt)[7..0]; /* SBox Input */
  let so      : bits( 8) = aes_sbox_fwd(si);
  let mixed   : bits(32) = aes_mixcolumn_byte_fwd(so);
  let result  : bits(32) = X(rs1)[31..0] ^ rol32(mixed, unsigned(shamt));
  X(rd) = EXTS(result); RETIRE_SUCCESS
}

Included in

Extension	Minimum version	Lifecycle state
Zkne (RV32)	v1.0.0	Frozen
Zkn (RV32)	v1.0.0	Frozen
Zk (RV32)	v1.0.0	Frozen

Extension

Minimum version

Lifecycle state

Zkne (RV32)

v1.0.0

Frozen

Zkn (RV32)

v1.0.0

Frozen

Zk (RV32)

v1.0.0

Frozen

aes64ds

Synopsis: AES final round decryption instruction for RV64.
Mnemonic: aes64ds rd, rs1, rs2
Encoding

Description: Uses the two 64-bit source registers to represent the entire AES state, and produces half of the next round output, applying the Inverse ShiftRows and SubBytes steps. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.

Note To Software Developers

The following code snippet shows the final round of the AES block decryption. t0 and t1 hold the current round state. t2 and t3 hold the next round state.

aes64ds t2, t0, t1
aes64ds t3, t1, t0

Note the reversed register order of the second instruction.

Operation

function clause execute (AES64DS(rs2, rs1, rd)) = {
  let sr : bits(64) = aes_rv64_shiftrows_inv(X(rs2)[63..0], X(rs1)[63..0]);
  let wd : bits(64) = sr[63..0];
  X(rd) = aes_apply_inv_sbox_to_each_byte(wd);
  RETIRE_SUCCESS
}

Included in

Extension	Minimum version	Lifecycle state
Zknd (RV64)	v1.0.0	Frozen
Zkn (RV64)	v1.0.0	Frozen
Zk (RV64)	v1.0.0	Frozen

Extension

Minimum version

Lifecycle state

Zknd (RV64)

v1.0.0

Frozen

Zkn (RV64)

v1.0.0

Frozen

Zk (RV64)

v1.0.0

Frozen

aes64dsm

Synopsis: AES middle round decryption instruction for RV64.
Mnemonic: aes64dsm rd, rs1, rs2
Encoding

Description: Uses the two 64-bit source registers to represent the entire AES state, and produces half of the next round output, applying the Inverse ShiftRows, SubBytes and MixColumns steps. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.

Note To Software Developers

The following code snippet shows one middle round of the AES block decryption. t0 and t1 hold the current round state. t2 and t3 hold the next round state.

aes64dsm t2, t0, t1
aes64dsm t3, t1, t0

Note the reversed register order of the second instruction.

Operation

function clause execute (AES64DSM(rs2, rs1, rd)) = {
  let sr : bits(64) = aes_rv64_shiftrows_inv(X(rs2)[63..0], X(rs1)[63..0]);
  let wd : bits(64) = sr[63..0];
  let sb : bits(64) = aes_apply_inv_sbox_to_each_byte(wd);
  X(rd)  = aes_mixcolumn_inv(sb[63..32]) @ aes_mixcolumn_inv(sb[31..0]);
  RETIRE_SUCCESS
}

Included in

Extension	Minimum version	Lifecycle state
Zknd (RV64)	v1.0.0	Frozen
Zkn (RV64)	v1.0.0	Frozen
Zk (RV64)	v1.0.0	Frozen

Extension

Minimum version

Lifecycle state

Zknd (RV64)

v1.0.0

Frozen

Zkn (RV64)

v1.0.0

Frozen

Zk (RV64)

v1.0.0

Frozen

aes64es

Synopsis: AES final round encryption instruction for RV64.
Mnemonic: aes64es rd, rs1, rs2
Encoding

Description: Uses the two 64-bit source registers to represent the entire AES state, and produces half of the next round output, applying the ShiftRows and SubBytes steps. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.

Note To Software Developers

The following code snippet shows the final round of the AES block encryption. t0 and t1 hold the current round state. t2 and t3 hold the next round state.

aes64es t2, t0, t1
aes64es t3, t1, t0

Note the reversed register order of the second instruction.

Operation

function clause execute (AES64ES(rs2, rs1, rd)) = {
  let sr : bits(64) = aes_rv64_shiftrows_fwd(X(rs2)[63..0], X(rs1)[63..0]);
  let wd : bits(64) = sr[63..0];
  X(rd) = aes_apply_fwd_sbox_to_each_byte(wd);
  RETIRE_SUCCESS
}

Included in

Extension	Minimum version	Lifecycle state
Zkne (RV64)	v1.0.0	Frozen
Zkn (RV64)	v1.0.0	Frozen
Zk (RV64)	v1.0.0	Frozen

Extension

Minimum version

Lifecycle state

Zkne (RV64)

v1.0.0

Frozen

Zkn (RV64)

v1.0.0

Frozen

Zk (RV64)

v1.0.0

Frozen

aes64esm

Synopsis: AES middle round encryption instruction for RV64.
Mnemonic: aes64esm rd, rs1, rs2
Encoding

Description: Uses the two 64-bit source registers to represent the entire AES state, and produces half of the next round output, applying the ShiftRows, SubBytes and MixColumns steps. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.

Note To Software Developers

The following code snippet shows one middle round of the AES block encryption. t0 and t1 hold the current round state. t2 and t3 hold the next round state.

aes64esm t2, t0, t1
aes64esm t3, t1, t0

Note the reversed register order of the second instruction.

Operation

function clause execute (AES64ESM(rs2, rs1, rd)) = {
  let sr : bits(64) = aes_rv64_shiftrows_fwd(X(rs2)[63..0], X(rs1)[63..0]);
  let wd : bits(64) = sr[63..0];
  let sb : bits(64) = aes_apply_fwd_sbox_to_each_byte(wd);
  X(rd)  =  aes_mixcolumn_fwd(sb[63..32]) @ aes_mixcolumn_fwd(sb[31..0]);
  RETIRE_SUCCESS
}

Included in

Extension	Minimum version	Lifecycle state
Zkne (RV64)	v1.0.0	Frozen
Zkn (RV64)	v1.0.0	Frozen
Zk (RV64)	v1.0.0	Frozen

Extension

Minimum version

Lifecycle state

Zkne (RV64)

v1.0.0

Frozen

Zkn (RV64)

v1.0.0

Frozen

Zk (RV64)

v1.0.0

Frozen

aes64im

Synopsis: This instruction accelerates the inverse MixColumns step of the AES Block Cipher, and is used to aid creation of the decryption KeySchedule.
Mnemonic: aes64im rd, rs1
Encoding

Description: The instruction applies the inverse MixColumns transformation to two columns of the state array, packed into a single 64-bit register. It is used to create the inverse cipher KeySchedule, according to the equivalent inverse cipher construction in cite:[nist:fips:197] (Page 23, Section 5.3.5). This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (AES64IM(rs1, rd)) = {
  let w0 : bits(32) = aes_mixcolumn_inv(X(rs1)[31.. 0]);
  let w1 : bits(32) = aes_mixcolumn_inv(X(rs1)[63..32]);
  X(rd)  = w1 @ w0;
  RETIRE_SUCCESS
}

Included in

Extension	Minimum version	Lifecycle state
Zknd (RV64)	v1.0.0	Frozen
Zkn (RV64)	v1.0.0	Frozen
Zk (RV64)	v1.0.0	Frozen

Extension

Minimum version

Lifecycle state

Zknd (RV64)

v1.0.0

Frozen

Zkn (RV64)

v1.0.0

Frozen

Zk (RV64)

v1.0.0

Frozen

aes64ks1i

Synopsis: This instruction implements part of the KeySchedule operation for the AES Block cipher involving the SBox operation.
Mnemonic: aes64ks1i rd, rs1, rnum
Encoding

Description: This instruction implements the rotation, SubBytes and Round Constant addition steps of the AES block cipher Key Schedule. This instruction must always be implemented such that its execution latency does not depend on the data being operated on. Note that rnum must be in the range 0x0..0xA. The values 0xB..0xF are reserved.
Operation

function clause execute (AES64KS1I(rnum, rs1, rd)) = {
  if(unsigned(rnum) > 10) then {
    handle_illegal();  RETIRE_SUCCESS
  } else {
    let tmp1 : bits(32) = X(rs1)[63..32];
    let rc   : bits(32) = aes_decode_rcon(rnum); /* round number -> round constant */
    let tmp2 : bits(32) = if (rnum ==0xA) then tmp1 else ror32(tmp1, 8);
    let tmp3 : bits(32) = aes_subword_fwd(tmp2);
    let result : bits(64) = (tmp3 ^ rc) @ (tmp3 ^ rc);
    X(rd) = EXTZ(result);
    RETIRE_SUCCESS
  }
}

Included in

Extension	Minimum version	Lifecycle state
Zkne (RV64)	v1.0.0	Frozen
Zknd (RV64)	v1.0.0	Frozen
Zkn (RV64)	v1.0.0	Frozen
Zk (RV64)	v1.0.0	Frozen

Extension

Minimum version

Lifecycle state

Zkne (RV64)

v1.0.0

Frozen

Zknd (RV64)

v1.0.0

Frozen

Zkn (RV64)

v1.0.0

Frozen

Zk (RV64)

v1.0.0

Frozen

aes64ks2

Synopsis: This instruction implements part of the KeySchedule operation for the AES Block cipher.
Mnemonic: aes64ks2 rd, rs1, rs2
Encoding

Description: This instruction implements the additional XOR’ing of key words as part of the AES block cipher Key Schedule. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (AES64KS2(rs2, rs1, rd)) = {
  let w0 : bits(32) = X(rs1)[63..32] ^ X(rs2)[31..0];
  let w1 : bits(32) = X(rs1)[63..32] ^ X(rs2)[31..0] ^ X(rs2)[63..32];
  X(rd)  = w1 @ w0;
  RETIRE_SUCCESS
}

Included in

Extension	Minimum version	Lifecycle state
Zkne (RV64)	v1.0.0	Frozen
Zknd (RV64)	v1.0.0	Frozen
Zkn (RV64)	v1.0.0	Frozen
Zk (RV64)	v1.0.0	Frozen

Extension

Minimum version

Lifecycle state

Zkne (RV64)

v1.0.0

Frozen

Zknd (RV64)

v1.0.0

Frozen

Zkn (RV64)

v1.0.0

Frozen

Zk (RV64)

v1.0.0

Frozen

andn

Synopsis: AND with inverted operand
Mnemonic: andn rd, rs1, rs2
Encoding

Description: This instruction performs the bitwise logical AND operation between rs1 and the bitwise inversion of rs2.
Operation

X(rd) = X(rs1) & ~X(rs2);

Included in

Extension	Minimum version	Lifecycle state
Zbb ([zbb])	1.0.0	Frozen
Zbkb (Zbkb)	v1.0.0-rc4	Frozen

Extension

Minimum version

Lifecycle state

Zbb ([zbb])

1.0.0

Frozen

Zbkb (Zbkb)

v1.0.0-rc4

Frozen

brev8

Synopsis: Reverse the bits in each byte of a source register.
Mnemonic: brev8, rd, rs
Encoding

Description: This instruction reverses the order of the bits in every byte of a register.

This instruction is a specific encoding of a more generic instruction which was originally proposed as part of the RISC-V Bitmanip extension (grevi). Eventually, the more generic instruction may be standardised. Until then, only the most common instances of it, such as this, are being included in specifications.

Operation

result : xlenbits = EXTZ(0b0);
foreach (i from 0 to sizeof(xlen) by 8) {
result[i+7..i] = reverse_bits_in_byte(X(rs1)[i+7..i]);
};
X(rd) = result;

Included in

Extension	Minimum version	Lifecycle state
Zbkb (Zbkb)	v1.0.0-rc4	Frozen

Extension

Minimum version

Lifecycle state

Zbkb (Zbkb)

v1.0.0-rc4

Frozen

clmul

Synopsis: Carry-less multiply (low-part)
Mnemonic: clmul rd, rs1, rs2
Encoding

Description: clmul produces the lower half of the 2·XLEN carry-less product.
Operation

let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;

foreach (i from 0 to (xlen - 1) by 1) {
   output = if   ((rs2_val >> i) & 1)
            then output ^ (rs1_val << i);
            else output;
}

X[rd] = output

Included in

Extension	Minimum version	Lifecycle state
Zbc ([zbc])	1.0.0	Frozen
Zbkc (Zbkc)	v1.0.0-rc4	Frozen

Extension

Minimum version

Lifecycle state

Zbc ([zbc])

1.0.0

Frozen

Zbkc (Zbkc)

v1.0.0-rc4

Frozen

clmulh

Synopsis: Carry-less multiply (high-part)
Mnemonic: clmulh rd, rs1, rs2
Encoding

Description: clmulh produces the upper half of the 2·XLEN carry-less product.
Operation

let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;

foreach (i from 1 to xlen by 1) {
   output = if   ((rs2_val >> i) & 1)
            then output ^ (rs1_val >> (xlen - i));
            else output;
}

X[rd] = output

Included in

Extension	Minimum version	Lifecycle state
Zbc ([zbc])	1.0.0	Frozen
Zbkc (Zbkc)	v1.0.0-rc4	Frozen

Extension

Minimum version

Lifecycle state

Zbc ([zbc])

1.0.0

Frozen

Zbkc (Zbkc)

v1.0.0-rc4

Frozen

orn

Synopsis: OR with inverted operand
Mnemonic: orn rd, rs1, rs2
Encoding

Description: This instruction performs the bitwise logical OR operation between rs1 and the bitwise inversion of rs2.
Operation

X(rd) = X(rs1) | ~X(rs2);

Included in

Extension	Minimum version	Lifecycle state
Zbb ([zbb])	v1.0.0	Frozen
Zbkb (Zbkb)	v1.0.0-rc4	Frozen

Extension

Minimum version

Lifecycle state

Zbb ([zbb])

v1.0.0

Frozen

Zbkb (Zbkb)

v1.0.0-rc4

Frozen

pack

Synopsis: Pack the low halves of rs1 and rs2 into rd.
Mnemonic: pack rd, rs1, rs2
Encoding

Description: The pack instruction packs the XLEN/2-bit lower halves of rs1 and rs2 into rd, with rs1 in the lower half and rs2 in the upper half.
Operation

let lo_half : bits(xlen/2) = X(rs1)[xlen/2-1..0];
let hi_half : bits(xlen/2) = X(rs2)[xlen/2-1..0];
X(rd) = EXTZ(hi_half @ lo_half);

Included in

Extension	Minimum version	Lifecycle state
Zbkb (Zbkb)	v1.0.0-rc4	Frozen

Extension

Minimum version

Lifecycle state

Zbkb (Zbkb)

v1.0.0-rc4

Frozen

packh

Synopsis: Pack the low bytes of rs1 and rs2 into rd.
Mnemonic: packh rd, rs1, rs2
Encoding

Description: And the packh instruction packs the least-significant bytes of rs1 and rs2 into the 16 least-significant bits of rd, zero extending the rest of rd.
Operation

let lo_half : bits(8) = X(rs1)[7..0];
let hi_half : bits(8) = X(rs2)[7..0];
X(rd) = EXTZ(hi_half @ lo_half);

Included in

Extension	Minimum version	Lifecycle state
Zbkb (Zbkb)	v1.0.0-rc4	Frozen

Extension

Minimum version

Lifecycle state

Zbkb (Zbkb)

v1.0.0-rc4

Frozen

packw

Synopsis: Pack the low 16-bits of rs1 and rs2 into rd on RV64.
Mnemonic: packw rd, rs1, rs2
Encoding

Description: This instruction packs the low 16 bits of rs1 and rs2 into the 32 least-significant bits of rd, sign extending the 32-bit result to the rest of rd. This instruction only exists on RV64 based systems.
Operation

let lo_half : bits(16) = X(rs1)[15..0];
let hi_half : bits(16) = X(rs2)[15..0];
X(rd) = EXTS(hi_half @ lo_half);

Included in

Extension	Minimum version	Lifecycle state
Zbkb (Zbkb)	v1.0.0-rc4	Frozen

Extension

Minimum version

Lifecycle state

Zbkb (Zbkb)

v1.0.0-rc4

Frozen

rev8

Synopsis: Byte-reverse register
Mnemonic: rev8 rd, rs
Encoding (RV32)

Encoding (RV64)

Description: This instruction reverses the order of the bytes in rs.
Operation

let input = X(rs);
let output : xlenbits = 0;
let j = xlen - 1;

foreach (i from 0 to (xlen - 8) by 8) {
   output[i..(i + 7)] = input[(j - 7)..j];
   j = j - 8;
}

X[rd] = output

Note

The rev8 mnemonic corresponds to different instruction encodings in RV32 and RV64.

Software Hint

The byte-reverse operation is only available for the full register width. To emulate word-sized and halfword-sized byte-reversal, perform a rev8 rd,rs followed by a srai rd,rd,K, where K is XLEN-32 and XLEN-16, respectively.

Included in

Extension	Minimum version	Lifecycle state
Zbb ([zbb])	v1.0.0	Frozen
Zbkb (Zbkb)	v1.0.0-rc4	Frozen

Extension

Minimum version

Lifecycle state

Zbb ([zbb])

v1.0.0

Frozen

Zbkb (Zbkb)

v1.0.0-rc4

Frozen

rol

Synopsis: Rotate Left (Register)
Mnemonic: rol rd, rs1, rs2
Encoding

Description: This instruction performs a rotate left of rs1 by the amount in least-significant log2(XLEN) bits of rs2.
Operation

let shamt = if   xlen == 32
            then X(rs2)[4..0]
            else X(rs2)[5..0];
let result = (X(rs1) << shamt) | (X(rs1) >> (xlen - shamt));

X(rd) = result;

Included in

Extension	Minimum version	Lifecycle state
Zbb ([zbb])	v1.0.0	Frozen
Zbkb (Zbkb)	v1.0.0-rc4	Frozen

Extension

Minimum version

Lifecycle state

Zbb ([zbb])

v1.0.0

Frozen

Zbkb (Zbkb)

v1.0.0-rc4

Frozen

rolw

Synopsis: Rotate Left Word (Register)
Mnemonic: rolw rd, rs1, rs2
Encoding

Description: This instruction performs a rotate left on the least-significant word of rs1 by the amount in least-significant 5 bits of rs2. The resulting word value is sign-extended by copying bit 31 to all of the more-significant bits.
Operation

let rs1 = EXTZ(X(rs1)[31..0])
let shamt = X(rs2)[4..0];
let result = (rs1 << shamt) | (rs1 >> (32 - shamt));
X(rd) = EXTS(result[31..0]);

Included in

Extension	Minimum version	Lifecycle state
Zbb ([zbb])	v1.0.0	Frozen
Zbkb (Zbkb)	v1.0.0-rc4	Frozen

Extension

Minimum version

Lifecycle state

Zbb ([zbb])

v1.0.0

Frozen

Zbkb (Zbkb)

v1.0.0-rc4

Frozen

ror

Synopsis: Rotate Right
Mnemonic: ror rd, rs1, rs2
Encoding

Description: This instruction performs a rotate right of rs1 by the amount in least-significant log2(XLEN) bits of rs2.
Operation

let shamt = if   xlen == 32
            then X(rs2)[4..0]
            else X(rs2)[5..0];
let result = (X(rs1) >> shamt) | (X(rs1) << (xlen - shamt));

X(rd) = result;

Included in

Extension	Minimum version	Lifecycle state
Zbb ([zbb])	v1.0.0	Frozen
Zbkb (Zbkb)	v1.0.0-rc4	Frozen

Extension

Minimum version

Lifecycle state

Zbb ([zbb])

v1.0.0

Frozen

Zbkb (Zbkb)

v1.0.0-rc4

Frozen

rori

Synopsis: Rotate Right (Immediate)
Mnemonic: rori rd, rs1, shamt
Encoding (RV32)

Encoding (RV64)

Description: This instruction performs a rotate right of rs1 by the amount in the least-significant log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.
Operation

let shamt = if   xlen == 32
            then shamt[4..0]
            else shamt[5..0];
let result = (X(rs1) >> shamt) | (X(rs1) << (xlen - shamt));

X(rd) = result;

Included in

Extension	Minimum version	Lifecycle state
Zbb ([zbb])	v1.0.0	Frozen
Zbkb (Zbkb)	v1.0.0-rc4	Frozen

Extension

Minimum version

Lifecycle state

Zbb ([zbb])

v1.0.0

Frozen

Zbkb (Zbkb)

v1.0.0-rc4

Frozen

roriw

Synopsis: Rotate Right Word by Immediate
Mnemonic: roriw rd, rs1, shamt
Encoding

Description: This instruction performs a rotate right on the least-significant word of rs1 by the amount in the least-significant log2(XLEN) bits of shamt. The resulting word value is sign-extended by copying bit 31 to all of the more-significant bits.
Operation

let rs1_data = EXTZ(X(rs1)[31..0];
let result = (rs1_data >> shamt) | (rs1_data << (32 - shamt));
X(rd) = EXTS(result[31..0]);

Included in

Extension

Minimum version

Lifecycle state

Zbb ([zbb])

v1.0.0

Frozen

Zbkb (Zbkb)

v1.0.0-rc4

Frozen

rorw

Synopsis: Rotate Right Word (Register)
Mnemonic: rorw rd, rs1, rs2
Encoding

Description: This instruction performs a rotate right on the least-significant word of rs1 by the amount in least-significant 5 bits of rs2. The resultant word is sign-extended by copying bit 31 to all of the more-significant bits.
Operation

let rs1 = EXTZ(X(rs1)[31..0])
let shamt = X(rs2)[4..0];
let result = (rs1 >> shamt) | (rs1 << (32 - shamt));
X(rd) = EXTS(result);

Included in

Extension

Minimum version

Lifecycle state

Zbb ([zbb])

v1.0.0

Frozen

Zbkb (Zbkb)

v1.0.0-rc4

Frozen

sha256sig0

Synopsis: Implements the Sigma0 transformation function as used in the SHA2-256 hash function cite:[nist:fips:180:4] (Section 4.1.2).
Mnemonic: sha256sig0 rd, rs1
Encoding

Description: This instruction is supported for both RV32 and RV64 base architectures. For RV32, the entire XLEN source register is operated on. For RV64, the low 32 bits of the source register are operated on, and the result sign extended to XLEN bits. Though named for SHA2-256, the instruction works for both the SHA2-224 and SHA2-256 parameterisations as described in cite:[nist:fips:180:4]. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (SHA256SIG0(rs1,rd)) = {
  let inb    : bits(32) = X(rs1)[31..0];
  let result : bits(32) = ror32(inb,  7) ^ ror32(inb, 18) ^ (inb >>  3);
  X(rd)      = EXTS(result);
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

v1.0.0

Frozen

v1.0.0

Frozen

v1.0.0

Frozen

sha256sig1

Synopsis: Implements the Sigma1 transformation function as used in the SHA2-256 hash function cite:[nist:fips:180:4] (Section 4.1.2).
Mnemonic: sha256sig1 rd, rs1
Encoding

Description: This instruction is supported for both RV32 and RV64 base architectures. For RV32, the entire XLEN source register is operated on. For RV64, the low 32 bits of the source register are operated on, and the result sign extended to XLEN bits. Though named for SHA2-256, the instruction works for both the SHA2-224 and SHA2-256 parameterisations as described in cite:[nist:fips:180:4]. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (SHA256SIG1(rs1,rd)) = {
  let inb    : bits(32) = X(rs1)[31..0];
  let result : bits(32) = ror32(inb, 17) ^ ror32(inb, 19) ^ (inb >> 10);
  X(rd)      = EXTS(result);
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

v1.0.0

Frozen

v1.0.0

Frozen

v1.0.0

Frozen

sha256sum0

Synopsis: Implements the Sum0 transformation function as used in the SHA2-256 hash function cite:[nist:fips:180:4] (Section 4.1.2).
Mnemonic: sha256sum0 rd, rs1
Encoding

Description: This instruction is supported for both RV32 and RV64 base architectures. For RV32, the entire XLEN source register is operated on. For RV64, the low 32 bits of the source register are operated on, and the result sign extended to XLEN bits. Though named for SHA2-256, the instruction works for both the SHA2-224 and SHA2-256 parameterisations as described in cite:[nist:fips:180:4]. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (SHA256SUM0(rs1,rd)) = {
  let inb    : bits(32) = X(rs1)[31..0];
  let result : bits(32) = ror32(inb,  2) ^ ror32(inb, 13) ^ ror32(inb, 22);
  X(rd)      = EXTS(result);
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

v1.0.0

Frozen

v1.0.0

Frozen

v1.0.0

Frozen

sha256sum1

Synopsis: Implements the Sum1 transformation function as used in the SHA2-256 hash function cite:[nist:fips:180:4] (Section 4.1.2).
Mnemonic: sha256sum1 rd, rs1
Encoding

Description: This instruction is supported for both RV32 and RV64 base architectures. For RV32, the entire XLEN source register is operated on. For RV64, the low 32 bits of the source register are operated on, and the result sign extended to XLEN bits. Though named for SHA2-256, the instruction works for both the SHA2-224 and SHA2-256 parameterisations as described in cite:[nist:fips:180:4]. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (SHA256SUM1(rs1,rd)) = {
  let inb    : bits(32) = X(rs1)[31..0];
  let result : bits(32) = ror32(inb,  6) ^ ror32(inb, 11) ^ ror32(inb, 25);
  X(rd)      = EXTS(result);
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

v1.0.0

Frozen

v1.0.0

Frozen

v1.0.0

Frozen

sha512sig0h

Synopsis: Implements the high half of the Sigma0 transformation, as used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3).
Mnemonic: sha512sig0h rd, rs1, rs2
Encoding

Description: This instruction is implemented on RV32 only. Used to compute the Sigma0 transform of the SHA2-512 hash function in conjunction with the sha512sig0l instruction. The transform is a 64-bit to 64-bit function, so the input and output are each represented by two 32-bit registers. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.

Note to software developers

The entire Sigma0 transform for SHA2-512 may be computed on RV32 using the following instruction sequence:

sha512sig0l    t0, a0, a1
sha512sig0h    t1, a1, a0

Operation

function clause execute (SHA512SIG0H(rs2, rs1, rd)) = {
  X(rd) = EXTS((X(rs1) >>  1) ^ (X(rs1) >>  7) ^ (X(rs1) >>  8) ^
               (X(rs2) << 31)                  ^ (X(rs2) << 24) );
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

Zknh (RV32)

v1.0.0

Frozen

Zkn (RV32)

v1.0.0

Frozen

Zk (RV32)

v1.0.0

Frozen

sha512sig0l

Synopsis: Implements the low half of the Sigma0 transformation, as used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3).
Mnemonic: sha512sig0l rd, rs1, rs2
Encoding

Description: This instruction is implemented on RV32 only. Used to compute the Sigma0 transform of the SHA2-512 hash function in conjunction with the sha512sig0h instruction. The transform is a 64-bit to 64-bit function, so the input and output are each represented by two 32-bit registers. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.

Note to software developers

The entire Sigma0 transform for SHA2-512 may be computed on RV32 using the following instruction sequence:

sha512sig0l    t0, a0, a1
sha512sig0h    t1, a1, a0

Operation

function clause execute (SHA512SIG0L(rs2, rs1, rd)) = {
  X(rd) = EXTS((X(rs1) >>  1) ^ (X(rs1) >>  7) ^ (X(rs1) >>  8) ^
               (X(rs2) << 31) ^ (X(rs2) << 25) ^ (X(rs2) << 24) );
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

Zknh (RV32)

v1.0.0

Frozen

Zkn (RV32)

v1.0.0

Frozen

Zk (RV32)

v1.0.0

Frozen

sha512sig1h

Synopsis: Implements the high half of the Sigma1 transformation, as used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3).
Mnemonic: sha512sig1h rd, rs1, rs2
Encoding

Description: This instruction is implemented on RV32 only. Used to compute the Sigma1 transform of the SHA2-512 hash function in conjunction with the sha512sig1l instruction. The transform is a 64-bit to 64-bit function, so the input and output are each represented by two 32-bit registers. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.

Note to software developers

The entire Sigma1 transform for SHA2-512 may be computed on RV32 using the following instruction sequence:

sha512sig1l    t0, a0, a1
sha512sig1h    t1, a1, a0

Operation

function clause execute (SHA512SIG1H(rs2, rs1, rd)) = {
  X(rd) = EXTS((X(rs1) <<  3) ^ (X(rs1) >>  6) ^ (X(rs1) >> 19) ^
               (X(rs2) >> 29)                  ^ (X(rs2) << 13) );
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

Zknh (RV32)

v1.0.0

Frozen

Zkn (RV32)

v1.0.0

Frozen

Zk (RV32)

v1.0.0

Frozen

sha512sig1l

Synopsis: Implements the low half of the Sigma1 transformation, as used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3).
Mnemonic: sha512sig1l rd, rs1, rs2
Encoding

Description: This instruction is implemented on RV32 only. Used to compute the Sigma1 transform of the SHA2-512 hash function in conjunction with the sha512sig1h instruction. The transform is a 64-bit to 64-bit function, so the input and output are each represented by two 32-bit registers. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.

Note to software developers

The entire Sigma1 transform for SHA2-512 may be computed on RV32 using the following instruction sequence:

sha512sig1l    t0, a0, a1
sha512sig1h    t1, a1, a0

Operation

function clause execute (SHA512SIG1L(rs2, rs1, rd)) = {
  X(rd) = EXTS((X(rs1) <<  3) ^ (X(rs1) >>  6) ^ (X(rs1) >> 19) ^
               (X(rs2) >> 29) ^ (X(rs2) << 26) ^ (X(rs2) << 13) );
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

Zknh (RV32)

v1.0.0

Frozen

Zkn (RV32)

v1.0.0

Frozen

Zk (RV32)

v1.0.0

Frozen

sha512sum0r

Synopsis: Implements the Sum0 transformation, as used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3).
Mnemonic: sha512sum0r rd, rs1, rs2
Encoding

Description: This instruction is implemented on RV32 only. Used to compute the Sum0 transform of the SHA2-512 hash function. The transform is a 64-bit to 64-bit function, so the input and output is represented by two 32-bit registers. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.

Note to software developers

The entire Sum0 transform for SHA2-512 may be computed on RV32 using the following instruction sequence:

sha512sum0r    t0, a0, a1
sha512sum0r    t1, a1, a0

Note the reversed source register ordering.

Operation

function clause execute (SHA512SUM0R(rs2, rs1, rd)) = {
  X(rd) = EXTS((X(rs1) << 25) ^ (X(rs1) << 30) ^ (X(rs1) >> 28) ^
               (X(rs2) >>  7) ^ (X(rs2) >>  2) ^ (X(rs2) <<  4) );
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

Zknh (RV32)

v1.0.0

Frozen

Zkn (RV32)

v1.0.0

Frozen

Zk (RV32)

v1.0.0

Frozen

sha512sum1r

Synopsis: Implements the Sum1 transformation, as used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3).
Mnemonic: sha512sum1r rd, rs1, rs2
Encoding

Description: This instruction is implemented on RV32 only. Used to compute the Sum1 transform of the SHA2-512 hash function. The transform is a 64-bit to 64-bit function, so the input and output is represented by two 32-bit registers. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.

Note to software developers

The entire Sum1 transform for SHA2-512 may be computed on RV32 using the following instruction sequence:

sha512sum1r    t0, a0, a1
sha512sum1r    t1, a1, a0

Note the reversed source register ordering.

Operation

function clause execute (SHA512SUM1R(rs2, rs1, rd)) = {
  X(rd) = EXTS((X(rs1) << 23) ^ (X(rs1) >> 14) ^ (X(rs1) >> 18) ^
               (X(rs2) >>  9) ^ (X(rs2) << 18) ^ (X(rs2) << 14) );
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

Zknh (RV32)

v1.0.0

Frozen

Zkn (RV32)

v1.0.0

Frozen

Zk (RV32)

v1.0.0

Frozen

sha512sig0

Synopsis: Implements the Sigma0 transformation function as used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3).
Mnemonic: sha512sig0 rd, rs1
Encoding

Description: This instruction is supported for the RV64 base architecture. It implements the Sigma0 transform of the SHA2-512 hash function. cite:[nist:fips:180:4]. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (SHA512SIG0(rs1, rd)) = {
  X(rd) = ror64(X(rs1),  1) ^ ror64(X(rs1),  8) ^ (X(rs1) >> 7);
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

Zknh (RV64)

v1.0.0

Frozen

Zkn (RV64)

v1.0.0

Frozen

Zk (RV64)

v1.0.0

Frozen

sha512sig1

Synopsis: Implements the Sigma1 transformation function as used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3).
Mnemonic: sha512sig1 rd, rs1
Encoding

Description: This instruction is supported for the RV64 base architecture. It implements the Sigma1 transform of the SHA2-512 hash function. cite:[nist:fips:180:4]. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (SHA512SIG1(rs1, rd)) = {
  X(rd) = ror64(X(rs1), 19) ^ ror64(X(rs1), 61) ^ (X(rs1) >> 6);
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

Zknh (RV64)

v1.0.0

Frozen

Zkn (RV64)

v1.0.0

Frozen

Zk (RV64)

v1.0.0

Frozen

sha512sum0

Synopsis: Implements the Sum0 transformation function as used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3).
Mnemonic: sha512sum0 rd, rs1
Encoding

Description: This instruction is supported for the RV64 base architecture. It implements the Sum0 transform of the SHA2-512 hash function. cite:[nist:fips:180:4]. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (SHA512SUM0(rs1, rd)) = {
  X(rd) = ror64(X(rs1), 28) ^ ror64(X(rs1), 34) ^ ror64(X(rs1) ,39);
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

Zknh (RV64)

v1.0.0

Frozen

Zkn (RV64)

v1.0.0

Frozen

Zk (RV64)

v1.0.0

Frozen

sha512sum1

Synopsis: Implements the Sum1 transformation function as used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3).
Mnemonic: sha512sum1 rd, rs1
Encoding

Description: This instruction is supported for the RV64 base architecture. It implements the Sum1 transform of the SHA2-512 hash function. cite:[nist:fips:180:4]. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (SHA512SUM1(rs1, rd)) = {
  X(rd) = ror64(X(rs1), 14) ^ ror64(X(rs1), 18) ^ ror64(X(rs1) ,41);
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

Zknh (RV64)

v1.0.0

Frozen

Zkn (RV64)

v1.0.0

Frozen

Zk (RV64)

v1.0.0

Frozen

sm3p0

Synopsis: Implements the P0 transformation function as used in the SM3 hash function cite:[gbt:sm3,iso:sm3].
Mnemonic: sm3p0 rd, rs1
Encoding

Description: This instruction is supported for the RV32 and RV64 base architectures. It implements the P0 transform of the SM3 hash function cite:[gbt:sm3,iso:sm3]. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.

Supporting Material

This instruction is based on work done in cite:[MJS:LWSHA:20].

Operation

function clause execute (SM3P0(rs1, rd)) = {
  let r1     : bits(32) = X(rs1)[31..0];
  let result : bits(32) =  r1 ^ rol32(r1,  9) ^ rol32(r1, 17);
  X(rd) = EXTS(result);
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

Zksh

v1.0.0

Frozen

v1.0.0

Frozen

sm3p1

Synopsis: Implements the P1 transformation function as used in the SM3 hash function cite:[gbt:sm3,iso:sm3].
Mnemonic: sm3p1 rd, rs1
Encoding

Description: This instruction is supported for the RV32 and RV64 base architectures. It implements the P1 transform of the SM3 hash function cite:[gbt:sm3,iso:sm3]. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.

Supporting Material

This instruction is based on work done in cite:[MJS:LWSHA:20].

Operation

function clause execute (SM3P1(rs1, rd)) = {
  let r1     : bits(32) = X(rs1)[31..0];
  let result : bits(32) =  r1 ^ rol32(r1, 15) ^ rol32(r1, 23);
  X(rd) = EXTS(result);
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

Zksh

v1.0.0

Frozen

v1.0.0

Frozen

sm4ed

Synopsis: Accelerates the block encrypt/decrypt operation of the SM4 block cipher cite:[gbt:sm4, iso:sm4].
Mnemonic: sm4ed rd, rs1, rs2, bs
Encoding

Description: Implements a T-tables in hardware style approach to accelerating the SM4 round function. A byte is extracted from rs2 based on bs, to which the SBox and linear layer transforms are applied, before the result is XOR’d with rs1 and written back to rd. This instruction exists on RV32 and RV64 base architectures. On RV64, the 32-bit result is sign extended to XLEN bits. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (SM4ED (bs,rs2,rs1,rd)) = {
  let shamt : bits(5)  = bs @ 0b000; /* shamt = bs*8 */
  let sb_in : bits(8)  = (X(rs2)[31..0] >> shamt)[7..0];
  let x     : bits(32) = 0x000000 @ sm4_sbox(sb_in);
  let y     : bits(32) = x ^ (x               <<  8) ^ ( x               <<  2) ^
                             (x               << 18) ^ ((x & 0x0000003F) << 26) ^
                             ((x & 0x000000C0) << 10);
  let z     : bits(32) = rol32(y, unsigned(shamt));
  let result: bits(32) = z ^ X(rs1)[31..0];
  X(rd)                = EXTS(result);
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

Zksed

v1.0.0

Frozen

v1.0.0

Frozen

sm4ks

Synopsis: Accelerates the Key Schedule operation of the SM4 block cipher cite:[gbt:sm4, iso:sm4].
Mnemonic: sm4ks rd, rs1, rs2, bs
Encoding

Description: Implements a T-tables in hardware style approach to accelerating the SM4 Key Schedule. A byte is extracted from rs2 based on bs, to which the SBox and linear layer transforms are applied, before the result is XOR’d with rs1 and written back to rd. This instruction exists on RV32 and RV64 base architectures. On RV64, the 32-bit result is sign extended to XLEN bits. This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
Operation

function clause execute (SM4KS (bs,rs2,rs1,rd)) = {
  let shamt : bits(5)  = (bs @ 0b000); /* shamt = bs*8 */
  let sb_in : bits(8)  = (X(rs2)[31..0] >> shamt)[7..0];
  let x     : bits(32) = 0x000000 @ sm4_sbox(sb_in);
  let y     : bits(32) = x ^ ((x & 0x00000007) << 29) ^ ((x & 0x000000FE) <<  7) ^
                             ((x & 0x00000001) << 23) ^ ((x & 0x000000F8) << 13) ;
  let z     : bits(32) = rol32(y, unsigned(shamt));
  let result: bits(32) = z ^ X(rs1)[31..0];
  X(rd) = EXTS(result);
  RETIRE_SUCCESS
}

Included in

Extension

Minimum version

Lifecycle state

Zksed

v1.0.0

Frozen