sfence.vma
Supervisor memory-management fence
This instruction is defined by:
-
S, version >= 0
This instruction is included in the following profiles:
-
MockProfile 64-bit S-mode (Optional)
-
RVA20S64 (Mandatory)
-
RVA22S64 (Mandatory)
Synopsis
The supervisor memory-management fence instruction SFENCE.VMA
is used to
synchronize updates to in-memory memory-management data structures with
current execution. Instruction execution causes implicit reads and
writes to these data structures; however, these implicit references are
ordinarily not ordered with respect to explicit loads and stores.
Executing an SFENCE.VMA instruction guarantees that any previous stores
already visible to the current RISC-V hart are ordered before certain
implicit references by subsequent instructions in that hart to the
memory-management data structures. The specific set of operations
ordered by SFENCE.VMA is determined by rs1 and rs2, as described
below. SFENCE.VMA is also used to invalidate entries in the
address-translation cache associated with a hart (see [sv32algorithm]). Further details on the behavior of this instruction are described in [virt-control] and [pmp-vmem].
The SFENCE.VMA is used to flush any local hardware caches related to address translation. It is specified as a fence rather than a TLB flush to provide cleaner semantics with respect to which instructions are affected by the flush operation and to support a wider variety of dynamic caching structures and memory-management schemes. SFENCE.VMA is also used by higher privilege levels to synchronize page table writes and the address translation hardware. |
SFENCE.VMA orders only the local hart’s implicit references to the memory-management data structures.
Consequently, other harts must be notified separately when the memory-management data structures have been modified. One approach is to use 1) a local data fence to ensure local writes are visible globally, then 2) an interprocessor interrupt to the other thread, then 3) a local SFENCE.VMA in the interrupt handler of the remote thread, and finally 4) signal back to originating thread that operation is complete. This is, of course, the RISC-V analog to a TLB shootdown. |
For the common case that the translation data structures have only been modified for a single address mapping (i.e., one page or superpage), rs1 can specify a virtual address within that mapping to effect a translation fence for that mapping only. Furthermore, for the common case that the translation data structures have only been modified for a single address-space identifier, rs2 can specify the address space. The behavior of SFENCE.VMA depends on rs1 and rs2 as follows:
-
If rs1=
x0
and rs2=x0
, the fence orders all reads and writes made to any level of the page tables, for all address spaces. The fence also invalidates all address-translation cache entries, for all address spaces. -
If rs1=
x0
and rs2≠x0
, the fence orders all reads and writes made to any level of the page tables, but only for the address space identified by integer register rs2. Accesses to global mappings (see [translation]) are not ordered. The fence also invalidates all address-translation cache entries matching the address space identified by integer register rs2, except for entries containing global mappings. -
If rs1≠
x0
and rs2=x0
, the fence orders only reads and writes made to leaf page table entries corresponding to the virtual address in rs1, for all address spaces. The fence also invalidates all address-translation cache entries that contain leaf page table entries corresponding to the virtual address in rs1, for all address spaces. -
If rs1≠
x0
and rs2≠x0
, the fence orders only reads and writes made to leaf page table entries corresponding to the virtual address in rs1, for the address space identified by integer register rs2. Accesses to global mappings are not ordered. The fence also invalidates all address-translation cache entries that contain leaf page table entries corresponding to the virtual address in rs1 and that match the address space identified by integer register rs2, except for entries containing global mappings.
If the value held in rs1 is not a valid virtual address, then the SFENCE.VMA instruction has no effect. No exception is raised in this case.
When rs2≠x0
, bits SXLEN-1:ASIDMAX of the value held
in rs2 are reserved for future standard use. Until their use is
defined by a standard extension, they should be zeroed by software and
ignored by current implementations. Furthermore, if
ASIDLEN<ASIDMAX, the implementation shall ignore bits
ASIDMAX-1:ASIDLEN of the value held in rs2.
It is always legal to over-fence, e.g., by fencing only based on a
subset of the bits in rs1 and/or rs2, and/or by simply treating all
SFENCE.VMA instructions as having rs1= |
An implicit read of the memory-management data structures may return any translation for an address that was valid at any time since the most recent SFENCE.VMA that subsumes that address. The ordering implied by SFENCE.VMA does not place implicit reads and writes to the memory-management data structures into the global memory order in a way that interacts cleanly with the standard RVWMO ordering rules. In particular, even though an SFENCE.VMA orders prior explicit accesses before subsequent implicit accesses, and those implicit accesses are ordered before their associated explicit accesses, SFENCE.VMA does not necessarily place prior explicit accesses before subsequent explicit accesses in the global memory order. These implicit loads also need not otherwise obey normal program order semantics with respect to prior loads or stores to the same address.
A consequence of this specification is that an implementation may use any translation for an address that was valid at any time since the most recent SFENCE.VMA that subsumes that address. In particular, if a leaf PTE is modified but a subsuming SFENCE.VMA is not executed, either the old translation or the new translation will be used, but the choice is unpredictable. The behavior is otherwise well-defined. In a conventional TLB design, it is possible for multiple entries to
match a single address if, for example, a page is upgraded to a
superpage without first clearing the original non-leaf PTE’s valid bit
and executing an SFENCE.VMA with rs1= Another consequence of this specification is that it is generally unsafe to update a PTE using a set of stores of a width less than the width of the PTE, as it is legal for the implementation to read the PTE at any time, including when only some of the partial stores have taken effect. This specification permits the caching of PTEs whose V (Valid) bit is clear. Operating systems must be written to cope with this possibility, but implementers are reminded that eagerly caching invalid PTEs will reduce performance by causing additional page faults. |
Implementations must only perform implicit reads of the translation data structures pointed to by the current contents of the satp register or a subsequent valid (V=1) translation data structure entry, and must only raise exceptions for implicit accesses that are generated as a result of instruction execution, not those that are performed speculatively.
Changes to the sstatus fields SUM and MXR take effect immediately, without the need to execute an SFENCE.VMA instruction. Changing satp.MODE from Bare to other modes and vice versa also takes effect immediately, without the need to execute an SFENCE.VMA instruction. Likewise, changes to satp.ASID take effect immediately.
The following common situations typically require executing an SFENCE.VMA instruction:
|
If a hart employs an address-translation cache, that cache must appear to be private to that hart. In particular, the meaning of an ASID is local to a hart; software may choose to use the same ASID to refer to different address spaces on different harts.
A future extension could redefine ASIDs to be global across the SEE, enabling such options as shared translation caches and hardware support for broadcast TLB shootdown. However, as OSes have evolved to significantly reduce the scope of TLB shootdowns using novel ASID-management techniques, we expect the local-ASID scheme to remain attractive for its simplicity and possibly better scalability. |
For implementations that make satp.MODE read-only zero (always Bare), attempts to execute an SFENCE.VMA instruction might raise an illegal-instruction exception.
Execution
-
IDL
-
Sail
XReg vaddr = X[rs1];
Bits<16> asid = X[rs2][ASID_WIDTH - 1:0];
if (mode() == PrivilegeMode::U) {
raise(ExceptionCode::IllegalInstruction, mode(), $encoding);
}
if (CSR[misa].H == 1 && mode() == PrivilegeMode::VU) {
raise(ExceptionCode::IllegalInstruction, mode(), $encoding);
}
if (CSR[mstatus].TVM == 1 && mode() == PrivilegeMode::S) || (mode() == PrivilegeMode::VS) {
raise(ExceptionCode::IllegalInstruction, mode(), $encoding);
}
if (CSR[misa].H == 1 && CSR[hstatus].VTVM == 1 && mode() == PrivilegeMode::VS) {
raise(ExceptionCode::VirtualInstruction, mode(), $encoding);
}
if (!implemented?(ExtensionName::Sv32) && !implemented?(ExtensionName::Sv39) && !implemented?(ExtensionName::Sv48) && !implemented?(ExtensionName::Sv57)) {
if (TRAP_ON_SFENCE_VMA_WHEN_SATP_MODE_IS_READ_ONLY) {
raise(ExceptionCode::IllegalInstruction, mode(), $encoding);
}
}
VmaOrderType vma_type;
if (CSR[misa].H == 1 && mode() == PrivilegeMode::VS) {
vma_type.vsmode = true;
vma_type.single_vmid = true;
vma_type.vmid = CSR[hgatp].VMID;
} else {
vma_type.smode = true;
}
if ((rs1 == 0) && (rs2 == 0)) {
vma_type.global = true;
order_pgtbl_writes_before_vmafence(vma_type);
invalidate_translations(vma_type);
order_pgtbl_reads_after_vmafence(vma_type);
} else if ((rs1 == 0) && (rs2 != 0)) {
vma_type.single_asid = true;
vma_type.asid = asid;
order_pgtbl_writes_before_vmafence(vma_type);
invalidate_translations(vma_type);
order_pgtbl_reads_after_vmafence(vma_type);
} else if ((rs1 != 0) && (rs2 == 0)) {
if (canonical_vaddr?(vaddr)) {
vma_type.single_vaddr = true;
vma_type.vaddr = vaddr;
order_pgtbl_writes_before_vmafence(vma_type);
invalidate_translations(vma_type);
order_pgtbl_reads_after_vmafence(vma_type);
}
} else {
if (canonical_vaddr?(vaddr)) {
vma_type.single_asid = true;
vma_type.asid = asid;
vma_type.single_vaddr = true;
vma_type.vaddr = vaddr;
order_pgtbl_writes_before_vmafence(vma_type);
invalidate_translations(vma_type);
order_pgtbl_reads_after_vmafence(vma_type);
}
}
{
let addr : option(xlenbits) = if rs1 == 0b00000 then None() else Some(X(rs1));
let asid : option(xlenbits) = if rs2 == 0b00000 then None() else Some(X(rs2));
match cur_privilege {
User => { handle_illegal(); RETIRE_FAIL },
Supervisor => match (architecture(get_mstatus_SXL(mstatus)), mstatus.TVM()) {
(Some(_), 0b1) => { handle_illegal(); RETIRE_FAIL },
(Some(_), 0b0) => { flush_TLB(asid, addr); RETIRE_SUCCESS },
(_, _) => internal_error(__FILE__, __LINE__, "unimplemented sfence architecture")
},
Machine => { flush_TLB(asid, addr); RETIRE_SUCCESS }
}
}