Version 1 (modified by trac, 8 years ago) (diff) |
---|
GCN ISA FLAT instructions
These instructions allow to access to main memory, LDS and VGPRS registers. FLAT instructions fetch address from 2 vector registers that hold 64-bit address. FLAT instruction presents only in GCN 1.1 architecture.
List of fields for the MUBUF encoding (GCN 1.1):
Bits | Name | Description |
---|---|---|
16 | GLC | Operation globally coherent |
17 | SLC | System level coherent |
18-24 | OPCODE | Operation code |
25-31 | ENCODING | Encoding type. Must be 0b110111 |
32-39 | VADDR | Vector address registers |
40-47 | VDATA | Vector data register |
55 | TFE | Texture Fail Enable ??? |
56-63 | VDST | Vector destination register |
Instruction syntax: INSTRUCTION VDST, VADDR(2) [MODIFIERS]
Instruction syntax: INSTRUCTION VADDR(2), VDATA [MODIFIERS]
Modifiers can be supplied in any order. Modifiers list: SLC, GLC, TFE. The TFE flag requires additional the VDATA register.
FLAT instruction can complete out of order with each other. This can be caused by different resources from/to that instruction can load/store. FLAT instruction increase VMCNT if access to main memory, or LKGMCNT if accesses to LDS.
Instructions by opcode
List of the MUBUF instructions by opcode (GCN 1.1/1.2):
Opcode | Mnemonic (GCN1.1) | Mnemonic (GCN1.2) |
---|---|---|
8 (0x8) | FLAT_LOAD_UBYTE | -- |
9 (0x9) | FLAT_LOAD_SBYTE | -- |
10 (0xa) | FLAT_LOAD_USHORT | -- |
11 (0xb) | FLAT_LOAD_SSHORT | -- |
12 (0xc) | FLAT_LOAD_DWORD | -- |
13 (0xd) | FLAT_LOAD_DWORDX2 | -- |
14 (0xe) | FLAT_LOAD_DWORDX4 | -- |
15 (0xf) | FLAT_LOAD_DWORDX3 | -- |
16 (0x10) | -- | FLAT_LOAD_UBYTE |
17 (0x11) | -- | FLAT_LOAD_SBYTE |
18 (0x12) | -- | FLAT_LOAD_USHORT |
19 (0x13) | -- | FLAT_LOAD_SSHORT |
20 (0x14) | -- | FLAT_LOAD_DWORD |
21 (0x15) | -- | FLAT_LOAD_DWORDX2 |
22 (0x16) | -- | FLAT_LOAD_DWORDX3 |
23 (0x17) | -- | FLAT_LOAD_DWORDX4 |
24 (0x18) | FLAT_STORE_BYTE | FLAT_STORE_BYTE |
26 (0x1a) | FLAT_STORE_SHORT | FLAT_STORE_SHORT |
28 (0x1c) | FLAT_STORE_DWORD | FLAT_STORE_DWORD |
29 (0x1d) | FLAT_STORE_DWORDX2 | FLAT_STORE_DWORDX2 |
30 (0x1e) | FLAT_STORE_DWORDX4 | FLAT_STORE_DWORDX3 |
31 (0x1f) | FLAT_STORE_DWORDX3 | FLAT_STORE_DWORDX4 |
48 (0x30) | FLAT_ATOMIC_SWAP | -- |
49 (0x31) | FLAT_ATOMIC_CMPSWAP | -- |
50 (0x32) | FLAT_ATOMIC_ADD | -- |
52 (0x34) | FLAT_ATOMIC_SUB | -- |
53 (0x35) | FLAT_ATOMIC_SMIN | -- |
54 (0x36) | FLAT_ATOMIC_UMIN | -- |
55 (0x37) | FLAT_ATOMIC_SMAX | -- |
56 (0x38) | FLAT_ATOMIC_UMAX | -- |
57 (0x39) | FLAT_ATOMIC_AND | -- |
58 (0x3a) | FLAT_ATOMIC_OR | -- |
59 (0x3b) | FLAT_ATOMIC_XOR | -- |
60 (0x3c) | FLAT_ATOMIC_INC | -- |
61 (0x3d) | FLAT_ATOMIC_DEC | -- |
62 (0x3e) | FLAT_ATOMIC_FCMPSWAP | -- |
63 (0x3f) | FLAT_ATOMIC_FMIN | -- |
64 (0x40) | FLAT_ATOMIC_FMAX | FLAT_ATOMIC_SWAP |
65 (0x41) | -- | FLAT_ATOMIC_CMPSWAP |
66 (0x42) | -- | FLAT_ATOMIC_ADD |
67 (0x43) | -- | FLAT_ATOMIC_SUB |
68 (0x44) | -- | FLAT_ATOMIC_SMIN |
69 (0x45) | -- | FLAT_ATOMIC_UMIN |
70 (0x46) | -- | FLAT_ATOMIC_SMAX |
71 (0x47) | -- | FLAT_ATOMIC_UMAX |
72 (0x48) | -- | FLAT_ATOMIC_AND |
73 (0x49) | -- | FLAT_ATOMIC_OR |
74 (0x4a) | -- | FLAT_ATOMIC_XOR |
75 (0x4b) | -- | FLAT_ATOMIC_INC |
76 (0x4c) | -- | FLAT_ATOMIC_DEC |
80 (0x50) | FLAT_ATOMIC_SWAP_X2 | -- |
81 (0x51) | FLAT_ATOMIC_CMPSWAP_X2 | -- |
82 (0x52) | FLAT_ATOMIC_ADD_X2 | -- |
84 (0x54) | FLAT_ATOMIC_SUB_X2 | -- |
85 (0x55) | FLAT_ATOMIC_SMIN_X2 | -- |
86 (0x56) | FLAT_ATOMIC_UMIN_X2 | -- |
87 (0x57) | FLAT_ATOMIC_SMAX_X2 | -- |
88 (0x58) | FLAT_ATOMIC_UMAX_X2 | -- |
89 (0x59) | FLAT_ATOMIC_AND_X2 | -- |
90 (0x5a) | FLAT_ATOMIC_OR_X2 | -- |
91 (0x5b) | FLAT_ATOMIC_XOR_X2 | -- |
92 (0x5c) | FLAT_ATOMIC_INC_X2 | -- |
93 (0x5d) | FLAT_ATOMIC_DEC_X2 | -- |
94 (0x5e) | FLAT_ATOMIC_FCMPSWAP_X2 | -- |
95 (0x5f) | FLAT_ATOMIC_FMIN_X2 | -- |
96 (0x60) | FLAT_ATOMIC_FMAX_X2 | FLAT_ATOMIC_SWAP_X2 |
81 (0x61) | -- | FLAT_ATOMIC_CMPSWAP_X2 |
82 (0x62) | -- | FLAT_ATOMIC_ADD_X2 |
83 (0x63) | -- | FLAT_ATOMIC_SUB_X2 |
84 (0x64) | -- | FLAT_ATOMIC_SMIN_X2 |
85 (0x65) | -- | FLAT_ATOMIC_UMIN_X2 |
86 (0x66) | -- | FLAT_ATOMIC_SMAX_X2 |
87 (0x67) | -- | FLAT_ATOMIC_UMAX_X2 |
88 (0x68) | -- | FLAT_ATOMIC_AND_X2 |
89 (0x69) | -- | FLAT_ATOMIC_OR_X2 |
90 (0x6a) | -- | FLAT_ATOMIC_XOR_X2 |
91 (0x6b) | -- | FLAT_ATOMIC_INC_X2 |
92 (0x6c) | -- | FLAT_ATOMIC_DEC_X2 |
Instruction set
Alphabetically sorted instruction list:
FLAT_ATOMIC_SWAP
Opcode: 48 (0x30) for GCN 1.0/1.1; 64 (0x40) for GCN 1.2
Syntax: FLAT_ATOMIC_SWAP VDST, VADDR(2), VDATA
Description: Store VDATA dword into VADDR address. If GLC flag is set then
return previous value from resource to VDST, otherwise store old value from VDATA to VDST.
Operation is atomic.
Operation:
UINT32* VM = (UINT32*)VADDR
UINT32 P = *VM; *VM = VDATA; VDST = (GLC) ? P : VDATA // atomic
FLAT_LOAD_DWORD
Opcode: 12 (0xc) for GCN 1.1; 20 (0x14) for GCN 1.2
Syntax: FLAT_LOAD_DWORD VDST, VADDR(2)
Description Load dword to VDST from VADDR address.
Operation:
VDST = *(UINT32*)VADDR
FLAT_LOAD_DWORDX2
Opcode: 13 (0xd) for GCN 1.1; 21 (0x15) for GCN 1.2
Syntax: FLAT_LOAD_DWORDX2 VDST(, VADDR(2)
Description Load two dwords to VDST from VADDR address.
Operation:
VDST = *(UINT64*)VADDR
FLAT_LOAD_DWORDX3
Opcode: 15 (0xf) for GCN 1.1; 22 (0x16) for GCN 1.2
Syntax: FLAT_LOAD_DWORDX3 VDST(3), VADDR(2)
Description Load three dwords to VDST from VADDR address.
Operation:
VDST[0] = *(UINT32*)VADDR
VDST[1] = *(UINT32*)(VADDR+4)
VDST[2] = *(UINT32*)(VADDR+8)
FLAT_LOAD_DWORDX4
Opcode: 13 (0xe) for GCN 1.1; 23 (0x17) for GCN 1.2
Syntax: FLAT_LOAD_DWORDX4 VDST(4), VADDR(2)
Description Load four dwords to VDST from VADDR address.
Operation:
VDST[0] = *(UINT32*)VADDR
VDST[1] = *(UINT32*)(VADDR+4)
VDST[2] = *(UINT32*)(VADDR+8)
VDST[3] = *(UINT32*)(VADDR+12)
FLAT_LOAD_SBYTE
Opcode: 9 (0x9) for GCN 1.1; 17 (0x11) for GCN 1.2
Syntax: FLAT_LOAD_SBYTE VDST, VADDR(2)
Description: Load byte to VDST from VADDR address with sign extending.
Operation:
VDST = *(INT8*)VADDR
FLAT_LOAD_SSHORT
Opcode: 11 (0xb) for GCN 1.1; 19 (0x13) for GCN 1.2
Syntax: FLAT_LOAD_SSHORT VDST, VADDR(2)
Description: Load 16-bit word to VDST from VADDR address with sign extending.
Operation:
VDST = *(INT16*)VADDR
FLAT_LOAD_UBYTE
Opcode: 8 (0x8) for GCN 1.1; 16 (0x10) for GCN 1.2
Syntax: FLAT_LOAD_UBYTE VDST, VADDR(2)
Description: Load byte to VDST from VADDR address with zero extending.
Operation:
VDST = *(UINT8*)VADDR
FLAT_LOAD_USHORT
Opcode: 10 (0xa) for GCN 1.1; 18 (0x12) for GCN 1.2
Syntax: FLAT_LOAD_USHORT VDST, VADDR(1:2)
Description: Load 16-bit word to VDST from VADDR address with zero extending.
Operation:
VDST = *(UINT16*)VADDR
FLAT_STORE_BYTE
Opcode: 24 (0x18)
Syntax: FLAT_STORE_BYTE VADDR(2), VDATA
Description: Store byte from VDATA to VADDR address.
Operation:
*(UINT8*)VADDR = VDATA&0xff
FLAT_STORE_DWORD
Opcode: 28 (0x1c)
Syntax: FLAT_STORE_DWORD VADDR(2), VDATA
Description: Store dword from VDATA to VADDR address.
Operation:
*(UINT32*)VADDR = VDATA
FLAT_STORE_DWORDX2
Opcode: 29 (0x1d)
Syntax: FLAT_STORE_DWORDX2 VADDR(2), VDATA(2)
Description: Store two dwords from VDATA to VADDR address.
Operation:
*(UINT64*)VADDR = VDATA
FLAT_STORE_DWORDX3
Opcode: 31 (0x1f) for GCN 1.1; 30 (0x1e) for GCN 1.2
Syntax: FLAT_STORE_DWORDX3 VADDR(2), VDATA(3)
Description: Store three dwords from VDATA to VADDR address.
Operation:
*(UINT32*)(VADDR) = VDATA[0]
*(UINT32*)(VADDR+4) = VDATA[1]
*(UINT32*)(VADDR+8) = VDATA[2]
FLAT_STORE_DWORDX4
Opcode: 30 (0x1e) for GCN 1.1; 31 (0x1d) for GCN 1.2
Syntax: FLAT_STORE_DWORDX4 VADDR(2), VDATA(4)
Description: Store three dwords from VDATA to VADDR address.
Operation:
*(UINT32*)(VADDR) = VDATA[0]
*(UINT32*)(VADDR+4) = VDATA[1]
*(UINT32*)(VADDR+8) = VDATA[2]
*(UINT32*)(VADDR+12) = VDATA[3]
FLAT_STORE_SHORT
Opcode: 26 (0x1a)
Syntax: FLAT_STORE_SHORT VADDR(2), VDATA
Description: Store 16-bit word from VDATA to VADDR address.
Operation:
*(UINT16*)VADDR = VDATA&0xffff