wiki:GcnInstrsFlat

Version 1 (modified by trac, 8 years ago) (diff)

--

Back to Table of content

GCN ISA FLAT instructions

These instructions allow to access to main memory, LDS and VGPRS registers. FLAT instructions fetch address from 2 vector registers that hold 64-bit address. FLAT instruction presents only in GCN 1.1 architecture.

List of fields for the MUBUF encoding (GCN 1.1):

Bits Name Description
16 GLC Operation globally coherent
17 SLC System level coherent
18-24 OPCODE Operation code
25-31 ENCODING Encoding type. Must be 0b110111
32-39 VADDR Vector address registers
40-47 VDATA Vector data register
55 TFE Texture Fail Enable ???
56-63 VDST Vector destination register

Instruction syntax: INSTRUCTION VDST, VADDR(2) [MODIFIERS]
Instruction syntax: INSTRUCTION VADDR(2), VDATA [MODIFIERS]

Modifiers can be supplied in any order. Modifiers list: SLC, GLC, TFE. The TFE flag requires additional the VDATA register.

FLAT instruction can complete out of order with each other. This can be caused by different resources from/to that instruction can load/store. FLAT instruction increase VMCNT if access to main memory, or LKGMCNT if accesses to LDS.

Instructions by opcode

List of the MUBUF instructions by opcode (GCN 1.1/1.2):

Opcode Mnemonic (GCN1.1) Mnemonic (GCN1.2)
8 (0x8) FLAT_LOAD_UBYTE --
9 (0x9) FLAT_LOAD_SBYTE --
10 (0xa) FLAT_LOAD_USHORT --
11 (0xb) FLAT_LOAD_SSHORT --
12 (0xc) FLAT_LOAD_DWORD --
13 (0xd) FLAT_LOAD_DWORDX2 --
14 (0xe) FLAT_LOAD_DWORDX4 --
15 (0xf) FLAT_LOAD_DWORDX3 --
16 (0x10) -- FLAT_LOAD_UBYTE
17 (0x11) -- FLAT_LOAD_SBYTE
18 (0x12) -- FLAT_LOAD_USHORT
19 (0x13) -- FLAT_LOAD_SSHORT
20 (0x14) -- FLAT_LOAD_DWORD
21 (0x15) -- FLAT_LOAD_DWORDX2
22 (0x16) -- FLAT_LOAD_DWORDX3
23 (0x17) -- FLAT_LOAD_DWORDX4
24 (0x18) FLAT_STORE_BYTE FLAT_STORE_BYTE
26 (0x1a) FLAT_STORE_SHORT FLAT_STORE_SHORT
28 (0x1c) FLAT_STORE_DWORD FLAT_STORE_DWORD
29 (0x1d) FLAT_STORE_DWORDX2 FLAT_STORE_DWORDX2
30 (0x1e) FLAT_STORE_DWORDX4 FLAT_STORE_DWORDX3
31 (0x1f) FLAT_STORE_DWORDX3 FLAT_STORE_DWORDX4
48 (0x30) FLAT_ATOMIC_SWAP --
49 (0x31) FLAT_ATOMIC_CMPSWAP --
50 (0x32) FLAT_ATOMIC_ADD --
52 (0x34) FLAT_ATOMIC_SUB --
53 (0x35) FLAT_ATOMIC_SMIN --
54 (0x36) FLAT_ATOMIC_UMIN --
55 (0x37) FLAT_ATOMIC_SMAX --
56 (0x38) FLAT_ATOMIC_UMAX --
57 (0x39) FLAT_ATOMIC_AND --
58 (0x3a) FLAT_ATOMIC_OR --
59 (0x3b) FLAT_ATOMIC_XOR --
60 (0x3c) FLAT_ATOMIC_INC --
61 (0x3d) FLAT_ATOMIC_DEC --
62 (0x3e) FLAT_ATOMIC_FCMPSWAP --
63 (0x3f) FLAT_ATOMIC_FMIN --
64 (0x40) FLAT_ATOMIC_FMAX FLAT_ATOMIC_SWAP
65 (0x41) -- FLAT_ATOMIC_CMPSWAP
66 (0x42) -- FLAT_ATOMIC_ADD
67 (0x43) -- FLAT_ATOMIC_SUB
68 (0x44) -- FLAT_ATOMIC_SMIN
69 (0x45) -- FLAT_ATOMIC_UMIN
70 (0x46) -- FLAT_ATOMIC_SMAX
71 (0x47) -- FLAT_ATOMIC_UMAX
72 (0x48) -- FLAT_ATOMIC_AND
73 (0x49) -- FLAT_ATOMIC_OR
74 (0x4a) -- FLAT_ATOMIC_XOR
75 (0x4b) -- FLAT_ATOMIC_INC
76 (0x4c) -- FLAT_ATOMIC_DEC
80 (0x50) FLAT_ATOMIC_SWAP_X2 --
81 (0x51) FLAT_ATOMIC_CMPSWAP_X2 --
82 (0x52) FLAT_ATOMIC_ADD_X2 --
84 (0x54) FLAT_ATOMIC_SUB_X2 --
85 (0x55) FLAT_ATOMIC_SMIN_X2 --
86 (0x56) FLAT_ATOMIC_UMIN_X2 --
87 (0x57) FLAT_ATOMIC_SMAX_X2 --
88 (0x58) FLAT_ATOMIC_UMAX_X2 --
89 (0x59) FLAT_ATOMIC_AND_X2 --
90 (0x5a) FLAT_ATOMIC_OR_X2 --
91 (0x5b) FLAT_ATOMIC_XOR_X2 --
92 (0x5c) FLAT_ATOMIC_INC_X2 --
93 (0x5d) FLAT_ATOMIC_DEC_X2 --
94 (0x5e) FLAT_ATOMIC_FCMPSWAP_X2 --
95 (0x5f) FLAT_ATOMIC_FMIN_X2 --
96 (0x60) FLAT_ATOMIC_FMAX_X2 FLAT_ATOMIC_SWAP_X2
81 (0x61) -- FLAT_ATOMIC_CMPSWAP_X2
82 (0x62) -- FLAT_ATOMIC_ADD_X2
83 (0x63) -- FLAT_ATOMIC_SUB_X2
84 (0x64) -- FLAT_ATOMIC_SMIN_X2
85 (0x65) -- FLAT_ATOMIC_UMIN_X2
86 (0x66) -- FLAT_ATOMIC_SMAX_X2
87 (0x67) -- FLAT_ATOMIC_UMAX_X2
88 (0x68) -- FLAT_ATOMIC_AND_X2
89 (0x69) -- FLAT_ATOMIC_OR_X2
90 (0x6a) -- FLAT_ATOMIC_XOR_X2
91 (0x6b) -- FLAT_ATOMIC_INC_X2
92 (0x6c) -- FLAT_ATOMIC_DEC_X2

Instruction set

Alphabetically sorted instruction list:

FLAT_ATOMIC_SWAP

Opcode: 48 (0x30) for GCN 1.0/1.1; 64 (0x40) for GCN 1.2
Syntax: FLAT_ATOMIC_SWAP VDST, VADDR(2), VDATA Description: Store VDATA dword into VADDR address. If GLC flag is set then return previous value from resource to VDST, otherwise store old value from VDATA to VDST. Operation is atomic.
Operation:
UINT32* VM = (UINT32*)VADDR UINT32 P = *VM; *VM = VDATA; VDST = (GLC) ? P : VDATA // atomic

FLAT_LOAD_DWORD

Opcode: 12 (0xc) for GCN 1.1; 20 (0x14) for GCN 1.2
Syntax: FLAT_LOAD_DWORD VDST, VADDR(2)
Description Load dword to VDST from VADDR address.
Operation:
VDST = *(UINT32*)VADDR

FLAT_LOAD_DWORDX2

Opcode: 13 (0xd) for GCN 1.1; 21 (0x15) for GCN 1.2
Syntax: FLAT_LOAD_DWORDX2 VDST(, VADDR(2)
Description Load two dwords to VDST from VADDR address.
Operation:
VDST = *(UINT64*)VADDR

FLAT_LOAD_DWORDX3

Opcode: 15 (0xf) for GCN 1.1; 22 (0x16) for GCN 1.2
Syntax: FLAT_LOAD_DWORDX3 VDST(3), VADDR(2)
Description Load three dwords to VDST from VADDR address.
Operation:
VDST[0] = *(UINT32*)VADDR VDST[1] = *(UINT32*)(VADDR+4) VDST[2] = *(UINT32*)(VADDR+8)

FLAT_LOAD_DWORDX4

Opcode: 13 (0xe) for GCN 1.1; 23 (0x17) for GCN 1.2
Syntax: FLAT_LOAD_DWORDX4 VDST(4), VADDR(2)
Description Load four dwords to VDST from VADDR address.
Operation:
VDST[0] = *(UINT32*)VADDR VDST[1] = *(UINT32*)(VADDR+4) VDST[2] = *(UINT32*)(VADDR+8) VDST[3] = *(UINT32*)(VADDR+12)

FLAT_LOAD_SBYTE

Opcode: 9 (0x9) for GCN 1.1; 17 (0x11) for GCN 1.2
Syntax: FLAT_LOAD_SBYTE VDST, VADDR(2)
Description: Load byte to VDST from VADDR address with sign extending.
Operation:
VDST = *(INT8*)VADDR

FLAT_LOAD_SSHORT

Opcode: 11 (0xb) for GCN 1.1; 19 (0x13) for GCN 1.2
Syntax: FLAT_LOAD_SSHORT VDST, VADDR(2)
Description: Load 16-bit word to VDST from VADDR address with sign extending.
Operation:
VDST = *(INT16*)VADDR

FLAT_LOAD_UBYTE

Opcode: 8 (0x8) for GCN 1.1; 16 (0x10) for GCN 1.2
Syntax: FLAT_LOAD_UBYTE VDST, VADDR(2)
Description: Load byte to VDST from VADDR address with zero extending.
Operation:
VDST = *(UINT8*)VADDR

FLAT_LOAD_USHORT

Opcode: 10 (0xa) for GCN 1.1; 18 (0x12) for GCN 1.2
Syntax: FLAT_LOAD_USHORT VDST, VADDR(1:2)
Description: Load 16-bit word to VDST from VADDR address with zero extending.
Operation:
VDST = *(UINT16*)VADDR

FLAT_STORE_BYTE

Opcode: 24 (0x18)
Syntax: FLAT_STORE_BYTE VADDR(2), VDATA
Description: Store byte from VDATA to VADDR address.
Operation:
*(UINT8*)VADDR = VDATA&0xff

FLAT_STORE_DWORD

Opcode: 28 (0x1c)
Syntax: FLAT_STORE_DWORD VADDR(2), VDATA
Description: Store dword from VDATA to VADDR address.
Operation:
*(UINT32*)VADDR = VDATA

FLAT_STORE_DWORDX2

Opcode: 29 (0x1d)
Syntax: FLAT_STORE_DWORDX2 VADDR(2), VDATA(2)
Description: Store two dwords from VDATA to VADDR address.
Operation:
*(UINT64*)VADDR = VDATA

FLAT_STORE_DWORDX3

Opcode: 31 (0x1f) for GCN 1.1; 30 (0x1e) for GCN 1.2
Syntax: FLAT_STORE_DWORDX3 VADDR(2), VDATA(3)
Description: Store three dwords from VDATA to VADDR address.
Operation:
*(UINT32*)(VADDR) = VDATA[0] *(UINT32*)(VADDR+4) = VDATA[1] *(UINT32*)(VADDR+8) = VDATA[2]

FLAT_STORE_DWORDX4

Opcode: 30 (0x1e) for GCN 1.1; 31 (0x1d) for GCN 1.2
Syntax: FLAT_STORE_DWORDX4 VADDR(2), VDATA(4)
Description: Store three dwords from VDATA to VADDR address.
Operation:
*(UINT32*)(VADDR) = VDATA[0] *(UINT32*)(VADDR+4) = VDATA[1] *(UINT32*)(VADDR+8) = VDATA[2] *(UINT32*)(VADDR+12) = VDATA[3]

FLAT_STORE_SHORT

Opcode: 26 (0x1a)
Syntax: FLAT_STORE_SHORT VADDR(2), VDATA
Description: Store 16-bit word from VDATA to VADDR address.
Operation:
*(UINT16*)VADDR = VDATA&0xffff