wiki:GcnInstrsMubuf

Version 1 (modified by trac, 8 years ago) (diff)

--

Back to Table of content

GCN ISA MUBUF instructions

These instructions allow to access to main video memory. MUBUF instructions operates on the buffer resources. The buffer resources are 4 dwords which holds the base address, buffer size, their structure and format of their data.

List of fields for the MUBUF encoding (GCN 1.0/1.1):

Bits Name Description
0-11 OFFSET Unsigned byte offset
12 OFFEN If set, send additional offset from VADDR
13 IDXEN If set, send index from VADDR
14 GLC Operation globally coherent
15 ADDR64 If set, address is 64-bit
16 LDS Data is read from or written to LDS, otherwise from VGPR
18-24 OPCODE Operation code
26-31 ENCODING Encoding type. Must be 0b111000
32-39 VADDR Vector address registers
40-47 VDATA Vector data register
48-52 SRSRC Scalar registers with buffer resource (SGPR# is 4*value)
54 SLC System level coherent
55 TFE Texture Fail Enable ???
56-63 SOFFSET Scalar base offset operand

List of fields for the MUBUF encoding (GCN 1.2):

Bits Name Description
0-11 OFFSET Unsigned byte offset
12 OFFEN If set, send additional offset from VADDR
13 IDXEN If set, send index from VADDR
14 GLC Operation globally coherent
16 LDS Data is read from or written to LDS, otherwise from VGPR
17 SLC System level coherent
18-24 OPCODE Operation code
26-31 ENCODING Encoding type. Must be 0b111000
32-39 VADDR Vector address registers
40-47 VDATA Vector data register
48-52 SRSRC Scalar registers with buffer resource (SGPR# is 4*value)
55 TFE Texture Fail Enable ???
56-63 SOFFSET Scalar base offset operand

Instruction syntax: INSTRUCTION VDATA, VADDR, SRSRC, SOFFSET [MODIFIERS]

Modifiers can be supplied in any order. Modifiers list: OFFEN, IDXEN, SLC, GLC, TFE, ADDR64, LDS, OFFSET:OFFSET.

The buffer resource format:

Bits Name Description
0-47 BASE Base address
48-61 STRIDE Stride in bytes. Size of records
62 Cache swizzle Buffer access. Optionally, swizzle texture cache TC L1 cache banks
63 Swizzle enable If set, enable swizzle addressing mode
64-95 NUMRECORDS Number of records (size of the buffer)
96-98 DST_SEL_X Select destination component for X
99-101 DST_SEL_Y Select destination component for Y
102-104 DST_SEL_Z Select destination component for Z
105-107 DST_SEL_W Select destination component for W
108-110 NUMFORMAT Number format
111-114 DATAFORMAT Data format
115-116 ELEMSIZE Element size (used only by swizzle mode)
117-118 INDEXSTRIDE Index stride (used only by swizzle mode)
119 TID_ENABLE Add thread id to index
121 Hash enable Enable address hashing
122 HEAP Buffer is heap
126-127 TYPE Resource type. 0 - for buffer

Instructions by opcode

List of the MUBUF instructions by opcode (GCN 1.0/1.1):

Opcode GCN 1.0 GCN 1.1 Mnemonic
0 (0x0) BUFFER_LOAD_FORMAT_X
1 (0x1) BUFFER_LOAD_FORMAT_XY
2 (0x2) BUFFER_LOAD_FORMAT_XYZ
3 (0x3) BUFFER_LOAD_FORMAT_XYZW
4 (0x4) BUFFER_STORE_FORMAT_X
5 (0x5) BUFFER_STORE_FORMAT_XY
6 (0x6) BUFFER_STORE_FORMAT_XYZ
7 (0x7) BUFFER_STORE_FORMAT_XYZW
8 (0x8) BUFFER_LOAD_UBYTE
9 (0x9) BUFFER_LOAD_SBYTE
10 (0xa) BUFFER_LOAD_USHORT
11 (0xb) BUFFER_LOAD_SSHORT
12 (0xc) BUFFER_LOAD_DWORD
13 (0xd) BUFFER_LOAD_DWORDX2
14 (0xe) BUFFER_LOAD_DWORDX4
15 (0xf) BUFFER_LOAD_DWORDX3
24 (0x18) BUFFER_STORE_BYTE
26 (0x1a) BUFFER_STORE_SHORT
28 (0x1c) BUFFER_STORE_DWORD
29 (0x1d) BUFFER_STORE_DWORDX2
30 (0x1e) BUFFER_STORE_DWORDX4
31 (0x1f) BUFFER_STORE_DWORDX3
48 (0x30) BUFFER_ATOMIC_SWAP
49 (0x31) BUFFER_ATOMIC_CMPSWAP
50 (0x32) BUFFER_ATOMIC_ADD
51 (0x33) BUFFER_ATOMIC_SUB
52 (0x34) BUFFER_ATOMIC_RSUB
53 (0x35) BUFFER_ATOMIC_SMIN
54 (0x36) BUFFER_ATOMIC_UMIN
55 (0x37) BUFFER_ATOMIC_SMAX
56 (0x38) BUFFER_ATOMIC_UMAX
57 (0x39) BUFFER_ATOMIC_AND
58 (0x3a) BUFFER_ATOMIC_OR
59 (0x3b) BUFFER_ATOMIC_XOR
60 (0x3c) BUFFER_ATOMIC_INC
61 (0x3d) BUFFER_ATOMIC_DEC
62 (0x3e) BUFFER_ATOMIC_FCMPSWAP
63 (0x3f) BUFFER_ATOMIC_FMIN
64 (0x40) BUFFER_ATOMIC_FMAX
80 (0x50) BUFFER_ATOMIC_SWAP_X2
81 (0x51) BUFFER_ATOMIC_CMPSWAP_X2
82 (0x52) BUFFER_ATOMIC_ADD_X2
83 (0x53) BUFFER_ATOMIC_SUB_X2
84 (0x54) BUFFER_ATOMIC_RSUB_X2
85 (0x55) BUFFER_ATOMIC_SMIN_X2
86 (0x56) BUFFER_ATOMIC_UMIN_X2
87 (0x57) BUFFER_ATOMIC_SMAX_X2
88 (0x58) BUFFER_ATOMIC_UMAX_X2
89 (0x59) BUFFER_ATOMIC_AND_X2
90 (0x5a) BUFFER_ATOMIC_OR_X2
91 (0x5b) BUFFER_ATOMIC_XOR_X2
92 (0x5c) BUFFER_ATOMIC_INC_X2
93 (0x5d) BUFFER_ATOMIC_DEC_X2
94 (0x5e) BUFFER_ATOMIC_FCMPSWAP_X2
95 (0x5f) BUFFER_ATOMIC_FMIN_X2
96 (0x60) BUFFER_ATOMIC_FMAX_X2
112 (0x70) BUFFER_WBINVL1_SC
113 (0x71) BUFFER_WBINVL1

List of the MUBUF instructions by opcode (GCN 1.0/1.1):

Opcode Mnemonic
0 (0x0) BUFFER_LOAD_FORMAT_X
1 (0x1) BUFFER_LOAD_FORMAT_XY
2 (0x2) BUFFER_LOAD_FORMAT_XYZ
3 (0x3) BUFFER_LOAD_FORMAT_XYZW
4 (0x4) BUFFER_STORE_FORMAT_X
5 (0x5) BUFFER_STORE_FORMAT_XY
6 (0x6) BUFFER_STORE_FORMAT_XYZ
7 (0x7) BUFFER_STORE_FORMAT_XYZW
8 (0x8) BUFFER_LOAD_FORMAT_D16_X
9 (0x9) BUFFER_LOAD_FORMAT_D16_XY
10 (0xa) BUFFER_LOAD_FORMAT_D16_XYZ
11 (0xb) BUFFER_LOAD_FORMAT_D16_XYZW
12 (0xc) BUFFER_STORE_FORMAT_D16_X
13 (0xd) BUFFER_STORE_FORMAT_D16_XY
14 (0xe) BUFFER_STORE_FORMAT_D16_XYZ
15 (0xf) BUFFER_STORE_FORMAT_D16_XYZW
16 (0x10) BUFFER_LOAD_UBYTE
17 (0x11) BUFFER_LOAD_SBYTE
18 (0x12) BUFFER_LOAD_USHORT
19 (0x13) BUFFER_LOAD_SSHORT
20 (0x14) BUFFER_LOAD_DWORD
21 (0x15) BUFFER_LOAD_DWORDX2
22 (0x16) BUFFER_LOAD_DWORDX3
23 (0x17) BUFFER_LOAD_DWORDX4
24 (0x18) BUFFER_STORE_BYTE
26 (0x1a) BUFFER_STORE_SHORT
28 (0x1c) BUFFER_STORE_DWORD
29 (0x1d) BUFFER_STORE_DWORDX2
30 (0x1e) BUFFER_STORE_DWORDX3
31 (0x1f) BUFFER_STORE_DWORDX4
61 (0x3d) BUFFER_STORE_LDS_DWORD
62 (0x3e) BUFFER_WBINVL1
63 (0x3f) BUFFER_WBINVL1_VOL
64 (0x40) BUFFER_ATOMIC_SWAP
65 (0x41) BUFFER_ATOMIC_CMPSWAP
66 (0x42) BUFFER_ATOMIC_ADD
67 (0x43) BUFFER_ATOMIC_SUB
68 (0x44) BUFFER_ATOMIC_SMIN
69 (0x45) BUFFER_ATOMIC_UMIN
70 (0x46) BUFFER_ATOMIC_SMAX
71 (0x47) BUFFER_ATOMIC_UMAX
72 (0x48) BUFFER_ATOMIC_AND
73 (0x49) BUFFER_ATOMIC_OR
74 (0x4a) BUFFER_ATOMIC_XOR
75 (0x4b) BUFFER_ATOMIC_INC
76 (0x4c) BUFFER_ATOMIC_DEC
96 (0x60) BUFFER_ATOMIC_SWAP_X2
97 (0x61) BUFFER_ATOMIC_CMPSWAP_X2
98 (0x62) BUFFER_ATOMIC_ADD_X2
99 (0x63) BUFFER_ATOMIC_SUB_X2
100 (0x64) BUFFER_ATOMIC_SMIN_X2
101 (0x65) BUFFER_ATOMIC_UMIN_X2
102 (0x66) BUFFER_ATOMIC_SMAX_X2
103 (0x67) BUFFER_ATOMIC_UMAX_X2
104 (0x68) BUFFER_ATOMIC_AND_X2
105 (0x69) BUFFER_ATOMIC_OR_X2
106 (0x6a) BUFFER_ATOMIC_XOR_X2
107 (0x6b) BUFFER_ATOMIC_INC_X2
108 (0x6c) BUFFER_ATOMIC_DEC_X2

Buffer addressing

The address depends on couple factors which are from instruction and the buffer resource. The buffer is organized in simple structure that contains records. Buffer address starts from the base offset given in buffer resource. The index indicates number of the record, and the offset indicates position in record. Expression that describes address:

BUFOFFSET = (UINT32)(AINDEX*STRIDE) + AOFFSET

Optionally, buffer can be addressed in the swizzle mode that is very effective manner to addressing the scratch buffers that holds spilled registers. Expression that describes swizzle mode addressing:

AINDEX_MSB = AINDEX / INDEXSTRIDE AINDEX_LSB = AINDEX % INDEXSTRIDE AOFFSET_MSB = AOFFSET / ELEMSIZE AOFFSET_LSB = AOFFSET % ELEMSIZE BUFOFFSET = AOFFSET_LSB + ELEMSIZE*AINDEX_LSB + INDEXSTRIDE * (AINDEX_MSB*STRIDE + \ AOFFSET_MSB * ELEMSIZE)

The 64-bit addressing can be enabled by set ADDR64 flag in instruction. In this case, two VADDR registers contains an address. Expression to calculate address in this case:

ADDRESS = BASE + VGPR_ADDRESS + OFFSET + SGPR_OFFSET // 64-bit addressing

No range checking for 64-bit mode addressing.

The base address are calculated as sum of BASE and SGPR offset. The instruction format supply OFFEN and IDXEN that includes index from VPGR registers. These flags are permitted only if 64-bit addressing is not enabled. Table describes combination of these flags:

IDXEN OFFEN Indexreg Offset reg
0 0 N/A N/A
1 0 VGPR[V] N/A
0 1 N/A VGPR[V]
1 1 VGPR[V] VGPR[V+1]

Expressions that describes offsets and indices:

UINT32 AOFFSET = OFFSET + (OFFEN ? VGPR_OFFSET : 0) UINT32 AINDEX = (IDXEN ? VGPR_INDEX : 0) + (TID_ENABLE ? LANEID : 0)

The hardware checks range for buffer resources with STRIDE=0 in following way: if BUFOFFSET >= NUMRECORDS-SGPR_OFFSET then an address is out of range. For STRIDE!=0 if AINDEX >= NUMRECORDS or OFFSET >= STRIDE when IDXEN or TID_ENABLE is set, then an address is out of range. Reads are zero and writes are ignored for an addresses out of range.

For 32-bit operations, an address are aligned to 4 bytes.

The coalescing works for STRIDE==0 on offset (hardware looks at offset), otherwise works if stride<=1 or swizzle mode enabled and all offsets are equal and ELEMSIZE have same value as size of element that can be operated by instruction. Then hardware coalesce across any set of contiguous indices for raw buffers. For swizzled buffers, it cannot coalesce across INDEXSTRIDE boundaries.

Instruction set

Alphabetically sorted instruction list: