source: CLRX/CLRadeonExtender/trunk/doc/GcnInstrsSmem.md @ 3144

Last change on this file since 3144 was 3144, checked in by matszpk, 2 years ago

CLRadeonExtender: Add chapter about SMEM encoding.

File size: 7.5 KB
RevLine 
[3144]1## GCN ISA SMRD instructions (GCN 1.2)
2
3The basic encoding of the SMRD instructions needs 4 bytes (dword). List of fields:
4
5Bits  | Name     | Description
6------|----------|------------------------------
70-5   | SBASE    | Number of aligned SGPR pair.
86-12  | SDATA    | Scalar destination/data operand
916    | GLC      | Operation globally coherent
1017    | IMM      | IMM indicator
1118-25 | OPCODE   | Operation code
1226-31 | ENCODING | Encoding type. Must be 0b110000
1332-51 | OFFSET   | Unsigned 20-bit byte offset or SGPR number that holds byte offset
14
15Value of the IMM determines meaning of the OFFSET field:
16
17* IMM=1 - OFFSET holds a dword offset to SBASE.
18* IMM=0 - OFFSET holds number of SGPR that holds byte offset to SBASE.
19
20For S_LOAD_DWORD\* instructions, 2 SBASE SGPRs holds an base 48-bit address and a
2116-bit size.
22For S_BUFFER_LOAD_DWORD\* instructions, 4 SBASE SGPRs holds a buffer descriptor.
23In this case, SBASE must be a multipla of 2.
24
25The SMEM instructions can return the result data out of the order. Any SMEM operation
26(including S_MEMTIME) increments LGKM_CNT counter. The best way to wait for results
27is `S_WAITCNT LGKMCNT(0)`.
28
29* LGKM_CNT incremented by one for every fetch of single Dword
30* LGKM_CNT incremented by two for every fetch of two or more Dwords
31
32NOTE: Between setting third dword from buffer resource and S_BUFFER_* instruction
33is required least one instruction (vector or scalar) due to delay.
34
35List of the instructions by opcode:
36
37 Opcode     | Mnemonic (GCN1.2)
38------------|--------------------------
39 0 (0x0)    | S_LOAD_DWORD
40 1 (0x1)    | S_LOAD_DWORDX2
41 2 (0x2)    | S_LOAD_DWORDX4
42 3 (0x3)    | S_LOAD_DWORDX8
43 4 (0x4)    | S_LOAD_DWORDX16
44 8 (0x8)    | S_BUFFER_LOAD_DWORD
45 9 (0x9)    | S_BUFFER_LOAD_DWORDX2
46 10 (0xa)   | S_BUFFER_LOAD_DWORDX4
47 11 (0xb)   | S_BUFFER_LOAD_DWORDX8
48 12 (0xc)   | S_BUFFER_LOAD_DWORDX16
49 16 (0x10)  | S_STORE_DWORD
50 17 (0x11)  | S_STORE_DWORDX2
51 18 (0x12)  | S_STORE_DWORDX4
52 24 (0x18)  | S_BUFFER_LOAD_DWORD
53 25 (0x19)  | S_BUFFER_LOAD_DWORDX2
54 27 (0x1a)  | S_BUFFER_LOAD_DWORDX4
55 32 (0x20)  | S_DCACHE_INV
56 33 (0x21)  | S_DCACHE_WB
57 34 (0x22)  | S_DCACHE_INV_VOL
58 35 (0x23)  | S_DCACHE_WB_VOL
59 36 (0x24)  | S_MEMTIME
60 37 (0x25)  | S_MEMREALTIME
61 38 (0x26)  | S_ATC_PROBE
62 39 (0x27)  | S_ATC_PROBE_BUFFER
63
64### Instruction set
65
66Alphabetically sorted instruction list:
67
68#### S_BUFFER_LOAD_DWORD
69
70Opcode: 8 (0x8) 
71Syntax: S_BUFFER_LOAD_DWORD SDATA, SBASE(4), OFFSET 
72Description: Load single dword from read-only memory through constant cache (kcache).
73SBASE is buffer descriptor. 
74Operation: 
75```
76SDATA = *(UINT32*)(SMEM + (OFFSET & 3))
77```
78
79#### S_BUFFER_LOAD_DWORDX16
80
81Opcode: 12 (0xc) 
82Syntax: S_BUFFER_LOAD_DWORDX16 SDATA(16), SBASE(4), OFFSET 
83Description: Load 16 dwords from read-only memory through constant cache (kcache).
84SBASE is buffer descriptor. 
85Operation: 
86```
87for (BYTE i = 0; i < 16; i++)
88    SDATA[i] = *(UINT32*)(SMEM + i*4 + (OFFSET & 3))
89```
90
91#### S_BUFFER_LOAD_DWORDX2
92
93Opcode: 9 (0x9) 
94Syntax: S_BUFFER_LOAD_DWORDX2 SDATA(2), SBASE(4), OFFSET 
95Description: Load two dwords from read-only memory through constant cache (kcache).
96SBASE is buffer descriptor. 
97Operation: 
98```
99SDATA = *(UINT64*)(SMEM + (OFFSET & 3))
100```
101
102#### S_BUFFER_LOAD_DWORDX4
103
104Opcode: 10 (0xa) 
105Syntax: S_BUFFER_LOAD_DWORDX4 SDATA(4), SBASE(4), OFFSET 
106Description: Load four dwords from read-only memory through constant cache (kcache).
107SBASE is buffer descriptor. 
108Operation: 
109```
110for (BYTE i = 0; i < 4; i++)
111    SDATA[i] = *(UINT32*)(SMEM + i*4 + (OFFSET & 3))
112```
113
114#### S_BUFFER_LOAD_DWORDX8
115
116Opcode: 11 (0xb) 
117Syntax: S_BUFFER_LOAD_DWORDX8 SDATA(8), SBASE(4), OFFSET 
118Description: Load eight dwords from read-only memory through constant cache (kcache).
119SBASE is buffer descriptor. 
120Operation: 
121```
122for (BYTE i = 0; i < 8; i++)
123    SDATA[i] = *(UINT32*)(SMEM + i*4 + (OFFSET & 3))
124```
125
126#### S_BUFFER_STORE_DWORD
127
128Opcode: 24 (0x18) 
129Syntax: S_BUFFER_STORE_DWORD SDATA, SBASE(4), OFFSET 
130Description: Store single dword to memory. It accepts only offset as M0 or any immediate.
131SBASE is buffer descriptor. 
132Operation: 
133```
134*(UINT32*)(SMEM + (OFFSET & 3)) = SDATA
135```
136
137#### S_BUFFER_STORE_DWORDX2
138
139Opcode: 25 (0x19) 
140Syntax: S_BUFFER_STORE_DWORDX2 SDATA(2), SBASE(4), OFFSET 
141Description: Store two dwords to memory. It accepts only offset as M0 or any immediate.
142SBASE is buffer descriptor. 
143Operation: 
144```
145*(UINT64*)(SMEM + (OFFSET & 3)) = SDATA
146```
147
148#### S_BUFFER_STORE_DWORDX4
149
150Opcode: 26 (0x1a) 
151Syntax: S_BUFFER_STORE_DWORDX4 SDATA(4), SBASE(4), OFFSET 
152Description: Store four dwords to memory. It accepts only offset as M0 or any immediate.
153SBASE is buffer descriptor. 
154Operation: 
155```
156for (BYTE i = 0; i < 4; i++)
157    *(UINT32*)(SMEM + i*4 + (OFFSET & 3)) = SDATA
158```
159
160#### S_DCACHE_INV
161
162Opcode: 32 (0x20) 
163Syntax: S_DCACHE_INV 
164Description: Invalidate entire L1 K cache.
165
166#### S_DCACHE_INV_VOL
167
168Opcode: 34 (0x22) 
169Syntax: S_DCACHE_INV_VOL 
170Description: Invalidate all volatile lines in L1 K cache.
171
172
173#### S_LOAD_DWORD
174
175Opcode: 0 (0x0) 
176Syntax: S_LOAD_DWORD SDATA, SBASE(2), OFFSET 
177Description: Load single dword from read-only memory through constant cache (kcache). 
178Operation: 
179```
180SDATA = *(UINT32*)(SMEM + (OFFSET & 3))
181```
182
183#### S_LOAD_DWORDX16
184
185Opcode: 4 (0x4) 
186Syntax: S_LOAD_DWORDX16 SDATA(16), SBASE(2), OFFSET 
187Description: Load 16 dwords from read-only memory through constant cache (kcache). 
188Operation: 
189```
190for (BYTE i = 0; i < 16; i++)
191    SDATA[i] = *(UINT32*)(SMEM + i*4 + (OFFSET & 3))
192```
193
194#### S_LOAD_DWORDX2
195
196Opcode: 1 (0x1) 
197Syntax: S_LOAD_DWORDX2 SDATA(2), SBASE(2), OFFSET 
198Description: Load two dwords from read-only memory through constant cache (kcache). 
199```
200SDATA = *(UINT64*)(SMEM + (OFFSET & 3))
201```
202
203#### S_LOAD_DWORDX4
204
205Opcode: 2 (0x2) 
206Syntax: S_LOAD_DWORDX4 SDATA(4), SBASE(2), OFFSET 
207Description: Load four dwords from read-only memory through constant cache (kcache). 
208Operation: 
209```
210for (BYTE i = 0; i < 4; i++)
211    SDATA[i] = *(UINT32*)(SMEM + i*4 + (OFFSET & 3))
212```
213
214#### S_LOAD_DWORDX8
215
216Opcode: 3 (0x3) 
217Syntax: S_LOAD_DWORDX8 SDATA(8), SBASE(2), OFFSET 
218Description: Load eight dwords from read-only memory through constant cache (kcache). 
219Operation: 
220```
221for (BYTE i = 0; i < 8; i++)
222    SDATA[i] = *(UINT32*)(SMEM + i*4 + (OFFSET & 3))
223```
224
225#### S_MEMREALTIME
226
227Opcode: 37 (0x25) 
228Syntax: S_MEMREALTIME SDATA(2) 
229Description: Store value of 64-bit RTC counter to SDATA.
230Before reading result, S_WAITCNT LGKMCNT(0) is required. 
231Operation: 
232```
233SDATA = CLOCKCNT
234```
235
236#### S_MEMTIME
237
238Opcode: 36 (0x24) 
239Syntax: S_MEMTIME SDATA(2) 
240Description: Store value of 64-bit clock counter to SDATA.
241This "time" is a free-running clock counter based on the shader core clock.
242Before reading result, S_WAITCNT LGKMCNT(0) is required. 
243Operation: 
244```
245SDATA = CLOCKCNT
246```
247
248#### S_STORE_DWORD
249
250Opcode: 16 (0x10) 
251Syntax: S_STORE_DWORD SDATA, SBASE(2), OFFSET 
252Description: Store single dword to memory. It accepts only offset as M0 or any immediate. 
253SBASE is buffer descriptor. 
254Operation: 
255```
256*(UINT32*)(SMEM + (OFFSET & 3)) = SDATA
257```
258
259#### S_STORE_DWORDX2
260
261Opcode: 17 (0x11) 
262Syntax: S_STORE_DWORDX2 SDATA(2), SBASE(2), OFFSET 
263Description: Store two dwords to memory. It accepts only offset as M0 or any immediate. 
264Operation: 
265```
266*(UINT64*)(SMEM + (OFFSET & 3)) = SDATA
267```
268
269#### S_STORE_DWORDX4
270
271Opcode: 18 (0x12) 
272Syntax: S_STORE_DWORDX4 SDATA(4), SBASE(2), OFFSET 
273Description: Store four dwords to memory. It accepts only offset as M0 or any immediate. 
274Operation: 
275```
276for (BYTE i = 0; i < 4; i++)
277    *(UINT32*)(SMEM + i*4 + (OFFSET & 3)) = SDATA
278```
Note: See TracBrowser for help on using the repository browser.