Changes between Version 5 and Version 6 of GcnInstrsSmem
- Timestamp:
- 11/23/17 23:00:34 (6 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
GcnInstrsSmem
v5 v6 24 24 </tr> 25 25 <tr> 26 <td>14</td> 27 <td>SOE</td> 28 <td>Scalar offset enable (GCN 1.4)</td> 29 </tr> 30 <tr> 31 <td>15</td> 32 <td>NV</td> 33 <td>Non-volative (GCN 1.4)</td> 34 </tr> 35 <tr> 26 36 <td>16</td> 27 37 <td>GLC</td> … … 47 57 <td>OFFSET</td> 48 58 <td>Unsigned 20-bit byte offset or SGPR number that holds byte offset</td> 59 </tr> 60 <tr> 61 <td>32-52</td> 62 <td>OFFSET</td> 63 <td>Unsigned 21-bit byte offset or SGPR number (byte offset) (GCN 1.4)</td> 64 </tr> 65 <tr> 66 <td>57-63</td> 67 <td>SOFFSET</td> 68 <td>SGPR offset (only if SOE=1)</td> 49 69 </tr> 50 70 </tbody> … … 58 78 16-bit size. For S_BUFFER_LOAD_DWORD* instructions, 4 SBASE SGPRs holds a 59 79 buffer descriptor. In this case, SBASE must be a multipla of 2. 60 S_STORE_* and S_BUFFER_STORE_* accepts only M0 as offset register.</p> 80 S_STORE_* and S_BUFFER_STORE_* accepts only M0 as offset register for GCN 1.2. 81 In GCN 1.4 S_STORE_* and S_BUFFER_STORE_* accepts also SGPR as offset register.</p> 61 82 <p>The SMEM instructions can return the result data out of the order. Any SMEM operation 62 83 (including S_MEMTIME) increments LGKM_CNT counter. The best way to wait for results … … 73 94 <tr> 74 95 <th>Opcode</th> 75 <th> Mnemonic (GCN1.2)</th>96 <th>GCN 1.2</th> 76 97 </tr> 77 98 </thead> … … 79 100 <tr> 80 101 <td>0 (0x0)</td> 81 <td> S_LOAD_DWORD</td>102 <td>✓</td> 82 103 </tr> 83 104 <tr> 84 105 <td>1 (0x1)</td> 85 <td> S_LOAD_DWORDX2</td>106 <td>✓</td> 86 107 </tr> 87 108 <tr> 88 109 <td>2 (0x2)</td> 89 <td> S_LOAD_DWORDX4</td>110 <td>✓</td> 90 111 </tr> 91 112 <tr> 92 113 <td>3 (0x3)</td> 93 <td> S_LOAD_DWORDX8</td>114 <td>✓</td> 94 115 </tr> 95 116 <tr> 96 117 <td>4 (0x4)</td> 97 <td> S_LOAD_DWORDX16</td>118 <td>✓</td> 98 119 </tr> 99 120 <tr> 100 121 <td>8 (0x8)</td> 101 <td> S_BUFFER_LOAD_DWORD</td>122 <td>✓</td> 102 123 </tr> 103 124 <tr> 104 125 <td>9 (0x9)</td> 105 <td> S_BUFFER_LOAD_DWORDX2</td>126 <td>✓</td> 106 127 </tr> 107 128 <tr> 108 129 <td>10 (0xa)</td> 109 <td> S_BUFFER_LOAD_DWORDX4</td>130 <td>✓</td> 110 131 </tr> 111 132 <tr> 112 133 <td>11 (0xb)</td> 113 <td> S_BUFFER_LOAD_DWORDX8</td>134 <td>✓</td> 114 135 </tr> 115 136 <tr> 116 137 <td>12 (0xc)</td> 117 <td> S_BUFFER_LOAD_DWORDX16</td>138 <td>✓</td> 118 139 </tr> 119 140 <tr> 120 141 <td>16 (0x10)</td> 121 <td> S_STORE_DWORD</td>142 <td>✓</td> 122 143 </tr> 123 144 <tr> 124 145 <td>17 (0x11)</td> 125 <td> S_STORE_DWORDX2</td>146 <td>✓</td> 126 147 </tr> 127 148 <tr> 128 149 <td>18 (0x12)</td> 129 <td> S_STORE_DWORDX4</td>150 <td>✓</td> 130 151 </tr> 131 152 <tr> 132 153 <td>24 (0x18)</td> 133 <td> S_BUFFER_LOAD_DWORD</td>154 <td>✓</td> 134 155 </tr> 135 156 <tr> 136 157 <td>25 (0x19)</td> 137 <td> S_BUFFER_LOAD_DWORDX2</td>158 <td>✓</td> 138 159 </tr> 139 160 <tr> 140 161 <td>27 (0x1a)</td> 141 <td> S_BUFFER_LOAD_DWORDX4</td>162 <td>✓</td> 142 163 </tr> 143 164 <tr> 144 165 <td>32 (0x20)</td> 145 <td> S_DCACHE_INV</td>166 <td>✓</td> 146 167 </tr> 147 168 <tr> 148 169 <td>33 (0x21)</td> 149 <td> S_DCACHE_WB</td>170 <td>✓</td> 150 171 </tr> 151 172 <tr> 152 173 <td>34 (0x22)</td> 153 <td> S_DCACHE_INV_VOL</td>174 <td>✓</td> 154 175 </tr> 155 176 <tr> 156 177 <td>35 (0x23)</td> 157 <td> S_DCACHE_WB_VOL</td>178 <td>✓</td> 158 179 </tr> 159 180 <tr> 160 181 <td>36 (0x24)</td> 161 <td> S_MEMTIME</td>182 <td>✓</td> 162 183 </tr> 163 184 <tr> 164 185 <td>37 (0x25)</td> 165 <td> S_MEMREALTIME</td>186 <td>✓</td> 166 187 </tr> 167 188 <tr> 168 189 <td>38 (0x26)</td> 169 <td> S_ATC_PROBE</td>190 <td>✓</td> 170 191 </tr> 171 192 <tr> 172 193 <td>39 (0x27)</td> 173 <td>S_ATC_PROBE_BUFFER</td> 194 <td>✓</td> 195 </tr> 196 <tr> 197 <td>40 (0x28)</td> 198 <td></td> 199 </tr> 200 <tr> 201 <td>41 (0x29)</td> 202 <td></td> 203 </tr> 204 <tr> 205 <td>128 (0x80)</td> 206 <td></td> 207 </tr> 208 <tr> 209 <td>129 (0x81)</td> 210 <td></td> 211 </tr> 212 <tr> 213 <td>130 (0x82)</td> 214 <td></td> 215 </tr> 216 <tr> 217 <td>131 (0x83)</td> 218 <td></td> 219 </tr> 220 <tr> 221 <td>132 (0x84)</td> 222 <td></td> 223 </tr> 224 <tr> 225 <td>133 (0x85)</td> 226 <td></td> 227 </tr> 228 <tr> 229 <td>134 (0x86)</td> 230 <td></td> 231 </tr> 232 <tr> 233 <td>135 (0x87)</td> 234 <td></td> 235 </tr> 236 <tr> 237 <td>136 (0x88)</td> 238 <td></td> 239 </tr> 240 <tr> 241 <td>137 (0x89)</td> 242 <td></td> 243 </tr> 244 <tr> 245 <td>138 (0x8a)</td> 246 <td></td> 247 </tr> 248 <tr> 249 <td>139 (0x8b)</td> 250 <td></td> 251 </tr> 252 <tr> 253 <td>140 (0x8c)</td> 254 <td></td> 255 </tr> 256 <tr> 257 <td>160 (0xa0)</td> 258 <td></td> 259 </tr> 260 <tr> 261 <td>161 (0xa1)</td> 262 <td></td> 263 </tr> 264 <tr> 265 <td>162 (0xa2)</td> 266 <td></td> 267 </tr> 268 <tr> 269 <td>163 (0xa3)</td> 270 <td></td> 271 </tr> 272 <tr> 273 <td>164 (0xa4)</td> 274 <td></td> 275 </tr> 276 <tr> 277 <td>165 (0xa5)</td> 278 <td></td> 279 </tr> 280 <tr> 281 <td>166 (0xa6)</td> 282 <td></td> 283 </tr> 284 <tr> 285 <td>167 (0xa7)</td> 286 <td></td> 287 </tr> 288 <tr> 289 <td>168 (0xa8)</td> 290 <td></td> 291 </tr> 292 <tr> 293 <td>169 (0xa9)</td> 294 <td></td> 295 </tr> 296 <tr> 297 <td>170 (0xaa)</td> 298 <td></td> 299 </tr> 300 <tr> 301 <td>171 (0xab)</td> 302 <td></td> 303 </tr> 304 <tr> 305 <td>172 (0xac)</td> 306 <td></td> 174 307 </tr> 175 308 </tbody> … … 237 370 <code>for (BYTE i = 0; i < 4; i++) 238 371 *(UINT32*)(SMEM + i*4 + (OFFSET & ~3)) = SDATA[i]</code></p> 372 <h4>S_DCACHE_DISCARD</h4> 373 <p>Opcode 40 (0x28) only for GCN 1.4<br /> 374 Syntax: S_DCACHE_DISCARD SBASE(2), SOFFSET1<br /> 375 Description: Discard one dirty scalar data cache line. A cache line is 64 376 bytes. Address calculated as S_STORE_DWORD with alignment to 64-byte boundary. 377 LGKM count is incremented by 1 for this opcode.</p> 378 <h4>S_DCACHE_DISCARD_X2</h4> 379 <p>Opcode 41 (0x29) only for GCN 1.4<br /> 380 Syntax: S_DCACHE_DISCARD_X2 SBASE(2), SOFFSET1<br /> 381 Description: Discard two dirty scalar data cache lines. A cache line is 64 382 bytes. Address calculated as S_STORE_DWORD with alignment to 64-byte boundary. 383 LGKM count is incremented by 1 for this opcode.</p> 239 384 <h4>S_DCACHE_INV</h4> 240 385 <p>Opcode: 32 (0x20)<br /> … … 295 440 <p>Opcode: 16 (0x10)<br /> 296 441 Syntax: S_STORE_DWORD SDATA, SBASE(2), OFFSET<br /> 297 Description: Store single dword to memory. It accepts only offset as M0 or any immediate.<br />298 SBASE is buffer descriptor.<br />442 Description: Store single dword to memory. 443 It accepts only offset as M0 or any immediate (only GCN 1.2).<br /> 299 444 Operation:<br /> 300 445 <code>*(UINT32*)(SMEM + (OFFSET & ~3)) = SDATA</code></p> … … 302 447 <p>Opcode: 17 (0x11)<br /> 303 448 Syntax: S_STORE_DWORDX2 SDATA(2), SBASE(2), OFFSET<br /> 304 Description: Store two dwords to memory. It accepts only offset as M0 or any immediate.<br /> 449 Description: Store two dwords to memory. 450 It accepts only offset as M0 or any immediate (only GCN 1.2).<br /> 305 451 Operation:<br /> 306 452 <code>*(UINT64*)(SMEM + (OFFSET & ~3)) = SDATA</code></p> … … 308 454 <p>Opcode: 18 (0x12)<br /> 309 455 Syntax: S_STORE_DWORDX4 SDATA(4), SBASE(2), OFFSET<br /> 310 Description: Store four dwords to memory. It accepts only offset as M0 or any immediate.<br /> 456 Description: Store four dwords to memory. 457 It accepts only offset as M0 or any immediate (only GCN 1.2).<br /> 311 458 Operation:<br /> 312 459 <code>for (BYTE i = 0; i < 4; i++)