Changes between Initial Version and Version 1 of GcnInstrsSmem


Ignore:
Timestamp:
06/08/17 22:00:27 (7 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GcnInstrsSmem

    v1 v1  
     1[wiki:ClrxToc Back to Table of content]
     2{{{
     3#!html
     4<h2>GCN ISA SMRD instructions (GCN 1.2)</h2>
     5<p>The basic encoding of the SMRD instructions needs 4 bytes (dword). List of fields:</p>
     6<table>
     7<thead>
     8<tr>
     9<th>Bits</th>
     10<th>Name</th>
     11<th>Description</th>
     12</tr>
     13</thead>
     14<tbody>
     15<tr>
     16<td>0-5</td>
     17<td>SBASE</td>
     18<td>Number of aligned SGPR pair.</td>
     19</tr>
     20<tr>
     21<td>6-12</td>
     22<td>SDATA</td>
     23<td>Scalar destination/data operand</td>
     24</tr>
     25<tr>
     26<td>16</td>
     27<td>GLC</td>
     28<td>Operation globally coherent</td>
     29</tr>
     30<tr>
     31<td>17</td>
     32<td>IMM</td>
     33<td>IMM indicator</td>
     34</tr>
     35<tr>
     36<td>18-25</td>
     37<td>OPCODE</td>
     38<td>Operation code</td>
     39</tr>
     40<tr>
     41<td>26-31</td>
     42<td>ENCODING</td>
     43<td>Encoding type. Must be 0b110000</td>
     44</tr>
     45<tr>
     46<td>32-51</td>
     47<td>OFFSET</td>
     48<td>Unsigned 20-bit byte offset or SGPR number that holds byte offset</td>
     49</tr>
     50</tbody>
     51</table>
     52<p>Value of the IMM determines meaning of the OFFSET field:</p>
     53<ul>
     54<li>IMM=1 - OFFSET holds a byte offset to SBASE.</li>
     55<li>IMM=0 - OFFSET holds number of SGPR that holds byte offset to SBASE.</li>
     56</ul>
     57<p>For S_LOAD_DWORD* instructions, 2 SBASE SGPRs holds an base 48-bit address and a
     5816-bit size.
     59For S_BUFFER_LOAD_DWORD* instructions, 4 SBASE SGPRs holds a buffer descriptor.
     60In this case, SBASE must be a multipla of 2.</p>
     61<p>The SMEM instructions can return the result data out of the order. Any SMEM operation
     62(including S_MEMTIME) increments LGKM_CNT counter. The best way to wait for results
     63is <code>S_WAITCNT LGKMCNT(0)</code>.</p>
     64<ul>
     65<li>LGKM_CNT incremented by one for every fetch of single Dword</li>
     66<li>LGKM_CNT incremented by two for every fetch of two or more Dwords</li>
     67</ul>
     68<p>NOTE: Between setting third dword from buffer resource and S_BUFFER_* instruction
     69is required least one instruction (vector or scalar) due to delay.</p>
     70<p>List of the instructions by opcode:</p>
     71<table>
     72<thead>
     73<tr>
     74<th>Opcode</th>
     75<th>Mnemonic (GCN1.2)</th>
     76</tr>
     77</thead>
     78<tbody>
     79<tr>
     80<td>0 (0x0)</td>
     81<td>S_LOAD_DWORD</td>
     82</tr>
     83<tr>
     84<td>1 (0x1)</td>
     85<td>S_LOAD_DWORDX2</td>
     86</tr>
     87<tr>
     88<td>2 (0x2)</td>
     89<td>S_LOAD_DWORDX4</td>
     90</tr>
     91<tr>
     92<td>3 (0x3)</td>
     93<td>S_LOAD_DWORDX8</td>
     94</tr>
     95<tr>
     96<td>4 (0x4)</td>
     97<td>S_LOAD_DWORDX16</td>
     98</tr>
     99<tr>
     100<td>8 (0x8)</td>
     101<td>S_BUFFER_LOAD_DWORD</td>
     102</tr>
     103<tr>
     104<td>9 (0x9)</td>
     105<td>S_BUFFER_LOAD_DWORDX2</td>
     106</tr>
     107<tr>
     108<td>10 (0xa)</td>
     109<td>S_BUFFER_LOAD_DWORDX4</td>
     110</tr>
     111<tr>
     112<td>11 (0xb)</td>
     113<td>S_BUFFER_LOAD_DWORDX8</td>
     114</tr>
     115<tr>
     116<td>12 (0xc)</td>
     117<td>S_BUFFER_LOAD_DWORDX16</td>
     118</tr>
     119<tr>
     120<td>16 (0x10)</td>
     121<td>S_STORE_DWORD</td>
     122</tr>
     123<tr>
     124<td>17 (0x11)</td>
     125<td>S_STORE_DWORDX2</td>
     126</tr>
     127<tr>
     128<td>18 (0x12)</td>
     129<td>S_STORE_DWORDX4</td>
     130</tr>
     131<tr>
     132<td>24 (0x18)</td>
     133<td>S_BUFFER_LOAD_DWORD</td>
     134</tr>
     135<tr>
     136<td>25 (0x19)</td>
     137<td>S_BUFFER_LOAD_DWORDX2</td>
     138</tr>
     139<tr>
     140<td>27 (0x1a)</td>
     141<td>S_BUFFER_LOAD_DWORDX4</td>
     142</tr>
     143<tr>
     144<td>32 (0x20)</td>
     145<td>S_DCACHE_INV</td>
     146</tr>
     147<tr>
     148<td>33 (0x21)</td>
     149<td>S_DCACHE_WB</td>
     150</tr>
     151<tr>
     152<td>34 (0x22)</td>
     153<td>S_DCACHE_INV_VOL</td>
     154</tr>
     155<tr>
     156<td>35 (0x23)</td>
     157<td>S_DCACHE_WB_VOL</td>
     158</tr>
     159<tr>
     160<td>36 (0x24)</td>
     161<td>S_MEMTIME</td>
     162</tr>
     163<tr>
     164<td>37 (0x25)</td>
     165<td>S_MEMREALTIME</td>
     166</tr>
     167<tr>
     168<td>38 (0x26)</td>
     169<td>S_ATC_PROBE</td>
     170</tr>
     171<tr>
     172<td>39 (0x27)</td>
     173<td>S_ATC_PROBE_BUFFER</td>
     174</tr>
     175</tbody>
     176</table>
     177<h3>Instruction set</h3>
     178<p>Alphabetically sorted instruction list:</p>
     179<h4>S_BUFFER_LOAD_DWORD</h4>
     180<p>Opcode: 8 (0x8)<br />
     181Syntax: S_BUFFER_LOAD_DWORD SDATA, SBASE(4), OFFSET<br />
     182Description: Load single dword from read-only memory through constant cache (kcache).
     183SBASE is buffer descriptor.<br />
     184Operation:<br />
     185<code>SDATA = *(UINT32*)(SMEM + (OFFSET &amp; 3))</code></p>
     186<h4>S_BUFFER_LOAD_DWORDX16</h4>
     187<p>Opcode: 12 (0xc)<br />
     188Syntax: S_BUFFER_LOAD_DWORDX16 SDATA(16), SBASE(4), OFFSET<br />
     189Description: Load 16 dwords from read-only memory through constant cache (kcache).
     190SBASE is buffer descriptor.<br />
     191Operation:<br />
     192<code>for (BYTE i = 0; i &lt; 16; i++)
     193    SDATA[i] = *(UINT32*)(SMEM + i*4 + (OFFSET &amp; 3))</code></p>
     194<h4>S_BUFFER_LOAD_DWORDX2</h4>
     195<p>Opcode: 9 (0x9)<br />
     196Syntax: S_BUFFER_LOAD_DWORDX2 SDATA(2), SBASE(4), OFFSET<br />
     197Description: Load two dwords from read-only memory through constant cache (kcache).
     198SBASE is buffer descriptor.<br />
     199Operation:<br />
     200<code>SDATA = *(UINT64*)(SMEM + (OFFSET &amp; 3))</code></p>
     201<h4>S_BUFFER_LOAD_DWORDX4</h4>
     202<p>Opcode: 10 (0xa)<br />
     203Syntax: S_BUFFER_LOAD_DWORDX4 SDATA(4), SBASE(4), OFFSET<br />
     204Description: Load four dwords from read-only memory through constant cache (kcache).
     205SBASE is buffer descriptor.<br />
     206Operation:<br />
     207<code>for (BYTE i = 0; i &lt; 4; i++)
     208    SDATA[i] = *(UINT32*)(SMEM + i*4 + (OFFSET &amp; 3))</code></p>
     209<h4>S_BUFFER_LOAD_DWORDX8</h4>
     210<p>Opcode: 11 (0xb)<br />
     211Syntax: S_BUFFER_LOAD_DWORDX8 SDATA(8), SBASE(4), OFFSET<br />
     212Description: Load eight dwords from read-only memory through constant cache (kcache).
     213SBASE is buffer descriptor.<br />
     214Operation:<br />
     215<code>for (BYTE i = 0; i &lt; 8; i++)
     216    SDATA[i] = *(UINT32*)(SMEM + i*4 + (OFFSET &amp; 3))</code></p>
     217<h4>S_BUFFER_STORE_DWORD</h4>
     218<p>Opcode: 24 (0x18)<br />
     219Syntax: S_BUFFER_STORE_DWORD SDATA, SBASE(4), OFFSET<br />
     220Description: Store single dword to memory. It accepts only offset as M0 or any immediate.
     221SBASE is buffer descriptor.<br />
     222Operation:<br />
     223<code>*(UINT32*)(SMEM + (OFFSET &amp; 3)) = SDATA</code></p>
     224<h4>S_BUFFER_STORE_DWORDX2</h4>
     225<p>Opcode: 25 (0x19)<br />
     226Syntax: S_BUFFER_STORE_DWORDX2 SDATA(2), SBASE(4), OFFSET<br />
     227Description: Store two dwords to memory. It accepts only offset as M0 or any immediate.
     228SBASE is buffer descriptor.<br />
     229Operation:<br />
     230<code>*(UINT64*)(SMEM + (OFFSET &amp; 3)) = SDATA</code></p>
     231<h4>S_BUFFER_STORE_DWORDX4</h4>
     232<p>Opcode: 26 (0x1a)<br />
     233Syntax: S_BUFFER_STORE_DWORDX4 SDATA(4), SBASE(4), OFFSET<br />
     234Description: Store four dwords to memory. It accepts only offset as M0 or any immediate.
     235SBASE is buffer descriptor.<br />
     236Operation:<br />
     237<code>for (BYTE i = 0; i &lt; 4; i++)
     238    *(UINT32*)(SMEM + i*4 + (OFFSET &amp; 3)) = SDATA</code></p>
     239<h4>S_DCACHE_INV</h4>
     240<p>Opcode: 32 (0x20)<br />
     241Syntax: S_DCACHE_INV<br />
     242Description: Invalidate entire L1 K cache.</p>
     243<h4>S_DCACHE_INV_VOL</h4>
     244<p>Opcode: 34 (0x22)<br />
     245Syntax: S_DCACHE_INV_VOL<br />
     246Description: Invalidate all volatile lines in L1 K cache.</p>
     247<h4>S_LOAD_DWORD</h4>
     248<p>Opcode: 0 (0x0)<br />
     249Syntax: S_LOAD_DWORD SDATA, SBASE(2), OFFSET<br />
     250Description: Load single dword from read-only memory through constant cache (kcache).<br />
     251Operation:<br />
     252<code>SDATA = *(UINT32*)(SMEM + (OFFSET &amp; 3))</code></p>
     253<h4>S_LOAD_DWORDX16</h4>
     254<p>Opcode: 4 (0x4)<br />
     255Syntax: S_LOAD_DWORDX16 SDATA(16), SBASE(2), OFFSET<br />
     256Description: Load 16 dwords from read-only memory through constant cache (kcache).<br />
     257Operation:<br />
     258<code>for (BYTE i = 0; i &lt; 16; i++)
     259    SDATA[i] = *(UINT32*)(SMEM + i*4 + (OFFSET &amp; 3))</code></p>
     260<h4>S_LOAD_DWORDX2</h4>
     261<p>Opcode: 1 (0x1)<br />
     262Syntax: S_LOAD_DWORDX2 SDATA(2), SBASE(2), OFFSET<br />
     263Description: Load two dwords from read-only memory through constant cache (kcache).<br />
     264<code>SDATA = *(UINT64*)(SMEM + (OFFSET &amp; 3))</code></p>
     265<h4>S_LOAD_DWORDX4</h4>
     266<p>Opcode: 2 (0x2)<br />
     267Syntax: S_LOAD_DWORDX4 SDATA(4), SBASE(2), OFFSET<br />
     268Description: Load four dwords from read-only memory through constant cache (kcache).<br />
     269Operation:<br />
     270<code>for (BYTE i = 0; i &lt; 4; i++)
     271    SDATA[i] = *(UINT32*)(SMEM + i*4 + (OFFSET &amp; 3))</code></p>
     272<h4>S_LOAD_DWORDX8</h4>
     273<p>Opcode: 3 (0x3)<br />
     274Syntax: S_LOAD_DWORDX8 SDATA(8), SBASE(2), OFFSET<br />
     275Description: Load eight dwords from read-only memory through constant cache (kcache).<br />
     276Operation:<br />
     277<code>for (BYTE i = 0; i &lt; 8; i++)
     278    SDATA[i] = *(UINT32*)(SMEM + i*4 + (OFFSET &amp; 3))</code></p>
     279<h4>S_MEMREALTIME</h4>
     280<p>Opcode: 37 (0x25)<br />
     281Syntax: S_MEMREALTIME SDATA(2)<br />
     282Description: Store value of 64-bit RTC counter to SDATA.
     283Before reading result, S_WAITCNT LGKMCNT(0) is required.<br />
     284Operation:<br />
     285<code>SDATA = CLOCKCNT</code></p>
     286<h4>S_MEMTIME</h4>
     287<p>Opcode: 36 (0x24)<br />
     288Syntax: S_MEMTIME SDATA(2)<br />
     289Description: Store value of 64-bit clock counter to SDATA.
     290This "time" is a free-running clock counter based on the shader core clock.
     291Before reading result, S_WAITCNT LGKMCNT(0) is required.<br />
     292Operation:<br />
     293<code>SDATA = CLOCKCNT</code></p>
     294<h4>S_STORE_DWORD</h4>
     295<p>Opcode: 16 (0x10)<br />
     296Syntax: S_STORE_DWORD SDATA, SBASE(2), OFFSET<br />
     297Description: Store single dword to memory. It accepts only offset as M0 or any immediate.<br />
     298SBASE is buffer descriptor.<br />
     299Operation:<br />
     300<code>*(UINT32*)(SMEM + (OFFSET &amp; 3)) = SDATA</code></p>
     301<h4>S_STORE_DWORDX2</h4>
     302<p>Opcode: 17 (0x11)<br />
     303Syntax: S_STORE_DWORDX2 SDATA(2), SBASE(2), OFFSET<br />
     304Description: Store two dwords to memory. It accepts only offset as M0 or any immediate.<br />
     305Operation:<br />
     306<code>*(UINT64*)(SMEM + (OFFSET &amp; 3)) = SDATA</code></p>
     307<h4>S_STORE_DWORDX4</h4>
     308<p>Opcode: 18 (0x12)<br />
     309Syntax: S_STORE_DWORDX4 SDATA(4), SBASE(2), OFFSET<br />
     310Description: Store four dwords to memory. It accepts only offset as M0 or any immediate.<br />
     311Operation:<br />
     312<code>for (BYTE i = 0; i &lt; 4; i++)
     313    *(UINT32*)(SMEM + i*4 + (OFFSET &amp; 3)) = SDATA</code></p>
     314}}}