Changes between Version 3 and Version 4 of GcnSdwaDpp


Ignore:
Timestamp:
06/05/17 09:00:26 (7 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GcnSdwaDpp

    v3 v4  
    9191<th>Value</th>
    9292<th>Name</th>
    93 <th>CLRX names</th>
     93<th>Assembler names</th>
    9494<th>Description</th>
    9595</tr>
     
    147147<th>Value</th>
    148148<th>Name</th>
    149 <th>CLRX name</th>
     149<th>Assembler name</th>
    150150<th>Descrption</th>
    151151</tr>
     
    175175(fill bits after part) to source operand while for source operand was not
    176176selected whole dword (SDWA_DWORD not choosen).</p>
     177<p>Examples:
     178<code>v_xor_b32 v1,v2,v3 dst_sel:byte_1 src0_sel:byte1 src1_sel:word1
     179v_xor_b32 v1,v2,v3 dst_sel:b1 src0_sel:b1 src1_sel:w1
     180v_xor_b32 v1,v2,v3 dst_sel:byte_1 src0_sel:byte1 src1_sel:word1 dst_unused:preserve
     181v_xor_b32 v1,v2,v3 dst_sel:byte_1 src0_sel:byte1 src1_sel:word1 dst_unused:sext
     182v_xor_b32 v1,sext(v2),v3 dst_sel:byte_1 src0_sel:byte1 src1_sel:word1</code></p>
    177183<p>Operation code:<br />
    178184<code>// SRC0_SRC = source SRC0, SRC1_SRC = source SRC1, DST_SRC = VDST source
     
    299305        break;
    300306}</code></p>
     307<h3>VOP_DPP</h3>
     308<p>The VOP_DPP encoding is enabled by setting 0xfa in VSRC0 field in VOP1/VOP2/VOPC encoding.
     309List of fields:</p>
     310<table>
     311<thead>
     312<tr>
     313<th>Bits</th>
     314<th>Name</th>
     315<th>Description</th>
     316</tr>
     317</thead>
     318<tbody>
     319<tr>
     320<td>0-7</td>
     321<td>SRC0</td>
     322<td>First source vector operand</td>
     323</tr>
     324<tr>
     325<td>8-16</td>
     326<td>DPP_CTRL</td>
     327<td>Data parallel primitive control</td>
     328</tr>
     329<tr>
     330<td>19</td>
     331<td>BOUND_CTRL</td>
     332<td>Specifies behaviour when shared data is invalid</td>
     333</tr>
     334<tr>
     335<td>20</td>
     336<td>SRC0_NEG</td>
     337<td>Negation modifier for SRC0</td>
     338</tr>
     339<tr>
     340<td>21</td>
     341<td>SRC0_ABS</td>
     342<td>Absolute value for SRC0</td>
     343</tr>
     344<tr>
     345<td>22</td>
     346<td>SRC1_NEG</td>
     347<td>Negation modifier for SRC1</td>
     348</tr>
     349<tr>
     350<td>23</td>
     351<td>SRC1_ABS</td>
     352<td>Absolute value for SRC1</td>
     353</tr>
     354<tr>
     355<td>24-27</td>
     356<td>BANK_MASK</td>
     357<td>Bank enable mask</td>
     358</tr>
     359<tr>
     360<td>28-31</td>
     361<td>ROW_MASK</td>
     362<td>Row enable mask</td>
     363</tr>
     364</tbody>
     365</table>
     366<p>The operation on wavefronts applied to VSRC0 operand in VOP instruction.
     367The wavefront contains 4 rows (16 threads), and each row contains 4 banks (4 threads).
     368The DPP_CTRL choose which operation will be applied to VSRC0.
     369List of data parallel operations:</p>
     370<table>
     371<thead>
     372<tr>
     373<th>Value</th>
     374<th>Name</th>
     375<th>Modifier</th>
     376<th>Description</th>
     377</tr>
     378</thead>
     379<tbody>
     380<tr>
     381<td>0x00-0xff</td>
     382<td>DPP_QUAD_PERM{00:ff}</td>
     383<td>quad_perm:[A,B,C,D]</td>
     384<td>Full permute of 4 threads</td>
     385</tr>
     386<tr>
     387<td>0x101-0x10f</td>
     388<td>DPP_ROW_SL{1:15}</td>
     389<td>row_shl:N</td>
     390<td>Row shift left by N threads</td>
     391</tr>
     392<tr>
     393<td>0x111-0x11f</td>
     394<td>DPP_ROW_SR{1:15}</td>
     395<td>row_shr:N</td>
     396<td>Row shift right by N threads</td>
     397</tr>
     398<tr>
     399<td>0x121-0x12f</td>
     400<td>DPP_ROW_RR{1:15}</td>
     401<td>row_ror:N</td>
     402<td>Row rotate right by N threads</td>
     403</tr>
     404<tr>
     405<td>0x130</td>
     406<td>DPP_WF_SL1</td>
     407<td>wave_shl:1</td>
     408<td>Wave shift left by 1 thread</td>
     409</tr>
     410<tr>
     411<td>0x134</td>
     412<td>DPP_WF_RL1</td>
     413<td>wave_rol:1</td>
     414<td>Wave rotate left by 1 thread</td>
     415</tr>
     416<tr>
     417<td>0x138</td>
     418<td>DPP_WF_SR1</td>
     419<td>wave_shr:1</td>
     420<td>Wave shift right by 1 thread</td>
     421</tr>
     422<tr>
     423<td>0x13c</td>
     424<td>DPP_WF_RR1</td>
     425<td>wave_ror:1</td>
     426<td>Wave rotate right by 1 thread</td>
     427</tr>
     428<tr>
     429<td>0x140</td>
     430<td>DPP_ROW_MIRROR</td>
     431<td>row_mirror</td>
     432<td>Mirror threads within row</td>
     433</tr>
     434<tr>
     435<td>0x141</td>
     436<td>DPP_ROW_HALF_MIRROR</td>
     437<td>row_half_mirror</td>
     438<td>Mirror threads within half row</td>
     439</tr>
     440<tr>
     441<td>0x142</td>
     442<td>DPP_ROW_BCAST15</td>
     443<td>row_bcast:15</td>
     444<td>Broadcast 15 thread of each row to next row</td>
     445</tr>
     446<tr>
     447<td>0x143</td>
     448<td>DPP_ROW_BCAST15</td>
     449<td>row_bcast:15</td>
     450<td>Broadcast 31 thread to row 2 and row 3</td>
     451</tr>
     452</tbody>
     453</table>
     454<p>The BOUND_CTRL flag (modifier <code>bound_ctrl</code> or <code>bound_ctrl:0</code>) control how to fill invalid
     455threads (for example that last threads after left shifting). Zero value (no modifier)
     456sets invalid threads by original VSRC0 value for particular thread. One value (with modifier)
     457fills invalid threads by 0 thread VSRC0 value.</p>
     458<p>The field BANK_MASK (modifier <code>bank_mask:value</code>) choose which banks will be enabled during
     459data parallel operation in each enabled row. The Nth bit represents Nth bank in each row.
     460Disabled bank will be filled by original VSRC0 value for particular thread</p>
     461<p>The field ROW_MASK (modifier <code>row_mask:value</code>) choose which rows will be enabled during
     462data parallel operation. The Nth bit represents Nth row.
     463Disabled row will be filled by original VSRC0 value for particular thread.</p>
    301464}}}