Changes between Version 4 and Version 5 of GcnSdwaDpp


Ignore:
Timestamp:
Jun 5, 2017, 3:00:43 PM (2 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GcnSdwaDpp

    v4 v5  
    175175(fill bits after part) to source operand while for source operand was not
    176176selected whole dword (SDWA_DWORD not choosen).</p>
    177 <p>Examples:
     177<p>Examples:<br />
    178178<code>v_xor_b32 v1,v2,v3 dst_sel:byte_1 src0_sel:byte1 src1_sel:word1
    179179v_xor_b32 v1,v2,v3 dst_sel:b1 src0_sel:b1 src1_sel:w1
     
    185185// SRC0_DST = dest. SRC0, SRC1_DST = dest. SRC1, DST_DST = VDST dest.
    186186// OPERATION(SRC0, SRC1) - instruction operation, VDST - VDST register before instruction
    187 if (HAVE_SRC0)
    188 {
    189     switch(SRC0_SEL)
    190     {
    191         case SDWA_BYTE_0:
    192             SRC0_DST = (SRC0_SEXT) ? INT32(INT8(SRC0_SRC &amp; 0xff)) : SRC0_SRC &amp; 0xff
    193             break;
    194         case SDWA_BYTE_1:
    195             SRC0_DST = (SRC0_SEXT) ? INT32(INT8((SRC0_SRC&gt;&gt;8) &amp; 0xff)) :
    196                         (SRC0_SRC&gt;&gt;8) &amp; 0xff
    197             break;
    198         case SDWA_BYTE_2:
    199             SRC0_DST = (SRC0_SEXT) ? INT32(INT8((SRC0_SRC&gt;&gt;16) &amp; 0xff)) :
    200                         (SRC0_SRC&gt;&gt;16) &amp; 0xff
    201             break;
    202         case SDWA_BYTE_1:
    203             SRC0_DST = (SRC0_SEXT) ? INT32(INT8(SRC0_SRC&gt;&gt;24)) : SRC0_SRC&gt;&gt;24
    204             break;
    205         case SDWA_WORD_0:
    206             SRC0_DST = (SRC0_SEXT) ? INT32(INT16(SRC0_SRC &amp; 0xffff)) : SRC0_SRC &amp; 0xffff
    207             break;
    208         case SDWA_WORD_1:
    209             SRC0_DST = (SRC0_SEXT) ? INT32(INT16(SRC0_SRC &gt;&gt; 16)) : SRC0_SRC &gt;&gt; 16
    210             break;
    211         case SDWA_DWORD:
    212             SRC0_DST = SRC0_SRC
    213             break;
    214     }
     187switch(SRC0_SEL)
     188{
     189    case SDWA_BYTE_0:
     190        SRC0_DST = (SRC0_SEXT) ? INT32(INT8(SRC0_SRC &amp; 0xff)) : SRC0_SRC &amp; 0xff
     191        break;
     192    case SDWA_BYTE_1:
     193        SRC0_DST = (SRC0_SEXT) ? INT32(INT8((SRC0_SRC&gt;&gt;8) &amp; 0xff)) :
     194                    (SRC0_SRC&gt;&gt;8) &amp; 0xff
     195        break;
     196    case SDWA_BYTE_2:
     197        SRC0_DST = (SRC0_SEXT) ? INT32(INT8((SRC0_SRC&gt;&gt;16) &amp; 0xff)) :
     198                    (SRC0_SRC&gt;&gt;16) &amp; 0xff
     199        break;
     200    case SDWA_BYTE_1:
     201        SRC0_DST = (SRC0_SEXT) ? INT32(INT8(SRC0_SRC&gt;&gt;24)) : SRC0_SRC&gt;&gt;24
     202        break;
     203    case SDWA_WORD_0:
     204        SRC0_DST = (SRC0_SEXT) ? INT32(INT16(SRC0_SRC &amp; 0xffff)) : SRC0_SRC &amp; 0xffff
     205        break;
     206    case SDWA_WORD_1:
     207        SRC0_DST = (SRC0_SEXT) ? INT32(INT16(SRC0_SRC &gt;&gt; 16)) : SRC0_SRC &gt;&gt; 16
     208        break;
     209    case SDWA_DWORD:
     210        SRC0_DST = SRC0_SRC
     211        break;
    215212}
    216213if (HAVE_SRC1)
     
    243240    }
    244241}
    245 DST_SRC = OPERATION(SRC0,SRC1)
     242DST_SRC = OPERATION(SRC0_DST,SRC1_DST)
    246243UNT32 tmp
    247244switch(DST_SEL)
     
    306303}</code></p>
    307304<h3>VOP_DPP</h3>
    308 <p>The VOP_DPP encoding is enabled by setting 0xfa in VSRC0 field in VOP1/VOP2/VOPC encoding.
     305<p>The VOP_DPP encoding is enabled by setting 0xfa in SRC0 field in VOP1/VOP2/VOPC encoding.
    309306List of fields:</p>
    310307<table>
     
    364361</tbody>
    365362</table>
    366 <p>The operation on wavefronts applied to VSRC0 operand in VOP instruction.
     363<p>The operation on wavefronts applied to SRC0 operand in VOP instruction.
    367364The wavefront contains 4 rows (16 threads), and each row contains 4 banks (4 threads).
    368 The DPP_CTRL choose which operation will be applied to VSRC0.
     365The DPP_CTRL choose which operation will be applied to SRC0.
    369366List of data parallel operations:</p>
    370367<table>
     
    446443<tr>
    447444<td>0x143</td>
    448 <td>DPP_ROW_BCAST15</td>
    449 <td>row_bcast:15</td>
     445<td>DPP_ROW_BCAST31</td>
     446<td>row_bcast:31</td>
    450447<td>Broadcast 31 thread to row 2 and row 3</td>
    451448</tr>
     
    454451<p>The BOUND_CTRL flag (modifier <code>bound_ctrl</code> or <code>bound_ctrl:0</code>) control how to fill invalid
    455452threads (for example that last threads after left shifting). Zero value (no modifier)
    456 sets invalid threads by original VSRC0 value for particular thread. One value (with modifier)
    457 fills invalid threads by 0 thread VSRC0 value.</p>
     453do not perform operation in thread that source threads are invalid.
     454One value (with modifier) fills invalid threads by 0 value.</p>
    458455<p>The field BANK_MASK (modifier <code>bank_mask:value</code>) choose which banks will be enabled during
    459456data parallel operation in each enabled row. The Nth bit represents Nth bank in each row.
    460 Disabled bank will be filled by original VSRC0 value for particular thread</p>
     457Threads in disabled banks do not perform operation.</p>
    461458<p>The field ROW_MASK (modifier <code>row_mask:value</code>) choose which rows will be enabled during
    462459data parallel operation. The Nth bit represents Nth row.
    463 Disabled row will be filled by original VSRC0 value for particular thread.</p>
     460Threads in disabled rows do not perform operation.</p>
     461<p>Examples:<br />
     462<code>v_xor_b32 v1,v2,v3 quad_perm:[2,3,0,1]
     463v_xor_b32 v1,v2,v3 row_shl:5
     464v_xor_b32 v1,v2,v3 row_shr:7
     465v_xor_b32 v1,v2,v3 row_ror:8
     466v_xor_b32 v1,v2,v3 wave_shl:1
     467v_xor_b32 v1,v2,v3 wave_shl
     468v_xor_b32 v1,v2,v3 wave_shr:1
     469v_xor_b32 v1,v2,v3 wave_shr
     470v_xor_b32 v1,v2,v3 wave_rol:1
     471v_xor_b32 v1,v2,v3 wave_rol
     472v_xor_b32 v1,v2,v3 wave_ror:1
     473v_xor_b32 v1,v2,v3 wave_ror
     474v_xor_b32 v1,v2,v3 row_mirror
     475v_xor_b32 v1,v2,v3 row_half_mirror
     476v_xor_b32 v1,v2,v3 row_bcast:15
     477v_xor_b32 v1,v2,v3 row_bcast:31
     478v_xor_b32 v1,v2,v3 row_shr:7 bound_ctrl
     479v_xor_b32 v1,v2,v3 row_shr:7 bound_ctrl:0
     480v_xor_b32 v1,v2,v3 row_shl:5 row_mask:0b1100
     481v_xor_b32 v1,v2,v3 row_shl:5 bank_mask:0b0101</code></p>
     482<p>Operation code:<br />
     483<code>// SRC0_SRC[X] - original VSRC0 value from thread X
     484// SRC0_DST[X] - destination VSRC0 value from thread X
     485// OPERATION(SRC0, SRC1) - instruction operation, VDST - VDST register before instruction
     486BYTE invalid = 0
     487BYTE srcLane
     488if (DPP_CTRL&gt;=DPP_QUAD_PERM00 &amp;&amp; DPP_CTRL&lt;=DPP_QUAD_PERMFF)
     489{
     490    BYTE p0 = DPP_CTRL&amp;3
     491    BYTE p1 = (DPP_CTRL&gt;&gt;2)&amp;3
     492    BYTE p2 = (DPP_CTRL&gt;&gt;4)&amp;3
     493    BYTE p3 = (DPP_CTRL&gt;&gt;6)&amp;3
     494    BYTE curL4 = LANEID&amp;~3
     495    if (LANEID&amp;3==0)
     496        srcLane = curL4 + p0
     497    else if (LANEID&amp;3==1)
     498        srcLane = curL4 + p1
     499    else if (LANEID&amp;3==2)
     500        srcLane = curL4 + p2
     501    else if (LANEID&amp;3==3)
     502        srcLane = curL4 + p3   
     503}
     504else if (DPP_CTRL&gt;=DPP_ROW_SL1 &amp;&amp; DPP_CTRL&lt;=DPP_ROW_SL15)
     505{
     506    BYTE shift = DPP_CTRL&amp;15
     507    BYTE slid = LANEID&amp;15
     508    BYTE curR = LANEID&amp;~15
     509    if (slid+shift&lt;=15)
     510        srcLane = curR + slid + shift
     511    else
     512        srcLane = LANESNUM
     513}
     514else if (DPP_CTRL&gt;=DPP_ROW_SR1 &amp;&amp; DPP_CTRL&lt;=DPP_ROW_SR15)
     515{
     516    BYTE shift = DPP_CTRL&amp;15
     517    BYTE slid = LANEID&amp;15
     518    BYTE curR = LANEID&amp;~15
     519    if (slid&gt;=shift)
     520        srcLane = curR + slid - shift
     521    else
     522        srcLane = LANESNUM
     523}
     524else if (DPP_CTRL&gt;=DPP_ROW_RR1 &amp;&amp; DPP_CTRL&lt;=DPP_ROW_RR15)
     525{
     526    BYTE shift = DPP_CTRL&amp;15
     527    BYTE slid = LANEID&amp;15
     528    BYTE curR = LANEID&amp;~15
     529    srcLane = curR + ((16+slid - shift)&amp;15)
     530}
     531else if (DPP_CTRL==DPP_WF_SL1)
     532    srcLane = LANEID+1
     533else if (DPP_CTRL==DPP_WF_SR1)
     534    srcLane = LANEID-1
     535else if (DPP_CTRL==DPP_WF_RL1)
     536    srcLane = (LANEID+1)&amp;63
     537else if (DPP_CTRL==DPP_WF_RR1)
     538    srcLane = (LANEID-1)&amp;63
     539else if (DPP_CTRL==DPP_ROW_MIRROR)
     540{
     541    BYTE curR = LANEID&amp;~15
     542    srcLane = curR + ((LANEID&amp;15)^15)
     543}
     544else if (DPP_CTRL==DPP_ROW_HALF_MIRROR)
     545{
     546    BYTE curR = LANEID&amp;~7
     547    srcLane = curR + ((LANEID&amp;7)^7)
     548}
     549else if (DPP_CTRL==DPP_BCAST_15)
     550{
     551    BYTE curR = LANEID&amp;~15
     552    if (LANEID&lt;15)
     553        srcLane = LANEID
     554    else
     555        srcLane = ((LANEID-16)&amp;~15)+15
     556}
     557else if (DPP_CTRL==DPP_BCAST_31)
     558{
     559    BYTE curR = LANEID&amp;~31
     560    if (LANEID&lt;31)
     561        srcLane = LANEID
     562    else
     563        srcLane = ((LANEID-31)&amp;~31)+31
     564}
     565if (dstLane &lt; LANESNUM)
     566    SRC0_DST[LANEID] = SRC0_SRC[srcLane]
     567else if (BOUND_CTRL==0)
     568    SRC0_DST[LANEID] = 0
     569else
     570    invalid = 1
     571if ((ROW_MASK &amp; (1U&lt;&lt;(LANEID&gt;&gt;4)))==0)
     572    invalid = 1
     573if ((BANK_MASK &amp; (1U&lt;&lt;((LANEID&gt;&gt;2)&amp;3)))==0)
     574    invalid = 1
     575if (!invalid)
     576    VDST = OPERATION(SRC0_DST,SRC1)</code></p>
    464577}}}