Changes between Version 2 and Version 3 of GcnInstrsVop3


Ignore:
Timestamp:
12/05/15 23:00:20 (8 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GcnInstrsVop3

    v2 v3  
    795795<h3>Instruction set</h3>
    796796<p>Alphabetically sorted instruction list:</p>
     797<h4>V_ALIGNBIT_B32</h4>
     798<p>Opcode: 334 (0x14e) for GCN 1.0/1.1; 462 (0x1ce) for GCN 1.2<br />
     799Syntax: V_ALIGNBIT_B32 VDST, SRC0, SRC1, SRC2<br />
     800Description: Align bit. Shift right bits in 64-bit stored in SRC1 (low part) and
     801SRC0 (high part) by SRC2&amp;31 bits, and store low 32-bit of the result in VDST.<br />
     802Operation:<br />
     803<code>VDST = (((UINT64)SRC0)&lt;&lt;32) | SRC1) &gt;&gt; (SRC2&amp;31)</code></p>
     804<h4>V_ALIGNBYTE_B32</h4>
     805<p>Opcode: 335 (0x14f) for GCN 1.0/1.1; 463 (0x1cf) for GCN 1.2<br />
     806Syntax: V_ALIGNBYTE_B32 VDST, SRC0, SRC1, SRC2<br />
     807Description: Align bit. Shift right bits in 64-bit stored in SRC1 (low part) and
     808SRC0 (high part) by (SRC2&amp;3)*8 bits, and store low 32-bit of the result in VDST.<br />
     809Operation:<br />
     810<code>VDST = (((UINT64)SRC0)&lt;&lt;32) | SRC1) &gt;&gt; ((SRC2&amp;3)*8)</code></p>
     811<h4>V_BFE_I32</h4>
     812<p>Opcode: 329 (0x149) for GCN 1.0/1.1; 457 (0x1c9) for GCN 1.2<br />
     813Syntax: V_BFE_I32 VDST, SRC0, SRC1, SRC2<br />
     814Description: Extracts bits in SRC0 from range (SRC1&amp;31) with length (SRC2&amp;31)
     815and extend sign from last bit of extracted value, and store result to VDST.<br />
     816Operation:<br />
     817<code>UINT8 shift = SRC1 &amp; 31
     818UINT8 length = SRC2 &amp; 31
     819if (length==0)
     820    VDST = 0
     821if (shift+length &lt; 32)
     822    VDST = (INT32)(SRC0 &lt;&lt; (32 - shift - length)) &gt;&gt; (32 - length)
     823else
     824    VDST = (INT32)SRC0 &gt;&gt; shift</code></p>
     825<h4>V_BFE_U32</h4>
     826<p>Opcode: 328 (0x148) for GCN 1.0/1.1; 456 (0x1c8) for GCN 1.2<br />
     827Syntax: V_BFE_U32 VDST, SRC0, SRC1, SRC2<br />
     828Description: Extracts bits in SRC0 from range SRC1&amp;31 with length SRC2&amp;31, and
     829store result to VDST.<br />
     830Operation:<br />
     831<code>UINT8 shift = SRC1 &amp; 31
     832UINT8 length = SRC2 &amp; 31
     833if (length==0)
     834    VDST = 0
     835if (shift+length &lt; 32)
     836    VDST = SRC0 &lt;&lt; (32 - shift - length) &gt;&gt; (32 - length)
     837else
     838    VDST = SRC0 &gt;&gt; shift</code></p>
     839<h4>V_BFI_B32</h4>
     840<p>Opcode: 330 (0x14a) for GCN 1.0/1.1; 458 (0x1ca) for GCN 1.2<br />
     841Syntax: V_BFI_B32 VDST, SRC0, SRC1, SRC2<br />
     842Description: Replace bits in SRC2 by bits from SRC1 marked by bits in SRC0, and store result
     843to VDST.<br />
     844Operation:<br />
     845<code>VDST = (SRC0 &amp; SRC1) | (~SRC0 &amp; SRC2)</code></p>
    797846<h4>V_CUBEID_F32</h4>
    798847<p>Opcode: 324 (0x144) for GCN 1.0/1.1; 452 (0x1c4) for GCN 1.2<br />
     
    815864    OUT = (SF0 &gt;= 0.0) ? 0 : 1
    816865VDST = OUT</code></p>
     866<h4>V_CUBEMA_F32</h4>
     867<p>Opcode: 327 (0x147) for GCN 1.0/1.1; 455 (0x1c7) for GCN 1.2<br />
     868Syntax: V_CUBEMA_F32 VDST, SRC0, SRC1, SRC2<br />
     869Description: Cubemap Major Axis. Choose highest absolute value from all three FP values
     870(SRC0, SRC1, SRC2) and multiply choosen FP value by two. Result is stored in VDST.<br />
     871Operation:<br />
     872<code>FLOAT SF0 = ASFLOAT(SRC0)
     873FLOAT SF1 = ASFLOAT(SRC1)
     874FLOAT SF2 = ASFLOAT(SRC2)
     875if (ABS(SF2) &gt;= ABS(SF1) &amp;&amp; ABS(SF2) &gt;= ABS(SF0))
     876    OUT = 2*SF2
     877else if (ABS(SF1) &gt;= ABS(SF0)
     878    OUT = 2*SF1
     879else
     880    OUT = 2*SF0
     881VDST = OUT</code></p>
     882<h4>V_CUBESC_F32</h4>
     883<p>Opcode: 325 (0x145) for GCN 1.0/1.1; 453 (0x1c5) for GCN 1.2<br />
     884Syntax: V_CUBESC_F32 VDST, SRC0, SRC1, SRC2<br />
     885Description: Cubemap S coordination. Algorithm below.<br />
     886Operation:<br />
     887<code>FLOAT SF0 = ASFLOAT(SRC0)
     888FLOAT SF1 = ASFLOAT(SRC1)
     889FLOAT SF2 = ASFLOAT(SRC2)
     890if (ABS(SF2) &gt;= ABS(SF1) &amp;&amp; ABS(SF2) &gt;= ABS(SF0))
     891    OUT = SIGN((SF2) * SF0
     892else if (ABS(SF1) &gt;= ABS(SF0)
     893    OUT = SF0
     894else
     895    OUT = -SIGN((SF0) * SF2
     896VDST = OUT</code></p>
     897<h4>V_CUBETC_F32</h4>
     898<p>Opcode: 326 (0x146) for GCN 1.0/1.1; 454 (0x1c6) for GCN 1.2<br />
     899Syntax: V_CUBETC_F32 VDST, SRC0, SRC1, SRC2<br />
     900Description: Cubemap T coordination. Algorithm below.<br />
     901Operation:<br />
     902<code>FLOAT SF0 = ASFLOAT(SRC0)
     903FLOAT SF1 = ASFLOAT(SRC1)
     904FLOAT SF2 = ASFLOAT(SRC2)
     905if (ABS(SF2) &gt;= ABS(SF1) &amp;&amp; ABS(SF2) &gt;= ABS(SF0))
     906    OUT = -SF1
     907else if (ABS(SF1) &gt;= ABS(SF0)
     908    OUT = SIGN(SF1) * SF2
     909else
     910    OUT = -SF1
     911VDST = OUT</code></p>
     912<h4>V_FMA_F32</h4>
     913<p>Opcode: 331 (0x14b) for GCN 1.0/1.1; 459 (0x1cb) for GCN 1.2<br />
     914Syntax: V_FMA_F32 VDST, SRC0, SRC1, SRC2<br />
     915Description: Fused multiply addition on single floating point values from
     916SRC0, SRC1 and SRC2. Result stored in VDST.<br />
     917Operation:<br />
     918<code>// SRC0*SRC1+SRC2
     919VDST = FMA(ASFLOAT(SRC0), ASFLOAT(SRC1), ASFLOAT(SRC2))</code></p>
     920<h4>V_FMA_F64</h4>
     921<p>Opcode: 332 (0x14c) for GCN 1.0/1.1; 460 (0x1cc) for GCN 1.2<br />
     922Syntax: V_FMA_F64 VDST(2), SRC0(2), SRC1(2), SRC2(2)<br />
     923Description: Fused multiply addition on double floating point values from
     924SRC0, SRC1 and SRC2. Result stored in VDST.<br />
     925Operation:<br />
     926<code>// SRC0*SRC1+SRC2
     927VDST = FMA(ASDOUBLE(SRC0), ASDOUBLE(SRC1), ASDOUBLE(SRC2))</code></p>
     928<h4>V_LERP_U8</h4>
     929<p>Opcode: 333 (0x14d) for GCN 1.0/1.1; 461 (0x1cd) for GCN 1.2<br />
     930Syntax: V_LERP_U8 VDST, SRC0, SRC1, SRC2<br />
     931Description: For each byte of dword, calculate average from SRC0 byte and SRC1 byte with
     932rounding mode defined in first of the byte SRC2. If rounding bit is set then result for
     933that byte is rounded, otherwise truncated. All bytes will be stored in VDST.<br />
     934Operation:<br />
     935<code>for (UINT8 i = 0; i &lt; 4; i++)
     936{
     937    UINT8 S0 = (SRC0 &gt;&gt; (i*8)) &amp; 0xff
     938    UINT8 S1 = (SRC1 &gt;&gt; (i*8)) &amp; 0xff
     939    UINT8 S2 = (SRC2 &gt;&gt; (i*8)) &amp; 1
     940    VDST = (VDST &amp; ~(255U&lt;&lt;(i*8))) | (((S0+S1+S2) &gt;&gt; 1) &lt;&lt; (i*8))
     941}</code></p>
    817942<h4>V_MAD_F32</h4>
    818943<p>Opcode: 321 (0x141) for GCN 1.0/1.1; 449 (0x1c1) for GCN 1.2<br />
     
    847972Operation:<br />
    848973<code>VDST = (UINT32)(SRC0&amp;0xffffff) * (UINT32)(SRC1&amp;0xffffff) + SRC2</code></p>
     974<h4>V_MIN3_F32</h4>
     975<p>Opcode: 337 (0x151) for GCN 1.0/1.1; 465 (0x1d0) for GCN 1.2<br />
     976Syntax: V_MIN3_B32 VDST, SRC0, SRC1, SRC2  </p>
    849977}}}