Changes between Version 1 and Version 2 of GcnInstrsVop3p


Ignore:
Timestamp:
11/26/17 22:00:26 (6 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GcnInstrsVop3p

    v1 v2  
    224224<h3>Instruction set</h3>
    225225<p>Alphabetically sorted instruction list:</p>
     226<h4>V_PK_ADD_F16</h4>
     227<p>Opcode: 15 (0xf)<br />
     228Syntax: V_PK_ADD_F16 VDST, SRC0, SRC1<br />
     229Description: Add two 16-bit FP values from SRC0 to
     23016-bit FP values from SRC1, and store result to VDST.<br />
     231Operation:<br />
     232<code>HALF S0_0 = ASHALF(SRC0&amp;0xffff), S0_1 = ASHALF(SRC0&gt;&gt;16)
     233HALF S1_0 = ASHALF(SRC1&amp;0xffff), S1_1 = ASHALF(SRC1&gt;&gt;16)
     234HALF temp0 = S0_0 + S1_0
     235HALF temp1 = S0_1 + S1_1
     236VDST = ASINT16(temp0) | (ASINT16(temp1)&lt;&lt;16)</code></p>
    226237<h4>V_PK_ADD_I16</h4>
    227238<p>Opcode: 2 (0x2)<br />
     
    258269}
    259270VDST = (temp0&amp;0xffff) | (temp1&lt;&lt;16)</code></p>
     271<h4>V_PK_FMA_F16</h4>
     272<p>Opcode: 14 (0xe)<br />
     273Syntax: V_PK_FMA_F16 VDST, SRC0, SRC1, SRC2<br />
     274Description:  Two fused multiplies-adds on two 16-bit FP values from SRC0, SRC1 and SRC2
     275and store result to VDST.<br />
     276Operation:<br />
     277<code>HALF S0_0 = ASHALF(SRC0&amp;0xffff), S0_1 = ASHALF(SRC0&gt;&gt;16)
     278HALF S1_0 = ASHALF(SRC1&amp;0xffff), S1_1 = ASHALF(SRC1&gt;&gt;16)
     279HALF S2_0 = ASHALF(SRC2&amp;0xffff), S2_1 = ASHALF(SRC2&gt;&gt;16)
     280HALF temp0 = FMA(S0_0, S1_0, S2_0)
     281HALF temp1 = FMA(S0_1, S1_1, S2_1)
     282VDST = ASINT16(temp0) | (ASINT16(temp1)&lt;&lt;16)</code></p>
    260283<h4>V_PK_ASHRREV_I16</h4>
    261284<p>Opcode: 6 (0x6)<br />
     
    326349}
    327350VDST = (temp0&amp;0xffff) | (temp1&lt;&lt;16)</code></p>
     351<h4>V_PK_MAX_F16</h4>
     352<p>Opcode: 18 (0x12)<br />
     353Syntax: V_PK_MAX_F16 VDST, SRC0, SRC1<br />
     354Description: Choose greatest 16-bit floating point values between values from SRC0 and SRC1,
     355and store result to VDST.<br />
     356Operation:<br />
     357<code>HALF S0_0 = ASHALF(SRC0&amp;0xffff), S0_1 = ASHALF(SRC0&gt;&gt;16)
     358HALF S1_0 = ASHALF(SRC1&amp;0xffff), S1_1 = ASHALF(SRC1&gt;&gt;16)
     359HALF temp0 = MAX(S0_0, S1_0)
     360HALF temp1 = MAX(S0_1, S1_1)
     361VDST = ASINT16(temp0) | (ASINT16(temp1)&lt;&lt;16)</code></p>
    328362<h4>V_PK_MAX_I16</h4>
    329363<p>Opcode: 7 (0x7)<br />
     
    348382UINT16 temp1 = MAX(S0_1, S1_1)
    349383VDST = temp0 | (temp1&lt;&lt;16)</code></p>
     384<h4>V_PK_MIN_F16</h4>
     385<p>Opcode: 17 (0x11)<br />
     386Syntax: V_PK_MIN_F16 VDST, SRC0, SRC1<br />
     387Description: Choose smallest 16-bit floating point values between values from SRC0 and SRC1,
     388and store result to VDST.<br />
     389Operation:<br />
     390<code>HALF S0_0 = ASHALF(SRC0&amp;0xffff), S0_1 = ASHALF(SRC0&gt;&gt;16)
     391HALF S1_0 = ASHALF(SRC1&amp;0xffff), S1_1 = ASHALF(SRC1&gt;&gt;16)
     392HALF temp0 = MIN(S0_0, S1_0)
     393HALF temp1 = MIN(S0_1, S1_1)
     394VDST = ASINT16(temp0) | (ASINT16(temp1)&lt;&lt;16)</code></p>
    350395<h4>V_PK_MIN_I16</h4>
    351396<p>Opcode: 8 (0x8)<br />
     
    370415UINT16 temp1 = MIN(S0_1, S1_1)
    371416VDST = temp0 | (temp1&lt;&lt;16)</code></p>
     417<h4>V_PK_MUL_F16</h4>
     418<p>Opcode: 16 (0x10)<br />
     419Syntax: V_PK_MUL_F16 VDST, SRC0, SRC1<br />
     420Description: Multiply two 16-bit FP values from SRC0 by
     42116-bit FP values from SRC1, and store result to VDST.<br />
     422Operation:<br />
     423<code>HALF S0_0 = ASHALF(SRC0&amp;0xffff), S0_1 = ASHALF(SRC0&gt;&gt;16)
     424HALF S1_0 = ASHALF(SRC1&amp;0xffff), S1_1 = ASHALF(SRC1&gt;&gt;16)
     425HALF temp0 = S0_0 * S1_0
     426HALF temp1 = S0_1 * S1_1
     427VDST = ASINT16(temp0) | (ASINT16(temp1)&lt;&lt;16)</code></p>
    372428<h4>V_PK_MUL_LO_U16</h4>
    373429<p>Opcode: 1 (0x1)<br />