[wiki:ClrxToc Back to Table of content]
{{{
#!html
<h2>GCN ISA VOP2/VOP3 instructions</h2>
<p>VOP2 instructions can be encoded in the VOP2 encoding and the VOP3a/VOP3b encoding.
List of fields for VOP2 encoding:</p>
<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0-8</td>
<td>SRC0</td>
<td>First (scalar or vector) source operand</td>
</tr>
<tr>
<td>9-16</td>
<td>VSRC1</td>
<td>Second vector source operand</td>
</tr>
<tr>
<td>17-24</td>
<td>VDST</td>
<td>Destination vector operand</td>
</tr>
<tr>
<td>25-30</td>
<td>OPCODE</td>
<td>Operation code</td>
</tr>
<tr>
<td>31</td>
<td>ENCODING</td>
<td>Encoding type. Must be 0</td>
</tr>
</tbody>
</table>
<p>Syntax: INSTRUCTION VDST, SRC0, VSRC1</p>
<p>List of fields for VOP3A/VOP3B encoding (GCN 1.0/1.1):</p>
<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0-7</td>
<td>VDST</td>
<td>Vector destination operand</td>
</tr>
<tr>
<td>8-10</td>
<td>ABS</td>
<td>Absolute modifiers for source operands (VOP3A)</td>
</tr>
<tr>
<td>8-14</td>
<td>SDST</td>
<td>Scalar destination operand (VOP3B)</td>
</tr>
<tr>
<td>11</td>
<td>CLAMP</td>
<td>CLAMP modifier (VOP3A)</td>
</tr>
<tr>
<td>15</td>
<td>CLAMP</td>
<td>CLAMP modifier (VOP3B)</td>
</tr>
<tr>
<td>17-25</td>
<td>OPCODE</td>
<td>Operation code</td>
</tr>
<tr>
<td>26-31</td>
<td>ENCODING</td>
<td>Encoding type. Must be 0b110100</td>
</tr>
<tr>
<td>32-40</td>
<td>SRC0</td>
<td>First (scalar or vector) source operand</td>
</tr>
<tr>
<td>41-49</td>
<td>SRC1</td>
<td>Second (scalar or vector) source operand</td>
</tr>
<tr>
<td>50-58</td>
<td>SRC2</td>
<td>Third (scalar or vector) source operand</td>
</tr>
<tr>
<td>59-60</td>
<td>OMOD</td>
<td>OMOD modifier. Multiplication modifier</td>
</tr>
<tr>
<td>61-63</td>
<td>NEG</td>
<td>Negation modifier for source operands</td>
</tr>
</tbody>
</table>
<p>List of fields for VOP3A encoding (GCN 1.2):</p>
<table>
<thead>
<tr>
<th>Bits</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0-7</td>
<td>VDST</td>
<td>Destination vector operand</td>
</tr>
<tr>
<td>8-10</td>
<td>ABS</td>
<td>Absolute modifiers for source operands (VOP3A)</td>
</tr>
<tr>
<td>8-14</td>
<td>SDST</td>
<td>Scalar destination operand (VOP3B)</td>
</tr>
<tr>
<td>15</td>
<td>CLAMP</td>
<td>CLAMP modifier</td>
</tr>
<tr>
<td>16-25</td>
<td>OPCODE</td>
<td>Operation code</td>
</tr>
<tr>
<td>26-31</td>
<td>ENCODING</td>
<td>Encoding type. Must be 0b110100</td>
</tr>
<tr>
<td>32-40</td>
<td>SRC0</td>
<td>First (scalar or vector) source operand</td>
</tr>
<tr>
<td>41-49</td>
<td>SRC1</td>
<td>Second (scalar or vector) source operand</td>
</tr>
<tr>
<td>50-58</td>
<td>SRC2</td>
<td>Third (scalar or vector) source operand</td>
</tr>
<tr>
<td>59-60</td>
<td>OMOD</td>
<td>OMOD modifier. Multiplication modifier</td>
</tr>
<tr>
<td>61-63</td>
<td>NEG</td>
<td>Negation modifier for source operands</td>
</tr>
</tbody>
</table>
<p>Syntax: INSTRUCTION VDST, SRC0, SRC1 [MODIFIERS]</p>
<p>Modifiers:</p>
<ul>
<li>CLAMP - clamps destination floating point value in range 0.0-1.0</li>
<li>MUL:2, MUL:4, DIV:2 - OMOD modifiers. Multiply destination floating point value by
2.0, 4.0 or 0.5 respectively</li>
<li>-SRC - negate floating point value from source operand</li>
<li>ABS(SRC) - apply absolute value to source operand</li>
</ul>
<p>Negation and absolute value can be combined: <code>-ABS(V0)</code>. Modifiers CLAMP and
OMOD (MUL:2, MUL:4 and DIV:2) can be given in random order.</p>
<p>Limitations for operands:</p>
<ul>
<li>only one SGPR can be read by instruction. Multiple occurrences of this same
SGPR is allowed</li>
<li>only one literal constant can be used, and only when a SGPR or M0 is not used in
source operands</li>
<li>only SRC0 can holds LDS_DIRECT</li>
</ul>
<p>VOP2 opcodes (0-63) are reflected in VOP3 in range: 256-319.
List of the instructions by opcode:</p>
<table>
<thead>
<tr>
<th>Opcode</th>
<th>Mnemonic (GCN1.0/1.1)</th>
<th>Mnemonic (GCN 1.2)</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 (0x0)</td>
<td>V_CNDMASK_B32</td>
<td>V_CNDMASK_B32</td>
</tr>
<tr>
<td>1 (0x1)</td>
<td>V_READLANE_B32</td>
<td>V_ADD_F32</td>
</tr>
<tr>
<td>2 (0x2)</td>
<td>V_WRITELANE_B32</td>
<td>V_SUB_F32</td>
</tr>
<tr>
<td>3 (0x3)</td>
<td>V_ADD_F32</td>
<td>V_SUBREV_F32</td>
</tr>
<tr>
<td>4 (0x4)</td>
<td>V_SUB_F32</td>
<td>V_MUL_LEGACY_F32</td>
</tr>
<tr>
<td>5 (0x5)</td>
<td>V_SUBREV_F32</td>
<td>V_MUL_F32</td>
</tr>
<tr>
<td>6 (0x6)</td>
<td>V_MAC_LEGACY_F32</td>
<td>V_MUL_I32_I24</td>
</tr>
<tr>
<td>7 (0x7)</td>
<td>V_MUL_LEGACY_F32</td>
<td>V_MUL_HI_I32_I24</td>
</tr>
<tr>
<td>8 (0x8)</td>
<td>V_MUL_F32</td>
<td>V_MUL_U32_U24</td>
</tr>
<tr>
<td>9 (0x9)</td>
<td>V_MUL_I32_I24</td>
<td>V_MUL_HI_U32_U24</td>
</tr>
<tr>
<td>10 (0xa)</td>
<td>V_MUL_HI_I32_I24</td>
<td>V_MIN_F32</td>
</tr>
<tr>
<td>11 (0xb)</td>
<td>V_MUL_U32_U24</td>
<td>V_MAX_F32</td>
</tr>
<tr>
<td>12 (0xc)</td>
<td>V_MUL_HI_U32_U24</td>
<td>V_MIN_I32</td>
</tr>
<tr>
<td>13 (0xd)</td>
<td>V_MIN_LEGACY_F32</td>
<td>V_MAX_I32</td>
</tr>
<tr>
<td>14 (0xe)</td>
<td>V_MAX_LEGACY_F32</td>
<td>V_MIN_U32</td>
</tr>
<tr>
<td>15 (0xf)</td>
<td>V_MIN_F32</td>
<td>V_MAX_U32</td>
</tr>
</tbody>
</table>
<h3>Instruction set</h3>
<p>Alphabetically sorted instruction list:</p>
<h4>V_ADD_F32</h4>
<p>Opcode VOP2: 3 (0x3) for GCN 1.0/1.1; 1 (0x1) for GCN 1.2<br />
Opcode VOP3a: 259 (0x103) for GCN 1.0/1.1; 257 (0x101) for GCN 1.2<br />
Syntax: V_ADD_F32 VDST, SRC0, SRC1<br />
Description: Add two FP value from SRC0 and SRC1 and store result to VDST.<br />
Operation:<br />
<code>VDST = (FLOAT)SRC0 + (FLOAT)SRC1</code></p>
<h4>V_CNDMASK_B32</h4>
<p>Opcode VOP2: 0 (0x0) for GCN 1.0/1.1; 1 (0x0) for GCN 1.2<br />
Opcode VOP3a: 259 (0x100) for GCN 1.0/1.1; 256 (0x100) for GCN 1.2<br />
Syntax VOP2: V_CNDMASK_B32 VDST, SRC0, SRC1, VCC<br />
Syntax VOP3a: V_CNDMASK_B32 VDST, SRC0, SRC1, SSRC2(2)<br />
Description: If bit for current thread of VCC or SDST is set then store SRC1 to VDST,
otherwise store SRC0 to VDST. CLAMP and OMOD modifier doesn't affect on result.<br />
Operation:<br />
<code>VDST = SSRC2&amp;(1ULL&lt;&lt;THREADID) ? SRC1 : SRC0</code></p>
<h4>V_MAC_LEGACY_F32</h4>
<p>Opcode VOP2: 6 (0x6) for GCN 1.0/1.1
Opcode VOP3a: 262 (0x106) for GCN 1.0/1.1
Syntax: V_MUL_LEGACY_F32 VDST, SRC0, SRC1<br />
Description: Multiply FP value from SRC0 by FP value from SRC1 and add result to VDST.
If one of value is 0.0 then always do not change VDST (do not apply IEEE rules for 0.0*x).<br />
Operation:<br />
<code>if ((FLOAT)SRC0!=0.0 &amp;&amp; (FLOAT)SRC1!=0.0)
    VDST = (FLOAT)SRC0 * (FLOAT)SRC1 + (FLOAT)VDST</code></p>
<h4>V_MUL_LEGACY_F32</h4>
<p>Opcode VOP2: 7 (0x7) for GCN 1.0/1.1; 5 (0x4) for GCN 1.2<br />
Opcode VOP3a: 263 (0x107) for GCN 1.0/1.1; 260 (0x104) for GCN 1.2<br />
Syntax: V_MUL_LEGACY_F32 VDST, SRC0, SRC1<br />
Description: Multiply FP value from SRC0 by FP value from SRC1 and store result to VDST.
If one of value is 0.0 then always store 0.0 to VDST (do not apply IEEE rules for 0.0*x).<br />
Operation:<br />
<code>if ((FLOAT)SRC0!=0.0 &amp;&amp; (FLOAT)SRC1!=0.0)
    VDST = (FLOAT)SRC0 * (FLOAT)SRC1
else
    VDST = 0.0</code></p>
<h4>V_MUL_F32</h4>
<p>Opcode VOP2: 8 (0x8) for GCN 1.0/1.1; 5 (0x5) for GCN 1.2<br />
Opcode VOP3a: 264 (0x108) for GCN 1.0/1.1; 261 (0x105) for GCN 1.2<br />
Syntax: V_MUL_F32 VDST, SRC0, SRC1<br />
Description: Multiply FP value from SRC0 by FP value from SRC1 and store result to VDST.<br />
Operation:<br />
<code>VDST = (FLOAT)SRC0 * (FLOAT)SRC1</code></p>
<h4>V_MUL_HI_I32_24</h4>
<p>Opcode VOP2: 10 (0xa) for GCN 1.0/1.1; 7 (0x7) for GCN 1.2<br />
Opcode VOP3a: 266 (0x10a) for GCN 1.0/1.1; 263 (0x107) for GCN 1.2<br />
Syntax: V_MUL_HI_I32_24 VDST, SRC0, SRC1<br />
Description: Multiply 24-bit signed integer value from SRC0 by 24-bit signed value from SRC1
and store higher 16-bit of the result to VDST with sign extension.
Any modifier doesn't affect on result.<br />
Operation:<br />
<code>INT32 V0 = (INT32)((SRC0&amp;0x7fffff) | (SSRC0&amp;0x800000 ? 0xff800000 : 0))
INT32 V1 = (INT32)((SRC1&amp;0x7fffff) | (SSRC1&amp;0x800000 ? 0xff800000 : 0))
VDST = ((INT64)V0 * V1)&gt;&gt;32</code></p>
<h4>V_MUL_HI_U32_U24</h4>
<p>Opcode VOP2: 12 (0xc) for GCN 1.0/1.1; 9 (0x9) for GCN 1.2<br />
Opcode VOP3a: 268 (0x10c) for GCN 1.0/1.1; 265 (0x109) for GCN 1.2<br />
Syntax: V_MUL_HI_U32_U24 VDST, SRC0, SRC1<br />
Description: Multiply 24-bit unsigned integer value from SRC0 by 24-bit unsigned value
from SRC1 and store higher 16-bit of the result to VDST.
Any modifier doesn't affect to result.<br />
Operation:<br />
<code>VDST = ((UINT64)(SRC0&amp;0xffffff) * (UINT32)(SRC1&amp;0xffffff)) &gt;&gt; 32</code></p>
<h4>V_MUL_I32_I24</h4>
<p>Opcode VOP2: 9 (0x9) for GCN 1.0/1.1; 6 (0x6) for GCN 1.2<br />
Opcode VOP3a: 265 (0x109) for GCN 1.0/1.1; 262 (0x106) for GCN 1.2<br />
Syntax: V_MUL_I32_I24 VDST, SRC0, SRC1<br />
Description: Multiply 24-bit signed integer value from SRC0 by 24-bit signed value from SRC1
and store result to VDST. Any modifier doesn't affect to result.<br />
Operation:<br />
<code>INT32 V0 = (INT32)((SRC0&amp;0x7fffff) | (SSRC0&amp;0x800000 ? 0xff800000 : 0))
INT32 V1 = (INT32)((SRC1&amp;0x7fffff) | (SSRC1&amp;0x800000 ? 0xff800000 : 0))
VDST = V0 * V1</code></p>
<h4>V_MUL_U32_U24</h4>
<p>Opcode VOP2: 11 (0xb) for GCN 1.0/1.1; 8 (0x8) for GCN 1.2<br />
Opcode VOP3a: 267 (0x10b) for GCN 1.0/1.1; 264 (0x108) for GCN 1.2<br />
Syntax: V_MUL_U32_U24 VDST, SRC0, SRC1<br />
Description: Multiply 24-bit unsigned integer value from SRC0 by 24-bit unsigned value
from SRC1 and store result to VDST. Any modifier doesn't affect to result.<br />
Operation:<br />
<code>VDST = (UINT32)(SRC0&amp;0xffffff) * (UINT32)(SRC1&amp;0xffffff)</code></p>
<h4>V_READLANE_B32</h4>
<p>Opcode VOP2: 1 (0x1) for GCN 1.0/1.1<br />
Opcode VOP3a: 257 (0x101) for GCN 1.0/1.1<br />
Syntax: V_READLANE_B32 SDST, VSRC0, SSRC1<br />
Description: Copy one VSRC0 lane value to one SDST. Lane (thread id) choosen from SSRC1&amp;63.
SSRC1 can be SGPR or M0.<br />
Operation:<br />
<code>SDST = VSRC0[SSRC1 &amp; 63]</code></p>
<h4>V_WRITELANE_B32</h4>
<p>Opcode VOP2: 2 (0x2) for GCN 1.0/1.1<br />
Opcode VOP3a: 258 (0x102) for GCN 1.0/1.1<br />
Syntax: V_WRITELANE_B32 VDST, VSRC0, SSRC1<br />
Description: Copy SGPR to one lane of VDST. Lane choosen (thread id) from SSRC1&amp;63.
SSRC1 can be SGPR or M0.<br />
Operation:<br />
<code>VDST[SSRC1 &amp; 63] = SSRC0</code></p>
<h4>V_SUB_F32</h4>
<p>Opcode VOP2: 4 (0x4) for GCN 1.0/1.1; 2 (0x2) for GCN 1.2<br />
Opcode VOP3a: 260 (0x104) for GCN 1.0/1.1; 258 (0x102) for GCN 1.2<br />
Syntax: V_SUB_F32 VDST, SRC0, SRC1<br />
Description: Subtract FP value from SRC0 and FP value from SRC1 and store result to VDST.<br />
Operation:<br />
<code>VDST = (FLOAT)SRC0 - (FLOAT)SRC1</code></p>
<h4>V_SUBREV_F32</h4>
<p>Opcode VOP2: 5 (0x5) for GCN 1.0/1.1; 2 (0x3) for GCN 1.2<br />
Opcode VOP3a: 261 (0x105) for GCN 1.0/1.1; 259 (0x103) for GCN 1.2<br />
Syntax: V_SUBREV_F32 VDST, SRC0, SRC1<br />
Description: Subtract FP value from SRC1 and FP value from SRC0 and store result to VDST.<br />
Operation:<br />
<code>VDST = (FLOAT)SRC1 - (FLOAT)SRC0</code></p>
}}}