Changes between Version 9 and Version 10 of GcnInstrsFlat


Ignore:
Timestamp:
11/28/17 22:00:30 (6 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GcnInstrsFlat

    v9 v10  
    55<p>These instructions allow to access to main memory, LDS and scratch buffer.
    66FLAT instructions fetch address from 2 vector registers that hold 64-bit address.
    7 FLAT instruction presents only in GCN 1.1 or later architecture.</p>
    8 <p>List of fields for the FLAT encoding (GCN 1.1 - 1.4):</p>
     7FLAT instructions presents only in GCN 1.1 or later architecture.</p>
     8<p>List of fields for the FLAT encoding (GCN 1.1/1.2):</p>
    99<table>
    1010<thead>
     
    7575<td>NV</td>
    7676<td>Non-Volatile (GCN 1.4)</td>
     77</tr>
     78<tr>
     79<td>56-63</td>
     80<td>VDST</td>
     81<td>Vector destination register</td>
     82</tr>
     83</tbody>
     84</table>
     85<p>List of fields for the FLAT encoding (GCN 1.4):</p>
     86<table>
     87<thead>
     88<tr>
     89<th>Bits</th>
     90<th>Name</th>
     91<th>Description</th>
     92</tr>
     93</thead>
     94<tbody>
     95<tr>
     96<td>0-12</td>
     97<td>OFFSET</td>
     98<td>Byte offset</td>
     99</tr>
     100<tr>
     101<td>13</td>
     102<td>LDS</td>
     103<td>transfer DATA to LDS and memory</td>
     104</tr>
     105<tr>
     106<td>14-15</td>
     107<td>SEG</td>
     108<td>Memory segment (instrunction type)</td>
     109</tr>
     110<tr>
     111<td>16</td>
     112<td>GLC</td>
     113<td>Operation globally coherent</td>
     114</tr>
     115<tr>
     116<td>17</td>
     117<td>SLC</td>
     118<td>System level coherent</td>
     119</tr>
     120<tr>
     121<td>18-24</td>
     122<td>OPCODE</td>
     123<td>Operation code</td>
     124</tr>
     125<tr>
     126<td>25-31</td>
     127<td>ENCODING</td>
     128<td>Encoding type. Must be 0b110111</td>
     129</tr>
     130<tr>
     131<td>32-39</td>
     132<td>VADDR</td>
     133<td>Vector address registers</td>
     134</tr>
     135<tr>
     136<td>40-47</td>
     137<td>VDATA</td>
     138<td>Vector data register</td>
     139</tr>
     140<tr>
     141<td>48-54</td>
     142<td>SADDR</td>
     143<td>Scalar SGPR offset (for GLOBAL/SCRATCH) (0x7f value disables it)</td>
     144</tr>
     145<tr>
     146<td>55</td>
     147<td>NV</td>
     148<td>Non-Volatile</td>
    77149</tr>
    78150<tr>
     
    117189SCRATCH instruction syntax: INSTRUCTION VADDR(2), VDATA, SADDR|OFF [MODIFIERS]</p>
    118190<p>Modifiers can be supplied in any order. Modifiers list: SLC, GLC, TFE,
    119 LDS, NV, OFFSET:OFFSET. The TFE flag requires additional the VDATA register.
     191LDS, NV, INST_OFFSET:OFFSET. The TFE flag requires additional the VDATA register.
    120192LDS, NV and OFFSET are available only in GCN 1.4 architecture.</p>
    121193<p>FLAT instruction can complete out of order with each other. This can be caused by different
    122194resources from/to that instruction can load/store. FLAT instruction increase VMCNT if access
    123195to main memory, or LKGMCNT if accesses to LDS.</p>
    124 <p>OFFSET can be 13-bit signed for GLOBAL_* and SCRATCH_* instructions or
    125 12-bit unsigned for FLAT_* instructions.</p>
     196<p>OFFSET (INST_OFFSET modifier) can be 13-bit signed for GLOBAL_* and SCRATCH_*
     197instructions or 12-bit unsigned for FLAT_* instructions.</p>
    126198<h3>Instructions by opcode</h3>
    127199<p>List of the FLAT instructions by opcode (GCN 1.1/1.2):</p>
     
    527599</tbody>
    528600</table>
     601<p>List of the FLAT/GLOBAL/SCRATCH instructions by opcode (GCN 1.4):</p>
     602<table>
     603<thead>
     604<tr>
     605<th>Opcode</th>
     606<th>FLAT</th>
     607<th>GLOBAL</th>
     608<th>SCRATCH</th>
     609<th>Mnemonic</th>
     610</tr>
     611</thead>
     612<tbody>
     613<tr>
     614<td>16 (0x10)</td>
     615<td>✓</td>
     616<td>✓</td>
     617<td>✓</td>
     618<td>*_LOAD_UBYTE</td>
     619</tr>
     620<tr>
     621<td>17 (0x11)</td>
     622<td>✓</td>
     623<td>✓</td>
     624<td>✓</td>
     625<td>*_LOAD_SBYTE</td>
     626</tr>
     627<tr>
     628<td>18 (0x12)</td>
     629<td>✓</td>
     630<td>✓</td>
     631<td>✓</td>
     632<td>*_LOAD_USHORT</td>
     633</tr>
     634<tr>
     635<td>19 (0x13)</td>
     636<td>✓</td>
     637<td>✓</td>
     638<td>✓</td>
     639<td>*_LOAD_SSHORT</td>
     640</tr>
     641<tr>
     642<td>20 (0x14)</td>
     643<td>✓</td>
     644<td>✓</td>
     645<td>✓</td>
     646<td>*_LOAD_DWORD</td>
     647</tr>
     648<tr>
     649<td>21 (0x15)</td>
     650<td>✓</td>
     651<td>✓</td>
     652<td>✓</td>
     653<td>*_LOAD_DWORDX2</td>
     654</tr>
     655<tr>
     656<td>22 (0x16)</td>
     657<td>✓</td>
     658<td>✓</td>
     659<td>✓</td>
     660<td>*_LOAD_DWORDX3</td>
     661</tr>
     662<tr>
     663<td>23 (0x17)</td>
     664<td>✓</td>
     665<td>✓</td>
     666<td>✓</td>
     667<td>*_LOAD_DWORDX4</td>
     668</tr>
     669<tr>
     670<td>24 (0x18)</td>
     671<td>✓</td>
     672<td>✓</td>
     673<td>✓</td>
     674<td>*_STORE_BYTE</td>
     675</tr>
     676<tr>
     677<td>25 (0x19)</td>
     678<td>✓</td>
     679<td>✓</td>
     680<td>✓</td>
     681<td>*_STORE_BYTE_D16_HI</td>
     682</tr>
     683<tr>
     684<td>26 (0x1a)</td>
     685<td>✓</td>
     686<td>✓</td>
     687<td>✓</td>
     688<td>*_STORE_SHORT</td>
     689</tr>
     690<tr>
     691<td>27 (0x1b)</td>
     692<td>✓</td>
     693<td>✓</td>
     694<td>✓</td>
     695<td>*_STORE_SHORT_D16_HI</td>
     696</tr>
     697<tr>
     698<td>28 (0x1c)</td>
     699<td>✓</td>
     700<td>✓</td>
     701<td>✓</td>
     702<td>*_STORE_DWORD</td>
     703</tr>
     704<tr>
     705<td>29 (0x1d)</td>
     706<td>✓</td>
     707<td>✓</td>
     708<td>✓</td>
     709<td>*_STORE_DWORDX2</td>
     710</tr>
     711<tr>
     712<td>30 (0x1e)</td>
     713<td>✓</td>
     714<td>✓</td>
     715<td>✓</td>
     716<td>*_STORE_DWORDX3</td>
     717</tr>
     718<tr>
     719<td>31 (0x1f)</td>
     720<td>✓</td>
     721<td>✓</td>
     722<td>✓</td>
     723<td>*_STORE_DWORDX4</td>
     724</tr>
     725<tr>
     726<td>32 (0x20)</td>
     727<td>✓</td>
     728<td>✓</td>
     729<td>✓</td>
     730<td>*_LOAD_UBYTE_D16</td>
     731</tr>
     732<tr>
     733<td>33 (0x21)</td>
     734<td>✓</td>
     735<td>✓</td>
     736<td>✓</td>
     737<td>*_LOAD_UBYTE_D16_HI</td>
     738</tr>
     739<tr>
     740<td>34 (0x22)</td>
     741<td>✓</td>
     742<td>✓</td>
     743<td>✓</td>
     744<td>*_LOAD_SBYTE_D16</td>
     745</tr>
     746<tr>
     747<td>35 (0x23)</td>
     748<td>✓</td>
     749<td>✓</td>
     750<td>✓</td>
     751<td>*_LOAD_SBYTE_D16_HI</td>
     752</tr>
     753<tr>
     754<td>36 (0x24)</td>
     755<td>✓</td>
     756<td>✓</td>
     757<td>✓</td>
     758<td>*_LOAD_SHORT_D16</td>
     759</tr>
     760<tr>
     761<td>37 (0x25)</td>
     762<td>✓</td>
     763<td>✓</td>
     764<td>✓</td>
     765<td>*_LOAD_SHORT_D16_HI</td>
     766</tr>
     767<tr>
     768<td>64 (0x40)</td>
     769<td>✓</td>
     770<td>✓</td>
     771<td></td>
     772<td>*_ATOMIC_SWAP</td>
     773</tr>
     774<tr>
     775<td>65 (0x41)</td>
     776<td>✓</td>
     777<td>✓</td>
     778<td></td>
     779<td>*_ATOMIC_CMPSWAP</td>
     780</tr>
     781<tr>
     782<td>66 (0x42)</td>
     783<td>✓</td>
     784<td>✓</td>
     785<td></td>
     786<td>*_ATOMIC_ADD</td>
     787</tr>
     788<tr>
     789<td>67 (0x43)</td>
     790<td>✓</td>
     791<td>✓</td>
     792<td></td>
     793<td>*_ATOMIC_SUB</td>
     794</tr>
     795<tr>
     796<td>68 (0x44)</td>
     797<td>✓</td>
     798<td>✓</td>
     799<td></td>
     800<td>*_ATOMIC_SMIN</td>
     801</tr>
     802<tr>
     803<td>69 (0x45)</td>
     804<td>✓</td>
     805<td>✓</td>
     806<td></td>
     807<td>*_ATOMIC_UMIN</td>
     808</tr>
     809<tr>
     810<td>70 (0x46)</td>
     811<td>✓</td>
     812<td>✓</td>
     813<td></td>
     814<td>*_ATOMIC_SMAX</td>
     815</tr>
     816<tr>
     817<td>71 (0x47)</td>
     818<td>✓</td>
     819<td>✓</td>
     820<td></td>
     821<td>*_ATOMIC_UMAX</td>
     822</tr>
     823<tr>
     824<td>72 (0x48)</td>
     825<td>✓</td>
     826<td>✓</td>
     827<td></td>
     828<td>*_ATOMIC_AND</td>
     829</tr>
     830<tr>
     831<td>73 (0x49)</td>
     832<td>✓</td>
     833<td>✓</td>
     834<td></td>
     835<td>*_ATOMIC_OR</td>
     836</tr>
     837<tr>
     838<td>74 (0x4a)</td>
     839<td>✓</td>
     840<td>✓</td>
     841<td></td>
     842<td>*_ATOMIC_XOR</td>
     843</tr>
     844<tr>
     845<td>75 (0x4b)</td>
     846<td>✓</td>
     847<td>✓</td>
     848<td></td>
     849<td>*_ATOMIC_INC</td>
     850</tr>
     851<tr>
     852<td>76 (0x4c)</td>
     853<td>✓</td>
     854<td>✓</td>
     855<td></td>
     856<td>*_ATOMIC_DEC</td>
     857</tr>
     858<tr>
     859<td>96 (0x60)</td>
     860<td>✓</td>
     861<td>✓</td>
     862<td></td>
     863<td>*_ATOMIC_SWAP_X2</td>
     864</tr>
     865<tr>
     866<td>97 (0x61)</td>
     867<td>✓</td>
     868<td>✓</td>
     869<td></td>
     870<td>*_ATOMIC_CMPSWAP_X2</td>
     871</tr>
     872<tr>
     873<td>98 (0x62)</td>
     874<td>✓</td>
     875<td>✓</td>
     876<td></td>
     877<td>*_ATOMIC_ADD_X2</td>
     878</tr>
     879<tr>
     880<td>99 (0x63)</td>
     881<td>✓</td>
     882<td>✓</td>
     883<td></td>
     884<td>*_ATOMIC_SUB_X2</td>
     885</tr>
     886<tr>
     887<td>100 (0x64)</td>
     888<td>✓</td>
     889<td>✓</td>
     890<td></td>
     891<td>*_ATOMIC_SMIN_X2</td>
     892</tr>
     893<tr>
     894<td>101 (0x65)</td>
     895<td>✓</td>
     896<td>✓</td>
     897<td></td>
     898<td>*_ATOMIC_UMIN_X2</td>
     899</tr>
     900<tr>
     901<td>102 (0x66)</td>
     902<td>✓</td>
     903<td>✓</td>
     904<td></td>
     905<td>*_ATOMIC_SMAX_X2</td>
     906</tr>
     907<tr>
     908<td>103 (0x67)</td>
     909<td>✓</td>
     910<td>✓</td>
     911<td></td>
     912<td>*_ATOMIC_UMAX_X2</td>
     913</tr>
     914<tr>
     915<td>104 (0x68)</td>
     916<td>✓</td>
     917<td>✓</td>
     918<td></td>
     919<td>*_ATOMIC_AND_X2</td>
     920</tr>
     921<tr>
     922<td>105 (0x69)</td>
     923<td>✓</td>
     924<td>✓</td>
     925<td></td>
     926<td>*_ATOMIC_OR_X2</td>
     927</tr>
     928<tr>
     929<td>106 (0x6a)</td>
     930<td>✓</td>
     931<td>✓</td>
     932<td></td>
     933<td>*_ATOMIC_XOR_X2</td>
     934</tr>
     935<tr>
     936<td>107 (0x6b)</td>
     937<td>✓</td>
     938<td>✓</td>
     939<td></td>
     940<td>*_ATOMIC_INC_X2</td>
     941</tr>
     942<tr>
     943<td>108 (0x6c)</td>
     944<td>✓</td>
     945<td>✓</td>
     946<td></td>
     947<td>*_ATOMIC_DEC_X2</td>
     948</tr>
     949</tbody>
     950</table>
     951<p>The '*' means prefix of instruction (FLAT, GLOBAL or SCRATCH).</p>
    529952<h3>Instruction set</h3>
    530953<p>Alphabetically sorted instruction list:</p>
    531954<h4>FLAT_ATOMIC_ADD</h4>
    532 <p>Opcode: 50 (0x32) for GCN 1.1; 66 (0x42) for GCN 1.2<br />
     955<p>Opcode: 50 (0x32) for GCN 1.1; 66 (0x42) for GCN 1.2/1.4<br />
    533956Syntax: FLAT_ATOMIC_ADD VDST, VADDR(2), VDATA<br />
    534 Description: Add VDATA to value of VADDR address, and store result to this address.
     957Description: Add VDATA to value of memory address, and store result to this address.
    535958If GLC flag is set then return previous value from this address to VDST,
    536959otherwise keep VDST value. Operation is atomic.<br />
    537960Operation:<br />
    538 <code>UINT32* VM = (UINT32*)VADDR
     961<code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET)
    539962UINT32 P = *VM; *VM = *VM + VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
    540963<h4>FLAT_ATOMIC_ADD_X2</h4>
    541 <p>Opcode: 82 (0x52) for GCN 1.1; 98 (0x62) for GCN 1.2<br />
     964<p>Opcode: 82 (0x52) for GCN 1.1; 98 (0x62) for GCN 1.2/1.4<br />
    542965Syntax: FLAT_ATOMIC_ADD_X2 VDST(2), VADDR(2), VDATA(2)<br />
    543 Description: Add 64-bit VDATA to 64-bit value of VADDR address, and store result
     966Description: Add 64-bit VDATA to 64-bit value of memory address, and store result
    544967to this address. If GLC flag is set then return previous value from address to VDST,
    545968otherwise keep VDST value. Operation is atomic.<br />
    546969Operation:<br />
    547 <code>UINT64* VM = (UINT64*)VADDR
     970<code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET)
    548971UINT64 P = *VM; *VM = *VM + VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
    549972<h4>FLAT_ATOMIC_AND</h4>
    550 <p>Opcode: 57 (0x39) for GCN 1.1; 72 (0x48) for GCN 1.2<br />
     973<p>Opcode: 57 (0x39) for GCN 1.1; 72 (0x48) for GCN 1.2/1.4<br />
    551974Syntax: FLAT_ATOMIC_AND VDST, VADDR(2), VDATA<br />
    552 Description: Do bitwise AND on VDATA and value of VADDR address,
     975Description: Do bitwise AND on VDATA and value of memory address,
    553976and store result to this address. If GLC flag is set then return previous value
    554977from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
    555978Operation:<br />
    556 <code>UINT32* VM = (UINT32*)VADDR
     979<code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET)
    557980UINT32 P = *VM; *VM = *VM &amp; VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
    558981<h4>FLAT_ATOMIC_AND_X2</h4>
    559 <p>Opcode: 89 (0x59) for GCN 1.1; 104 (0x68) for GCN 1.2<br />
     982<p>Opcode: 89 (0x59) for GCN 1.1; 104 (0x68) for GCN 1.2/1.4<br />
    560983Syntax: FLAT_ATOMIC_AND_X2 VDST(2), VADDR(2), VDATA(2)<br />
    561 Description: Do 64-bit bitwise AND on VDATA and value of VADDR address,
     984Description: Do 64-bit bitwise AND on VDATA and value of memory address,
    562985and store result to this address. If GLC flag is set then return previous value
    563986from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
    564987Operation:<br />
    565 <code>UINT64* VM = (UINT64*)VADDR
     988<code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET)
    566989UINT64 P = *VM; *VM = *VM &amp; VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
    567990<h4>FLAT_ATOMIC_CMPSWAP</h4>
    568 <p>Opcode: 49 (0x31) for GCN 1.1; 65 (0x41) for GCN 1.2<br />
     991<p>Opcode: 49 (0x31) for GCN 1.1; 65 (0x41) for GCN 1.2/1.4<br />
    569992Syntax: FLAT_ATOMIC_CMPSWAP VDST, VADDR(2), VDATA(2)<br />
    570 Description: Store lower VDATA dword into VADDR address  if previous value
     993Description: Store lower VDATA dword into memory address  if previous value
    571994from that address is equal VDATA&gt;&gt;32, otherwise keep old value from address.
    572995If GLC flag is set then return previous value from address to VDST,
    573996otherwise keep VDST value. Operation is atomic.<br />
    574997Operation:<br />
    575 <code>UINT32* VM = (UINT32*)VADDR
     998<code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET)
    576999UINT32 P = *VM; *VM = *VM==(VDATA&gt;&gt;32) ? VDATA&amp;0xffffffff : *VM // part of atomic
    5771000VDST = (GLC) ? P : VDST // last part of atomic</code></p>
    5781001<h4>FLAT_ATOMIC_CMPSWAP_X2</h4>
    579 <p>Opcode: 81 (0x51) for GCN 1.1; 97 (0x61) for GCN 1.2<br />
     1002<p>Opcode: 81 (0x51) for GCN 1.1; 97 (0x61) for GCN 1.2/1.4<br />
    5801003Syntax: FLAT_ATOMIC_CMPSWAP_X2 VDST(2), VADDR(2), VDATA(4)<br />
    581 Description: Store lower VDATA 64-bit word into VADDR address if previous value
     1004Description: Store lower VDATA 64-bit word into memory address if previous value
    5821005from address is equal VDATA&gt;&gt;64, otherwise keep old value from VADDR.
    5831006If GLC flag is set then return previous value from VADDR to VDST,
    5841007otherwise keep VDST value. Operation is atomic.<br />
    5851008Operation:<br />
    586 <code>UINT64* VM = (UINT64*)VADDR
     1009<code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET)
    5871010UINT64 P = *VM; *VM = *VM==(VDATA[2:3]) ? VDATA[0:1] : *VM // part of atomic
    5881011VDST = (GLC) ? P : VDST // last part of atomic</code></p>
    5891012<h4>FLAT_ATOMIC_DEC</h4>
    590 <p>Opcode: 61 (0x3d) for GCN 1.1; 76 (0x4c) for GCN 1.2<br />
     1013<p>Opcode: 61 (0x3d) for GCN 1.1; 76 (0x4c) for GCN 1.2/1.4<br />
    5911014Syntax: FLAT_ATOMIC_DEC VDST, VADDR(2), VDATA<br />
    592 Description: Compare value from VADDR address and if less or equal than VDATA
    593 and this value is not zero, then decrement value from VADDR address,
     1015Description: Compare value from memory address and if less or equal than VDATA
     1016and this value is not zero, then decrement value from memory address,
    5941017otherwise store VDATA to this address. If GLC flag is set then return previous value
    5951018from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
    5961019Operation:<br />
    597 <code>UINT32* VM = (UINT32*)VADDR
     1020<code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET)
    5981021UINT32 P = *VM; *VM = (*VM &lt;= VDATA &amp;&amp; *VM!=0) ? *VM-1 : VDATA // atomic
    5991022VDST = (GLC) ? P : VDST // atomic</code></p>
    6001023<h4>FLAT_ATOMIC_DEC_X2</h4>
    601 <p>Opcode: 93 (0x5d) for GCN 1.1; 108 (0x6c) for GCN 1.2<br />
     1024<p>Opcode: 93 (0x5d) for GCN 1.1; 108 (0x6c) for GCN 1.2/1.4<br />
    6021025Syntax: FLAT_ATOMIC_DEC_X2 VDST(2), VADDR(2), VDATA(2)<br />
    603 Description: Compare 64-bit value from VADDR address and if less or equal than VDATA
    604 and this value is not zero, then decrement value from VADDR address,
     1026Description: Compare 64-bit value from memory address and if less or equal than VDATA
     1027and this value is not zero, then decrement value from memory address,
    6051028otherwise store VDATA to this address. If GLC flag is set then return previous value
    6061029from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
    6071030Operation:<br />
    608 <code>UINT64* VM = (UINT64*)VADDR
     1031<code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET)
    6091032UINT64 P = *VM; *VM = (*VM &lt;= VDATA &amp;&amp; *VM!=0) ? *VM-1 : VDATA // atomic
    6101033VDST = (GLC) ? P : VDST // atomic</code></p>
     
    6121035<p>Opcode: 62 (0x3e) for GCN 1.1<br />
    6131036Syntax: FLAT_ATOMIC_FCMPSWAP VDST, VADDR(1:2), VDATA(2)<br />
    614 Description: Store lower VDATA dword into VADDR address if previous single floating point
     1037Description: Store lower VDATA dword into memory address if previous single floating point
    6151038value from address is equal singe floating point value VDATA&gt;&gt;32,
    616 otherwise keep old value from VADDR address.
     1039otherwise keep old value from memory address.
    6171040If GLC flag is set then return previous value from this address to VDST,
    6181041otherwise keep VDST value. Operation is atomic.<br />
    6191042Operation:<br />
    620 <code>FLOAT* VM = (FLOAT*)VADDR
     1043<code>FLOAT* VM = (FLOAT*)(VADDR + INST_OFFSET)
    6211044FLOAT P = *VM; *VM = *VM==ASFLOAT(VDATA&gt;&gt;32) ? VDATA&amp;0xffffffff : *VM // part of atomic
    6221045VDST[0] = (GLC) ? P : VDST // last part of atomic</code></p>
     
    6241047<p>Opcode: 94 (0x5e) for GCN 1.1<br />
    6251048Syntax: FLAT_ATOMIC_FCMPSWAP_X2 VDATA(2), VADDR(2), SRSRC(4), SOFFSET<br />
    626 Description: Store lower VDATA 64-bit word into VADDR address if previous double
     1049Description: Store lower VDATA 64-bit word into memory address if previous double
    6271050floating point value from address is equal singe floating point value VDATA&gt;&gt;32,
    628 otherwise keep old value from VADDR address.
     1051otherwise keep old value from memory address.
    6291052If GLC flag is set then return previous value from address to VDST, otherwise keep
    6301053VDST value. Operation is atomic.<br />
    6311054Operation:<br />
    632 <code>DOUBLE* VM = (DOUBLE*)VMADDR
     1055<code>DOUBLE* VM = (DOUBLE*)(VADDR + INST_OFFSET)
    6331056DOUBLE P = *VM; *VM = *VM==ASDOUBLE(VDATA[2:3]) ? VDATA[0:1] : *VM // part of atomic
    6341057VDST = (GLC) ? P : VDST // last part of atomic</code></p>
     
    6371060Syntax: FLAT_ATOMIC_FMAX VDST, VADDR(2), VDATA<br />
    6381061Description: Choose greatest single floating point value from VDATA and from
    639 VADDR address, and store result to this address.
     1062memory address, and store result to this address.
    6401063If GLC flag is set then return previous value from address to VDST, otherwise keep
    6411064VDST value. Operation is atomic.<br />
    6421065Operation:<br />
    643 <code>FLOAT* VM = (FLOAT*)VADDR
     1066<code>FLOAT* VM = (FLOAT*)(VADDR + INST_OFFSET)
    6441067UINT32 P = *VM; *VM = MAX(*VM, ASFLOAT(VDATA)); VDST = (GLC) ? P : VDST // atomic</code></p>
    6451068<h4>BUFFER_ATOMIC_FMAX_X2</h4>
     
    6471070Syntax: FLAT_ATOMIC_FMAX_X2 VDST(2), VADDR(2), VDATA(2)<br />
    6481071Description: Choose greatest double floating point value from VDATA and from
    649 VADDR address, and store result to this address.
     1072memory address, and store result to this address.
    6501073If GLC flag is set then return previous value from address to VDST,
    6511074otherwise keep VDST value. Operation is atomic.<br />
    6521075Operation:<br />
    653 <code>DOUBLE* VM = (DOUBLE*)VADDR
     1076<code>DOUBLE* VM = (DOUBLE*)(VADDR + INST_OFFSET)
    6541077UINT64 P = *VM; *VM = MAX(*VM, ASDOUBLE(VDATA)); VDST = (GLC) ? P : VDST // atomic</code></p>
    6551078<h4>FLAT_ATOMIC_FMIN</h4>
     
    6571080Syntax: FLAT_ATOMIC_FMIN VDST, VADDR(2), VDATA<br />
    6581081Description: Choose smallest single floating point value from VDATA and from
    659 VADDR address, and store result to this address.
     1082memory address, and store result to this address.
    6601083If GLC flag is set then return previous value from address to VDST, otherwise keep
    6611084VDST value. Operation is atomic.<br />
    6621085Operation:<br />
    663 <code>FLOAT* VM = (FLOAT*)VADDR
     1086<code>FLOAT* VM = (FLOAT*)(VADDR + INST_OFFSET)
    6641087UINT32 P = *VM; *VM = MIN(*VM, ASFLOAT(VDATA)); VDST = (GLC) ? P : VDST // atomic</code></p>
    6651088<h4>BUFFER_ATOMIC_FMIN_X2</h4>
     
    6671090Syntax: FLAT_ATOMIC_FMIN_X2 VDST(2), VADDR(2), VDATA(2)<br />
    6681091Description: Choose smallest double floating point value from VDATA and from
    669 VADDR address, and store result to this address.
     1092memory address, and store result to this address.
    6701093If GLC flag is set then return previous value from address to VDST,
    6711094otherwise keep VDST value. Operation is atomic.<br />
    6721095Operation:<br />
    673 <code>DOUBLE* VM = (DOUBLE*)VADDR
     1096<code>DOUBLE* VM = (DOUBLE*)(VADDR + INST_OFFSET)
    6741097UINT64 P = *VM; *VM = MIN(*VM, ASDOUBLE(VDATA)); VDST = (GLC) ? P : VDST // atomic</code></p>
    6751098<h4>FLAT_ATOMIC_INC</h4>
    676 <p>Opcode: 60 (0x3c) for GCN 1.1; 75 (0x4b) for GCN 1.2<br />
     1099<p>Opcode: 60 (0x3c) for GCN 1.1; 75 (0x4b) for GCN 1.2/1.4<br />
    6771100Syntax: FLT_ATOMIC_INC VDST, VADDR(2), VDATA<br />
    678 Description: Compare value from VADDR address and if less than VDATA,
     1101Description: Compare value from memory address and if less than VDATA,
    6791102then increment value from address, otherwise store zero to address.
    6801103If GLC flag is set then return previous value from this address to VDST,
    6811104otherwise keep VDST value. Operation is atomic.<br />
    6821105Operation:<br />
    683 <code>UINT32* VM = (UINT32*)VADDR
     1106<code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET)
    6841107UINT32 P = *VM; *VM = (*VM &lt; VDATA) ? *VM+1 : 0; VDST = (GLC) ? P : VDST // atomic</code></p>
    6851108<h4>FLAT_ATOMIC_INC_X2</h4>
    686 <p>Opcode: 92 (0x5c) for GCN 1.1; 107 (0x9b) for GCN 1.2<br />
     1109<p>Opcode: 92 (0x5c) for GCN 1.1; 107 (0x9b) for GCN 1.2/1.4<br />
    6871110Syntax: FLAT_ATOMIC_INC_X2 VDST(2), VADDR(2), VADDR(2)<br />
    688 Description: Compare 64-bit value from VADDR address and if less than VDATA,
     1111Description: Compare 64-bit value from memory address and if less than VDATA,
    6891112then increment value from address, otherwise store zero to address.
    6901113If GLC flag is set then return previous value from this address to VDST,
    6911114otherwise keep VDST value. Operation is atomic.<br />
    6921115Operation:<br />
    693 <code>UINT64* VM = (UINT64*)VADDR
     1116<code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET)
    6941117UINT64 P = *VM; *VM = (*VM &lt; VDATA) ? *VM+1 : 0; VDST = (GLC) ? P : VDST // atomic</code></p>
    6951118<h4>FLAT_ATOMIC_OR</h4>
    696 <p>Opcode: 58 (0x3a) for GCN 1.1; 73 (0x49) for GCN 1.2<br />
     1119<p>Opcode: 58 (0x3a) for GCN 1.1; 73 (0x49) for GCN 1.2/1.4<br />
    6971120Syntax: FLAT_ATOMIC_OR VDST, VADDR(2), VDATA<br />
    698 Description: Do bitwise OR on VDATA and value of VADDR address,
     1121Description: Do bitwise OR on VDATA and value of memory address,
    6991122and store result to this address. If GLC flag is set then return previous value
    7001123from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
    7011124Operation:<br />
    702 <code>UINT32* VM = (UINT32*)VADDR
     1125<code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET)
    7031126UINT32 P = *VM; *VM = *VM | VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
    7041127<h4>FLAT_ATOMIC_OR_X2</h4>
    705 <p>Opcode: 90 (0x5a) for GCN 1.1; 105 (0x69) for GCN 1.2<br />
     1128<p>Opcode: 90 (0x5a) for GCN 1.1; 105 (0x69) for GCN 1.2/1.4<br />
    7061129Syntax: FLAT_ATOMIC_OR_X2 VDST(2), VADDR(2), VDATA(2)<br />
    707 Description: Do 64-bit bitwise OR on VDATA and value of VADDR address,
     1130Description: Do 64-bit bitwise OR on VDATA and value of memory address,
    7081131and store result to this address. If GLC flag is set then return previous value
    7091132from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
    7101133Operation:<br />
    711 <code>UINT64* VM = (UINT64*)VADDR
     1134<code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET)
    7121135UINT64 P = *VM; *VM = *VM | VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
    7131136<h4>FLAT_ATOMIC_SMAX</h4>
    714 <p>Opcode: 55 (0x37) for GCN 1.1; 70 (0x46) for GCN 1.2<br />
     1137<p>Opcode: 55 (0x37) for GCN 1.1; 70 (0x46) for GCN 1.2/1.4<br />
    7151138Syntax: FLAT_ATOMIC_SMAX VDST, VADDR(2), VDATA<br />
    716 Description: Choose greatest signed 32-bit value from VDATA and from VADDR address,
     1139Description: Choose greatest signed 32-bit value from VDATA and from memory address,
    7171140and store result to this address.
    7181141If GLC flag is set then return previous value from this address to VDST, otherwise keep
    7191142VDST value. Operation is atomic.<br />
    7201143Operation:<br />
    721 <code>INT32* VM = (INT32*)VADDR
     1144<code>INT32* VM = (INT32*)(VADDR + INST_OFFSET)
    7221145UINT32 P = *VM; *VM = MAX(*VM, (INT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
    7231146<h4>FLAT_ATOMIC_SMAX_X2</h4>
    724 <p>Opcode: 87 (0x57) for GCN 1.1; 102 (0x66) for GCN 1.2<br />
     1147<p>Opcode: 87 (0x57) for GCN 1.1; 102 (0x66) for GCN 1.2/1.4<br />
    7251148Syntax: FLAT_ATOMIC_SMAX_X2 VDST(2), VADDR(2), VDATA(2)<br />
    726 Description: Choose greatest signed 64-bit value from VDATA and from VADDR address,
     1149Description: Choose greatest signed 64-bit value from VDATA and from memory address,
    7271150and store result to this address.
    7281151If GLC flag is set then return previous value from this address to VDST, otherwise keep
    7291152VDST value. Operation is atomic.<br />
    7301153Operation:<br />
    731 <code>INT64* VM = (INT64*)VADDR
     1154<code>INT64* VM = (INT64*)(VADDR + INST_OFFSET)
    7321155UINT64 P = *VM; *VM = MAX(*VM, (INT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
    7331156<h4>FLAT_ATOMIC_SMIN</h4>
    734 <p>Opcode: 53 (0x35) for GCN 1.1; 68 (0x44) for GCN 1.2<br />
     1157<p>Opcode: 53 (0x35) for GCN 1.1; 68 (0x44) for GCN 1.2/1.4<br />
    7351158Syntax: FLAT_ATOMIC_SMIN VDST, VADDR(2), VDATA<br />
    736 Description: Choose smallest signed 32-bit value from VDATA and from VADDR address,
     1159Description: Choose smallest signed 32-bit value from VDATA and from memory address,
    7371160and store result to this address.
    7381161If GLC flag is set then return previous value from this address to VDST, otherwise keep
    7391162VDST value. Operation is atomic.<br />
    7401163Operation:<br />
    741 <code>INT32* VM = (INT32*)VADDR
     1164<code>INT32* VM = (INT32*)(VADDR + INST_OFFSET)
    7421165UINT32 P = *VM; *VM = MIN(*VM, (INT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
    7431166<h4>FLAT_ATOMIC_SMIN_X2</h4>
    744 <p>Opcode: 85 (0x55) for GCN 1.1; 100 (0x64) for GCN 1.2<br />
     1167<p>Opcode: 85 (0x55) for GCN 1.1; 100 (0x64) for GCN 1.2/1.4<br />
    7451168Syntax: FLAT_ATOMIC_SMIN_X2 VDST(2), VADDR(2), VDATA(2)<br />
    746 Description: Choose smallest signed 64-bit value from VDATA and from VADDR address,
     1169Description: Choose smallest signed 64-bit value from VDATA and from memory address,
    7471170and store result to this address.
    7481171If GLC flag is set then return previous value from this address to VDST, otherwise keep
    7491172VDST value. Operation is atomic.<br />
    7501173Operation:<br />
    751 <code>INT64* VM = (INT64*)VADDR
     1174<code>INT64* VM = (INT64*)(VADDR + INST_OFFSET)
    7521175UINT64 P = *VM; *VM = MIN(*VM, (INT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
    7531176<h4>FLAT_ATOMIC_SUB</h4>
    754 <p>Opcode: 51 (0x33) for GCN 1.1; 67 (0x43) for GCN 1.2<br />
     1177<p>Opcode: 51 (0x33) for GCN 1.1; 67 (0x43) for GCN 1.2/1.4<br />
    7551178Syntax: FLAT_ATOMIC_SUB VDST, VADDR(2), VDATA<br />
    756 Description: Subtract VDATA from value of VADDR address, and store result to this address.
     1179Description: Subtract VDATA from value of memory address, and store result to this address.
    7571180If GLC flag is set then return previous value from this address to VDST,
    7581181otherwise keep VDST value. Operation is atomic.<br />
    7591182Operation:<br />
    760 <code>UINT32* VM = (UINT32*)VADDR
     1183<code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET)
    7611184UINT32 P = *VM; *VM = *VM - VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
    7621185<h4>FLAT_ATOMIC_SUB_X2</h4>
    763 <p>Opcode: 83 (0x53) for GCN 1.1; 99 (0x63) for GCN 1.2<br />
     1186<p>Opcode: 83 (0x53) for GCN 1.1; 99 (0x63) for GCN 1.2/1.4<br />
    7641187Syntax: FLAT_ATOMIC_SUB_X2 VDST(2), VADDR(2), VDATA(2)<br />
    765 Description: Subtract 64-bit VDATA from 64-bit value of VADDR address, and store result
     1188Description: Subtract 64-bit VDATA from 64-bit value of memory address, and store result
    7661189to this address. If GLC flag is set then return previous value from address to VDST,
    7671190otherwise keep VDST value. Operation is atomic.<br />
    7681191Operation:<br />
    769 <code>UINT64* VM = (UINT64*)VADDR
     1192<code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET)
    7701193UINT64 P = *VM; *VM = *VM - VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
    7711194<h4>FLAT_ATOMIC_SWAP</h4>
    772 <p>Opcode: 48 (0x30) for GCN 1.1; 64 (0x40) for GCN 1.2<br />
    773 Syntax: FLAT_ATOMIC_SWAP VDST, VADDR(2), VDATA
    774 Description: Store VDATA dword into VADDR address. If GLC flag is set then
    775 return previous value from VADDR address to VDST, otherwise keep old value from VDST.
     1195<p>Opcode: 48 (0x30) for GCN 1.1; 64 (0x40) for GCN 1.2/1.4<br />
     1196Syntax: FLAT_ATOMIC_SWAP VDST, VADDR(2), VDATA<br />
     1197Description: Store VDATA dword into memory address. If GLC flag is set then
     1198return previous value from memory address to VDST, otherwise keep old value from VDST.
    7761199Operation is atomic.<br />
    7771200Operation:<br />
    778 <code>UINT32* VM = (UINT32*)VADDR
     1201<code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET)
    7791202UINT32 P = *VM; *VM = VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
    7801203<h4>FLAT_ATOMIC_SWAP_X2</h4>
    781 <p>Opcode: 80 (0x50) for GCN 1.1; 96 (0x60) for GCN 1.2<br />
    782 Syntax: FLAT_ATOMIC_SWAP_X2 VDST(2), VADDR(2), VDATA(2)
    783 Description: Store VDATA 64-bit word into VADDR address. If GLC flag is set then
    784 return previous value from VADDR address to VDST, otherwise keep old value from VDST.
     1204<p>Opcode: 80 (0x50) for GCN 1.1; 96 (0x60) for GCN 1.2/1.4<br />
     1205Syntax: FLAT_ATOMIC_SWAP_X2 VDST(2), VADDR(2), VDATA(2)<br />
     1206Description: Store VDATA 64-bit word into memory address. If GLC flag is set then
     1207return previous value from memory address to VDST, otherwise keep old value from VDST.
    7851208Operation is atomic.<br />
    7861209Operation:<br />
    787 <code>UINT64* VM = (UINT64*)VADDR
     1210<code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET)
    7881211UINT64 P = *VM; *VM = VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
    7891212<h4>FLAT_ATOMIC_UMAX</h4>
    790 <p>Opcode: 56 (0x38) for GCN 1.1; 71 (0x47) for GCN 1.2<br />
     1213<p>Opcode: 56 (0x38) for GCN 1.1; 71 (0x47) for GCN 1.2/1.4<br />
    7911214Syntax: FLAT_ATOMIC_UMAX VDST, VADDR(2), VDATA<br />
    792 Description: Choose greatest unsigned 32-bit value from VDATA and from VADDR address,
     1215Description: Choose greatest unsigned 32-bit value from VDATA and from memory address,
    7931216and store result to this address.
    7941217If GLC flag is set then return previous value from this address to VDST, otherwise keep
    7951218VDST value. Operation is atomic.<br />
    7961219Operation:<br />
    797 <code>UINT32* VM = (UINT32*)VADDR
     1220<code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET)
    7981221UINT32 P = *VM; *VM = MAX(*VM, (UINT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
    7991222<h4>FLAT_ATOMIC_UMAX_X2</h4>
    800 <p>Opcode: 88 (0x58) for GCN 1.1; 103 (0x67) for GCN 1.2<br />
     1223<p>Opcode: 88 (0x58) for GCN 1.1; 103 (0x67) for GCN 1.2/1.4<br />
    8011224Syntax: FLAT_ATOMIC_UMAX_X2 VDST(2), VADDR(2), VDATA(2)<br />
    802 Description: Choose greatest unsigned 64-bit value from VDATA and from VADDR address,
     1225Description: Choose greatest unsigned 64-bit value from VDATA and from memory address,
    8031226and store result to this address.
    8041227If GLC flag is set then return previous value from this address to VDST, otherwise keep
    8051228VDST value. Operation is atomic.<br />
    8061229Operation:<br />
    807 <code>UINT64* VM = (UINT64*)VADDR
     1230<code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET)
    8081231UINT64 P = *VM; *VM = MAX(*VM, (UINT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
    8091232<h4>FLAT_ATOMIC_UMIN</h4>
    810 <p>Opcode: 54 (0x36) for GCN 1.1; 69 (0x45) for GCN 1.2<br />
     1233<p>Opcode: 54 (0x36) for GCN 1.1; 69 (0x45) for GCN 1.2/1.4<br />
    8111234Syntax: FLAT_ATOMIC_UMIN VDST, VADDR(2), VDATA<br />
    812 Description: Choose smallest unsigned 32-bit value from VDATA and from VADDR address,
     1235Description: Choose smallest unsigned 32-bit value from VDATA and from memory address,
    8131236and store result to this address.
    8141237If GLC flag is set then return previous value from this address to VDST, otherwise keep
    8151238VDST value. Operation is atomic.<br />
    8161239Operation:<br />
    817 <code>UINT32* VM = (UINT32*)VADDR
     1240<code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET)
    8181241UINT32 P = *VM; *VM = MIN(*VM, (UINT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
    8191242<h4>FLAT_ATOMIC_UMIN_X2</h4>
    820 <p>Opcode: 86 (0x56) for GCN 1.1; 101 (0x65) for GCN 1.2<br />
     1243<p>Opcode: 86 (0x56) for GCN 1.1; 101 (0x65) for GCN 1.2/1.4<br />
    8211244Syntax: FLAT_ATOMIC_UMIN_X2 VDST(2), VADDR(2), VDATA(2)<br />
    822 Description: Choose smallest unsigned 64-bit value from VDATA and from VADDR address,
     1245Description: Choose smallest unsigned 64-bit value from VDATA and from memory address,
    8231246and store result to this address.
    8241247If GLC flag is set then return previous value from this address to VDST, otherwise keep
    8251248VDST value. Operation is atomic.<br />
    8261249Operation:<br />
    827 <code>UINT64* VM = (UINT64*)VADDR
     1250<code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET)
    8281251UINT64 P = *VM; *VM = MIN(*VM, (UINT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
    8291252<h4>FLAT_ATOMIC_XOR</h4>
    830 <p>Opcode: 59 (0x3b) for GCN 1.1; 74 (0x4a) for GCN 1.2<br />
     1253<p>Opcode: 59 (0x3b) for GCN 1.1; 74 (0x4a) for GCN 1.2/1.4<br />
    8311254Syntax: FLAT_ATOMIC_XOR VDST, VADDR(2), VDATA<br />
    832 Description: Do bitwise XOR on VDATA and value of VADDR address,
     1255Description: Do bitwise XOR on VDATA and value of memory address,
    8331256and store result to this address. If GLC flag is set then return previous value
    8341257from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
    8351258Operation:<br />
    836 <code>UINT32* VM = (UINT32*)VADDR
     1259<code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET)
    8371260UINT32 P = *VM; *VM = *VM ^ VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
    8381261<h4>FLAT_ATOMIC_XOR_X2</h4>
    839 <p>Opcode: 91 (0x5b) for GCN 1.1; 106 (0x6a) for GCN 1.2<br />
     1262<p>Opcode: 91 (0x5b) for GCN 1.1; 106 (0x6a) for GCN 1.2/1.4<br />
    8401263Syntax: FLAT_ATOMIC_XOR_X2 VDST(2), VADDR(2), VDATA(2)<br />
    841 Description: Do 64-bit bitwise XOR on VDATA and value of VADDR address,
     1264Description: Do 64-bit bitwise XOR on VDATA and value of memory address,
    8421265and store result to this address. If GLC flag is set then return previous value
    8431266from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
    8441267Operation:<br />
    845 <code>UINT64* VM = (UINT64*)VADDR
     1268<code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET)
    8461269UINT64 P = *VM; *VM = *VM ^ VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
    8471270<h4>FLAT_LOAD_DWORD</h4>
    848 <p>Opcode: 12 (0xc) for GCN 1.1; 20 (0x14) for GCN 1.2<br />
     1271<p>Opcode: 12 (0xc) for GCN 1.1; 20 (0x14) for GCN 1.2/1.4<br />
    8491272Syntax: FLAT_LOAD_DWORD VDST, VADDR(2)<br />
    850 Description Load dword to VDST from VADDR address.<br />
    851 Operation:<br />
    852 <code>VDST = *(UINT32*)VADDR</code></p>
     1273Description Load dword to VDST from memory address.<br />
     1274Operation:<br />
     1275<code>VDST = *(UINT32*)(VADDR + INST_OFFSET)</code></p>
    8531276<h4>FLAT_LOAD_DWORDX2</h4>
    854 <p>Opcode: 13 (0xd) for GCN 1.1; 21 (0x15) for GCN 1.2<br />
     1277<p>Opcode: 13 (0xd) for GCN 1.1; 21 (0x15) for GCN 1.2/1.4<br />
    8551278Syntax: FLAT_LOAD_DWORDX2 VDST(, VADDR(2)<br />
    856 Description Load two dwords to VDST from VADDR address.<br />
    857 Operation:<br />
    858 <code>VDST = *(UINT64*)VADDR</code></p>
     1279Description Load two dwords to VDST from memory address.<br />
     1280Operation:<br />
     1281<code>VDST = *(UINT64*)(VADDR + INST_OFFSET)</code></p>
    8591282<h4>FLAT_LOAD_DWORDX3</h4>
    860 <p>Opcode: 15 (0xf) for GCN 1.1; 22 (0x16) for GCN 1.2<br />
     1283<p>Opcode: 15 (0xf) for GCN 1.1; 22 (0x16) for GCN 1.2/1.4<br />
    8611284Syntax: FLAT_LOAD_DWORDX3 VDST(3), VADDR(2)<br />
    862 Description Load three dwords to VDST from VADDR address.<br />
    863 Operation:<br />
    864 <code>VDST[0] = *(UINT32*)VADDR
    865 VDST[1] = *(UINT32*)(VADDR+4)
    866 VDST[2] = *(UINT32*)(VADDR+8)</code></p>
     1285Description Load three dwords to VDST from memory address.<br />
     1286Operation:<br />
     1287<code>BYTE* VM = (VADDR + INST_OFFSET)
     1288VDST[0] = *(UINT32*)VM
     1289VDST[1] = *(UINT32*)(VM+4)
     1290VDST[2] = *(UINT32*)(VM+8)</code></p>
    8671291<h4>FLAT_LOAD_DWORDX4</h4>
    868 <p>Opcode: 13 (0xe) for GCN 1.1; 23 (0x17) for GCN 1.2<br />
     1292<p>Opcode: 13 (0xe) for GCN 1.1; 23 (0x17) for GCN 1.2/1.4<br />
    8691293Syntax: FLAT_LOAD_DWORDX4 VDST(4), VADDR(2)<br />
    870 Description Load four dwords to VDST from VADDR address.<br />
    871 Operation:<br />
    872 <code>VDST[0] = *(UINT32*)VADDR
    873 VDST[1] = *(UINT32*)(VADDR+4)
    874 VDST[2] = *(UINT32*)(VADDR+8)
    875 VDST[3] = *(UINT32*)(VADDR+12)</code></p>
     1294Description Load four dwords to VDST from memory address.<br />
     1295Operation:<br />
     1296<code>BYTE* VM = (VADDR + INST_OFFSET)
     1297VDST[0] = *(UINT32*)VM
     1298VDST[1] = *(UINT32*)(VM+4)
     1299VDST[2] = *(UINT32*)(VM+8)
     1300VDST[3] = *(UINT32*)(VM+12)</code></p>
    8761301<h4>FLAT_LOAD_SBYTE</h4>
    877 <p>Opcode: 9 (0x9) for GCN 1.1; 17 (0x11) for GCN 1.2<br />
     1302<p>Opcode: 9 (0x9) for GCN 1.1; 17 (0x11) for GCN 1.2/1.4<br />
    8781303Syntax: FLAT_LOAD_SBYTE VDST, VADDR(2)<br />
    879 Description: Load byte to VDST from VADDR address with sign extending.<br />
    880 Operation:<br />
    881 <code>VDST = *(INT8*)VADDR</code></p>
     1304Description: Load byte to VDST from memory address with sign extending.<br />
     1305Operation:<br />
     1306<code>VDST = *(INT8*)(VADDR + INST_OFFSET)</code></p>
     1307<h4>FLAT_LOAD_SBYTE_D16</h4>
     1308<p>Opcode: 34 (0x22) for GCN 1.4<br />
     1309Syntax: FLAT_LOAD_SBYTE_D16 VDST, VADDR(2)<br />
     1310Description: Load byte to lower 16-bit part of VDST from
     1311memory address with sign extending.<br />
     1312Operation:<br />
     1313<code>BYTE* VM = (VADDR + INST_OFFSET)
     1314VDST = ((UINT16)*(INT8*)VM) | (VDST&amp;0xffff0000)</code></p>
     1315<h4>FLAT_LOAD_SBYTE_D16_HI</h4>
     1316<p>Opcode: 35 (0x23) for GCN 1.4<br />
     1317Syntax: FLAT_LOAD_SBYTE_D16_HI VDST, VADDR(2)<br />
     1318Description: Load byte to higher 16-bit part of VDST from
     1319memory address with sign extending.<br />
     1320Operation:<br />
     1321<code>BYTE* VM = (VADDR + INST_OFFSET)
     1322VDST = (((UINT32)*(INT8*)VM)&lt;&lt;16) | (VDST&amp;0xffff)</code></p>
     1323<h4>FLAT_LOAD_SHORT_D16</h4>
     1324<p>Opcode: 36 (0x24) for GCN 1.4<br />
     1325Syntax: FLAT_LOAD_SHORT_D16 VDST, VADDR(2)<br />
     1326Description: Load 16-bit word to lower 16-bit part of VDST from memory address.<br />
     1327Operation:<br />
     1328<code>BYTE* VM = (VADDR + INST_OFFSET)
     1329VDST = *(UINT16*)VM | (VDST &amp; 0xffff0000)</code></p>
     1330<h4>FLAT_LOAD_SHORT_D16_HI</h4>
     1331<p>Opcode: 36 (0x24) for GCN 1.4<br />
     1332Syntax: FLAT_LOAD_SHORT_D16_HI VDST, VADDR(2)<br />
     1333Description: Load 16-bit word to lower 16-bit part of VDST from memory address.<br />
     1334Operation:<br />
     1335<code>BYTE* VM = (VADDR + INST_OFFSET)
     1336VDST = (((UINT32)*(UINT16*)VM)&lt;&lt;16) | (VDST &amp; 0xffff)</code></p>
    8821337<h4>FLAT_LOAD_SSHORT</h4>
    883 <p>Opcode: 11 (0xb) for GCN 1.1; 19 (0x13) for GCN 1.2<br />
     1338<p>Opcode: 11 (0xb) for GCN 1.1; 19 (0x13) for GCN 1.2/1.4<br />
    8841339Syntax: FLAT_LOAD_SSHORT VDST, VADDR(2)<br />
    885 Description: Load 16-bit word to VDST from VADDR address with sign extending.<br />
    886 Operation:<br />
    887 <code>VDST = *(INT16*)VADDR</code></p>
     1340Description: Load 16-bit word to VDST from memory address with sign extending.<br />
     1341Operation:<br />
     1342<code>VDST = *(INT16*)(VADDR + INST_OFFSET)</code></p>
    8881343<h4>FLAT_LOAD_UBYTE</h4>
    889 <p>Opcode: 8 (0x8) for GCN 1.1; 16 (0x10) for GCN 1.2<br />
     1344<p>Opcode: 8 (0x8) for GCN 1.1; 16 (0x10) for GCN 1.2/1.4<br />
    8901345Syntax: FLAT_LOAD_UBYTE VDST, VADDR(2)<br />
    891 Description: Load byte to VDST from VADDR address with zero extending.<br />
    892 Operation:<br />
    893 <code>VDST = *(UINT8*)VADDR</code></p>
     1346Description: Load byte to VDST from memory address with zero extending.<br />
     1347Operation:<br />
     1348<code>VDST = *(UINT8*)(VADDR + INST_OFFSET)</code></p>
     1349<h4>FLAT_LOAD_UBYTE_D16</h4>
     1350<p>Opcode: 32 (0x20) for GCN 1.4<br />
     1351Syntax: FLAT_LOAD_UBYTE_D16 VDST, VADDR(2)<br />
     1352Description: Load byte to lower 16-bit part of VDST from
     1353memory address with zero extending.<br />
     1354Operation:<br />
     1355<code>BYTE* VM = (VADDR + INST_OFFSET)
     1356VDST = ((UINT16)*(UINT8*)VM) | (VDST&amp;0xffff0000)</code></p>
     1357<h4>FLAT_LOAD_UBYTE_D16_HI</h4>
     1358<p>Opcode: 33 (0x21) for GCN 1.4<br />
     1359Syntax: FLAT_LOAD_UBYTE_D16_HI VDST, VADDR(2)<br />
     1360Description: Load byte to higher 16-bit part of VDST from
     1361memory address with zero extending.<br />
     1362Operation:<br />
     1363<code>BYTE* VM = (VADDR + INST_OFFSET)
     1364VDST = (((UINT32)*(UINT8*)VM)&lt;&lt;16) | (VDST&amp;0xffff)</code></p>
    8941365<h4>FLAT_LOAD_USHORT</h4>
    895 <p>Opcode: 10 (0xa) for GCN 1.1; 18 (0x12) for GCN 1.2<br />
     1366<p>Opcode: 10 (0xa) for GCN 1.1; 18 (0x12) for GCN 1.2/1.4<br />
    8961367Syntax: FLAT_LOAD_USHORT VDST, VADDR(1:2)<br />
    897 Description: Load 16-bit word to VDST from VADDR address with zero extending.<br />
    898 Operation:<br />
    899 <code>VDST = *(UINT16*)VADDR</code></p>
     1368Description: Load 16-bit word to VDST from memory address with zero extending.<br />
     1369Operation:<br />
     1370<code>VDST = *(UINT16*)(VADDR + INST_OFFSET)</code></p>
    9001371<h4>FLAT_STORE_BYTE</h4>
    9011372<p>Opcode: 24 (0x18)<br />
    9021373Syntax: FLAT_STORE_BYTE VADDR(2), VDATA<br />
    903 Description: Store byte from VDATA to VADDR address.<br />
    904 Operation:<br />
    905 <code>*(UINT8*)VADDR = VDATA&amp;0xff</code></p>
     1374Description: Store byte from VDATA to memory address.<br />
     1375Operation:<br />
     1376<code>*(UINT8*)(VADDR + INST_OFFSET) = VDATA&amp;0xff</code></p>
     1377<h4>FLAT_STORE_BYTE_D16_HI</h4>
     1378<p>Opcode: 25 (0x19) for GCN 1.4<br />
     1379Syntax: FLAT_STORE_BYTE_D16_HI VADDR(2), VDATA<br />
     1380Description: Store byte from 16-23 bits of VDATA to memory address.<br />
     1381Operation:<br />
     1382<code>*(UINT8*)(VADDR + INST_OFFSET) = (VDATA&gt;&gt;16)&amp;0xff</code></p>
    9061383<h4>FLAT_STORE_DWORD</h4>
    9071384<p>Opcode: 28 (0x1c)<br />
    9081385Syntax: FLAT_STORE_DWORD VADDR(2), VDATA<br />
    909 Description: Store dword from VDATA to VADDR address.<br />
    910 Operation:<br />
    911 <code>*(UINT32*)VADDR = VDATA</code></p>
     1386Description: Store dword from VDATA to memory address.<br />
     1387Operation:<br />
     1388<code>*(UINT32*)(VADDR + INST_OFFSET) = VDATA</code></p>
    9121389<h4>FLAT_STORE_DWORDX2</h4>
    9131390<p>Opcode: 29 (0x1d)<br />
    9141391Syntax: FLAT_STORE_DWORDX2 VADDR(2), VDATA(2)<br />
    915 Description: Store two dwords from VDATA to VADDR address.<br />
    916 Operation:<br />
    917 <code>*(UINT64*)VADDR = VDATA</code></p>
     1392Description: Store two dwords from VDATA to memory address.<br />
     1393Operation:<br />
     1394<code>*(UINT64*)(VADDR + INST_OFFSET) = VDATA</code></p>
    9181395<h4>FLAT_STORE_DWORDX3</h4>
    919 <p>Opcode: 31 (0x1f) for GCN 1.1; 30 (0x1e) for GCN 1.2<br />
     1396<p>Opcode: 31 (0x1f) for GCN 1.1; 30 (0x1e) for GCN 1.2/1.4<br />
    9201397Syntax: FLAT_STORE_DWORDX3 VADDR(2), VDATA(3)<br />
    921 Description: Store three dwords from VDATA to VADDR address.<br />
    922 Operation:<br />
    923 <code>*(UINT32*)(VADDR) = VDATA[0]
    924 *(UINT32*)(VADDR+4) = VDATA[1]
    925 *(UINT32*)(VADDR+8) = VDATA[2]</code></p>
     1398Description: Store three dwords from VDATA to memory address.<br />
     1399Operation:<br />
     1400<code>BYTE* VM = (VADDR + INST_OFFSET)
     1401*(UINT32*)(VM) = VDATA[0]
     1402*(UINT32*)(VM+4) = VDATA[1]
     1403*(UINT32*)(VM+8) = VDATA[2]</code></p>
    9261404<h4>FLAT_STORE_DWORDX4</h4>
    927 <p>Opcode: 30 (0x1e) for GCN 1.1; 31 (0x1d) for GCN 1.2<br />
     1405<p>Opcode: 30 (0x1e) for GCN 1.1; 31 (0x1d) for GCN 1.2/1.4<br />
    9281406Syntax: FLAT_STORE_DWORDX4 VADDR(2), VDATA(4)<br />
    929 Description: Store four dwords from VDATA to VADDR address.<br />
    930 Operation:<br />
    931 <code>*(UINT32*)(VADDR) = VDATA[0]
    932 *(UINT32*)(VADDR+4) = VDATA[1]
    933 *(UINT32*)(VADDR+8) = VDATA[2]
    934 *(UINT32*)(VADDR+12) = VDATA[3]</code></p>
     1407Description: Store four dwords from VDATA to memory address.<br />
     1408Operation:<br />
     1409<code>*(UINT32*)(VM) = VDATA[0]
     1410*(UINT32*)(VM+4) = VDATA[1]
     1411*(UINT32*)(VM+8) = VDATA[2]
     1412*(UINT32*)(VM+12) = VDATA[3]</code></p>
    9351413<h4>FLAT_STORE_SHORT</h4>
    9361414<p>Opcode: 26 (0x1a)<br />
    9371415Syntax: FLAT_STORE_SHORT VADDR(2), VDATA<br />
    938 Description: Store 16-bit word from VDATA to VADDR address.<br />
    939 Operation:<br />
    940 <code>*(UINT16*)VADDR = VDATA&amp;0xffff</code></p>
     1416Description: Store 16-bit word from VDATA to memory address.<br />
     1417Operation:<br />
     1418<code>*(UINT16*)(VADDR + INST_OFFSET) = VDATA&amp;0xffff</code></p>
     1419<h4>FLAT_STORE_SHORT_D16_HI</h4>
     1420<p>Opcode: 27 (0x1b) for GCN 1.4<br />
     1421Syntax: FLAT_STORE_SHORT_D16_HI VADDR(2), VDATA<br />
     1422Description: Store 16-bit word from higher 16-bit part of VDATA to memory address.<br />
     1423Operation:<br />
     1424<code>*(UINT16*)(VADDR + INST_OFFSET) = VDATA&gt;&gt;16</code></p>
     1425<h4>GLOBAL_ATOMIC_ADD</h4>
     1426<p>Opcode: 66 (0x42) for GCN 1.4<br />
     1427Syntax: GLOBAL_ATOMIC_ADD VDST, VADDR(2), VDATA, SADDR(2)|OFF<br />
     1428Description: Add VDATA to value of global address, and store result to this address.
     1429If GLC flag is set then return previous value from this address to VDST,
     1430otherwise keep VDST value. Operation is atomic.<br />
     1431Operation:<br />
     1432<code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET)
     1433UINT32 P = *VM; *VM = *VM + VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
     1434<h4>GLOBAL_ATOMIC_ADD_X2</h4>
     1435<p>Opcode: 98 (0x62) for GCN 1.4<br />
     1436Syntax: GLOBAL_ATOMIC_ADD_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br />
     1437Description: Add 64-bit VDATA to 64-bit value of global address, and store result
     1438to this address. If GLC flag is set then return previous value from address to VDST,
     1439otherwise keep VDST value. Operation is atomic.<br />
     1440Operation:<br />
     1441<code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET)
     1442UINT64 P = *VM; *VM = *VM + VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
     1443<h4>GLOBAL_ATOMIC_AND</h4>
     1444<p>Opcode: 72 (0x48) for GCN 1.4<br />
     1445Syntax: GLOBAL_ATOMIC_AND VDST, VADDR(2), VDATA, SADDR(2)|OFF<br />
     1446Description: Do bitwise AND on VDATA and value of global address,
     1447and store result to this address. If GLC flag is set then return previous value
     1448from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
     1449Operation:<br />
     1450<code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET)
     1451UINT32 P = *VM; *VM = *VM &amp; VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
     1452<h4>GLOBAL_ATOMIC_AND_X2</h4>
     1453<p>Opcode: 104 (0x68) for GCN 1.4<br />
     1454Syntax: GLOBAL_ATOMIC_AND_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br />
     1455Description: Do 64-bit bitwise AND on VDATA and value of global address,
     1456and store result to this address. If GLC flag is set then return previous value
     1457from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
     1458Operation:<br />
     1459<code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET)
     1460UINT64 P = *VM; *VM = *VM &amp; VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
     1461<h4>GLOBAL_ATOMIC_CMPSWAP</h4>
     1462<p>Opcode: 65 (0x41) for GCN 1.4<br />
     1463Syntax: GLOBAL_ATOMIC_CMPSWAP VDST, VADDR(2), VDATA(2), SADDR(2)|OFF<br />
     1464Description: Store lower VDATA dword into global address  if previous value
     1465from that address is equal VDATA&gt;&gt;32, otherwise keep old value from address.
     1466If GLC flag is set then return previous value from address to VDST,
     1467otherwise keep VDST value. Operation is atomic.<br />
     1468Operation:<br />
     1469<code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET)
     1470UINT32 P = *VM; *VM = *VM==(VDATA&gt;&gt;32) ? VDATA&amp;0xffffffff : *VM // part of atomic
     1471VDST = (GLC) ? P : VDST // last part of atomic</code></p>
     1472<h4>GLOBAL_ATOMIC_CMPSWAP_X2</h4>
     1473<p>Opcode: 97 (0x61) for GCN 1.4<br />
     1474Syntax: GLOBAL_ATOMIC_CMPSWAP_X2 VDST(2), VADDR(2), VDATA(4), SADDR(2)|OFF<br />
     1475Description: Store lower VDATA 64-bit word into global address if previous value
     1476from address is equal VDATA&gt;&gt;64, otherwise keep old value from VADDR.
     1477If GLC flag is set then return previous value from VADDR to VDST,
     1478otherwise keep VDST value. Operation is atomic.<br />
     1479Operation:<br />
     1480<code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET)
     1481UINT64 P = *VM; *VM = *VM==(VDATA[2:3]) ? VDATA[0:1] : *VM // part of atomic
     1482VDST = (GLC) ? P : VDST // last part of atomic</code></p>
     1483<h4>GLOBAL_ATOMIC_DEC</h4>
     1484<p>Opcode: 76 (0x4c) for GCN 1.4<br />
     1485Syntax: GLOBAL_ATOMIC_DEC VDST, VADDR(2), VDATA, SADDR(2)|OFF<br />
     1486Description: Compare value from global address and if less or equal than VDATA
     1487and this value is not zero, then decrement value from global address,
     1488otherwise store VDATA to this address. If GLC flag is set then return previous value
     1489from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
     1490Operation:<br />
     1491<code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET)
     1492UINT32 P = *VM; *VM = (*VM &lt;= VDATA &amp;&amp; *VM!=0) ? *VM-1 : VDATA // atomic
     1493VDST = (GLC) ? P : VDST // atomic</code></p>
     1494<h4>GLOBAL_ATOMIC_DEC_X2</h4>
     1495<p>Opcode: 108 (0x6c) for GCN 1.4<br />
     1496Syntax: GLOBAL_ATOMIC_DEC_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br />
     1497Description: Compare 64-bit value from global address and if less or equal than VDATA
     1498and this value is not zero, then decrement value from global address,
     1499otherwise store VDATA to this address. If GLC flag is set then return previous value
     1500from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
     1501Operation:<br />
     1502<code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET)
     1503UINT64 P = *VM; *VM = (*VM &lt;= VDATA &amp;&amp; *VM!=0) ? *VM-1 : VDATA // atomic
     1504VDST = (GLC) ? P : VDST // atomic</code></p>
     1505<h4>GLOBAL_ATOMIC_INC</h4>
     1506<p>Opcode: 75 (0x4b) for GCN 1.4<br />
     1507Syntax: FLT_ATOMIC_INC VDST, VADDR(2), VDATA, SADDR(2)|OFF<br />
     1508Description: Compare value from global address and if less than VDATA,
     1509then increment value from address, otherwise store zero to address.
     1510If GLC flag is set then return previous value from this address to VDST,
     1511otherwise keep VDST value. Operation is atomic.<br />
     1512Operation:<br />
     1513<code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET)
     1514UINT32 P = *VM; *VM = (*VM &lt; VDATA) ? *VM+1 : 0; VDST = (GLC) ? P : VDST // atomic</code></p>
     1515<h4>GLOBAL_ATOMIC_INC_X2</h4>
     1516<p>Opcode: 107 (0x9b) for GCN 1.4<br />
     1517Syntax: GLOBAL_ATOMIC_INC_X2 VDST(2), VADDR(2), VADDR(2), SADDR(2)|OFF<br />
     1518Description: Compare 64-bit value from global address and if less than VDATA,
     1519then increment value from address, otherwise store zero to address.
     1520If GLC flag is set then return previous value from this address to VDST,
     1521otherwise keep VDST value. Operation is atomic.<br />
     1522Operation:<br />
     1523<code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET)
     1524UINT64 P = *VM; *VM = (*VM &lt; VDATA) ? *VM+1 : 0; VDST = (GLC) ? P : VDST // atomic</code></p>
     1525<h4>GLOBAL_ATOMIC_OR</h4>
     1526<p>Opcode: 73 (0x49) for GCN 1.4<br />
     1527Syntax: GLOBAL_ATOMIC_OR VDST, VADDR(2), VDATA, SADDR(2)|OFF<br />
     1528Description: Do bitwise OR on VDATA and value of global address,
     1529and store result to this address. If GLC flag is set then return previous value
     1530from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
     1531Operation:<br />
     1532<code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET)
     1533UINT32 P = *VM; *VM = *VM | VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
     1534<h4>GLOBAL_ATOMIC_OR_X2</h4>
     1535<p>Opcode: 105 (0x69) for GCN 1.4<br />
     1536Syntax: GLOBAL_ATOMIC_OR_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br />
     1537Description: Do 64-bit bitwise OR on VDATA and value of global address,
     1538and store result to this address. If GLC flag is set then return previous value
     1539from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
     1540Operation:<br />
     1541<code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET)
     1542UINT64 P = *VM; *VM = *VM | VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
     1543<h4>GLOBAL_ATOMIC_SMAX</h4>
     1544<p>Opcode: 70 (0x46) for GCN 1.4<br />
     1545Syntax: GLOBAL_ATOMIC_SMAX VDST, VADDR(2), VDATA, SADDR(2)|OFF<br />
     1546Description: Choose greatest signed 32-bit value from VDATA and from global address,
     1547and store result to this address.
     1548If GLC flag is set then return previous value from this address to VDST, otherwise keep
     1549VDST value. Operation is atomic.<br />
     1550Operation:<br />
     1551<code>INT32* VM = (INT32*)(VADDR + SADDR + INST_OFFSET)
     1552UINT32 P = *VM; *VM = MAX(*VM, (INT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
     1553<h4>GLOBAL_ATOMIC_SMAX_X2</h4>
     1554<p>Opcode: 102 (0x66) for GCN 1.4<br />
     1555Syntax: GLOBAL_ATOMIC_SMAX_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br />
     1556Description: Choose greatest signed 64-bit value from VDATA and from global address,
     1557and store result to this address.
     1558If GLC flag is set then return previous value from this address to VDST, otherwise keep
     1559VDST value. Operation is atomic.<br />
     1560Operation:<br />
     1561<code>INT64* VM = (INT64*)(VADDR + SADDR + INST_OFFSET)
     1562UINT64 P = *VM; *VM = MAX(*VM, (INT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
     1563<h4>GLOBAL_ATOMIC_SMIN</h4>
     1564<p>Opcode: 68 (0x44) for GCN 1.4<br />
     1565Syntax: GLOBAL_ATOMIC_SMIN VDST, VADDR(2), VDATA, SADDR(2)|OFF<br />
     1566Description: Choose smallest signed 32-bit value from VDATA and from global address,
     1567and store result to this address.
     1568If GLC flag is set then return previous value from this address to VDST, otherwise keep
     1569VDST value. Operation is atomic.<br />
     1570Operation:<br />
     1571<code>INT32* VM = (INT32*)(VADDR + SADDR + INST_OFFSET)
     1572UINT32 P = *VM; *VM = MIN(*VM, (INT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
     1573<h4>GLOBAL_ATOMIC_SMIN_X2</h4>
     1574<p>Opcode: 100 (0x64) for GCN 1.4<br />
     1575Syntax: GLOBAL_ATOMIC_SMIN_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br />
     1576Description: Choose smallest signed 64-bit value from VDATA and from global address,
     1577and store result to this address.
     1578If GLC flag is set then return previous value from this address to VDST, otherwise keep
     1579VDST value. Operation is atomic.<br />
     1580Operation:<br />
     1581<code>INT64* VM = (INT64*)(VADDR + SADDR + INST_OFFSET)
     1582UINT64 P = *VM; *VM = MIN(*VM, (INT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
     1583<h4>GLOBAL_ATOMIC_SUB</h4>
     1584<p>Opcode: 67 (0x43) for GCN 1.4<br />
     1585Syntax: GLOBAL_ATOMIC_SUB VDST, VADDR(2), VDATA, SADDR(2)|OFF<br />
     1586Description: Subtract VDATA from value of global address, and store result to this address.
     1587If GLC flag is set then return previous value from this address to VDST,
     1588otherwise keep VDST value. Operation is atomic.<br />
     1589Operation:<br />
     1590<code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET)
     1591UINT32 P = *VM; *VM = *VM - VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
     1592<h4>GLOBAL_ATOMIC_SUB_X2</h4>
     1593<p>Opcode: 99 (0x63) for GCN 1.4<br />
     1594Syntax: GLOBAL_ATOMIC_SUB_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br />
     1595Description: Subtract 64-bit VDATA from 64-bit value of global address, and store result
     1596to this address. If GLC flag is set then return previous value from address to VDST,
     1597otherwise keep VDST value. Operation is atomic.<br />
     1598Operation:<br />
     1599<code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET)
     1600UINT64 P = *VM; *VM = *VM - VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
     1601<h4>GLOBAL_ATOMIC_SWAP</h4>
     1602<p>Opcode: 64 (0x40) for GCN 1.4<br />
     1603Syntax: GLOBAL_ATOMIC_SWAP VDST, VADDR(2), VDATA, SADDR(2)|OFF<br />
     1604Description: Store VDATA dword into global address. If GLC flag is set then
     1605return previous value from global address to VDST, otherwise keep old value from VDST.
     1606Operation is atomic.<br />
     1607Operation:<br />
     1608<code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET)
     1609UINT32 P = *VM; *VM = VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
     1610<h4>GLOBAL_ATOMIC_SWAP_X2</h4>
     1611<p>Opcode: 96 (0x60) for GCN 1.4<br />
     1612Syntax: GLOBAL_ATOMIC_SWAP_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br />
     1613Description: Store VDATA 64-bit word into global address. If GLC flag is set then
     1614return previous value from global address to VDST, otherwise keep old value from VDST.
     1615Operation is atomic.<br />
     1616Operation:<br />
     1617<code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET)
     1618UINT64 P = *VM; *VM = VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
     1619<h4>GLOBAL_ATOMIC_UMAX</h4>
     1620<p>Opcode: 71 (0x47) for GCN 1.4<br />
     1621Syntax: GLOBAL_ATOMIC_UMAX VDST, VADDR(2), VDATA, SADDR(2)|OFF<br />
     1622Description: Choose greatest unsigned 32-bit value from VDATA and from global address,
     1623and store result to this address.
     1624If GLC flag is set then return previous value from this address to VDST, otherwise keep
     1625VDST value. Operation is atomic.<br />
     1626Operation:<br />
     1627<code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET)
     1628UINT32 P = *VM; *VM = MAX(*VM, (UINT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
     1629<h4>GLOBAL_ATOMIC_UMAX_X2</h4>
     1630<p>Opcode: 103 (0x67) for GCN 1.4<br />
     1631Syntax: GLOBAL_ATOMIC_UMAX_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br />
     1632Description: Choose greatest unsigned 64-bit value from VDATA and from global address,
     1633and store result to this address.
     1634If GLC flag is set then return previous value from this address to VDST, otherwise keep
     1635VDST value. Operation is atomic.<br />
     1636Operation:<br />
     1637<code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET)
     1638UINT64 P = *VM; *VM = MAX(*VM, (UINT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
     1639<h4>GLOBAL_ATOMIC_UMIN</h4>
     1640<p>Opcode: 69 (0x45) for GCN 1.4<br />
     1641Syntax: GLOBAL_ATOMIC_UMIN VDST, VADDR(2), VDATA, SADDR(2)|OFF<br />
     1642Description: Choose smallest unsigned 32-bit value from VDATA and from global address,
     1643and store result to this address.
     1644If GLC flag is set then return previous value from this address to VDST, otherwise keep
     1645VDST value. Operation is atomic.<br />
     1646Operation:<br />
     1647<code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET)
     1648UINT32 P = *VM; *VM = MIN(*VM, (UINT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
     1649<h4>GLOBAL_ATOMIC_UMIN_X2</h4>
     1650<p>Opcode: 101 (0x65) for GCN 1.4<br />
     1651Syntax: GLOBAL_ATOMIC_UMIN_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br />
     1652Description: Choose smallest unsigned 64-bit value from VDATA and from global address,
     1653and store result to this address.
     1654If GLC flag is set then return previous value from this address to VDST, otherwise keep
     1655VDST value. Operation is atomic.<br />
     1656Operation:<br />
     1657<code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET)
     1658UINT64 P = *VM; *VM = MIN(*VM, (UINT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p>
     1659<h4>GLOBAL_ATOMIC_XOR</h4>
     1660<p>Opcode: 74 (0x4a) for GCN 1.4<br />
     1661Syntax: GLOBAL_ATOMIC_XOR VDST, VADDR(2), VDATA, SADDR(2)|OFF<br />
     1662Description: Do bitwise XOR on VDATA and value of global address,
     1663and store result to this address. If GLC flag is set then return previous value
     1664from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
     1665Operation:<br />
     1666<code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET)
     1667UINT32 P = *VM; *VM = *VM ^ VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
     1668<h4>GLOBAL_ATOMIC_XOR_X2</h4>
     1669<p>Opcode: 106 (0x6a) for GCN 1.4<br />
     1670Syntax: GLOBAL_ATOMIC_XOR_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br />
     1671Description: Do 64-bit bitwise XOR on VDATA and value of global address,
     1672and store result to this address. If GLC flag is set then return previous value
     1673from this address to VDST, otherwise keep VDST value. Operation is atomic.<br />
     1674Operation:<br />
     1675<code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET)
     1676UINT64 P = *VM; *VM = *VM ^ VDATA; VDST = (GLC) ? P : VDST // atomic</code></p>
     1677<h4>GLOBAL_LOAD_DWORD</h4>
     1678<p>Opcode: 20 (0x14) for GCN 1.4<br />
     1679Syntax: GLOBAL_LOAD_DWORD VDST, VADDR(2), SADDR(2)|OFF<br />
     1680Description Load dword to VDST from global address.<br />
     1681Operation:<br />
     1682<code>VDST = *(UINT32*)(VADDR + SADDR + INST_OFFSET)</code></p>
     1683<h4>GLOBAL_LOAD_DWORDX2</h4>
     1684<p>Opcode: 21 (0x15) for GCN 1.4<br />
     1685Syntax: GLOBAL_LOAD_DWORDX2 VDST(, VADDR(2), SADDR(2)|OFF<br />
     1686Description Load two dwords to VDST from global address.<br />
     1687Operation:<br />
     1688<code>VDST = *(UINT64*)(VADDR + SADDR + INST_OFFSET)</code></p>
     1689<h4>GLOBAL_LOAD_DWORDX3</h4>
     1690<p>Opcode: 22 (0x16) for GCN 1.4<br />
     1691Syntax: GLOBAL_LOAD_DWORDX3 VDST(3), VADDR(2), SADDR(2)|OFF<br />
     1692Description Load three dwords to VDST from global address.<br />
     1693Operation:<br />
     1694<code>BYTE* VM = (VADDR + SADDR + INST_OFFSET)
     1695VDST[0] = *(UINT32*)VM
     1696VDST[1] = *(UINT32*)(VM+4)
     1697VDST[2] = *(UINT32*)(VM+8)</code></p>
     1698<h4>GLOBAL_LOAD_DWORDX4</h4>
     1699<p>Opcode: 23 (0x17) for GCN 1.4<br />
     1700Syntax: GLOBAL_LOAD_DWORDX4 VDST(4), VADDR(2), SADDR(2)|OFF<br />
     1701Description Load four dwords to VDST from global address.<br />
     1702Operation:<br />
     1703<code>BYTE* VM = (VADDR + SADDR + INST_OFFSET)
     1704VDST[0] = *(UINT32*)VM
     1705VDST[1] = *(UINT32*)(VM+4)
     1706VDST[2] = *(UINT32*)(VM+8)
     1707VDST[3] = *(UINT32*)(VM+12)</code></p>
     1708<h4>GLOBAL_LOAD_SBYTE</h4>
     1709<p>Opcode: 17 (0x11) for GCN 1.4<br />
     1710Syntax: GLOBAL_LOAD_SBYTE VDST, VADDR(2), SADDR(2)|OFF<br />
     1711Description: Load byte to VDST from global address with sign extending.<br />
     1712Operation:<br />
     1713<code>VDST = *(INT8*)(VADDR + SADDR + INST_OFFSET)</code></p>
     1714<h4>GLOBAL_LOAD_SBYTE_D16</h4>
     1715<p>Opcode: 34 (0x22) for GCN 1.4<br />
     1716Syntax: GLOBAL_LOAD_SBYTE_D16 VDST, VADDR(2), SADDR(2)|OFF<br />
     1717Description: Load byte to lower 16-bit part of VDST from
     1718global address with sign extending.<br />
     1719Operation:<br />
     1720<code>BYTE* VM = (VADDR + SADDR + INST_OFFSET)
     1721VDST = ((UINT16)*(INT8*)VM) | (VDST&amp;0xffff0000)</code></p>
     1722<h4>GLOBAL_LOAD_SBYTE_D16_HI</h4>
     1723<p>Opcode: 35 (0x23) for GCN 1.4<br />
     1724Syntax: GLOBAL_LOAD_SBYTE_D16_HI VDST, VADDR(2), SADDR(2)|OFF<br />
     1725Description: Load byte to higher 16-bit part of VDST from
     1726global address with sign extending.<br />
     1727Operation:<br />
     1728<code>BYTE* VM = (VADDR + SADDR + INST_OFFSET)
     1729VDST = (((UINT32)*(INT8*)VM)&lt;&lt;16) | (VDST&amp;0xffff)</code></p>
     1730<h4>GLOBAL_LOAD_SHORT_D16</h4>
     1731<p>Opcode: 36 (0x24) for GCN 1.4<br />
     1732Syntax: GLOBAL_LOAD_SHORT_D16 VDST, VADDR(2), SADDR(2)|OFF<br />
     1733Description: Load 16-bit word to lower 16-bit part of VDST from global address.<br />
     1734Operation:<br />
     1735<code>BYTE* VM = (VADDR + SADDR + INST_OFFSET)
     1736VDST = *(UINT16*)VM | (VDST &amp; 0xffff0000)</code></p>
     1737<h4>GLOBAL_LOAD_SHORT_D16_HI</h4>
     1738<p>Opcode: 36 (0x24) for GCN 1.4<br />
     1739Syntax: GLOBAL_LOAD_SHORT_D16_HI VDST, VADDR(2), SADDR(2)|OFF<br />
     1740Description: Load 16-bit word to lower 16-bit part of VDST from global address.<br />
     1741Operation:<br />
     1742<code>BYTE* VM = (VADDR + SADDR + INST_OFFSET)
     1743VDST = (((UINT32)*(UINT16*)VM)&lt;&lt;16) | (VDST &amp; 0xffff)</code></p>
     1744<h4>GLOBAL_LOAD_SSHORT</h4>
     1745<p>Opcode: 19 (0x13) for GCN 1.4<br />
     1746Syntax: GLOBAL_LOAD_SSHORT VDST, VADDR(2), SADDR(2)|OFF<br />
     1747Description: Load 16-bit word to VDST from global address with sign extending.<br />
     1748Operation:<br />
     1749<code>VDST = *(INT16*)(VADDR + SADDR + INST_OFFSET)</code></p>
     1750<h4>GLOBAL_LOAD_UBYTE</h4>
     1751<p>Opcode: 16 (0x10) for GCN 1.4<br />
     1752Syntax: GLOBAL_LOAD_UBYTE VDST, VADDR(2), SADDR(2)|OFF<br />
     1753Description: Load byte to VDST from global address with zero extending.<br />
     1754Operation:<br />
     1755<code>VDST = *(UINT8*)(VADDR + SADDR + INST_OFFSET)</code></p>
     1756<h4>GLOBAL_LOAD_UBYTE_D16</h4>
     1757<p>Opcode: 32 (0x20) for GCN 1.4<br />
     1758Syntax: GLOBAL_LOAD_UBYTE_D16 VDST, VADDR(2), SADDR(2)|OFF<br />
     1759Description: Load byte to lower 16-bit part of VDST from
     1760global address with zero extending.<br />
     1761Operation:<br />
     1762<code>BYTE* VM = (VADDR + SADDR + INST_OFFSET)
     1763VDST = ((UINT16)*(UINT8*)VM) | (VDST&amp;0xffff0000)</code></p>
     1764<h4>GLOBAL_LOAD_UBYTE_D16_HI</h4>
     1765<p>Opcode: 33 (0x21) for GCN 1.4<br />
     1766Syntax: GLOBAL_LOAD_UBYTE_D16_HI VDST, VADDR(2), SADDR(2)|OFF<br />
     1767Description: Load byte to higher 16-bit part of VDST from
     1768global address with zero extending.<br />
     1769Operation:<br />
     1770<code>BYTE* VM = (VADDR + SADDR + INST_OFFSET)
     1771VDST = (((UINT32)*(UINT8*)VM)&lt;&lt;16) | (VDST&amp;0xffff)</code></p>
     1772<h4>GLOBAL_LOAD_USHORT</h4>
     1773<p>Opcode: 18 (0x12) for GCN 1.4<br />
     1774Syntax: GLOBAL_LOAD_USHORT VDST, VADDR(1:2), SADDR(2)|OFF<br />
     1775Description: Load 16-bit word to VDST from global address with zero extending.<br />
     1776Operation:<br />
     1777<code>VDST = *(UINT16*)(VADDR + SADDR + INST_OFFSET)</code></p>
     1778<h4>GLOBAL_STORE_BYTE</h4>
     1779<p>Opcode: 24 (0x18) for GCN 1.4<br />
     1780Syntax: GLOBAL_STORE_BYTE VADDR(2), VDATA, SADDR(2)|OFF<br />
     1781Description: Store byte from VDATA to global address.<br />
     1782Operation:<br />
     1783<code>*(UINT8*)(VADDR + SADDR + INST_OFFSET) = VDATA&amp;0xff</code></p>
     1784<h4>GLOBAL_STORE_BYTE_D16_HI</h4>
     1785<p>Opcode: 25 (0x19) for GCN 1.4<br />
     1786Syntax: GLOBAL_STORE_BYTE_D16_HI VADDR(2), VDATA, SADDR(2)|OFF<br />
     1787Description: Store byte from 16-23 bits of VDATA to global address.<br />
     1788Operation:<br />
     1789<code>*(UINT8*)(VADDR + SADDR + INST_OFFSET) = (VDATA&gt;&gt;16)&amp;0xff</code></p>
     1790<h4>GLOBAL_STORE_DWORD</h4>
     1791<p>Opcode: 28 (0x1c) for GCN 1.4<br />
     1792Syntax: GLOBAL_STORE_DWORD VADDR(2), VDATA, SADDR(2)|OFF<br />
     1793Description: Store dword from VDATA to global address.<br />
     1794Operation:<br />
     1795<code>*(UINT32*)(VADDR + SADDR + INST_OFFSET) = VDATA</code></p>
     1796<h4>GLOBAL_STORE_DWORDX2</h4>
     1797<p>Opcode: 29 (0x1d) for GCN 1.4<br />
     1798Syntax: GLOBAL_STORE_DWORDX2 VADDR(2), VDATA(2), SADDR(2)|OFF<br />
     1799Description: Store two dwords from VDATA to global address.<br />
     1800Operation:<br />
     1801<code>*(UINT64*)(VADDR + SADDR + INST_OFFSET) = VDATA</code></p>
     1802<h4>GLOBAL_STORE_DWORDX3</h4>
     1803<p>Opcode: 30 (0x1e) for GCN 1.4<br />
     1804Syntax: GLOBAL_STORE_DWORDX3 VADDR(2), VDATA(3), SADDR(2)|OFF<br />
     1805Description: Store three dwords from VDATA to global address.<br />
     1806Operation:<br />
     1807<code>BYTE* VM = (VADDR + SADDR + INST_OFFSET)
     1808*(UINT32*)(VM) = VDATA[0]
     1809*(UINT32*)(VM+4) = VDATA[1]
     1810*(UINT32*)(VM+8) = VDATA[2]</code></p>
     1811<h4>GLOBAL_STORE_DWORDX4</h4>
     1812<p>Opcode: 31 (0x1d) for GCN 1.4<br />
     1813Syntax: GLOBAL_STORE_DWORDX4 VADDR(2), VDATA(4), SADDR(2)|OFF<br />
     1814Description: Store four dwords from VDATA to global address.<br />
     1815Operation:<br />
     1816<code>BYTE* VM = (VADDR + SADDR + INST_OFFSET)
     1817*(UINT32*)(VM) = VDATA[0]
     1818*(UINT32*)(VM+4) = VDATA[1]
     1819*(UINT32*)(VM+8) = VDATA[2]
     1820*(UINT32*)(VM+12) = VDATA[3]</code></p>
     1821<h4>GLOBAL_STORE_SHORT</h4>
     1822<p>Opcode: 26 (0x1a) for GCN 1.4<br />
     1823Syntax: GLOBAL_STORE_SHORT VADDR(2), VDATA, SADDR(2)|OFF<br />
     1824Description: Store 16-bit word from VDATA to global address.<br />
     1825Operation:<br />
     1826<code>*(UINT16*)(VADDR + SADDR + INST_OFFSET) = VDATA&amp;0xffff</code></p>
     1827<h4>GLOBAL_STORE_SHORT_D16_HI</h4>
     1828<p>Opcode: 27 (0x1b) for GCN 1.4<br />
     1829Syntax: GLOBAL_STORE_SHORT_D16_HI VADDR(2), VDATA, SADDR(2)|OFF<br />
     1830Description: Store 16-bit word from higher 16-bit part of VDATA to global address.<br />
     1831Operation:<br />
     1832<code>*(UINT16*)(VADDR + SADDR + INST_OFFSET) = VDATA&gt;&gt;16</code></p>
    9411833}}}