source: CLRX/CLRadeonExtender/trunk/doc/GcnInstrsVop2.md @ 3149

Last change on this file since 3149 was 3149, checked in by matszpk, 2 years ago

CLRadeonExtender: CLRXDocs: Describe all 16-bit VOP2 instructions.

File size: 34.0 KB
Line 
1## GCN ISA VOP2/VOP3 instructions
2
3VOP2 instructions can be encoded in the VOP2 encoding and the VOP3A/VOP3B encoding.
4List of fields for VOP2 encoding:
5
6Bits  | Name     | Description
7------|----------|------------------------------
80-8   | SRC0     | First (scalar or vector) source operand
99-16  | VSRC1    | Second vector source operand
1017-24 | VDST     | Destination vector operand
1125-30 | OPCODE   | Operation code
1231    | ENCODING | Encoding type. Must be 0
13
14Syntax: INSTRUCTION VDST, SRC0, VSRC1
15
16List of fields for VOP3A/VOP3B encoding (GCN 1.0/1.1):
17
18Bits  | Name     | Description
19------|----------|------------------------------
200-7   | VDST     | Vector destination operand
218-10  | ABS      | Absolute modifiers for source operands (VOP3A)
228-14  | SDST     | Scalar destination operand (VOP3B)
2311    | CLAMP    | CLAMP modifier (VOP3A)
2415    | CLAMP    | CLAMP modifier (VOP3B)
2517-25 | OPCODE   | Operation code
2626-31 | ENCODING | Encoding type. Must be 0b110100
2732-40 | SRC0     | First (scalar or vector) source operand
2841-49 | SRC1     | Second (scalar or vector) source operand
2950-58 | SRC2     | Third (scalar or vector) source operand
3059-60 | OMOD     | OMOD modifier. Multiplication modifier
3161-63 | NEG      | Negation modifier for source operands
32
33List of fields for VOP3A/VOP3B encoding (GCN 1.2):
34
35Bits  | Name     | Description
36------|----------|------------------------------
370-7   | VDST     | Destination vector operand
388-10  | ABS      | Absolute modifiers for source operands (VOP3A)
398-14  | SDST     | Scalar destination operand (VOP3B)
4015    | CLAMP    | CLAMP modifier
4116-25 | OPCODE   | Operation code
4226-31 | ENCODING | Encoding type. Must be 0b110100
4332-40 | SRC0     | First (scalar or vector) source operand
4441-49 | SRC1     | Second (scalar or vector) source operand
4550-58 | SRC2     | Third (scalar or vector) source operand
4659-60 | OMOD     | OMOD modifier. Multiplication modifier
4761-63 | NEG      | Negation modifier for source operands
48
49Syntax: INSTRUCTION VDST, SRC0, SRC1 [MODIFIERS]
50
51Modifiers:
52
53* CLAMP - clamps destination floating point value in range 0.0-1.0
54* MUL:2, MUL:4, DIV:2 - OMOD modifiers. Multiply destination floating point value by
552.0, 4.0 or 0.5 respectively. Clamping applied after OMOD modifier.
56* -SRC - negate floating point value from source operand. Applied after ABS modifier.
57* ABS(SRC), |SRC| - apply absolute value to source operand
58
59NOTE: OMOD modifier doesn't work if output denormals are allowed
60(5 bit of MODE register for single precision or 7 bit for double precision). 
61NOTE: OMOD and CLAMP modifier affects only for instruction that output is
62floating point value. 
63NOTE: ABS and negation is applied to source operand for any instruction. 
64OMOD: OMOD modifier doesn't work for half precision (FP16) instructions (except V_MAC_F16).
65
66Negation and absolute value can be combined: `-ABS(V0)`. Modifiers CLAMP and
67OMOD (MUL:2, MUL:4 and DIV:2) can be given in random order.
68
69Limitations for operands:
70
71* only one SGPR can be read by instruction. Multiple occurrences of this same
72SGPR is allowed
73* only one literal constant can be used, and only when a SGPR or M0 is not used in
74source operands
75* only SRC0 can holds LDS_DIRECT
76
77Unaligned pairs of SGPRs are allowed in source and destination operands.
78
79VOP2 opcodes (0-63) are reflected in VOP3 in range: 256-319.
80List of the instructions by opcode:
81
82 Opcode     | Opcode(VOP3)| Mnemonic (GCN1.0/1.1) | Mnemonic (GCN 1.2)
83------------|-------------|----------------------|------------------------
84 0 (0x0)    | 256 (0x100) | V_CNDMASK_B32        | V_CNDMASK_B32
85 1 (0x1)    | 257 (0x101) | V_READLANE_B32       | V_ADD_F32
86 2 (0x2)    | 258 (0x102) | V_WRITELANE_B32      | V_SUB_F32
87 3 (0x3)    | 259 (0x103) | V_ADD_F32            | V_SUBREV_F32
88 4 (0x4)    | 260 (0x104) | V_SUB_F32            | V_MUL_LEGACY_F32
89 5 (0x5)    | 261 (0x105) | V_SUBREV_F32         | V_MUL_F32
90 6 (0x6)    | 262 (0x106) | V_MAC_LEGACY_F32     | V_MUL_I32_I24
91 7 (0x7)    | 263 (0x107) | V_MUL_LEGACY_F32     | V_MUL_HI_I32_I24
92 8 (0x8)    | 264 (0x108) | V_MUL_F32            | V_MUL_U32_U24
93 9 (0x9)    | 265 (0x109) | V_MUL_I32_I24        | V_MUL_HI_U32_U24
94 10 (0xa)   | 266 (0x10a) | V_MUL_HI_I32_I24     | V_MIN_F32
95 11 (0xb)   | 267 (0x10b) | V_MUL_U32_U24        | V_MAX_F32
96 12 (0xc)   | 268 (0x10c) | V_MUL_HI_U32_U24     | V_MIN_I32
97 13 (0xd)   | 269 (0x10d) | V_MIN_LEGACY_F32     | V_MAX_I32
98 14 (0xe)   | 270 (0x10e) | V_MAX_LEGACY_F32     | V_MIN_U32
99 15 (0xf)   | 271 (0x10f) | V_MIN_F32            | V_MAX_U32
100 16 (0x10)  | 272 (0x110) | V_MAX_F32            | V_LSHRREV_B32
101 17 (0x11)  | 273 (0x111) | V_MIN_I32            | V_ASHRREV_I32
102 18 (0x12)  | 274 (0x112) | V_MAX_I32            | V_LSHLREV_B32
103 19 (0x13)  | 275 (0x113) | V_MIN_U32            | V_AND_B32
104 20 (0x14)  | 276 (0x114) | V_MAX_U32            | V_OR_B32
105 21 (0x15)  | 277 (0x115) | V_LSHR_B32           | V_XOR_B32
106 22 (0x16)  | 278 (0x116) | V_LSHRREV_B32        | V_MAC_F32
107 23 (0x17)  | 279 (0x117) | V_ASHR_I32           | V_MADMK_F32
108 24 (0x18)  | 280 (0x118) | V_ASHRREV_I32        | V_MADAK_F32
109 25 (0x19)  | 281 (0x119) | V_LSHL_B32           | V_ADD_U32 (VOP3B)
110 26 (0x1a)  | 282 (0x11a) | V_LSHLREV_B32        | V_SUB_U32 (VOP3B)
111 27 (0x1b)  | 283 (0x11b) | V_AND_B32            | V_SUBREV_U32 (VOP3B)
112 28 (0x1c)  | 284 (0x11c) | V_OR_B32             | V_ADDC_U32 (VOP3B)
113 29 (0x1d)  | 285 (0x11d) | V_XOR_B32            | V_SUBB_U32 (VOP3B)
114 30 (0x1e)  | 286 (0x11e) | V_BFM_B32            | V_SUBBREV_U32 (VOP3B)
115 31 (0x1f)  | 287 (0x11f) | V_MAC_F32            | V_ADD_F16
116 32 (0x20)  | 288 (0x120) | V_MADMK_F32          | V_SUB_F16
117 33 (0x21)  | 289 (0x121) | V_MADAK_F32          | V_SUBREV_F16
118 34 (0x22)  | 290 (0x122) | V_BCNT_U32_B32       | V_MUL_F16
119 35 (0x23)  | 291 (0x123) | V_MBCNT_LO_U32_B32   | V_MAC_F16
120 36 (0x24)  | 292 (0x124) | V_MBCNT_HI_U32_B32   | V_MADMK_F16
121 37 (0x25)  | 293 (0x125) | V_ADD_I32 (VOP3B)    | V_MADAK_F16
122 38 (0x26)  | 294 (0x126) | V_SUB_I32 (VOP3B)    | V_ADD_U16
123 39 (0x27)  | 295 (0x127) | V_SUBREV_I32 (VOP3B) | V_SUB_U16
124 40 (0x28)  | 296 (0x128) | V_ADDC_U32 (VOP3B)   | V_SUBREV_U16
125 41 (0x29)  | 297 (0x129) | V_SUBB_U32 (VOP3B)   | V_MUL_LO_U16
126 42 (0x2a)  | 298 (0x12a) | V_SUBBREV_U32 (VOP3B)| V_LSHLREV_B16
127 43 (0x2b)  | 299 (0x12b) | V_LDEXP_F32          | V_LSHRREV_B16
128 44 (0x2c)  | 300 (0x12c) | V_CVT_PKACCUM_U8_F32 | V_ASHRREV_I16
129 45 (0x2d)  | 301 (0x12d) | V_CVT_PKNORM_I16_F32 | V_MAX_F16
130 46 (0x2e)  | 302 (0x12e) | V_CVT_PKNORM_U16_F32 | V_MIN_F16
131 47 (0x2f)  | 303 (0x12f) | V_CVT_PKRTZ_F16_F32  | V_MAX_U16
132 48 (0x30)  | 304 (0x130) | V_CVT_PK_U16_U32     | V_MAX_I16
133 49 (0x31)  | 305 (0x131) | V_CVT_PK_I16_I32     | V_MIN_U16
134 50 (0x32)  | 306 (0x132) | --                   | V_MIN_I16
135 51 (0x33)  | 307 (0x133) | --                   | V_LDEXP_F16
136
137### Instruction set
138
139Alphabetically sorted instruction list:
140
141#### V_ADD_F16
142
143Opcode VOP2: 31 (0x1f) for GCN 1.2 
144Opcode VOP3A: 287 (0x11f) for GCN 1.2 
145Syntax: V_ADD_F16 VDST, SRC0, SRC1 
146Description: Add two FP16 values from SRC0 and SRC1 and store result to VDST. 
147Operation: 
148```
149VDST = ASHALF(SRC0) + ASHALF(SRC1)
150```
151
152#### V_ADD_F32
153
154Opcode VOP2: 3 (0x3) for GCN 1.0/1.1; 1 (0x1) for GCN 1.2 
155Opcode VOP3A: 259 (0x103) for GCN 1.0/1.1; 257 (0x101) for GCN 1.2 
156Syntax: V_ADD_F32 VDST, SRC0, SRC1 
157Description: Add two FP values from SRC0 and SRC1 and store result to VDST. 
158Operation: 
159```
160VDST = ASFLOAT(SRC0) + ASFLOAT(SRC1)
161```
162
163#### V_ADD_I32, V_ADD_U32
164
165Opcode VOP2: 37 (0x25) for GCN 1.0/1.1; 25 (0x19) for GCN 1.2 
166Opcode VOP3B: 293 (0x125) for GCN 1.0/1.1; 281 (0x119) for GCN 1.2 
167Syntax VOP2 GCN 1.0/1.1: V_ADD_I32 VDST, VCC, SRC0, SRC1 
168Syntax VOP3B GCN 1.0/1.1: V_ADD_I32 VDST, SDST(2), SRC0, SRC1 
169Syntax VOP2 GCN 1.2: V_ADD_U32 VDST, VCC, SRC0, SRC1 
170Syntax VOP3B GCN 1.2: V_ADD_U32 VDST, SDST(2), SRC0, SRC1 
171Description: Add SRC0 to SRC1 and store result to VDST and store carry flag to
172SDST (or VCC) bit with number that equal to lane id. SDST is 64-bit.
173Bits for inactive threads in SDST are always zeroed. 
174Operation: 
175```
176UINT64 temp = (UINT64)SRC0 + (UINT64)SRC1
177VDST = temp
178SDST = 0
179UINT64 mask = (1ULL<<LANEID)
180SDST = (SDST&~mask) | ((temp >> 32) ? mask : 0)
181```
182
183#### V_ADD_U16
184
185Opcode VOP2: 38 (0x26) for GCN 1.2 
186Opcode VOP3A: 294 (0x126) for GCN 1.2 
187Syntax: V_ADD_U16 VDST, SRC0, SRC1 
188Description: Add two 16-bit unsigned values from SRC0 and SRC1 and
189store 16-bit unsigned result to VDST. 
190Operation: 
191```
192VDST = (SRC0 + SRC1) & 0xffff
193```
194
195#### V_ADDC_U32
196
197Opcode VOP2: 40 (0x28) for GCN 1.0/1.1; 28 (0x1c) for GCN 1.2 
198Opcode VOP3B: 296 (0x128) for GCN 1.0/1.1; 284 (0x11c) for GCN 1.2 
199Syntax VOP2 GCN 1.0/1.1: V_ADDC_U32 VDST, VCC, SRC0, SRC1, VCC 
200Syntax VOP3B GCN 1.2: V_ADDC_U32 VDST, SDST(2), SRC0, SRC1, SSRC2(2) 
201Description: Add SRC0 to SRC1 with carry stored in SSRC2 bit with number that equal lane id,
202and store result to VDST and store carry flag to SDST (or VCC) bit with number
203that equal to lane id. SDST and SSRC2 are 64-bit.
204Bits for inactive threads in SDST are always zeroed. 
205Operation: 
206```
207UINT64 mask = (1ULL<<LANEID)
208UINT8 CC = ((SSRC2&mask) ? 1 : 0)
209UINT64 temp = (UINT64)SRC0 + (UINT64)SRC1 + CC
210SDST = 0
211VDST = temp
212SDST = (SDST&~mask) | ((temp >> 32) ? mask : 0)
213```
214
215#### V_AND_B32
216
217Opcode: VOP2: 27 (0x1b) for GCN 1.0/1.1; 19 (0x13) for GCN 1.2 
218Opcode: VOP3A: 283 (0x11b) for GCN 1.0/1.1; 275 (0x113) for GCN 1.2 
219Syntax: V_AND_B32 VDST, SRC0, SRC1 
220Description: Do bitwise AND on SRC0 and SRC1, store result to VDST. 
221Operation: 
222```
223VDST = SRC0 & SRC1
224```
225
226#### V_ASHR_I32
227
228Opcode VOP2: 23 (0x17) for GCN 1.0/1.1 
229Opcode VOP3A: 279 (0x117) for GCN 1.0/1.1 
230Syntax: V_ASHR_I32 VDST, SRC0, SRC1 
231Description: Arithmetic shift right SRC0 by (SRC1&31) bits and store result into VDST. 
232Operation: 
233```
234VDST = (INT32)SRC0 >> (SRC1&31)
235```
236
237#### V_ASHRREV_B16
238
239Opcode VOP2: 44 (0x2c) for GCN 1.2 
240Opcode VOP3A: 300 (0x12c) for GCN 1.2 
241Syntax: V_ASHRREV_B16 VDST, SRC0, SRC1 
242Description: Shift right signed 16-bit value from SRC1 by (SRC0&15) bits and
243store 16-bit signed result into VDST. 
244Operation: 
245```
246VDST = ((INT16)SRC1 >> (SRC0&15)) & 0xffff
247```
248
249#### V_ASHRREV_I32
250
251Opcode VOP2: 24 (0x18) for GCN 1.0/1.1; 16 (0x11) for GCN 1.2 
252Opcode VOP3A: 280 (0x118) for GCN 1.0/1.1; 272 (0x111) for GCN 1.2 
253Syntax: V_ASHRREV_I32 VDST, SRC0, SRC1 
254Description: Arithmetic shift right SRC1 by (SRC0&31) bits and store result into VDST. 
255Operation: 
256```
257VDST = (INT32)SRC1 >> (SRC0&31)
258```
259
260#### V_BCNT_U32_B32
261
262Opcode VOP2: 34 (0x22) for GCN 1.0/1.1 
263Opcode VOP3A: 290 (0x122) for GCN 1.0/1.1 
264Syntax: V_BCNT_U32_B32 VDST, SRC0, SRC1 
265Description: Count bits in SRC0, adds SSRC1, and store result to VDST. 
266Operation: 
267```
268VDST = SRC1 + BITCOUNT(SRC0)
269```
270
271#### V_BFM_B32
272
273Opcode VOP2: 30 (0x1e) for GCN 1.0/1.1 
274Opcode VOP3A: 286 (0x11e) for GCN 1.0/1.1 
275Syntax: V_BFM_B32 VDST, SRC0, SRC1 
276Description: Make 32-bit bitmask from (SRC1 & 31) bit that have length (SRC0 & 31) and
277store it to VDST. 
278Operation: 
279```
280VDST = ((1U << (SRC0&31))-1) << (SRC1&31)
281```
282
283#### V_CNDMASK_B32
284
285Opcode VOP2: 0 (0x0) for GCN 1.0/1.1; 1 (0x0) for GCN 1.2 
286Opcode VOP3A: 256 (0x100) for GCN 1.0/1.1; 256 (0x100) for GCN 1.2 
287Syntax VOP2: V_CNDMASK_B32 VDST, SRC0, SRC1, VCC 
288Syntax VOP3A: V_CNDMASK_B32 VDST, SRC0, SRC1, SSRC2(2) 
289Description: If bit for current lane of VCC or SDST is set then store SRC1 to VDST,
290otherwise store SRC0 to VDST. 
291Operation: 
292```
293VDST = SSRC2&(1ULL<<LANEID) ? SRC1 : SRC0
294```
295
296#### V_CVT_PK_I16_I32
297
298Opcode VOP2: 49 (0x31) for GCN 1.0/1.1 
299Opcode VOP3A: 305 (0x131) for GCN 1.0/1.1 
300Syntax: V_CVT_PK_I16_I32 VDST, SRC0, SRC1 
301Description: Convert signed value from SRC0 and SRC1 to signed 16-bit values with
302clamping, and store first value to low 16-bit and second to high 16-bit of the VDST. 
303Operation: 
304```
305INT16 D0 = MAX(MIN((INT32)SRC0, 0x7fff), -0x8000)
306INT16 D1 = MAX(MIN((INT32)SRC1, 0x7fff), -0x8000)
307VDST = D0 | (((UINT32)D1) << 16)
308```
309
310#### V_CVT_PK_U16_U32
311
312Opcode VOP2: 48 (0x30) for GCN 1.0/1.1 
313Opcode VOP3A: 304 (0x130) for GCN 1.0/1.1 
314Syntax: V_CVT_PK_U16_U32 VDST, SRC0, SRC1 
315Description: Convert unsigned value from SRC0 and SRC1 to unsigned 16-bit values with
316clamping, and store first value to low 16-bit and second to high 16-bit of the VDST. 
317Operation: 
318```
319UINT16 D0 = MIN(SRC0, 0xffff)
320UINT16 D1 = MIN(SRC1, 0xffff)
321VDST = D0 | (((UINT32)D1) << 16)
322```
323
324#### V_CVT_PKACCUM_U8_F32
325
326Opcode VOP2: 44 (0x2c) for GCN 1.0/1.1 
327Opcode VOP3A: 300 (0x12c) for GCN 1.0/1.1 
328Syntax: V_CVT_PKACCUM_U8_F32 VDST, SRC0, SRC1 
329Description: Convert floating point value from SRC0 to unsigned byte value with
330rounding mode from MODE register, and store this byte to (SRC1&3)'th byte of VDST. 
331Operation: 
332```
333UINT8 shift = ((SRC1&3) * 8)
334UINT32 mask = 0xff << shift
335FLOAT f = RNDINT(ASFLOAT(SRC0))
336UINT8 VAL8 = 0
337if (ISNAN(f))
338    VAL8 = (UINT8)MAX(MIN(f, 255.0), 0.0)
339VDST = (VDST&~mask) | (((UINT32)VAL8) << shift)
340```
341
342#### V_CVT_PKNORM_I16_F32
343
344Opcode VOP2: 45 (0x2d) for GCN 1.0/1.1 
345Opcode VOP3A: 301 (0x12d) for GCN 1.0/1.1 
346Syntax: V_CVT_PKNORM_I16_F32 VDST, SRC0, SRC1 
347Description: Convert normalized FP value from SRC0 and SRC1 to signed 16-bit integers with
348rounding to nearest to even (??), and store first value to low 16-bit and
349second to high 16-bit of the VDST. 
350Operation: 
351```
352INT16 roundNorm(FLOAT S)
353{
354    FLOAT f = RNDNEINT(S*32767)
355    if (ISNAN(f))
356        return 0
357    return (INT16)MAX(MIN(f, 32767.0), -32767.0)
358}
359VDST = roundNorm(ASFLOAT(SRC0)) | ((UINT32)roundNorm(ASFLOAT(SRC1)) << 16)
360```
361
362#### V_CVT_PKNORM_U16_F32
363
364Opcode VOP2: 46 (0x2e) for GCN 1.0/1.1 
365Opcode VOP3A: 302 (0x12e) for GCN 1.0/1.1 
366Syntax: V_CVT_PKNORM_U16_F32 VDST, SRC0, SRC1 
367Description: Convert normalized FP value from SRC0 and SRC1 to unsigned 16-bit integers with
368rounding to nearest to even (??), and store first value to low 16-bit and
369second to high 16-bit of the VDST. 
370Operation: 
371```
372UINT16 roundNorm(FLOAT S)
373{
374    FLOAT f = RNDNEINT(S*65535.0)
375    if (ISNAN(f))
376        return 0
377    return (INT16)MAX(MIN(f, 65535.0), 0.0)
378}
379VDST = roundNorm(ASFLOAT(SRC0)) | ((UINT32)roundNorm(ASFLOAT(SRC1)) << 16)
380```
381
382#### V_CVT_PKRTZ_F16_F32
383
384Opcode VOP2: 47 (0x2f) for GCN 1.0/1.1 
385Opcode VOP3A: 303 (0x12f) for GCN 1.0/1.1 
386Syntax: V_CVT_PKRTZ_F16_F32 VDST, SRC0, SRC1 
387Description: Convert normalized FP value from SRC0 and SRC1 to half floating points with
388rounding to zero, and store first value to low 16-bit and
389second to high 16-bit of the VDST. 
390Operation: 
391```
392UINT16 D0 = ASINT16(CVT_HALF_RTZ(ASFLOAT(SRC0)))
393UINT16 D1 = ASINT16(CVT_HALF_RTZ(ASFLOAT(SRC1)))
394VDST = D0 | (((UINT32)D1) << 16)
395```
396
397#### V_LDEXP_F16
398
399Opcode VOP2: 51 (0x33) for GCN 1.2 
400Opcode VOP3A: 307 (0x133) for GCN 1.2 
401Syntax: V_LDEXP_F16 VDST, SRC0, SRC1 
402Description: Do ldexp operation on SRC0 and SRC1 (multiply SRC0 by 2**(SRC1)).
403SRC1 is signed integer, SRC0 is half floating point value. 
404Operation: 
405```
406VDST = ASHALF(SRC0) * POW(2.0, (INT32)SRC1)
407```
408
409#### V_LDEXP_F32
410
411Opcode VOP2: 43 (0x2b) for GCN 1.0/1.1 
412Opcode VOP3A: 299 (0x12b) for GCN 1.0/1.1 
413Syntax: V_LDEXP_F32 VDST, SRC0, SRC1 
414Description: Do ldexp operation on SRC0 and SRC1 (multiply SRC0 by 2**(SRC1)).
415SRC1 is signed integer, SRC0 is floating point value. 
416Operation: 
417```
418VDST = ASFLOAT(SRC0) * POW(2.0, (INT32)SRC1)
419```
420
421#### V_LSHL_B32
422
423Opcode VOP2: 25 (0x19) for GCN 1.0/1.1 
424Opcode VOP3A: 281 (0x119) for GCN 1.0/1.1 
425Syntax: V_LSHL_B32 VDST, SRC0, SRC1 
426Description: Shift left SRC0 by (SRC1&31) bits and store result into VDST. 
427Operation: 
428```
429VDST = SRC0 << (SRC1&31)
430```
431
432#### V_LSHLREV_B16
433
434Opcode VOP2: 42 (0x2a) for GCN 1.2 
435Opcode VOP3A: 298 (0x12a) for GCN 1.2 
436Syntax: V_LSHLREV_B16 VDST, SRC0, SRC1 
437Description: Shift left unsigned 16-bit value from SRC1 by (SRC0&15) bits and
438store 16-bit unsigned result into VDST. 
439Operation: 
440```
441VDST = (SRC1 << (SRC0&15)) & 0xffff
442```
443
444#### V_LSHLREV_B32
445
446Opcode VOP2: 26 (0x1a) for GCN 1.0/1.1; 18 (0x12) for GCN 1.2 
447Opcode VOP3A: 282 (0x11a) for GCN 1.0/1.1; 274 (0x112) for GCN 1.2 
448Syntax: V_LSHLREV_B32 VDST, SRC0, SRC1 
449Description: Shift left SRC1 by (SRC0&31) bits and store result into VDST. 
450Operation: 
451```
452VDST = SRC1 << (SRC0&31)
453```
454
455#### V_LSHR_B32
456
457Opcode VOP2: 21 (0x15) for GCN 1.0/1.1 
458Opcode VOP3A: 277 (0x115) for GCN 1.0/1.1 
459Syntax: V_LSHR_B32 VDST, SRC0, SRC1 
460Description: Shift right SRC0 by (SRC1&31) bits and store result into VDST. 
461Operation: 
462```
463VDST = SRC0 >> (SRC1&31)
464```
465
466#### V_LSHRREV_B16
467
468Opcode VOP2: 43 (0x2b) for GCN 1.2 
469Opcode VOP3A: 299 (0x12b) for GCN 1.2 
470Syntax: V_LSHRREV_B16 VDST, SRC0, SRC1 
471Description: Shift right unsigned 16-bit value from SRC1 by (SRC0&15) bits and
472store 16-bit unsigned result into VDST. 
473Operation: 
474```
475VDST = (SRC1 >> (SRC0&15)) & 0xffff
476```
477
478#### V_LSHRREV_B32
479
480Opcode VOP2: 22 (0x16) for GCN 1.0/1.1; 16 (0x10) for GCN 1.2 
481Opcode VOP3A: 278 (0x116) for GCN 1.0/1.1; 272 (0x110) for GCN 1.2 
482Syntax: V_LSHRREV_B32 VDST, SRC0, SRC1 
483Description: Shift right SRC1 by (SRC0&31) bits and store result into VDST. 
484Operation: 
485```
486VDST = SRC1 >> (SRC0&31)
487```
488
489#### V_MAC_F16
490
491Opcode VOP2: 35 (0x23) for GCN 1.2 
492Opcode VOP3A: 291 (0x123) for GCN 1.2 
493Syntax: V_MAC_F16 VDST, SRC0, SRC1 
494Description: Multiply FP16 value from SRC0 by FP16 value from SRC1 and
495add result to VDST. It applies OMOD modifier to result. 
496Operation: 
497```
498VDST = ASHALF(SRC0) * ASHALF(SRC1) + ASHALF(VDST)
499```
500
501#### V_MAC_F32
502
503Opcode VOP2: 31 (0x1f) for GCN 1.0/1.1; 22 (0x16) for GCN 1.2 
504Opcode VOP3A: 287 (0x11f) for GCN 1.0/1.1; 278 (0x116) for GCN 1.2 
505Syntax: V_MAC_F32 VDST, SRC0, SRC1 
506Description: Multiply FP value from SRC0 by FP value from SRC1 and add result to VDST. 
507Operation: 
508```
509VDST = ASFLOAT(SRC0) * ASFLOAT(SRC1) + ASFLOAT(VDST)
510```
511
512#### V_MAC_LEGACY_F32
513
514Opcode VOP2: 6 (0x6) for GCN 1.0/1.1 
515Opcode VOP3A: 262 (0x106) for GCN 1.0/1.1 
516Syntax: V_MAC_LEGACY_F32 VDST, SRC0, SRC1 
517Description: Multiply FP value from SRC0 by FP value from SRC1 and add result to VDST.
518If one of value is 0.0 then always do not change VDST (do not apply IEEE rules for 0.0*x). 
519Operation: 
520```
521if (ASFLOAT(SRC0)!=0.0 && ASFLOAT(SRC1)!=0.0)
522    VDST = ASFLOAT(SRC0) * ASFLOAT(SRC1) + ASFLOAT(VDST)
523```
524
525#### V_MADMK_F16
526
527Opcode: 36 (0x24) for GCN 1.2 
528Opcode: 292 (0x124) for GCN 1.2 
529Syntax: V_MADMK_F16 VDST, SRC0, FLOAT16LIT, SRC1 
530Description: Multiply FP16 value from SRC0 with the constant literal FLOAT16LIT and add
531FP16 value from SRC1; and store result to VDST. Constant literal follows
532after instruction word. Use nearest-even rouding. 
533Operation:
534```
535VDST = ASHALF(SRC0) * ASHALF(FLOAT16LIT) + ASHALF(SRC1)
536```
537
538#### V_MADMK_F32
539
540Opcode: VOP2: 32 (0x20) for GCN 1.0/1.1; 23 (0x17) for GCN 1.2 
541Opcode: VOP3A: 288 (0x120) for GCN 1.0/1.1; 279 (0x117) for GCN 1.2 
542Syntax: V_MADMK_F32 VDST, SRC0, FLOATLIT, SRC1 
543Description: Multiply FP value from SRC0 with the constant literal FLOATLIT and add
544FP value from SRC1; and store result to VDST. Constant literal follows
545after instruction word. 
546Operation:
547```
548VDST = ASFLOAT(SRC0) * ASFLOAT(FLOATLIT) + ASFLOAT(SRC1)
549```
550
551#### V_MADAK_F16
552
553Opcode: 37 (0x25) for GCN 1.2 
554Opcode: 293 (0x125) for GCN 1.2 
555Syntax: V_MADAK_F16 VDST, SRC0, SRC1, FLOAT16LIT 
556Description: Multiply FP16 value from SRC0 with FP16 value from SRC1 and add
557the constant literal FLOATLIT16; and store result to VDST. Constant literal follows
558after instruction word. 
559Operation:
560```
561VDST = ASHALF(SRC0) * ASHALF(SRC1) + ASHALF(FLOAT16LIT)
562```
563
564#### V_MADAK_F32
565
566Opcode: VOP2: 33 (0x21) for GCN 1.0/1.1; 24 (0x18) for GCN 1.2 
567Opcode: VOP3A: 289 (0x121) for GCN 1.0/1.1; 280 (0x118) for GCN 1.2 
568Syntax: V_MADAK_F32 VDST, SRC0, SRC1, FLOATLIT 
569Description: Multiply FP value from SRC0 with FP value from SRC1 and add
570the constant literal FLOATLIT; and store result to VDST. Constant literal follows
571after instruction word. 
572Operation:
573```
574VDST = ASFLOAT(SRC0) * ASFLOAT(SRC1) + ASFLOAT(FLOATLIT)
575```
576
577#### V_MAX_F16
578
579Opcode VOP2: 45 (0x2d) for GCN 1.2 
580Opcode VOP3A: 301 (0x12d) for GCN 1.2 
581Syntax: V_MAX_F16 VDST, SRC0, SRC1 
582Description: Choose largest half floating point value from SRC0 and SRC1,
583and store result to VDST. 
584Operation: 
585```
586VDST = MAX(ASFHALF(SRC0), ASFHALF(SRC1))
587```
588
589#### V_MAX_F32
590
591Opcode VOP2: 16 (0x10) for GCN 1.0/1.1; 11 (0xb) for GCN 1.2 
592Opcode VOP3A: 272 (0x110) for GCN 1.0/1.1; 267 (0x10b) for GCN 1.2 
593Syntax: V_MAX_F32 VDST, SRC0, SRC1 
594Description: Choose largest floating point value from SRC0 and SRC1,
595and store result to VDST. 
596Operation: 
597```
598VDST = MAX(ASFLOAT(SRC0), ASFLOAT(SRC1))
599```
600
601#### V_MAX_I32
602
603Opcode VOP2: 18 (0x12) for GCN 1.0/1.1; 13 (0xd) for GCN 1.2 
604Opcode VOP3A: 274 (0x112) for GCN 1.0/1.1; 269 (0x10d) for GCN 1.2 
605Syntax: V_MAX_I32 VDST, SRC0, SRC1 
606Description: Choose largest signed value from SRC0 and SRC1, and store result to VDST. 
607Operation: 
608```
609VDST = MAX((INT32)SRC0, (INT32)SRC1)
610```
611
612#### V_MAX_LEGACY_F32
613
614Opcode VOP2: 14 (0xe) for GCN 1.0/1.1 
615Opcode VOP3A: 270 (0x10e) for GCN 1.0/1.1 
616Syntax: V_MAX_LEGACY_F32 VDST, SRC0, SRC1 
617Description: Choose largest floating point value from SRC0 and SRC1,
618and store result to VDST. If SSRC1 is NaN value then store NaN value to VDST
619(legacy rules for handling NaNs). 
620Operation: 
621```
622if (!ISNAN(ASFLOAT(SRC1)))
623    VDST = MAX(ASFLOAT(SRC0), ASFLOAT(SRC1))
624else
625    VDST = NaN
626```
627
628#### V_MAX_U32
629
630Opcode VOP2: 20 (0x14) for GCN 1.0/1.1; 15 (0xf) for GCN 1.2 
631Opcode VOP3A: 276 (0x114) for GCN 1.0/1.1; 271 (0x10f) for GCN 1.2 
632Syntax: V_MAX_U32 VDST, SRC0, SRC1 
633Description: Choose largest unsigned value from SRC0 and SRC1, and store result to VDST. 
634Operation: 
635```
636VDST = MAX(SRC0, SRC1)
637```
638
639#### V_MBCNT_HI_U32_B32
640
641Opcode VOP2: 36 (0x24) for GCN 1.0/1.1 
642Opcode VOP3A: 292 (0x124) for GCN 1.0/1.1 
643Syntax: V_MBCNT_HI_U32_B32 VDST, SRC0, SRC1 
644Description: Make mask for all lanes ending at current lane,
645get from that mask higher 32-bits, use it to mask SSRC0,
646count bits in that value, and store result to VDST. 
647Operation: 
648```
649UINT32 MASK = ((1ULL << (LANEID-32)) - 1ULL) & SRC0
650VDST = SRC1 + BITCOUNT(MASK)
651```
652
653#### V_MBCNT_LO_U32_B32
654
655Opcode VOP2: 35 (0x23) for GCN 1.0/1.1 
656Opcode VOP3A: 291 (0x123) for GCN 1.0/1.1 
657Syntax: V_MBCNT_LO_U32_B32 VDST, SRC0, SRC1 
658Description: Make mask for all lanes ending at current lane,
659get from that mask lower 32-bits, use it to mask SSRC0,
660count bits in that value, and store result to VDST. 
661Operation: 
662```
663UINT32 MASK = ((1ULL << LANEID) - 1ULL) & SRC0
664VDST = SRC1 + BITCOUNT(MASK)
665```
666
667#### V_MIN_F16
668
669Opcode VOP2: 46 (0x2e) for GCN 1.2 
670Opcode VOP3A: 302 (0x12e) for GCN 1.2 
671Syntax: V_MIN_F16 VDST, SRC0, SRC1 
672Description: Choose smallest half floating point value from SRC0 and SRC1,
673and store result to VDST. 
674Operation: 
675```
676VDST = MIN(ASFHALF(SRC0), ASFHALF(SRC1))
677```
678
679#### V_MIN_F32
680
681Opcode VOP2: 15 (0xf) for GCN 1.0/1.1; 10 (0xa) for GCN 1.2 
682Opcode VOP3A: 271 (0x10f) for GCN 1.0/1.1; 266 (0x10a) for GCN 1.2 
683Syntax: V_MIN_F32 VDST, SRC0, SRC1 
684Description: Choose smallest floating point value from SRC0 and SRC1,
685and store result to VDST. 
686Operation: 
687```
688VDST = MIN(ASFLOAT(SRC0), ASFLOAT(SRC1))
689```
690
691#### V_MIN_i16
692
693Opcode VOP2: 50 (0x32) for GCN 1.2 
694Opcode VOP3A: 306 (0x132) for GCN 1.2 
695Syntax: V_MIN_i16 VDST, SRC0, SRC1 
696Description: Choose smallest signed 16-bit value from SRC0 and SRC1,
697and store result to VDST. 
698Operation: 
699```
700VDST = MIN((INT16)SRC0, (INT16)SRC1)
701```
702
703#### V_MIN_I32
704
705Opcode VOP2: 17 (0x11) for GCN 1.0/1.1; 12 (0xc) for GCN 1.2 
706Opcode VOP3A: 273 (0x111) for GCN 1.0/1.1; 268 (0x10c) for GCN 1.2 
707Syntax: V_MIN_I32 VDST, SRC0, SRC1 
708Description: Choose smallest signed value from SRC0 and SRC1, and store result to VDST. 
709Operation: 
710```
711VDST = MIN((INT32)SRC0, (INT32)SRC1)
712```
713
714#### V_MIN_LEGACY_F32
715
716Opcode VOP2: 13 (0xd) for GCN 1.0/1.1 
717Opcode VOP3A: 269 (0x10d) for GCN 1.0/1.1 
718Syntax: V_MIN_LEGACY_F32 VDST, SRC0, SRC1 
719Description: Choose smallest floating point value from SRC0 and SRC1,
720and store result to VDST. If SSRC1 is NaN value then store NaN value to VDST
721(legacy rules for handling NaNs). 
722Operation: 
723```
724if (!ISNAN(ASFLOAT(SRC1)))
725    VDST = MIN(ASFLOAT(SRC0), ASFLOAT(SRC1))
726else
727    VDST = NaN
728```
729
730#### V_MIN_U16
731
732Opcode VOP2: 49 (0x31) for GCN 1.2 
733Opcode VOP3A: 305 (0x131) for GCN 1.2 
734Syntax: V_MIN_U16 VDST, SRC0, SRC1 
735Description: Choose smallest unsigned 16-bit value from SRC0 and SRC1,
736and store result to VDST. 
737Operation: 
738```
739VDST = MIN(SRC0&0xffff, SRC1&0xffff)
740```
741
742#### V_MIN_U32
743
744Opcode VOP2: 19 (0x13) for GCN 1.0/1.1; 14 (0xe) for GCN 1.2 
745Opcode VOP3A: 275 (0x113) for GCN 1.0/1.1; 270 (0x10e) for GCN 1.2 
746Syntax: V_MIN_U32 VDST, SRC0, SRC1 
747Description: Choose smallest unsigned value from SRC0 and SRC1, and store result to VDST. 
748Operation: 
749```
750VDST = MIN(SRC0, SRC1)
751```
752
753#### V_MUL_LEGACY_F32
754
755Opcode VOP2: 7 (0x7) for GCN 1.0/1.1; 5 (0x4) for GCN 1.2 
756Opcode VOP3A: 263 (0x107) for GCN 1.0/1.1; 260 (0x104) for GCN 1.2 
757Syntax: V_MUL_LEGACY_F32 VDST, SRC0, SRC1 
758Description: Multiply FP value from SRC0 by FP value from SRC1 and store result to VDST.
759If one of value is 0.0 then always store 0.0 to VDST (do not apply IEEE rules for 0.0*x). 
760Operation: 
761```
762if (ASFLOAT(SRC0)!=0.0 && ASFLOAT(SRC1)!=0.0)
763    VDST = ASFLOAT(SRC0) * ASFLOAT(SRC1)
764else
765    VDST = 0.0
766```
767
768#### V_MUL_F16
769
770Opcode VOP2: 34 (0x22) for GCN 1.2 
771Opcode VOP3A: 290 (0x122) for GCN 1.2 
772Syntax: V_MUL_F16 VDST, SRC0, SRC1 
773Description: Multiply FP16 value from SRC0 by FP16 value from SRC1
774and store result to VDST. 
775Operation: 
776```
777VDST = ASHALF(SRC0) * ASHALF(SRC1)
778```
779
780#### V_MUL_F32
781
782Opcode VOP2: 8 (0x8) for GCN 1.0/1.1; 5 (0x5) for GCN 1.2 
783Opcode VOP3A: 264 (0x108) for GCN 1.0/1.1; 261 (0x105) for GCN 1.2 
784Syntax: V_MUL_F32 VDST, SRC0, SRC1 
785Description: Multiply FP value from SRC0 by FP value from SRC1 and store result to VDST. 
786Operation: 
787```
788VDST = ASFLOAT(SRC0) * ASFLOAT(SRC1)
789```
790
791#### V_MUL_HI_I32_I24
792
793Opcode VOP2: 10 (0xa) for GCN 1.0/1.1; 7 (0x7) for GCN 1.2 
794Opcode VOP3A: 266 (0x10a) for GCN 1.0/1.1; 263 (0x107) for GCN 1.2 
795Syntax: V_MUL_HI_I32_I24 VDST, SRC0, SRC1 
796Description: Multiply 24-bit signed integer value from SRC0 by 24-bit signed value from SRC1
797and store higher 16-bit of the result to VDST with sign extension.
798Any modifier doesn't affect on result. 
799Operation: 
800```
801INT32 V0 = (INT32)((SRC0&0x7fffff) | (SSRC0&0x800000 ? 0xff800000 : 0))
802INT32 V1 = (INT32)((SRC1&0x7fffff) | (SSRC1&0x800000 ? 0xff800000 : 0))
803VDST = ((INT64)V0 * V1)>>32
804```
805
806#### V_MUL_HI_U32_U24
807
808Opcode VOP2: 12 (0xc) for GCN 1.0/1.1; 9 (0x9) for GCN 1.2 
809Opcode VOP3A: 268 (0x10c) for GCN 1.0/1.1; 265 (0x109) for GCN 1.2 
810Syntax: V_MUL_HI_U32_U24 VDST, SRC0, SRC1 
811Description: Multiply 24-bit unsigned integer value from SRC0 by 24-bit unsigned value
812from SRC1 and store higher 16-bit of the result to VDST.
813Any modifier doesn't affect on result. 
814Operation: 
815```
816VDST = ((UINT64)(SRC0&0xffffff) * (UINT32)(SRC1&0xffffff)) >> 32
817```
818
819#### V_MUL_I32_I24
820
821Opcode VOP2: 9 (0x9) for GCN 1.0/1.1; 6 (0x6) for GCN 1.2 
822Opcode VOP3A: 265 (0x109) for GCN 1.0/1.1; 262 (0x106) for GCN 1.2 
823Syntax: V_MUL_I32_I24 VDST, SRC0, SRC1 
824Description: Multiply 24-bit signed integer value from SRC0 by 24-bit signed value from SRC1
825and store result to VDST. Any modifier doesn't affect on result. 
826Operation: 
827```
828INT32 V0 = (INT32)((SRC0&0x7fffff) | (SSRC0&0x800000 ? 0xff800000 : 0))
829INT32 V1 = (INT32)((SRC1&0x7fffff) | (SSRC1&0x800000 ? 0xff800000 : 0))
830VDST = V0 * V1
831```
832
833#### V_MUL_LO_U16
834
835Opcode VOP2: 41 (0x29) for GCN 1.2 
836Opcode VOP3A: 297 (0x129) for GCN 1.2 
837Syntax: V_MUL_LO_U16 VDST, SRC0, SRC1 
838Description: Multiply 16-bit unsigned value from SRC0 by 16-bit unsigned value from SRC1
839and store 16-bit result to VDST. 
840Operation: 
841```
842VDST = ((SRC0&0Xffff) * (SRC1&0xffff)) & 0xffff
843```
844
845#### V_MUL_U32_U24
846
847Opcode VOP2: 11 (0xb) for GCN 1.0/1.1; 8 (0x8) for GCN 1.2 
848Opcode VOP3A: 267 (0x10b) for GCN 1.0/1.1; 264 (0x108) for GCN 1.2 
849Syntax: V_MUL_U32_U24 VDST, SRC0, SRC1 
850Description: Multiply 24-bit unsigned integer value from SRC0 by 24-bit unsigned value
851from SRC1 and store result to VDST. Any modifier doesn't affect on result. 
852Operation: 
853```
854VDST = (UINT32)(SRC0&0xffffff) * (UINT32)(SRC1&0xffffff)
855```
856
857#### V_OR_B32
858
859Opcode: VOP2: 28 (0x1c) for GCN 1.0/1.1; 20 (0x14) for GCN 1.2 
860Opcode: VOP3A: 284 (0x11c) for GCN 1.0/1.1; 276 (0x114) for GCN 1.2 
861Syntax: V_OR_B32 VDST, SRC0, SRC1 
862Description: Do bitwise OR operation on SRC0 and SRC1, store result to VDST.
863CLAMP and OMOD modifier doesn't affect on result. 
864Operation: 
865```
866VDST = SRC0 | SRC1
867```
868
869#### V_READLANE_B32
870
871Opcode VOP2: 1 (0x1) for GCN 1.0/1.1 
872Opcode VOP3A: 257 (0x101) for GCN 1.0/1.1 
873Syntax: V_READLANE_B32 SDST, VSRC0, SSRC1 
874Description: Copy one VSRC0 lane value to one SDST. Lane (thread id) choosen from SSRC1&63.
875SSRC1 can be SGPR or M0. Ignores EXEC mask. 
876Operation: 
877```
878SDST = VSRC0[SSRC1 & 63]
879```
880
881#### V_SUB_F16
882
883Opcode VOP2: 32 (0x20) for GCN 1.2 
884Opcode VOP3A: 288 (0x120) for GCN 1.2 
885Syntax: V_SUB_F16 VDST, SRC0, SRC1 
886Description: Subtract FP16 value of SRC1 from FP16 value of SRC0 and store result to VDST. 
887Operation: 
888```
889VDST = ASHALF(SRC0) - ASHALF(SRC1)
890```
891
892#### V_SUB_F32
893
894Opcode VOP2: 4 (0x4) for GCN 1.0/1.1; 2 (0x2) for GCN 1.2 
895Opcode VOP3A: 260 (0x104) for GCN 1.0/1.1; 258 (0x102) for GCN 1.2 
896Syntax: V_SUB_F32 VDST, SRC0, SRC1 
897Description: Subtract FP value of SRC1 from FP value of SRC0 and store result to VDST. 
898Operation: 
899```
900VDST = ASFLOAT(SRC0) - ASFLOAT(SRC1)
901```
902
903#### V_SUB_U16
904
905Opcode VOP2: 39 (0x27) for GCN 1.2 
906Opcode VOP3A: 295 (0x127) for GCN 1.2 
907Syntax: V_SUB_U16 VDST, SRC0, SRC1 
908Description: Subtract unsigned 16-bit value of SRC1 from SRC0 and store
90916-bit unsigned result to VDST. 
910Operation: 
911```
912VDST = (SRC0 - SRC1) & 0xffff
913```
914
915#### V_SUB_I32, V_SUB_U32
916
917Opcode VOP2: 38 (0x26) for GCN 1.0/1.1; 26 (0x1a) for GCN 1.2 
918Opcode VOP3B: 294 (0x126) for GCN 1.0/1.1; 282 (0x11a) for GCN 1.2 
919Syntax VOP2 GCN 1.0/1.1: V_SUB_I32 VDST, VCC, SRC0, SRC1 
920Syntax VOP3B GCN 1.0/1.1: V_SUB_I32 VDST, SDST(2), SRC0, SRC1 
921Syntax VOP2 GCN 1.2: V_SUB_U32 VDST, VCC, SRC0, SRC1 
922Syntax VOP3B GCN 1.2: V_SUB_U32 VDST, SDST(2), SRC0, SRC1 
923Description: Subtract SRC1 from SRC0 and store result to VDST and store borrow flag to
924SDST (or VCC) bit with number that equal to lane id. SDST is 64-bit.
925Bits for inactive threads in SDST are always zeroed. 
926Operation: 
927```
928UINT64 temp = (UINT64)SRC0 - (UINT64)SRC1
929VDST = temp
930SDST = 0
931UINT64 mask = (1ULL<<LANEID)
932SDST = (SDST&~mask) | ((temp>>32) ? mask : 0)
933```
934
935#### V_SUBB_U32
936
937Opcode VOP2: 41 (0x29) for GCN 1.0/1.1; 29 (0x1d) for GCN 1.2 
938Opcode VOP3B: 297 (0x129) for GCN 1.0/1.1; 285 (0x11d) for GCN 1.2 
939Syntax VOP2 GCN 1.0/1.1: V_SUBB_U32 VDST, VCC, SRC0, SRC1, VCC 
940Syntax VOP3B GCN 1.2: V_SUBB_U32 VDST, SDST(2), SRC0, SRC1, SSRC2(2) 
941Description: Subtract SRC1 with borrow from SRC0,
942and store result to VDST and store carry flag to SDST (or VCC) bit with number
943that equal to lane id. Borrow is stored in SSRC2 bit with number of lane id.
944SDST and SSRC2 are 64-bit. Bits for inactive threads in SDST are always zeroed. 
945Operation: 
946```
947UINT64 mask = (1ULL<<LANEID)
948UINT8 CC = ((SSRC2&mask) ? 1 : 0)
949UINT64 temp = (UINT64)SRC0 - (UINT64)SRC1 - CC
950SDST = 0
951VDST = temp
952SDST = (SDST&~mask) | ((temp >> 32) ? mask : 0)
953```
954
955#### V_SUBBREV_U32
956
957Opcode VOP2: 42 (0x2a) for GCN 1.0/1.1; 30 (0x1e) for GCN 1.2 
958Opcode VOP3B: 298 (0x12a) for GCN 1.0/1.1; 286 (0x11e) for GCN 1.2 
959Syntax VOP2 GCN 1.0/1.1: V_SUBBREV_U32 VDST, VCC, SRC0, SRC1, VCC 
960Syntax VOP3B GCN 1.2: V_SUBBREV_U32 VDST, SDST(2), SRC0, SRC1, SSRC2(2) 
961Description: Subtract SRC0 with borrow from SRC1,
962and store result to VDST and store carry flag to SDST (or VCC) bit with number
963that equal to lane id. Borrow is stored in SSRC2 bit with number of lane id.
964SDST and SSRC2 are 64-bit. Bits for inactive threads in SDST are always zeroed. 
965Operation: 
966```
967UINT64 mask = (1ULL<<LANEID)
968UINT8 CC = ((SSRC2&mask) ? 1 : 0)
969UINT64 temp = (UINT64)SRC1 - (UINT64)SRC0 - CC
970SDST = 0
971VDST = temp
972SDST = (SDST&~mask) | ((temp >> 32) ? mask : 0)
973```
974
975#### V_SUBREV_F16
976
977Opcode VOP2: 33 (0x21) for GCN 1.2 
978Opcode VOP3A: 289 (0x121) for GCN 1.2 
979Syntax: V_SUBREV_F16 VDST, SRC0, SRC1 
980Description: Subtract FP16 value of SRC0 from FP16 value of SRC1 and store result to VDST. 
981Operation: 
982```
983VDST = ASHALF(SRC1) - ASHALF(SRC0)
984```
985
986#### V_SUBREV_F32
987
988Opcode VOP2: 5 (0x5) for GCN 1.0/1.1; 2 (0x3) for GCN 1.2 
989Opcode VOP3A: 261 (0x105) for GCN 1.0/1.1; 259 (0x103) for GCN 1.2 
990Syntax: V_SUBREV_F32 VDST, SRC0, SRC1 
991Description: Subtract FP value of SRC0 from FP value of SRC1 and store result to VDST. 
992Operation: 
993```
994VDST = ASFLOAT(SRC1) - ASFLOAT(SRC0)
995```
996
997#### V_SUBREV_I32, V_SUBREV_U32
998
999Opcode VOP2: 39 (0x27) for GCN 1.0/1.1; 27 (0x1b) for GCN 1.2 
1000Opcode VOP3B: 295 (0x127) for GCN 1.0/1.1; 283 (0x11b) for GCN 1.2 
1001Syntax VOP2 GCN 1.0/1.1: V_SUBREV_I32 VDST, VCC, SRC0, SRC1 
1002Syntax VOP3B GCN 1.0/1.1: V_SUBREV_I32 VDST, SDST(2), SRC0, SRC1 
1003Syntax VOP2 GCN 1.2: V_SUBREV_U32 VDST, VCC, SRC0, SRC1 
1004Syntax VOP3B GCN 1.2: V_SUBREV_U32 VDST, SDST(2), SRC0, SRC1 
1005Description: Subtract SRC0 from SRC1 and store result to VDST and store borrow flag to
1006SDST (or VCC) bit with number that equal to lane id. SDST is 64-bit.
1007Bits for inactive threads in SDST are always zeroed. 
1008Operation: 
1009```
1010UINT64 temp = (UINT64)SRC1 - (UINT64)SRC0
1011VDST = temp
1012SDST = 0
1013UINT64 mask = (1ULL<<LANEID)
1014SDST = (SDST&~mask) | ((temp>>32) ? mask : 0)
1015```
1016
1017#### V_SUBREV_U16
1018
1019Opcode VOP2: 40 (0x28) for GCN 1.2 
1020Opcode VOP3A: 296 (0x128) for GCN 1.2 
1021Syntax: V_SUBREV_U16 VDST, SRC0, SRC1 
1022Description: Subtract unsigned 16-bit value of SRC0 from SRC1 and store
102316-bit unsigned result to VDST. 
1024Operation: 
1025```
1026VDST = (SRC1 - SRC0) & 0xffff
1027```
1028
1029#### V_XOR_B32
1030
1031Opcode: VOP2: 29 (0x1d) for GCN 1.0/1.1; 21 (0x15) for GCN 1.2 
1032Opcode: VOP3A: 285 (0x11d) for GCN 1.0/1.1; 277 (0x115) for GCN 1.2 
1033Syntax: V_XOR_B32 VDST, SRC0, SRC1 
1034Description: Do bitwise XOR operation on SRC0 and SRC1, store result to VDST.
1035CLAMP and OMOD modifier doesn't affect on result. 
1036Operation: 
1037```
1038VDST = SRC0 ^ SRC1
1039```
1040
1041#### V_WRITELANE_B32
1042
1043Opcode VOP2: 2 (0x2) for GCN 1.0/1.1 
1044Opcode VOP3A: 258 (0x102) for GCN 1.0/1.1 
1045Syntax: V_WRITELANE_B32 VDST, VSRC0, SSRC1 
1046Description: Copy SGPR to one lane of VDST. Lane choosen (thread id) from SSRC1&63.
1047SSRC1 can be SGPR or M0. Ignores EXEC mask. 
1048Operation: 
1049```
1050VDST[SSRC1 & 63] = SSRC0
1051```
Note: See TracBrowser for help on using the repository browser.