1 | ## GCN ISA VOP1/VOP3 instructions |
---|
2 | |
---|
3 | VOP1 instructions can be encoded in the VOP1 encoding and the VOP3A/VOP3B encoding. |
---|
4 | List of fields for VOP1 encoding: |
---|
5 | |
---|
6 | Bits | Name | Description |
---|
7 | ------|----------|------------------------------ |
---|
8 | 0-8 | SRC0 | First (scalar or vector) source operand |
---|
9 | 9-16 | OPCODE | Operation code |
---|
10 | 17-24 | VDST | Destination vector operand |
---|
11 | 25-31 | ENCODING | Encoding type. Must be 0b0111111 |
---|
12 | |
---|
13 | Syntax: INSTRUCTION VDST, SRC0 |
---|
14 | |
---|
15 | List of fields for VOP3A/VOP3B encoding (GCN 1.0/1.1): |
---|
16 | |
---|
17 | Bits | Name | Description |
---|
18 | ------|----------|------------------------------ |
---|
19 | 0-7 | VDST | Vector destination operand |
---|
20 | 8-10 | ABS | Absolute modifiers for source operands (VOP3A) |
---|
21 | 8-14 | SDST | Scalar destination operand (VOP3B) |
---|
22 | 11 | CLAMP | CLAMP modifier (VOP3A) |
---|
23 | 17-25 | OPCODE | Operation code |
---|
24 | 26-31 | ENCODING | Encoding type. Must be 0b110100 |
---|
25 | 32-40 | SRC0 | First (scalar or vector) source operand |
---|
26 | 41-49 | SRC1 | Second (scalar or vector) source operand |
---|
27 | 50-58 | SRC2 | Third (scalar or vector) source operand |
---|
28 | 59-60 | OMOD | OMOD modifier. Multiplication modifier |
---|
29 | 61-63 | NEG | Negation modifier for source operands |
---|
30 | |
---|
31 | List of fields for VOP3A/VOP3B encoding (GCN 1.2/1.4): |
---|
32 | |
---|
33 | Bits | Name | Description |
---|
34 | ------|----------|------------------------------ |
---|
35 | 0-7 | VDST | Destination vector operand |
---|
36 | 8-10 | ABS | Absolute modifiers for source operands (VOP3A) |
---|
37 | 8-14 | SDST | Scalar destination operand (VOP3B) |
---|
38 | 11-14 | OP_SEL | Operand selection (VOP3A) (GCN 1.4) |
---|
39 | 15 | CLAMP | CLAMP modifier |
---|
40 | 16-25 | OPCODE | Operation code |
---|
41 | 26-31 | ENCODING | Encoding type. Must be 0b110100 |
---|
42 | 32-40 | SRC0 | First (scalar or vector) source operand |
---|
43 | 41-49 | SRC1 | Second (scalar or vector) source operand |
---|
44 | 50-58 | SRC2 | Third (scalar or vector) source operand |
---|
45 | 59-60 | OMOD | OMOD modifier. Multiplication modifier |
---|
46 | 61-63 | NEG | Negation modifier for source operands |
---|
47 | |
---|
48 | Syntax: INSTRUCTION VDST, SRC0 [MODIFIERS] |
---|
49 | |
---|
50 | Modifiers: |
---|
51 | |
---|
52 | * CLAMP - clamps destination floating point value in range 0.0-1.0 |
---|
53 | * MUL:2, MUL:4, DIV:2 - OMOD modifiers. Multiply destination floating point value by |
---|
54 | 2.0, 4.0 or 0.5 respectively. Clamping applied after OMOD modifier. |
---|
55 | * -SRC - negate floating point value from source operand. Applied after ABS modifier. |
---|
56 | * ABS(SRC), |SRC| - apply absolute value to source operand |
---|
57 | * OP_SEL:VALUE|[B0,...] - operand half selection (0 - lower 16-bits, 1 - bits) |
---|
58 | |
---|
59 | NOTE: OMOD modifier doesn't work if output denormals are allowed |
---|
60 | (5 bit of MODE register for single precision or 7 bit for double precision). |
---|
61 | NOTE: OMOD and CLAMP modifier affects only for instruction that output is |
---|
62 | floating point value. |
---|
63 | NOTE: ABS and negation is applied to source operand for any instruction. |
---|
64 | |
---|
65 | Negation and absolute value can be combined: `-ABS(V0)`. Modifiers CLAMP and |
---|
66 | OMOD (MUL:2, MUL:4 and DIV:2) can be given in random order. |
---|
67 | |
---|
68 | Operand half selection (OP_SEL) take value with bits number depends of number operands. |
---|
69 | Last bit control destination operand. Zero in bit choose lower 16-bits in dword, |
---|
70 | one choose higher 16-bits. Example: op_sel:[0,1,1] - higher 16-bits in second source and |
---|
71 | in destination. List of bits of OP_SEL field: |
---|
72 | |
---|
73 | Bit | Operand | Description |
---|
74 | ----|---------|---------------------- |
---|
75 | 11 | SRC0 | Choose part of SRC0 (first source operand) |
---|
76 | 12 | SRC1 | Choose part of SRC1 (second source operand) |
---|
77 | 13 | SRC2 | Choose part of SRC2 (third source operand) |
---|
78 | 14 | VDST | Choose part of VDST (destination) |
---|
79 | |
---|
80 | Limitations for operands: |
---|
81 | |
---|
82 | * only one SGPR can be read by instruction. Multiple occurrences of this same |
---|
83 | SGPR is allowed |
---|
84 | * only one literal constant can be used, and only when a SGPR or M0 is not used in |
---|
85 | source operands |
---|
86 | * only SRC0 can holds LDS_DIRECT |
---|
87 | |
---|
88 | Unaligned pairs of SGPRs are allowed in source operands. |
---|
89 | |
---|
90 | VOP1 opcodes (0-127) are reflected in VOP3 in range: 384-511 for GCN 1.0/1.1 or |
---|
91 | 320-447 for GCN 1.2. |
---|
92 | |
---|
93 | List of the instructions by opcode (GCN 1.0/1.1): |
---|
94 | |
---|
95 | Opcode | Opcode(VOP3)|GCN 1.0|GCN 1.1| Mnemonic |
---|
96 | ------------|-------------|-------|-------|----------------------------- |
---|
97 | 0 (0x0) | 384 (0x180) | ✓ | ✓ | V_NOP |
---|
98 | 1 (0x1) | 385 (0x181) | ✓ | ✓ | V_MOV_B32 |
---|
99 | 2 (0x2) | 386 (0x182) | ✓ | ✓ | V_READFIRSTLANE_B32 |
---|
100 | 3 (0x3) | 387 (0x183) | ✓ | ✓ | V_CVT_I32_F64 |
---|
101 | 4 (0x4) | 388 (0x184) | ✓ | ✓ | V_CVT_F64_I32 |
---|
102 | 5 (0x5) | 389 (0x185) | ✓ | ✓ | V_CVT_F32_I32 |
---|
103 | 6 (0x6) | 390 (0x186) | ✓ | ✓ | V_CVT_F32_U32 |
---|
104 | 7 (0x7) | 391 (0x187) | ✓ | ✓ | V_CVT_U32_F32 |
---|
105 | 8 (0x8) | 392 (0x188) | ✓ | ✓ | V_CVT_I32_F32 |
---|
106 | 9 (0x9) | 393 (0x189) | ✓ | ✓ | V_MOV_FED_B32 |
---|
107 | 10 (0xa) | 394 (0x18a) | ✓ | ✓ | V_CVT_F16_F32 |
---|
108 | 11 (0xb) | 395 (0x18b) | ✓ | ✓ | V_CVT_F32_F16 |
---|
109 | 12 (0xc) | 396 (0x18c) | ✓ | ✓ | V_CVT_RPI_I32_F32 |
---|
110 | 13 (0xd) | 397 (0x18d) | ✓ | ✓ | V_CVT_FLR_I32_F32 |
---|
111 | 14 (0xe) | 398 (0x18e) | ✓ | ✓ | V_CVT_OFF_F32_I4 |
---|
112 | 15 (0xf) | 399 (0x18f) | ✓ | ✓ | V_CVT_F32_F64 |
---|
113 | 16 (0x10) | 400 (0x190) | ✓ | ✓ | V_CVT_F64_F32 |
---|
114 | 17 (0x11) | 401 (0x191) | ✓ | ✓ | V_CVT_F32_UBYTE0 |
---|
115 | 18 (0x12) | 402 (0x192) | ✓ | ✓ | V_CVT_F32_UBYTE1 |
---|
116 | 19 (0x13) | 403 (0x193) | ✓ | ✓ | V_CVT_F32_UBYTE2 |
---|
117 | 20 (0x14) | 404 (0x194) | ✓ | ✓ | V_CVT_F32_UBYTE3 |
---|
118 | 21 (0x15) | 405 (0x195) | ✓ | ✓ | V_CVT_U32_F64 |
---|
119 | 22 (0x16) | 406 (0x196) | ✓ | ✓ | V_CVT_F64_U32 |
---|
120 | 23 (0x17) | 407 (0x197) | | ✓ | V_TRUNC_F64 |
---|
121 | 24 (0x18) | 408 (0x198) | | ✓ | V_CEIL_F64 |
---|
122 | 25 (0x19) | 409 (0x199) | | ✓ | V_RNDNE_F64 |
---|
123 | 26 (0x1a) | 410 (0x19a) | | ✓ | V_FLOOR_F64 |
---|
124 | 32 (0x20) | 416 (0x1a0) | ✓ | ✓ | V_FRACT_F32 |
---|
125 | 33 (0x21) | 417 (0x1a1) | ✓ | ✓ | V_TRUNC_F32 |
---|
126 | 34 (0x22) | 418 (0x1a2) | ✓ | ✓ | V_CEIL_F32 |
---|
127 | 35 (0x23) | 419 (0x1a3) | ✓ | ✓ | V_RNDNE_F32 |
---|
128 | 36 (0x24) | 420 (0x1a4) | ✓ | ✓ | V_FLOOR_F32 |
---|
129 | 37 (0x25) | 421 (0x1a5) | ✓ | ✓ | V_EXP_F32 |
---|
130 | 38 (0x26) | 422 (0x1a6) | ✓ | ✓ | V_LOG_CLAMP_F32 |
---|
131 | 39 (0x27) | 423 (0x1a7) | ✓ | ✓ | V_LOG_F32 |
---|
132 | 40 (0x28) | 424 (0x1a8) | ✓ | ✓ | V_RCP_CLAMP_F32 |
---|
133 | 41 (0x29) | 425 (0x1a9) | ✓ | ✓ | V_RCP_LEGACY_F32 |
---|
134 | 42 (0x2a) | 426 (0x1aa) | ✓ | ✓ | V_RCP_F32 |
---|
135 | 43 (0x2b) | 427 (0x1ab) | ✓ | ✓ | V_RCP_IFLAG_F32 |
---|
136 | 44 (0x2c) | 428 (0x1ac) | ✓ | ✓ | V_RSQ_CLAMP_F32 |
---|
137 | 45 (0x2d) | 429 (0x1ad) | ✓ | ✓ | V_RSQ_LEGACY_F32 |
---|
138 | 46 (0x2e) | 430 (0x1ae) | ✓ | ✓ | V_RSQ_F32 |
---|
139 | 47 (0x2f) | 431 (0x1af) | ✓ | ✓ | V_RCP_F64 |
---|
140 | 48 (0x30) | 432 (0x1b0) | ✓ | ✓ | V_RCP_CLAMP_F64 |
---|
141 | 49 (0x31) | 433 (0x1b1) | ✓ | ✓ | V_RSQ_F64 |
---|
142 | 50 (0x32) | 434 (0x1b2) | ✓ | ✓ | V_RSQ_CLAMP_F64 |
---|
143 | 51 (0x33) | 435 (0x1b3) | ✓ | ✓ | V_SQRT_F32 |
---|
144 | 52 (0x34) | 436 (0x1b4) | ✓ | ✓ | V_SQRT_F64 |
---|
145 | 53 (0x35) | 437 (0x1b5) | ✓ | ✓ | V_SIN_F32 |
---|
146 | 54 (0x36) | 438 (0x1b6) | ✓ | ✓ | V_COS_F32 |
---|
147 | 55 (0x37) | 439 (0x1b7) | ✓ | ✓ | V_NOT_B32 |
---|
148 | 56 (0x38) | 440 (0x1b8) | ✓ | ✓ | V_BFREV_B32 |
---|
149 | 57 (0x39) | 441 (0x1b9) | ✓ | ✓ | V_FFBH_U32 |
---|
150 | 58 (0x3a) | 442 (0x1ba) | ✓ | ✓ | V_FFBL_B32 |
---|
151 | 59 (0x3b) | 443 (0x1bb) | ✓ | ✓ | V_FFBH_I32 |
---|
152 | 60 (0x3c) | 444 (0x1bc) | ✓ | ✓ | V_FREXP_EXP_I32_F64 |
---|
153 | 61 (0x3d) | 445 (0x1bd) | ✓ | ✓ | V_FREXP_MANT_F64 |
---|
154 | 62 (0x3e) | 446 (0x1be) | ✓ | ✓ | V_FRACT_F64 |
---|
155 | 63 (0x3f) | 447 (0x1bf) | ✓ | ✓ | V_FREXP_EXP_I32_F32 |
---|
156 | 64 (0x40) | 448 (0x1c0) | ✓ | ✓ | V_FREXP_MANT_F32 |
---|
157 | 65 (0x41) | 449 (0x1c1) | ✓ | ✓ | V_CLREXCP |
---|
158 | 66 (0x42) | 450 (0x1c2) | ✓ | ✓ | V_MOVRELD_B32 |
---|
159 | 67 (0x43) | 451 (0x1c3) | ✓ | ✓ | V_MOVRELS_B32 |
---|
160 | 68 (0x44) | 452 (0x1c4) | ✓ | ✓ | V_MOVRELSD_B32 |
---|
161 | 69 (0x45) | 453 (0x1c5) | | ✓ | V_LOG_LEGACY_F32 |
---|
162 | 70 (0x46) | 454 (0x1c6) | | ✓ | V_EXP_LEGACY_F32 |
---|
163 | |
---|
164 | List of the instructions by opcode (GCN 1.2/1.4): |
---|
165 | |
---|
166 | Opcode | Opcode(VOP3)| Mnemonic (GCN 1.2) | Mnemonic (GCN 1.4) |
---|
167 | ------------|-------------|---------------------|------------------------ |
---|
168 | 0 (0x0) | 320 (0x140) | V_NOP | V_NOP |
---|
169 | 1 (0x1) | 321 (0x141) | V_MOV_B32 | V_MOV_B32 |
---|
170 | 2 (0x2) | 322 (0x142) | V_READFIRSTLANE_B32 | V_READFIRSTLANE_B32 |
---|
171 | 3 (0x3) | 323 (0x143) | V_CVT_I32_F64 | V_CVT_I32_F64 |
---|
172 | 4 (0x4) | 324 (0x144) | V_CVT_F64_I32 | V_CVT_F64_I32 |
---|
173 | 5 (0x5) | 325 (0x145) | V_CVT_F32_I32 | V_CVT_F32_I32 |
---|
174 | 6 (0x6) | 326 (0x146) | V_CVT_F32_U32 | V_CVT_F32_U32 |
---|
175 | 7 (0x7) | 327 (0x147) | V_CVT_U32_F32 | V_CVT_U32_F32 |
---|
176 | 8 (0x8) | 328 (0x148) | V_CVT_I32_F32 | V_CVT_I32_F32 |
---|
177 | 9 (0x9) | 329 (0x149) | V_MOV_FED_B32 | V_MOV_FED_B32 |
---|
178 | 10 (0xa) | 330 (0x14a) | V_CVT_F16_F32 | V_CVT_F16_F32 |
---|
179 | 11 (0xb) | 331 (0x14b) | V_CVT_F32_F16 | V_CVT_F32_F16 |
---|
180 | 12 (0xc) | 332 (0x14c) | V_CVT_RPI_I32_F32 | V_CVT_RPI_I32_F32 |
---|
181 | 13 (0xd) | 333 (0x14d) | V_CVT_FLR_I32_F32 | V_CVT_FLR_I32_F32 |
---|
182 | 14 (0xe) | 334 (0x14e) | V_CVT_OFF_F32_I4 | V_CVT_OFF_F32_I4 |
---|
183 | 15 (0xf) | 335 (0x14f) | V_CVT_F32_F64 | V_CVT_F32_F64 |
---|
184 | 16 (0x10) | 336 (0x150) | V_CVT_F64_F32 | V_CVT_F64_F32 |
---|
185 | 17 (0x11) | 337 (0x151) | V_CVT_F32_UBYTE0 | V_CVT_F32_UBYTE0 |
---|
186 | 18 (0x12) | 338 (0x152) | V_CVT_F32_UBYTE1 | V_CVT_F32_UBYTE1 |
---|
187 | 19 (0x13) | 339 (0x153) | V_CVT_F32_UBYTE2 | V_CVT_F32_UBYTE2 |
---|
188 | 20 (0x14) | 340 (0x154) | V_CVT_F32_UBYTE3 | V_CVT_F32_UBYTE3 |
---|
189 | 21 (0x15) | 341 (0x155) | V_CVT_U32_F64 | V_CVT_U32_F64 |
---|
190 | 22 (0x16) | 342 (0x156) | V_CVT_F64_U32 | V_CVT_F64_U32 |
---|
191 | 23 (0x17) | 343 (0x157) | V_TRUNC_F64 | V_TRUNC_F64 |
---|
192 | 24 (0x18) | 344 (0x158) | V_CEIL_F64 | V_CEIL_F64 |
---|
193 | 25 (0x19) | 345 (0x159) | V_RNDNE_F64 | V_RNDNE_F64 |
---|
194 | 26 (0x1a) | 346 (0x15a) | V_FLOOR_F64 | V_FLOOR_F64 |
---|
195 | 27 (0x1b) | 347 (0x15b) | V_FRACT_F32 | V_FRACT_F32 |
---|
196 | 28 (0x1c) | 348 (0x15c) | V_TRUNC_F32 | V_TRUNC_F32 |
---|
197 | 29 (0x1d) | 349 (0x15d) | V_CEIL_F32 | V_CEIL_F32 |
---|
198 | 30 (0x1e) | 350 (0x15e) | V_RNDNE_F32 | V_RNDNE_F32 |
---|
199 | 31 (0x1f) | 351 (0x15f) | V_FLOOR_F32 | V_FLOOR_F32 |
---|
200 | 32 (0x20) | 352 (0x160) | V_EXP_F32 | V_EXP_F32 |
---|
201 | 33 (0x21) | 353 (0x161) | V_LOG_F32 | V_LOG_F32 |
---|
202 | 34 (0x22) | 354 (0x162) | V_RCP_F32 | V_RCP_F32 |
---|
203 | 35 (0x23) | 355 (0x163) | V_RCP_IFLAG_F32 | V_RCP_IFLAG_F32 |
---|
204 | 36 (0x24) | 356 (0x164) | V_RSQ_F32 | V_RSQ_F32 |
---|
205 | 37 (0x25) | 357 (0x165) | V_RCP_F64 | V_RCP_F64 |
---|
206 | 38 (0x26) | 358 (0x166) | V_RSQ_F64 | V_RSQ_F64 |
---|
207 | 39 (0x27) | 359 (0x167) | V_SQRT_F32 | V_SQRT_F32 |
---|
208 | 40 (0x28) | 360 (0x168) | V_SQRT_F64 | V_SQRT_F64 |
---|
209 | 41 (0x29) | 361 (0x169) | V_SIN_F32 | V_SIN_F32 |
---|
210 | 42 (0x2a) | 362 (0x16a) | V_COS_F32 | V_COS_F32 |
---|
211 | 43 (0x2b) | 363 (0x16b) | V_NOT_B32 | V_NOT_B32 |
---|
212 | 44 (0x2c) | 364 (0x16c) | V_BFREV_B32 | V_BFREV_B32 |
---|
213 | 45 (0x2d) | 365 (0x16d) | V_FFBH_U32 | V_FFBH_U32 |
---|
214 | 46 (0x2e) | 366 (0x16e) | V_FFBL_B32 | V_FFBL_B32 |
---|
215 | 47 (0x2f) | 367 (0x16f) | V_FFBH_I32 | V_FFBH_I32 |
---|
216 | 48 (0x30) | 368 (0x170) | V_FREXP_EXP_I32_F64 | V_FREXP_EXP_I32_F64 |
---|
217 | 49 (0x31) | 369 (0x171) | V_FREXP_MANT_F64 | V_FREXP_MANT_F64 |
---|
218 | 50 (0x32) | 370 (0x172) | V_FRACT_F64 | V_FRACT_F64 |
---|
219 | 51 (0x33) | 371 (0x173) | V_FREXP_EXP_I32_F32 | V_FREXP_EXP_I32_F32 |
---|
220 | 52 (0x34) | 372 (0x174) | V_FREXP_MANT_F32 | V_FREXP_MANT_F32 |
---|
221 | 53 (0x35) | 373 (0x175) | V_CLREXCP | V_CLREXCP |
---|
222 | 54 (0x36) | 374 (0x176) | V_MOVRELD_B32 | V_MOV_PRSV_B32 |
---|
223 | 55 (0x37) | 375 (0x177) | V_MOVRELS_B32 | V_SCREEN_PARTITION_4SE_B32 |
---|
224 | 56 (0x38) | 376 (0x178) | V_MOVRELSD_B32 | -- |
---|
225 | 57 (0x39) | 377 (0x179) | V_CVT_F16_U16 | V_CVT_F16_U16 |
---|
226 | 58 (0x3a) | 378 (0x17a) | V_CVT_F16_I16 | V_CVT_F16_I16 |
---|
227 | 59 (0x3b) | 379 (0x17b) | V_CVT_U16_F16 | V_CVT_U16_F16 |
---|
228 | 60 (0x3c) | 380 (0x17c) | V_CVT_I16_F16 | V_CVT_I16_F16 |
---|
229 | 61 (0x3d) | 381 (0x17d) | V_RCP_F16 | V_RCP_F16 |
---|
230 | 62 (0x3e) | 382 (0x17e) | V_SQRT_F16 | V_SQRT_F16 |
---|
231 | 63 (0x3f) | 383 (0x17f) | V_RSQ_F16 | V_RSQ_F16 |
---|
232 | 64 (0x40) | 384 (0x180) | V_LOG_F16 | V_LOG_F16 |
---|
233 | 65 (0x41) | 385 (0x181) | V_EXP_F16 | V_EXP_F16 |
---|
234 | 66 (0x42) | 386 (0x182) | V_FREXP_MANT_F16 | V_FREXP_MANT_F16 |
---|
235 | 67 (0x43) | 387 (0x183) | V_FREXP_EXP_I16_F16 | V_FREXP_EXP_I16_F16 |
---|
236 | 68 (0x44) | 388 (0x184) | V_FLOOR_F16 | V_FLOOR_F16 |
---|
237 | 69 (0x45) | 389 (0x185) | V_CEIL_F16 | V_CEIL_F16 |
---|
238 | 70 (0x46) | 390 (0x186) | V_TRUNC_F16 | V_TRUNC_F16 |
---|
239 | 71 (0x47) | 391 (0x187) | V_RNDNE_F16 | V_RNDNE_F16 |
---|
240 | 72 (0x48) | 392 (0x188) | V_FRACT_F16 | V_FRACT_F16 |
---|
241 | 73 (0x49) | 393 (0x189) | V_SIN_F16 | V_SIN_F16 |
---|
242 | 74 (0x4a) | 394 (0x18a) | V_COS_F16 | V_COS_F16 |
---|
243 | 75 (0x4b) | 395 (0x18b) | V_EXP_LEGACY_F32 | V_EXP_LEGACY_F32 |
---|
244 | 76 (0x4c) | 396 (0x18c) | V_LOG_LEGACY_F32 | V_LOG_LEGACY_F32 |
---|
245 | 77 (0x4d) | 397 (0x18d) | -- | V_CVT_NORM_I16_F16 |
---|
246 | 78 (0x4e) | 398 (0x18e) | -- | V_CVT_NORM_U16_F16 |
---|
247 | 79 (0x4f) | 399 (0x18f) | -- | V_SAT_PK_U8_I16 |
---|
248 | 80 (0x50) | 400 (0x190 | -- | V_WRITELANE_REGWR_B32 |
---|
249 | 81 (0x51) | 401 (0x191) | -- | V_SWAP_B32 |
---|
250 | |
---|
251 | ### Instruction set |
---|
252 | |
---|
253 | Alphabetically sorted instruction list: |
---|
254 | |
---|
255 | #### V_BFREV_B32 |
---|
256 | |
---|
257 | Opcode VOP1: 56 (0x38) for GCN 1.0/1.1; 44 (0x2c) for GCN 1.2 |
---|
258 | Opcode VOP3A: 440 (0x1b8) for GCN 1.0/1.1; 364 (0x16c) for GCN 1.2 |
---|
259 | Syntax: V_BFREV_B32 VDST, SRC0 |
---|
260 | Reverse bits in SRC0 and store result to VDST. |
---|
261 | Operation: |
---|
262 | ``` |
---|
263 | VDST = REVBIT(SRC0) |
---|
264 | ``` |
---|
265 | |
---|
266 | #### V_CEIL_F16 |
---|
267 | |
---|
268 | Opcode VOP1: 69 (0x45) for GCN 1.2 |
---|
269 | Opcode VOP3A: 389 (0x185) for GCN 1.2 |
---|
270 | Syntax: V_CEIL_F16 VDST, SRC0 |
---|
271 | Description: Truncate half floating point value from SRC0 with rounding to positive infinity |
---|
272 | (ceilling), and store result to VDST. Implemented by flooring. |
---|
273 | If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
274 | Operation: |
---|
275 | ``` |
---|
276 | HALF F = FLOOR(ASHALF(SRC0)) |
---|
277 | if (ASHALF(SRC0) > 0.0 && ASHALF(SRC0) != F) |
---|
278 | F += 1.0 |
---|
279 | VDST = F |
---|
280 | ``` |
---|
281 | |
---|
282 | #### V_CEIL_F32 |
---|
283 | |
---|
284 | Opcode VOP1: 34 (0x22) for GCN 1.0/1.1; 29 (0x1d) for GCN 1.2 |
---|
285 | Opcode VOP3A: 418 (0x1a2) for GCN 1.0/1.1; 349 (0x15d) for GCN 1.2 |
---|
286 | Syntax: V_CEIL_F32 VDST, SRC0 |
---|
287 | Description: Truncate floating point value from SRC0 with rounding to positive infinity |
---|
288 | (ceilling), and store result to VDST. Implemented by flooring. |
---|
289 | If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
290 | Operation: |
---|
291 | ``` |
---|
292 | FLOAT F = FLOOR(ASFLOAT(SRC0)) |
---|
293 | if (ASFLOAT(SRC0) > 0.0 && ASFLOAT(SRC0) != F) |
---|
294 | F += 1.0 |
---|
295 | VDST = F |
---|
296 | ``` |
---|
297 | |
---|
298 | #### V_CEIL_F64 |
---|
299 | |
---|
300 | Opcode VOP1: 24 (0x18) for GCN 1.1/1.2 |
---|
301 | Opcode VOP3A: 408 (0x198) for GCN 1.1; 344 (0x158) for GCN 1.2 |
---|
302 | Syntax: V_CEIL_F64 VDST(2), SRC0(2) |
---|
303 | Description: Truncate double floating point value from SRC0 with rounding to |
---|
304 | positive infinity (ceilling), and store result to VDST. Implemented by flooring. |
---|
305 | If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
306 | Operation: |
---|
307 | ``` |
---|
308 | DOUBLE F = FLOOR(ASDOUBLE(SRC0)) |
---|
309 | if (ASDOUBLE(SRC0) > 0.0 && ASDOUBLE(SRC0) != F) |
---|
310 | F += 1.0 |
---|
311 | VDST = F |
---|
312 | ``` |
---|
313 | |
---|
314 | #### V_CLREXCP |
---|
315 | |
---|
316 | Opcode VOP1: 65 (0x41) for GCN 1.0/1.1; 53 (0x35) for GCN 1.2 |
---|
317 | Opcode VOP3A: 449 (0x1c1) for GCN 1.0/1.1; 373 (0x175) for GCN 1.2 |
---|
318 | Syntax: V_CLREXCP |
---|
319 | Description: Clear wave's exception state in SIMD. |
---|
320 | |
---|
321 | #### V_COS_F16 |
---|
322 | |
---|
323 | Opcode VOP1: 74 (0x4a) for GCN 1.2 |
---|
324 | Opcode VOP3A: 394 (0x18a) for GCN 1.2 |
---|
325 | Syntax: V_COS_F16 VDST, SRC0 |
---|
326 | Description: Compute cosine of half FP value from SRC0. |
---|
327 | Input value must be normalized to range 1.0 - 1.0 (-360 degree : 360 degree). |
---|
328 | If SRC0 value is out of range then store 1.0 to VDST. |
---|
329 | If SRC0 value is infinity, store -NAN to VDST. |
---|
330 | Operation: |
---|
331 | ``` |
---|
332 | FLOAT SF = ASHALF(SRC0) |
---|
333 | VDST = 1.0 |
---|
334 | if (SF >= -1.0 && SF <= 1.0) |
---|
335 | VDST = APPROX_COS(SF) |
---|
336 | else if (ABS(SF)==INF_H) |
---|
337 | VDST = -NAN_H |
---|
338 | else if (ISNAN(SF)) |
---|
339 | VDST = SRC0 |
---|
340 | ``` |
---|
341 | |
---|
342 | #### V_COS_F32 |
---|
343 | |
---|
344 | Opcode VOP1: 54 (0x36) for GCN 1.0/1.1; 42 (0x2a) for GCN 1.2 |
---|
345 | Opcode VOP3A: 438 (0x1b6) for GCN 1.0/1.1; 362 (0x16a) for GCN 1.2 |
---|
346 | Syntax: V_COS_F32 VDST, SRC0 |
---|
347 | Description: Compute cosine of FP value from SRC0. Input value must be normalized to range |
---|
348 | 1.0 - 1.0 (-360 degree : 360 degree). If SRC0 value is out of range then store 1.0 to VDST. |
---|
349 | If SRC0 value is infinity, store -NAN to VDST. |
---|
350 | Operation: |
---|
351 | ``` |
---|
352 | FLOAT SF = ASFLOAT(SRC0) |
---|
353 | VDST = 1.0 |
---|
354 | if (SF >= -1.0 && SF <= 1.0) |
---|
355 | VDST = APPROX_COS(SF) |
---|
356 | else if (ABS(SF)==INF) |
---|
357 | VDST = -NAN |
---|
358 | else if (ISNAN(SF)) |
---|
359 | VDST = SRC0 |
---|
360 | ``` |
---|
361 | |
---|
362 | #### V_CVT_F16_F32 |
---|
363 | |
---|
364 | Opcode VOP1: 10 (0xa) |
---|
365 | Opcode VOP3A: 394 (0x18a) for GCN 1.0/1.1; 330 (0x14a) for GCN 1.2 |
---|
366 | Syntax: V_CVT_F16_F32 VDST, SRC0 |
---|
367 | Description: Convert single FP value to half floating point value with rounding from |
---|
368 | MODE register (single FP rounding mode for GCN 1.0, double FP rounding modefor GCN 1.2), |
---|
369 | and store result to VDST. If absolute value is too high, then store -/+infinity to VDST. |
---|
370 | In GCN 1.2 flushing denormals controlled by MODE. In GCN 1.0/1.1, denormals are enabled. |
---|
371 | Operation: |
---|
372 | ``` |
---|
373 | VDST = CVTHALF(ASFLOAT(SRC0)) |
---|
374 | ``` |
---|
375 | |
---|
376 | #### V_CVT_F16_I16 |
---|
377 | |
---|
378 | Opcode: VOP1: 58 (0x3a) for GCN 1.2 |
---|
379 | Opcode VOP3A: 378 (0x17a) for GCN 1.2 |
---|
380 | Syntax: V_CVT_F16_I16 VDST, SRC0 |
---|
381 | Description: Convert 16-bit signed valut to half floating point value. |
---|
382 | Operation: |
---|
383 | ``` |
---|
384 | VDST = (HALF)(INT16)SRC0 |
---|
385 | ``` |
---|
386 | |
---|
387 | #### V_CVT_F16_U16 |
---|
388 | |
---|
389 | Opcode: VOP1: 57 (0x39) for GCN 1.2 |
---|
390 | Opcode VOP3A: 377 (0x179) for GCN 1.2 |
---|
391 | Syntax: V_CVT_F16_U16 VDST, SRC0 |
---|
392 | Description: Convert 16-bit unsigned valut to half floating point value. |
---|
393 | Operation: |
---|
394 | ``` |
---|
395 | VDST = (HALF)(SRC0&0xffff) |
---|
396 | ``` |
---|
397 | |
---|
398 | #### V_CVT_F32_F16 |
---|
399 | |
---|
400 | Opcode VOP1: 11 (0xb) |
---|
401 | Opcode VOP3A: 395 (0x18b) for GCN 1.0/1.1; 331 (0x14b) for GCN 1.2 |
---|
402 | Syntax: V_CVT_F32_F16 VDST, SRC0 |
---|
403 | Description: Convert half FP value to single FP value, and store result to VDST. |
---|
404 | **By default, immediate is in FP32 format!**. |
---|
405 | In GCN 1.2 flushing denormals controlled by MODE. In GCN 1.0/1.1, denormals are enabled. |
---|
406 | Operation: |
---|
407 | ``` |
---|
408 | VDST = (FLOAT)(ASHALF(SRC0)) |
---|
409 | ``` |
---|
410 | |
---|
411 | #### V_CVT_F32_F64 |
---|
412 | |
---|
413 | Opcode VOP1: 15 (0xf) |
---|
414 | Opcode VOP3A: 399 (0x18f) for GCN 1.0/1.1; 335 (0x14f) for GCN 1.2 |
---|
415 | Syntax: V_CVT_F32_F64 VDST, SRC0(2) |
---|
416 | Description: Convert double FP value to single floating point value with rounding from |
---|
417 | MODE register (single FP rounding mode), and store result to VDST. |
---|
418 | If absolute value is too high, then store -/+infinity to VDST. |
---|
419 | Operation: |
---|
420 | ``` |
---|
421 | VDST = CVTHALF(ASDOUBLE(SRC0)) |
---|
422 | ``` |
---|
423 | |
---|
424 | #### V_CVT_F32_I32 |
---|
425 | |
---|
426 | Opcode VOP1: 5 (0x5) |
---|
427 | Opcode VOP3A: 389 (0x185) for GCN 1.0/1.1; 325 (0x145) for GCN 1.2 |
---|
428 | Syntax: V_CVT_F32_I32 VDST, SRC0 |
---|
429 | Description: Convert signed 32-bit integer to single FP value, and store it to VDST. |
---|
430 | Operation: |
---|
431 | ``` |
---|
432 | VDST = (FLOAT)(INT32)SRC0 |
---|
433 | ``` |
---|
434 | |
---|
435 | #### V_CVT_F32_U32 |
---|
436 | |
---|
437 | Opcode VOP1: 6 (0x6) |
---|
438 | Opcode VOP3A: 390 (0x186) for GCN 1.0/1.1; 326 (0x146) for GCN 1.2 |
---|
439 | Syntax: V_CVT_F32_U32 VDST, SRC0 |
---|
440 | Description: Convert unsigned 32-bit integer to single FP value, and store it to VDST. |
---|
441 | Operation: |
---|
442 | ``` |
---|
443 | VDST = (FLOAT)SRC0 |
---|
444 | ``` |
---|
445 | |
---|
446 | #### V_CVT_F32_UBYTE0 |
---|
447 | |
---|
448 | Opcode VOP1: 17 (0x11) |
---|
449 | Opcode VOP3A: 401 (0x191) for GCN 1.0/1.1; 337 (0x151) for GCN 1.2 |
---|
450 | Syntax: V_CVT_F32_UBYTE0 VDST, SRC0 |
---|
451 | Description: Convert the first unsigned 8-bit byte from SRC0 to single FP value, |
---|
452 | and store it to VDST. |
---|
453 | Operation: |
---|
454 | ``` |
---|
455 | VDST = (FLOAT)(SRC0 & 0xff) |
---|
456 | ``` |
---|
457 | |
---|
458 | #### V_CVT_F32_UBYTE1 |
---|
459 | |
---|
460 | Opcode VOP1: 18 (0x12) |
---|
461 | Opcode VOP3A: 402 (0x192) for GCN 1.0/1.1; 338 (0x152) for GCN 1.2 |
---|
462 | Syntax: V_CVT_F32_UBYTE1 VDST, SRC0 |
---|
463 | Description: Convert the second unsigned 8-bit byte from SRC0 to single FP value, |
---|
464 | and store it to VDST. |
---|
465 | Operation: |
---|
466 | ``` |
---|
467 | VDST = (FLOAT)((SRC0>>8) & 0xff) |
---|
468 | ``` |
---|
469 | |
---|
470 | #### V_CVT_F32_UBYTE2 |
---|
471 | |
---|
472 | Opcode VOP1: 19 (0x13) |
---|
473 | Opcode VOP3A: 403 (0x193) for GCN 1.0/1.1; 339 (0x153) for GCN 1.2 |
---|
474 | Syntax: V_CVT_F32_UBYTE2 VDST, SRC0 |
---|
475 | Description: Convert the third unsigned 8-bit byte from SRC0 to single FP value, |
---|
476 | and store it to VDST. |
---|
477 | Operation: |
---|
478 | ``` |
---|
479 | VDST = (FLOAT)((SRC0>>16) & 0xff) |
---|
480 | ``` |
---|
481 | |
---|
482 | #### V_CVT_F32_UBYTE3 |
---|
483 | |
---|
484 | Opcode VOP1: 20 (0x14) |
---|
485 | Opcode VOP3A: 404 (0x194) for GCN 1.0/1.1; 340 (0x154) for GCN 1.2 |
---|
486 | Syntax: V_CVT_F32_UBYTE3 VDST, SRC0 |
---|
487 | Description: Convert the fourth unsigned 8-bit byte from SRC0 to single FP value, |
---|
488 | and store it to VDST. |
---|
489 | Operation: |
---|
490 | ``` |
---|
491 | VDST = (FLOAT)(SRC0>>24) |
---|
492 | ``` |
---|
493 | |
---|
494 | #### V_CVT_F64_F32 |
---|
495 | |
---|
496 | Opcode VOP1: 16 (0x10) |
---|
497 | Opcode VOP3A: 400 (0x190) for GCN 1.0/1.1; 336 (0x150) for GCN 1.2 |
---|
498 | Syntax: V_CVT_F64_F32 VDST(2), SRC0 |
---|
499 | Description: Convert single FP value to double FP value, and store result to VDST. |
---|
500 | Operation: |
---|
501 | ``` |
---|
502 | VDST = (DOUBLE)(ASFLOAT(SRC0)) |
---|
503 | ``` |
---|
504 | |
---|
505 | #### V_CVT_F64_I32 |
---|
506 | |
---|
507 | Opcode VOP1: 4 (0x4) |
---|
508 | Opcode VOP3A: 388 (0x184) for GCN 1.0/1.1; 324 (0x144) for GCN 1.2 |
---|
509 | Syntax: V_CVT_F64_I32 VDST(2), SRC0 |
---|
510 | Description: Convert signed 32-bit integer to double FP value, and store it to VDST. |
---|
511 | Operation: |
---|
512 | ``` |
---|
513 | VDST = (DOUBLE)(INT32)SRC0 |
---|
514 | ``` |
---|
515 | |
---|
516 | #### V_CVT_F64_U32 |
---|
517 | |
---|
518 | Opcode VOP1: 22 (0x16) |
---|
519 | Opcode VOP3A: 406 (0x196) for GCN 1.0/1.1; 342 (0x156) for GCN 1.2 |
---|
520 | Syntax: V_CVT_F64_U32 VDST(2), SRC0 |
---|
521 | Description: Convert unsigned 32-bit integer to double FP value, and store it to VDST. |
---|
522 | Operation: |
---|
523 | ``` |
---|
524 | VDST = (DOUBLE)SRC0 |
---|
525 | ``` |
---|
526 | |
---|
527 | #### V_CVT_FLR_I32_F32 |
---|
528 | |
---|
529 | Opcode VOP1: 13 (0xd) |
---|
530 | Opcode VOP3A: 397 (0x18d) for GCN 1.0/1.1; 333 (0x14d) for GCN 1.2 |
---|
531 | Syntax: V_CVT_FLR_I32_F32 VDST, SRC0 |
---|
532 | Description: Convert 32-bit floating point value from SRC0 to signed 32-bit integer, and |
---|
533 | store result to VDST. Conversion uses rounding to negative infinity (floor). |
---|
534 | If value is higher/lower than maximal/minimal integer then store MAX_INT32/MIN_INT32 to VDST. |
---|
535 | If input value is NaN/-NaN then store MAX_INT32/MIN_INT32 to VDST. |
---|
536 | Operation: |
---|
537 | ``` |
---|
538 | FLOAT SF = ASFLOAT(SF) |
---|
539 | if (!ISNAN(SF)) |
---|
540 | VDST = (INT32)MAX(MIN(FLOOR(SF), 2147483647.0), -2147483648.0) |
---|
541 | else |
---|
542 | VDST = (INT32)SF>=0 ? 2147483647 : -2147483648 |
---|
543 | ``` |
---|
544 | |
---|
545 | #### V_CVT_I16_F16 |
---|
546 | |
---|
547 | Opcode VOP1: 60 (0x3c) |
---|
548 | Opcode VOP3A: 380 (0x17c) for GCN 1.2 |
---|
549 | Syntax: V_CVT_I16_F16 VDST, SRC0 |
---|
550 | Description: Convert 16-bit floating point value from SRC0 to signed 16-bit integer, and |
---|
551 | store result to VDST. Conversion uses rounding to zero. If value is higher/lower than |
---|
552 | maximal/minimal integer then store MAX_INT16/MIN_INT16 to VDST. |
---|
553 | If input value is NaN then store 0 to VDST. |
---|
554 | Operation: |
---|
555 | ``` |
---|
556 | VDST = 0 |
---|
557 | if (!ISNAN(ASHALF(SRC0))) |
---|
558 | VDST = (INT16)MAX(MIN(RNDTZINT(ASHALF(SRC0)), 32767.0), -32768.0) |
---|
559 | ``` |
---|
560 | |
---|
561 | #### V_CVT_I32_F32 |
---|
562 | |
---|
563 | Opcode VOP1: 8 (0x8) |
---|
564 | Opcode VOP3A: 392 (0x188) for GCN 1.0/1.1; 328 (0x148) for GCN 1.2 |
---|
565 | Syntax: V_CVT_I32_F32 VDST, SRC0 |
---|
566 | Description: Convert 32-bit floating point value from SRC0 to signed 32-bit integer, and |
---|
567 | store result to VDST. Conversion uses rounding to zero. If value is higher/lower than |
---|
568 | maximal/minimal integer then store MAX_INT32/MIN_INT32 to VDST. |
---|
569 | If input value is NaN then store 0 to VDST. |
---|
570 | Operation: |
---|
571 | ``` |
---|
572 | VDST = 0 |
---|
573 | if (!ISNAN(ASFLOAT(SRC0))) |
---|
574 | VDST = (INT32)MAX(MIN(RNDTZINT(ASFLOAT(SRC0)), 2147483647.0), -2147483648.0) |
---|
575 | ``` |
---|
576 | |
---|
577 | #### V_CVT_I32_F64 |
---|
578 | |
---|
579 | Opcode VOP1: 3 (0x3) |
---|
580 | Opcode VOP3A: 387 (0x183) for GCN 1.0/1.1; 323 (0x143) for GCN 1.2 |
---|
581 | Syntax: V_CVT_I32_F64 VDST, SRC0(2) |
---|
582 | Description: Convert 64-bit floating point value from SRC0 to signed 32-bit integer, and |
---|
583 | store result to VDST. Conversion uses rounding to zero. If value is higher/lower than |
---|
584 | maximal/minimal integer then store MAX_INT32/MIN_INT32 to VDST. |
---|
585 | If input value is NaN then store 0 to VDST. |
---|
586 | Operation: |
---|
587 | ``` |
---|
588 | VDST = 0 |
---|
589 | if (!ISNAN(ASDOUBLE(SRC0))) |
---|
590 | VDST = (INT32)MAX(MIN(RNDTZINT(ASDOUBLE(SRC0)), 2147483647.0), -2147483648.0) |
---|
591 | ``` |
---|
592 | |
---|
593 | #### V_CVT_NORM_I16_F16 |
---|
594 | |
---|
595 | Opcode VOP1: 77 (0x4d) for GCN 1.4 |
---|
596 | Opcode VOP3A: 397 (0x18d) for GCN 1.4 |
---|
597 | Syntax: V_CVT_NORM_I16_F16 VDST, SRC0(2) |
---|
598 | Description: Convert 16-bit floating point value from SRC0 to signed normalized 16-bit value |
---|
599 | by multiplying value by 32768.0 and make conversion to 16-bit signed integer, and |
---|
600 | store result to VDST. Conversion depends on rounding mode. |
---|
601 | ``` |
---|
602 | VDST = 0 |
---|
603 | if (!ISNAN(ASHALF(SRC0))) |
---|
604 | VDST = (INT16)(MAX(MIN(RNDINT(ASHALF(SRC0*32767.0)), 32767.0, -32767.0))) |
---|
605 | ``` |
---|
606 | |
---|
607 | #### V_CVT_NORM_U16_F16 |
---|
608 | |
---|
609 | Opcode VOP1: 78 (0x4e) for GCN 1.4 |
---|
610 | Opcode VOP3A: 398 (0x18e) for GCN 1.4 |
---|
611 | Syntax: V_CVT_NORM_U16_F16 VDST, SRC0(2) |
---|
612 | Description: Convert 16-bit floating point value from SRC0 to unsigned normalized |
---|
613 | 16-bit value by multiplying value by 65535.0 and make conversion to |
---|
614 | 16-bit unsigned integer, and store result to VDST. Probably rounds to +Infinity. |
---|
615 | ``` |
---|
616 | VDST = 0 |
---|
617 | if (!ISNAN(ASHALF(SRC0))) |
---|
618 | VDST = (UINT16)(MAX(MIN(RNDINT(ASHALF(SRC0*65535.0)), 65535.0, 0.0))) |
---|
619 | ``` |
---|
620 | |
---|
621 | #### V_CVT_OFF_F32_I4 |
---|
622 | |
---|
623 | Opcode VOP1: 14 (0xe) |
---|
624 | Opcode VOP3A: 398 (0x18e) for GCN 1.0/1.1; 334 (0x14e) for GCN 1.2 |
---|
625 | Syntax: V_CVT_OFF_F32_I4 VDST, SRC0 |
---|
626 | Description: Convert 4-bit signed value from SRC0 to floating point value, normalize that |
---|
627 | value to range -0.5:0.4375 and store result to VDST. |
---|
628 | Operation: |
---|
629 | ``` |
---|
630 | VDST = (FLOAT)((SRC0 & 0xf) ^ 8) / 16.0 - 0.5 |
---|
631 | ``` |
---|
632 | |
---|
633 | #### V_CVT_RPI_I32_F32 |
---|
634 | |
---|
635 | Opcode VOP1: 12 (0xc) |
---|
636 | Opcode VOP3A: 396 (0x18c) for GCN 1.0/1.1; 332 (0x14c) for GCN 1.2 |
---|
637 | Syntax: V_CVT_RPI_I32_F32 VDST, SRC0 |
---|
638 | Description: Convert 32-bit floating point value from SRC0 to signed 32-bit integer, and |
---|
639 | store result to VDST. Conversion adds 0.5 to value and rounds negative infinity (floor). |
---|
640 | If value is higher/lower than maximal/minimal integer then store MAX_INT32/MIN_INT32 to |
---|
641 | VDST. If input value is NaN/-NaN then store MAX_INT32/MIN_INT32 to VDST. |
---|
642 | Operation: |
---|
643 | ``` |
---|
644 | FLOAT SF = ASFLOAT(SRC0) |
---|
645 | if (!ISNAN(SF)) |
---|
646 | VDST = (INT32)MAX(MIN(FLOOR(SF + 0.5), 2147483647.0), -2147483648.0) |
---|
647 | else |
---|
648 | VDST = (INT32)SF>=0 ? 2147483647 : -2147483648 |
---|
649 | ``` |
---|
650 | |
---|
651 | #### V_CVT_U16_F16 |
---|
652 | |
---|
653 | Opcode VOP1: 59 (0x3b) for GCN 1.2 |
---|
654 | Opcode VOP3A: 379 (0x17b) for GCN 1.2 |
---|
655 | Syntax: V_CVT_U16_F16 VDST, SRC0 |
---|
656 | Description: Convert 32-bit half floating point value from SRC0 to unsigned 16-bit integer, |
---|
657 | and store result to VDST. Conversion uses rounding to zero. If value is higher than |
---|
658 | maximal integer then store MAX_UINT16 to VDST. If input value is NaN then store 0 to VDST. |
---|
659 | Operation: |
---|
660 | ``` |
---|
661 | VDST = 0 |
---|
662 | if (!ISNAN(ASHALF(SRC0))) |
---|
663 | VDST = (UINT16)MIN(RNDTZINT(ASHALF(SRC0)), 65535.0) |
---|
664 | ``` |
---|
665 | |
---|
666 | |
---|
667 | #### V_CVT_U32_F32 |
---|
668 | |
---|
669 | Opcode VOP1: 7 (0x7) |
---|
670 | Opcode VOP3A: 391 (0x187) for GCN 1.0/1.1; 327 (0x147) for GCN 1.2 |
---|
671 | Syntax: V_CVT_U32_F32 VDST, SRC0 |
---|
672 | Description: Convert 32-bit floating point value from SRC0 to unsigned 32-bit integer, and |
---|
673 | store result to VDST. Conversion uses rounding to zero. If value is higher than |
---|
674 | maximal integer then store MAX_UINT32 to VDST. |
---|
675 | If input value is NaN then store 0 to VDST. |
---|
676 | Operation: |
---|
677 | ``` |
---|
678 | VDST = 0 |
---|
679 | if (!ISNAN(ASFLOAT(SRC0))) |
---|
680 | VDST = (UINT32)MIN(RNDTZINT(ASFLOAT(SRC0)), 4294967295.0) |
---|
681 | ``` |
---|
682 | |
---|
683 | #### V_CVT_U32_F64 |
---|
684 | |
---|
685 | Opcode VOP1: 21 (0x15) |
---|
686 | Opcode VOP3A: 405 (0x195) for GCN 1.0/1.1; 341 (0x155) for GCN 1.2 |
---|
687 | Syntax: V_CVT_U32_F64 VDST, SRC0(2) |
---|
688 | Description: Convert 64-bit floating point value from SRC0 to unsigned 32-bit integer, and |
---|
689 | store result to VDST. Conversion uses rounding to zero. If value is higher than |
---|
690 | maximal integer then store MAX_UINT32 to VDST. |
---|
691 | If input value is NaN then store 0 to VDST. |
---|
692 | Operation: |
---|
693 | ``` |
---|
694 | VDST = 0 |
---|
695 | if (!ISNAN(ASDOUBLE(SRC0))) |
---|
696 | VDST = (UINT32)MIN(RNDTZINT(ASDOUBLE(SRC0)), 4294967295.0) |
---|
697 | ``` |
---|
698 | |
---|
699 | #### V_EXP_F16 |
---|
700 | |
---|
701 | Opcode VOP1: 65 (0x41) for GCN 1.2 |
---|
702 | Opcode VOP3A: 385 (0x181) for GCN 1.2 |
---|
703 | Syntax: V_EXP_F16 VDST, SRC0 |
---|
704 | Description: Approximate power of two from half FP value SRC0 and store it to VDST. |
---|
705 | Operation: |
---|
706 | ``` |
---|
707 | VDST = APPROX_POW2(ASHALF(SRC0)) |
---|
708 | ``` |
---|
709 | |
---|
710 | #### V_EXP_F32 |
---|
711 | |
---|
712 | Opcode VOP1: 37 (0x25) for GCN 1.0/1.1; 32 (0x20) for GCN 1.2 |
---|
713 | Opcode VOP3A: 421 (0x1a5) for GCN 1.0/1.1; 352 (0x160) for GCN 1.2 |
---|
714 | Syntax: V_EXP_F32 VDST, SRC0 |
---|
715 | Description: Approximate power of two from FP value SRC0 and store it to VDST. Instruction |
---|
716 | for values smaller than -126.0 always returns 0 regardless floatmode in MODE register. |
---|
717 | Operation: |
---|
718 | ``` |
---|
719 | if (ASFLOAT(SRC0)>=-126.0) |
---|
720 | VDST = APPROX_POW2(ASFLOAT(SRC0)) |
---|
721 | else |
---|
722 | VDST = 0.0 |
---|
723 | ``` |
---|
724 | |
---|
725 | ### V_EXP_LEGACY_F32 |
---|
726 | |
---|
727 | Opcode VOP1: 70 (0x46) for GCN 1.1; 75 (0x4b) for GCN 1.2 |
---|
728 | Opcode VOP3A: 454 (0x1c6) for GCN 1.1; 395 (0x18b) for GCN 1.2 |
---|
729 | Syntax: V_EXP_LEGACY_F32 VDST, SRC0 |
---|
730 | Description: Approximate power of two from FP value SRC0 and store it to VDST. Instruction |
---|
731 | for values smaller than -126.0 always returns 0 regardless floatmode in MODE register. |
---|
732 | For some cases this instructions returns slightly less accurate result than V_EXP_F32. |
---|
733 | Operation: |
---|
734 | ``` |
---|
735 | if (ASFLOAT(SRC0)>=-126.0) |
---|
736 | VDST = APPROX_POW2(ASFLOAT(SRC0)) |
---|
737 | else |
---|
738 | VDST = 0.0 |
---|
739 | ``` |
---|
740 | |
---|
741 | #### V_FFBH_U32 |
---|
742 | |
---|
743 | Opcode VOP1: 57 (0x39) for GCN 1.0/1.1; 45 (0x2d) for GCN 1.2 |
---|
744 | Opcode VOP3A: 441 (0x1b9) for GCN 1.0/1.1; 365 (0x16d) for GCN 1.2 |
---|
745 | Syntax: V_FFBH_U32 VDST, SRC0 |
---|
746 | Description: Find last one bit in SRC0. If found, store number of skipped bits to VDST, |
---|
747 | otherwise set VDST to -1. |
---|
748 | Operation: |
---|
749 | ``` |
---|
750 | VDST = -1 |
---|
751 | for (INT8 i = 31; i >= 0; i--) |
---|
752 | if ((1U<<i) & SRC0) != 0) |
---|
753 | { VDST = 31-i; break; } |
---|
754 | ``` |
---|
755 | |
---|
756 | #### V_FFBH_I32 |
---|
757 | |
---|
758 | Opcode VOP1: 59 (0x3b) for GCN 1.0/1.1; 47 (0x2f) for GCN 1.2 |
---|
759 | Opcode VOP3A: 443 (0x1bb) for GCN 1.0/1.1; 367 (0x16f) for GCN 1.2 |
---|
760 | Syntax: V_FFBH_I32 VDST, SRC0 |
---|
761 | Description: Find last opposite bit to sign in SRC0. If found, store number of skipped bits |
---|
762 | to VDST, otherwise set VDST to -1. |
---|
763 | Operation: |
---|
764 | ``` |
---|
765 | VDST = -1 |
---|
766 | UINT32 bitval = (INT32)SRC0>=0 ? 1 : 0 |
---|
767 | for (INT8 i = 31; i >= 0; i--) |
---|
768 | if ((1U<<i) & SRC0) == (bitval<<i)) |
---|
769 | { VDST = 31-i; break; } |
---|
770 | ``` |
---|
771 | |
---|
772 | #### V_FFBL_B32 |
---|
773 | |
---|
774 | Opcode VOP1: 58 (0x3a) for GCN 1.0/1.1; 46 (0x2e) for GCN 1.2 |
---|
775 | Opcode VOP3A: 442 (0x1ba) for GCN 1.0/1.1; 366 (0x16e) for GCN 1.2 |
---|
776 | Syntax: V_FFBL_B32 VDST, SRC0 |
---|
777 | Description: Find first one bit in SRC0. If found, store number of bit to VDST, |
---|
778 | otherwise set VDST to -1. |
---|
779 | Operation: |
---|
780 | ``` |
---|
781 | VDST = -1 |
---|
782 | for (UINT8 i = 0; i < 32; i++) |
---|
783 | if ((1U<<i) & SRC0) != 0) |
---|
784 | { VDST = i; break; } |
---|
785 | ``` |
---|
786 | |
---|
787 | #### V_FLOOR_F16 |
---|
788 | |
---|
789 | Opcode VOP1: 68 (0x44) for GCN 1.2 |
---|
790 | Opcode VOP3A: 388 (0x184) for GCN 1.2 |
---|
791 | Syntax: V_FLOOR_F16 VDST, SRC0 |
---|
792 | Description: Truncate half floating point value SRC0 with rounding to negative infinity |
---|
793 | (flooring), and store result to VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
794 | Operation: |
---|
795 | ``` |
---|
796 | VDST = FLOOR(ASHALF(SRC0)) |
---|
797 | ``` |
---|
798 | |
---|
799 | #### V_FLOOR_F32 |
---|
800 | |
---|
801 | Opcode VOP1: 36 (0x24) for GCN 1.0/1.1; 31 (0x1f) for GCN 1.2 |
---|
802 | Opcode VOP3A: 420 (0x1a4) for GCN 1.0/1.1; 351 (0x15f) for GCN 1.2 |
---|
803 | Syntax: V_FLOOR_F32 VDST, SRC0 |
---|
804 | Description: Truncate floating point value SRC0 with rounding to negative infinity |
---|
805 | (flooring), and store result to VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
806 | Operation: |
---|
807 | ``` |
---|
808 | VDST = FLOOR(ASFLOAT(SRC0)) |
---|
809 | ``` |
---|
810 | |
---|
811 | #### V_FLOOR_F64 |
---|
812 | |
---|
813 | Opcode VOP1: 26 (0x1a) for GCN 1.1/1.2 |
---|
814 | Opcode VOP3A: 410 (0x19a) for GCN 1.1; 346 (0x15a) for GCN 1.2 |
---|
815 | Syntax: V_FLOOR_F64 VDST(2), SRC0(2) |
---|
816 | Description: Truncate double floating point value SRC0 with rounding to negative infinity |
---|
817 | (flooring), and store result to VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
818 | Operation: |
---|
819 | ``` |
---|
820 | VDST = FLOOR(ASDOUBLE(SRC0)) |
---|
821 | ``` |
---|
822 | |
---|
823 | #### V_FRACT_F32 |
---|
824 | |
---|
825 | Opcode VOP1: 32 (0x20) for GCN 1.0/1.1; 27 (0x1b) for GCN 1.2 |
---|
826 | Opcode VOP3A: 416 (0x1a0) for GCN 1.0/1.1; 347 (0x15b) for GCN 1.2 |
---|
827 | Syntax: V_FRACT_F32 VDST, SRC0 |
---|
828 | Description: Get fractional from floating point value SRC0 and store it to VDST. |
---|
829 | Fractional will be computed by subtracting floor(SRC0) from SRC0. |
---|
830 | If SRC0 is infinity or NaN then NaN with proper sign is stored to VDST. |
---|
831 | Operation: |
---|
832 | ``` |
---|
833 | FLOAT SF = ASFLOAT(SRC0) |
---|
834 | if (!ISNAN(SF) && SF!=-INF && SF!=INF) |
---|
835 | VDST = SF - FLOOR(ASFLOAT(SF)) |
---|
836 | else |
---|
837 | VDST = NAN * SIGN(SF) |
---|
838 | ``` |
---|
839 | |
---|
840 | #### V_FRACT_F64 |
---|
841 | |
---|
842 | Opcode VOP1: 62 (0x3e) for GCN 1.0/1.1; 52 (0x32) for GCN 1.2 |
---|
843 | Opcode VOP3A: 446 (0x1be) for GCN 1.0/1.1; 372 (0x172) for GCN 1.2 |
---|
844 | Syntax: V_FRACT_F64 VDST(2), SRC0(2) |
---|
845 | Description: Get fractional from double floating point value SRC0 and store it to VDST. |
---|
846 | Fractional will be computed by subtracting floor(SRC0) from SRC0. |
---|
847 | If SRC0 is infinity or NaN then NaN with proper sign is stored to VDST. |
---|
848 | Operation: |
---|
849 | ``` |
---|
850 | FLOAT SD = ASDOUBLE(SRC0) |
---|
851 | if (!ISNAN(SD) && SD!=-INF && SD!=INF) |
---|
852 | VDST = SD - FLOOR(ASDOUBLE(SD)) |
---|
853 | else |
---|
854 | VDST = NAN * SIGN(SD) |
---|
855 | ``` |
---|
856 | |
---|
857 | #### V_FREXP_EXP_I16_F16 |
---|
858 | |
---|
859 | Opcode VOP1: 67 (0x43) for GCN 1.2 |
---|
860 | Opcode VOP3A: 387 (0x183) for GCN 1.2 |
---|
861 | Syntax: V_FREXP_EXP_I16_F16 VDST, SRC0 |
---|
862 | Description: Get exponent plus 1 from half FP value SRC0, and store that exponent to VDST |
---|
863 | as 16-bit signed integer. This instruction realizes frexp function. |
---|
864 | If SRC0 is infinity or NAN then store 0 to VDST. |
---|
865 | Operation: |
---|
866 | ``` |
---|
867 | HALF SF = ASHALF(SRC0) |
---|
868 | if (ABS(SF) != INF_H && !ISNAN(SF)) |
---|
869 | VDST = (INT16)FREXP_EXP(SF) |
---|
870 | else |
---|
871 | VDST = 0 |
---|
872 | ``` |
---|
873 | |
---|
874 | #### V_FREXP_EXP_I32_F32 |
---|
875 | |
---|
876 | Opcode VOP1: 63 (0x3f) for GCN 1.0/1.1; 51 (0x33) for GCN 1.2 |
---|
877 | Opcode VOP3A: 447 (0x1bf) for GCN 1.0/1.1; 371 (0x173) for GCN 1.2 |
---|
878 | Syntax: V_FREXP_EXP_I32_F32 VDST, SRC0 |
---|
879 | Description: Get exponent plus 1 from single FP value SRC0, and store that exponent to VDST. |
---|
880 | This instruction realizes frexp function. |
---|
881 | If SRC0 is infinity or NAN then store -1 if GCN 1.0 or 0 to VDST. |
---|
882 | Operation: |
---|
883 | ``` |
---|
884 | FLOAT SF = ASFLOAT(SRC0) |
---|
885 | if (ABS(SF) != INF && !ISNAN(SF)) |
---|
886 | VDST = FREXP_EXP(SF) |
---|
887 | else |
---|
888 | VDST = -1 // GCN 1.0 |
---|
889 | VDST = 0 // later |
---|
890 | ``` |
---|
891 | |
---|
892 | #### V_FREXP_EXP_I32_F64 |
---|
893 | |
---|
894 | Opcode VOP1: 60 (0x3c) for GCN 1.0/1.1; 48 (0x30) for GCN 1.2 |
---|
895 | Opcode VOP3A: 444 (0x1bc) for GCN 1.0/1.1; 368 (0x170) for GCN 1.2 |
---|
896 | Syntax: V_FREXP_EXP_I32_F64 VDST, SRC0(2) |
---|
897 | Description: Get exponent plus 1 from double FP value SRC0, and store that exponent to VDST. |
---|
898 | This instruction realizes frexp function. |
---|
899 | If SRC0 is infinity or NAN then store -1 if GCN 1.0 or 0 to VDST. |
---|
900 | Operation: |
---|
901 | ``` |
---|
902 | DOUBLE SD = ASDOUBLE(SRC0) |
---|
903 | if (ABS(SD) != INF && !ISNAN(SD)) |
---|
904 | VDST = FREXP_EXP(SD) |
---|
905 | else |
---|
906 | VDST = -1 // GCN 1.0 |
---|
907 | VDST = 0 // later |
---|
908 | ``` |
---|
909 | |
---|
910 | #### V_FREXP_MANT_F16 |
---|
911 | |
---|
912 | Opcode VOP1: 66 (0x42) for GCN 1.2 |
---|
913 | Opcode VOP3A: 386 (0x182) for GCN 1.2 |
---|
914 | Syntax: V_FREXP_MANT_F16 VDST, SRC0 |
---|
915 | Description: Get mantisa from half FP value SRC0, and store it to VDST. Mantisa includes |
---|
916 | sign of input. |
---|
917 | Operation: |
---|
918 | ``` |
---|
919 | HALF SF = ASHALF(SRC0) |
---|
920 | if (ABS(SF) == INF) |
---|
921 | VDST = SF |
---|
922 | else if (!ISNAN(SF)) |
---|
923 | VDST = FREXP_MANT(SF) * SIGN(SF) |
---|
924 | else |
---|
925 | VDST = NAN_H * SIGN(SF) |
---|
926 | ``` |
---|
927 | |
---|
928 | #### V_FREXP_MANT_F32 |
---|
929 | |
---|
930 | Opcode VOP1: 64 (0x40) for GCN 1.0/1.1; 52 (0x34) for GCN 1.2 |
---|
931 | Opcode VOP3A: 448 (0x1c0) for GCN 1.0/1.1; 372 (0x174) for GCN 1.2 |
---|
932 | Syntax: V_FREXP_MANT_F32 VDST, SRC0 |
---|
933 | Description: Get mantisa from single FP value SRC0, and store it to VDST. Mantisa includes |
---|
934 | sign of input. For GCN 1.0, if SRC0 is infinity then store -NAN to VDST. |
---|
935 | Operation: |
---|
936 | ``` |
---|
937 | FLOAT SF = ASFLOAT(SRC0) |
---|
938 | if (ABS(SF) == INF) |
---|
939 | VDST = -NAN // GCN 1.0 |
---|
940 | VDST = SF // later |
---|
941 | else if (!ISNAN(SF)) |
---|
942 | VDST = FREXP_MANT(SF) * SIGN(SF) |
---|
943 | else |
---|
944 | VDST = NAN * SIGN(SF) |
---|
945 | ``` |
---|
946 | |
---|
947 | #### V_FREXP_MANT_F64 |
---|
948 | |
---|
949 | Opcode VOP1: 61 (0x3d) for GCN 1.0/1.1; 49 (0x31) for GCN 1.2 |
---|
950 | Opcode VOP3A: 445 (0x1bd) for GCN 1.0/1.1; 369 (0x171) for GCN 1.2 |
---|
951 | Syntax: V_FREXP_MANT_F64 VDST(2), SRC0(2) |
---|
952 | Description: Get mantisa from double FP value SRC0, and store it to VDST. Mantisa includes |
---|
953 | sign of input. If SRC0 is infinity then store -NAN to VDST. |
---|
954 | Operation: |
---|
955 | ``` |
---|
956 | DOUBLE SD = ASDOUBLE(SRC0) |
---|
957 | if (ABS(SD) == INF) |
---|
958 | VDST = -NAN // GCN 1.0 |
---|
959 | VDST = SF // later |
---|
960 | else if (!ISNAN(SD)) |
---|
961 | VDST = FREXP_MANT(SD) * SIGN(SD) |
---|
962 | else |
---|
963 | VDST = NAN * SIGN(SD) |
---|
964 | ``` |
---|
965 | |
---|
966 | #### V_LOG_CLAMP_F32 |
---|
967 | |
---|
968 | Opcode VOP1: 38 (0x26) for GCN 1.0/1.1 |
---|
969 | Opcode VOP3A: 422 (0x1a6) for GCN 1.0/1.1 |
---|
970 | Syntax: V_LOG_CLAMP_F32 VDST, SRC0 |
---|
971 | Description: Approximate logarithm of the base 2 from floating point value SRC0 with |
---|
972 | clamping infinities to -MAX_FLOAT. Result is stored in VDST. |
---|
973 | If SRC0 is negative then store -NaN to VDST. This instruction doesn't handle denormalized |
---|
974 | values regardless FLOAT MODE register setup. |
---|
975 | Operation: |
---|
976 | ``` |
---|
977 | FLOAT F = ASFLOAT(SRC0) |
---|
978 | if (F==1.0) |
---|
979 | VDST = 0.0f |
---|
980 | if (F<0.0) |
---|
981 | VDST = -NaN |
---|
982 | else |
---|
983 | { |
---|
984 | VDST = APPROX_LOG2(F) |
---|
985 | if (ASFLOAT(VDST)==-INF) |
---|
986 | VDST = -MAX_FLOAT |
---|
987 | } |
---|
988 | ``` |
---|
989 | |
---|
990 | #### V_LOG_F16 |
---|
991 | |
---|
992 | Opcode VOP1: 64 (0x40) for GCN 1.2 |
---|
993 | Opcode VOP3A: 384 (0x180) for GCN 1.2 |
---|
994 | Syntax: V_LOG_F16 VDST, SRC0 |
---|
995 | Description: Approximate logarithm of the base 2 from half floating point value SRC0, |
---|
996 | and store result to VDST. If SRC0 is negative then store -NaN to VDST. |
---|
997 | Operation: |
---|
998 | ``` |
---|
999 | HALF F = ASHALF(SRC0) |
---|
1000 | if (F==1.0) |
---|
1001 | VDST = 0.0h |
---|
1002 | if (F<0.0) |
---|
1003 | VDST = -NaN_F |
---|
1004 | else |
---|
1005 | VDST = APPROX_LOG2(F) |
---|
1006 | ``` |
---|
1007 | |
---|
1008 | #### V_LOG_F32 |
---|
1009 | |
---|
1010 | Opcode VOP1: 39 (0x27) for GCN 1.0/1.1; 33 (0x21) for GCN 1.2 |
---|
1011 | Opcode VOP3A: 423 (0x1a7) for GCN 1.0/1.1; 353 (0x161) for GCN 1.2 |
---|
1012 | Syntax: V_LOG_F32 VDST, SRC0 |
---|
1013 | Description: Approximate logarithm of base the 2 from floating point value SRC0, and store |
---|
1014 | result to VDST. If SRC0 is negative then store -NaN to VDST. |
---|
1015 | This instruction doesn't handle denormalized values regardless FLOAT MODE register setup. |
---|
1016 | Operation: |
---|
1017 | ``` |
---|
1018 | FLOAT F = ASFLOAT(SRC0) |
---|
1019 | if (F==1.0) |
---|
1020 | VDST = 0.0f |
---|
1021 | if (F<0.0) |
---|
1022 | VDST = -NaN |
---|
1023 | else |
---|
1024 | VDST = APPROX_LOG2(F) |
---|
1025 | ``` |
---|
1026 | |
---|
1027 | #### V_LOG_LEGACY_F32 |
---|
1028 | |
---|
1029 | Opcode VOP1: 69 (0x45) for GCN 1.1; 76 (0x4c) for GCN 1.2 |
---|
1030 | Opcode VOP3A: 453 (0x1c5) for GCN 1.1; 396 (0x18c) for GCN 1.2 |
---|
1031 | Syntax: V_LOG_LEGACY_F32 VDST, SRC0 |
---|
1032 | Description: Approximate logarithm of the base 2 from floating point value SRC0, and store |
---|
1033 | result to VDST. If SRC0 is negative then store -NaN to VDST. |
---|
1034 | This instruction doesn't handle denormalized values regardless FLOAT MODE register setup. |
---|
1035 | This instruction returns slightly different results than V_LOG_F32. |
---|
1036 | Operation: |
---|
1037 | ``` |
---|
1038 | FLOAT F = ASFLOAT(SRC0) |
---|
1039 | if (F==1.0) |
---|
1040 | VDST = 0.0f |
---|
1041 | if (F<0.0) |
---|
1042 | VDST = -NaN |
---|
1043 | else |
---|
1044 | VDST = APPROX_LOG2(F) |
---|
1045 | ``` |
---|
1046 | |
---|
1047 | #### V_MOV_B32 |
---|
1048 | |
---|
1049 | Opcode VOP1: 1 (0x1) |
---|
1050 | Opcode VOP3A: 385 (0x181) for GCN 1.0/1.1; 321 (0x141) for GCN 1.2 |
---|
1051 | Syntax: V_MOV_B32 VDST, SRC0 |
---|
1052 | Description: Move SRC0 into VDST. |
---|
1053 | Operation: |
---|
1054 | ``` |
---|
1055 | VDST = SRC0 |
---|
1056 | ``` |
---|
1057 | |
---|
1058 | #### V_MOV_FED_B32 |
---|
1059 | |
---|
1060 | Opcode VOP1: 9 (0x9) |
---|
1061 | Opcode VOP3A: 393 (0x189) for GCN 1.0/1.1; 329 (0x149) for GCN 1.2 |
---|
1062 | Syntax: V_MOV_FED_B32 VDST, SRC0 |
---|
1063 | Description: Introduce edc double error upon write to dest vgpr without causing an exception |
---|
1064 | (???). |
---|
1065 | |
---|
1066 | #### V_MOVRELD_B32 |
---|
1067 | |
---|
1068 | Opcode VOP1: 66 (0x42) for GCN 1.0/1.1; 54 (0x34) for GCN 1.2 |
---|
1069 | Opcode VOP3A: 450 (0x1c2) for GCN 1.0/1.1; 374 (0x174) for GCN 1.2 |
---|
1070 | Syntax: V_MOVRELD_B32 VDST, VSRC0 |
---|
1071 | Description: Move SRC0 to VGPR[VDST_NUMBER+M0]. |
---|
1072 | Operation: |
---|
1073 | ``` |
---|
1074 | VGPR[VDST_NUMBER+M0] = SRC0 |
---|
1075 | ``` |
---|
1076 | |
---|
1077 | #### V_MOVRELS_B32 |
---|
1078 | |
---|
1079 | Opcode VOP1: 67 (0x43) for GCN 1.0/1.1; 55 (0x35) for GCN 1.2 |
---|
1080 | Opcode VOP3A: 451 (0x1c3) for GCN 1.0/1.1; 375 (0x175) for GCN 1.2 |
---|
1081 | Syntax: V_MOVRELS_B32 VDST, VSRC0 |
---|
1082 | Description: Move SRC0[SRC0_NUMBER+M0] to VDST. |
---|
1083 | Operation: |
---|
1084 | ``` |
---|
1085 | VDST = VGPR[SRC0_NUMBER+M0] |
---|
1086 | ``` |
---|
1087 | |
---|
1088 | #### V_MOVRELSD_B32 |
---|
1089 | |
---|
1090 | Opcode VOP1: 68 (0x44) for GCN 1.0/1.1; 56 (0x36) for GCN 1.2 |
---|
1091 | Opcode VOP3A: 452 (0x1c4) for GCN 1.0/1.1; 376 (0x176) for GCN 1.2 |
---|
1092 | Syntax: V_MOVRELSD_B32 VDST, VSRC0 |
---|
1093 | Description: Move SRC0[SRC0_NUMBER+M0] to VGPR[VDST_NUMBER+M0]. |
---|
1094 | Operation: |
---|
1095 | ``` |
---|
1096 | VGPR[VDST_NUMBER+M0] = VGPR[SRC0_NUMBER+M0] |
---|
1097 | ``` |
---|
1098 | |
---|
1099 | #### V_NOP |
---|
1100 | |
---|
1101 | Opcode VOP1: 0 (0x0) |
---|
1102 | Opcode VOP3A: 384 (0x180) for GCN 1.0/1.1; 320 (0x140) for GCN 1.2 |
---|
1103 | Syntax: V_NOP |
---|
1104 | Description: Do nothing. |
---|
1105 | |
---|
1106 | #### V_NOT_B32 |
---|
1107 | |
---|
1108 | Opcode VOP1: 55 (0x37) for GCN 1.0/1.1; 43 (0x2b) for GCN 1.2 |
---|
1109 | Opcode VOP3A: 439 (0x1b7) for GCN 1.0/1.1; 363 (0x16b) for GCN 1.2 |
---|
1110 | Syntax: V_NOT_B32 VDST, SRC0 |
---|
1111 | Description: Do bitwise negation on 32-bit SRC0, and store result to VDST. |
---|
1112 | Operation: |
---|
1113 | ``` |
---|
1114 | VDST = ~SRC0 |
---|
1115 | ``` |
---|
1116 | |
---|
1117 | #### V_RCP_CLAMP_F32 |
---|
1118 | |
---|
1119 | Opcode VOP1: 40 (0x28) for GCN 1.0/1.1 |
---|
1120 | Opcode VOP3A: 424 (0x1a8) for GCN 1.0/1.1 |
---|
1121 | Syntax: V_RCP_CLAMP_F32 VDST, SRC0 |
---|
1122 | Description: Approximate reciprocal from floating point value SRC0 and store it to VDST. |
---|
1123 | Guaranted error below 1ulp. Result is clamped to MAX_FLOAT including sign of a result. |
---|
1124 | Operation: |
---|
1125 | ``` |
---|
1126 | VDST = APPROX_RCP(ASFLOAT(SRC0)) |
---|
1127 | if (ABS(ASFLOAT(VDST))==INF) |
---|
1128 | VDST = SIGN(ASFLOAT(VDST)) * MAX_FLOAT |
---|
1129 | ``` |
---|
1130 | |
---|
1131 | #### V_RCP_CLAMP_F64 |
---|
1132 | |
---|
1133 | Opcode VOP1: 48 (0x30) for GCN 1.0/1.1 |
---|
1134 | Opcode VOP3A: 432 (0x1b0) for GCN 1.0/1.1 |
---|
1135 | Syntax: V_RCP_CLAMP_F64 VDST(2), SRC0(2) |
---|
1136 | Description: Approximate reciprocal from double FP value SRC0 and store it to VDST. |
---|
1137 | Relative error of approximation is ~1e-8. |
---|
1138 | Result is clamped to MAX_DOUBLE value including sign of a result. |
---|
1139 | Operation: |
---|
1140 | ``` |
---|
1141 | VDST = APPROX_RCP(ASDOUBLE(SRC0)) |
---|
1142 | if (ABS(ASDOUBLE(VDST))==INF) |
---|
1143 | VDST = SIGN(ASDOUBLE(VDST)) * MAX_DOUBLE |
---|
1144 | ``` |
---|
1145 | |
---|
1146 | #### V_RCP_F16 |
---|
1147 | |
---|
1148 | Opcode VOP1: 61 (0x3d) for GCN 1.2 |
---|
1149 | Opcode VOP3A: 381 (0x17d) for GCN 1.2 |
---|
1150 | Syntax: V_RCP_F16 VDST, SRC0 |
---|
1151 | Description: Approximate reciprocal from half floating point value SRC0 and |
---|
1152 | store it to VDST. Guaranted error below 1ulp. |
---|
1153 | Operation: |
---|
1154 | ``` |
---|
1155 | VDST = APPROX_RCP(ASHALF(SRC0)) |
---|
1156 | ``` |
---|
1157 | |
---|
1158 | #### V_RCP_F32 |
---|
1159 | |
---|
1160 | Opcode VOP1: 42 (0x2a) for GCN 1.0/1.1; 34 (0x22) for GCN 1.2 |
---|
1161 | Opcode VOP3A: 426 (0x1aa) for GCN 1.0/1.1; 354 (0x162) for GCN 1.2 |
---|
1162 | Syntax: V_RCP_F32 VDST, SRC0 |
---|
1163 | Description: Approximate reciprocal from floating point value SRC0 and store it to VDST. |
---|
1164 | Guaranted error below 1ulp. |
---|
1165 | Operation: |
---|
1166 | ``` |
---|
1167 | VDST = APPROX_RCP(ASFLOAT(SRC0)) |
---|
1168 | ``` |
---|
1169 | |
---|
1170 | #### V_RCP_F64 |
---|
1171 | |
---|
1172 | Opcode VOP1: 47 (0x2f) for GCN 1.0/1.1; 37 (0x25) for GCN 1.2 |
---|
1173 | Opcode VOP3A: 431 (0x1af) for GCN 1.0/1.1; 357 (0x165) for GCN 1.2 |
---|
1174 | Syntax: V_RCP_F64 VDST(2), SRC0(2) |
---|
1175 | Description: Approximate reciprocal from double FP value SRC0 and store it to VDST. |
---|
1176 | Relative error of approximation is ~1e-8. |
---|
1177 | Operation: |
---|
1178 | ``` |
---|
1179 | VDST = APPROX_RCP(ASDOUBLE(SRC0)) |
---|
1180 | ``` |
---|
1181 | |
---|
1182 | #### V_RCP_IFLAG_F32 |
---|
1183 | |
---|
1184 | Opcode VOP1: 43 (0x2b) for GCN 1.0/1.1; 35 (0x23) for GCN 1.2 |
---|
1185 | Opcode VOP3A: 427 (0x1ab) for GCN 1.0/1.1; 355 (0x163) for GCN 1.2 |
---|
1186 | Syntax: V_RCP_IFLAG_F32 VDST, SRC0 |
---|
1187 | Description: Approximate reciprocal from floating point value SRC0 and store it to VDST. |
---|
1188 | Guaranted error below 1ulp. This instruction signals integer division by zero, instead |
---|
1189 | any floating point exception when error is occurred. |
---|
1190 | Operation: |
---|
1191 | ``` |
---|
1192 | VDST = APPROX_RCP_IFLAG(ASFLOAT(SRC0)) |
---|
1193 | ``` |
---|
1194 | |
---|
1195 | #### V_RCP_LEGACY_F32 |
---|
1196 | |
---|
1197 | Opcode VOP1: 41 (0x29) for GCN 1.0/1.1 |
---|
1198 | Opcode VOP3A: 425 (0x1a9) for GCN 1.0/1.1 |
---|
1199 | Syntax: V_RCP_LEGACY_F32 VDST, SRC0 |
---|
1200 | Description: Approximate reciprocal from floating point value SRC0 and store it to VDST. |
---|
1201 | Guaranted error below 1ulp. If SRC0 or VDST is zero or infinity then store 0 with proper |
---|
1202 | sign to VDST. |
---|
1203 | Operation: |
---|
1204 | ``` |
---|
1205 | FLOAT SF = ASFLOAT(SRC0) |
---|
1206 | if (ABS(SF)==0.0) |
---|
1207 | VDST = SIGN(SF)*0.0 |
---|
1208 | else |
---|
1209 | { |
---|
1210 | VDST = APPROX_RCP(SF) |
---|
1211 | if (ABS(ASFLOAT(VDST)) == INF) |
---|
1212 | VDST = SIGN(SF)*0.0 |
---|
1213 | } |
---|
1214 | ``` |
---|
1215 | |
---|
1216 | #### V_READFIRSTLANE_B32 |
---|
1217 | |
---|
1218 | Opcode VOP1: 2 (0x2) |
---|
1219 | Opcode VOP3A: 386 (0x182) for GCN 1.0/1.1; 322 (0x142) for GCN 1.2 |
---|
1220 | Syntax: V_READFIRSTLANE_B32 SDST, VSRC0 |
---|
1221 | Description: Copy one VSRC0 lane value to one SDST. Lane (thread id) is first active lane id |
---|
1222 | or first lane id all lanes are inactive. SSRC1 can be SGPR or M0. Ignores EXEC mask. |
---|
1223 | Operation: |
---|
1224 | ``` |
---|
1225 | UINT8 firstlane = 0 |
---|
1226 | for (UINT8 i = 0; i < 64; i++) |
---|
1227 | if ((1ULL<<i) & EXEC) != 0) |
---|
1228 | { firstlane = i; break; } |
---|
1229 | SDST = VSRC0[firstlane] |
---|
1230 | ``` |
---|
1231 | #### V_RNDNE_F16 |
---|
1232 | |
---|
1233 | Opcode VOP1: 71 (0x47) for GCN 1.2 |
---|
1234 | Opcode VOP3A: 391 (0x187) for GCN 1.2 |
---|
1235 | Syntax: V_RNDNE_F16 VDST, SRC0 |
---|
1236 | Description: Round half floating point value SRC0 to nearest even integer, |
---|
1237 | and store result to VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
1238 | Operation: |
---|
1239 | ``` |
---|
1240 | VDST = RNDNE(ASHALF(SRC0)) |
---|
1241 | ``` |
---|
1242 | |
---|
1243 | #### V_RNDNE_F32 |
---|
1244 | |
---|
1245 | Opcode VOP1: 35 (0x23) for GCN 1.0/1.1; 30 (0x1e) for GCN 1.2 |
---|
1246 | Opcode VOP3A: 420 (0x1a4) for GCN 1.0/1.1; 350 (0x15e) for GCN 1.2 |
---|
1247 | Syntax: V_RNDNE_F32 VDST, SRC0 |
---|
1248 | Description: Round floating point value SRC0 to nearest even integer, and store result to |
---|
1249 | VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
1250 | Operation: |
---|
1251 | ``` |
---|
1252 | VDST = RNDNE(ASFLOAT(SRC0)) |
---|
1253 | ``` |
---|
1254 | |
---|
1255 | #### V_RNDNE_F64 |
---|
1256 | |
---|
1257 | Opcode VOP1: 25 (0x19) for GCN 1.1/1.2 |
---|
1258 | Opcode VOP3A: 409 (0x199) for GCN 1.1; 345 (0x159) for GCN 1.2 |
---|
1259 | Syntax: V_RNDNE_F64 VDST(2), SRC0(2) |
---|
1260 | Description: Round double floating point value SRC0 to nearest even integer, |
---|
1261 | and store result to VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
1262 | Operation: |
---|
1263 | ``` |
---|
1264 | VDST = RNDNE(ASDOUBLE(SRC0)) |
---|
1265 | ``` |
---|
1266 | |
---|
1267 | #### V_RSQ_CLAMP_F32 |
---|
1268 | |
---|
1269 | Opcode VOP1: 44 (0x2c) for GCN 1.0/1.1 |
---|
1270 | Opcode VOP3A: 428 (0x1ac) for GCN 1.0/1.1 |
---|
1271 | Syntax: V_RSQ_CLAMP_F32 VDST, SRC0 |
---|
1272 | Description: Approximate reciprocal square root from floating point value SRC0 with |
---|
1273 | clamping to MAX_FLOAT, and store result to VDST. |
---|
1274 | If SRC0 is negative value, store -NAN to VDST. |
---|
1275 | This instruction doesn't handle denormalized values regardless FLOAT MODE register setup. |
---|
1276 | Operation: |
---|
1277 | ``` |
---|
1278 | VDST = APPROX_RSQRT(ASFLOAT(SRC0)) |
---|
1279 | if (ASFLOAT(VDST)==INF) |
---|
1280 | VDST = MAX_FLOAT |
---|
1281 | ``` |
---|
1282 | |
---|
1283 | #### V_RSQ_CLAMP_F64 |
---|
1284 | |
---|
1285 | Opcode VOP1: 50 (0x32) for GCN 1.0/1.1 |
---|
1286 | Opcode VOP3A: 434 (0x1b2) for GCN 1.0/1.1 |
---|
1287 | Syntax: V_RSQ_CLAMP_F64 VDST(2), SRC0(2) |
---|
1288 | Description: Approximate reciprocal square root from double floating point value SRC0 |
---|
1289 | with clamping to MAX_DOUBLE ,and store it to VDST. If SRC0 is negative value, |
---|
1290 | store -NAN to VDST. |
---|
1291 | Operation: |
---|
1292 | ``` |
---|
1293 | VDST = APPROX_RSQRT(ASDOUBLE(SRC0)) |
---|
1294 | if (ASDOUBLE(VDST)==INF) |
---|
1295 | VDST = MAX_DOUBLE |
---|
1296 | ``` |
---|
1297 | |
---|
1298 | #### V_RSQ_F16 |
---|
1299 | |
---|
1300 | Opcode VOP1: 63 (0x3f) for GCN 1.2 |
---|
1301 | Opcode VOP3A: 383 (0x17f) for GCN 1.2 |
---|
1302 | Syntax: V_RSQ_F16 VDST, SRC0 |
---|
1303 | Description: Approximate reciprocal square root from half floating point value SRC0 and |
---|
1304 | store it to VDST. If SRC0 is negative value, store -NAN to VDST. |
---|
1305 | Operation: |
---|
1306 | ``` |
---|
1307 | VDST = APPROX_RSQRT(ASHALF(SRC0)) |
---|
1308 | ``` |
---|
1309 | |
---|
1310 | #### V_RSQ_F32 |
---|
1311 | |
---|
1312 | Opcode VOP1: 46 (0x2e) for GCN 1.0/1.1; 36 (0x24) for GCN 1.2 |
---|
1313 | Opcode VOP3A: 430 (0x1ae) for GCN 1.0/1.1; 356 (0x164) for GCN 1.2 |
---|
1314 | Syntax: V_RSQ_F32 VDST, SRC0 |
---|
1315 | Description: Approximate reciprocal square root from floating point value SRC0 and |
---|
1316 | store it to VDST. If SRC0 is negative value, store -NAN to VDST. |
---|
1317 | This instruction doesn't handle denormalized values regardless FLOAT MODE register setup. |
---|
1318 | Operation: |
---|
1319 | ``` |
---|
1320 | VDST = APPROX_RSQRT(ASFLOAT(SRC0)) |
---|
1321 | ``` |
---|
1322 | |
---|
1323 | #### V_RSQ_F64 |
---|
1324 | |
---|
1325 | Opcode VOP1: 49 (0x31) for GCN 1.0/1.1; 38 (0x26) for GCN 1.2 |
---|
1326 | Opcode VOP3A: 433 (0x1b1) for GCN 1.0/1.1; 358 (0x166) for GCN 1.2 |
---|
1327 | Syntax: V_RSQ_F64 VDST(2), SRC0(2) |
---|
1328 | Description: Approximate reciprocal square root from double floating point value SRC0 and |
---|
1329 | store it to VDST. If SRC0 is negative value, store -NAN to VDST. |
---|
1330 | Operation: |
---|
1331 | ``` |
---|
1332 | VDST = APPROX_RSQRT(ASDOUBLE(SRC0)) |
---|
1333 | ``` |
---|
1334 | |
---|
1335 | #### V_RSQ_LEGACY_F32 |
---|
1336 | |
---|
1337 | Opcode VOP1: 45 (0x2d) for GCN 1.0/1.1 |
---|
1338 | Opcode VOP3A: 429 (0x1ad) for GCN 1.0/1.1 |
---|
1339 | Syntax: V_RCP_LEGACY_F32 VDST, SRC0 |
---|
1340 | Description: Approximate reciprocal square root from floating point value SRC0, |
---|
1341 | and store result to VDST. If SRC0 is negative value, store -NAN to VDST. |
---|
1342 | If result is zero then store 0.0 to VDST. |
---|
1343 | This instruction doesn't handle denormalized values regardless FLOAT MODE register setup. |
---|
1344 | Operation: |
---|
1345 | ``` |
---|
1346 | VDST = APPROX_RSQRT(ASFLOAT(SRC0)) |
---|
1347 | if (ASFLOAT(VDST)==INF) |
---|
1348 | VDST = 0.0 |
---|
1349 | ``` |
---|
1350 | |
---|
1351 | #### V_SAT_PK_U8_I16 |
---|
1352 | |
---|
1353 | Opcode VOP1: 79 (0x4f) for GCN 1.4 |
---|
1354 | Opcode VOP3A: 399 (0x18f) for GCN 1.4 |
---|
1355 | Syntax: V_SAT_PK_U8_I16 VDST, SRC0 |
---|
1356 | Description: Saturate two packed signed 16-bit values in SRC0 to 8-bit unsigned value |
---|
1357 | and store they values to VDST in lower 16-bits. |
---|
1358 | Operation: |
---|
1359 | ``` |
---|
1360 | VDST = MAX(MIN((INT16)(SRC0&0xffff), 255), 0) |
---|
1361 | VDST |= MAX(MIN((INT16)(SRC0>>16), 255), 0) << 8 |
---|
1362 | ``` |
---|
1363 | |
---|
1364 | #### V_SCREEN_PARTITION_4SE_B32 |
---|
1365 | |
---|
1366 | Opcode: VOP1: 55 (0x37) for GCN 1.4 |
---|
1367 | Opcode: VOP3A: 375 (0x177) for GCN 1.4 |
---|
1368 | Syntax: V_SCREEN_PARTITION_4SE_B32 VDST, SRC0 |
---|
1369 | Description: 4SE version of LUT instruction for screen partitioning/filtering (see more in ISA manual). Get lower 8-bits from SRC0 and translate by table and store result to VDST. |
---|
1370 | Operation: |
---|
1371 | ``` |
---|
1372 | BYTE TABLE[256] = { |
---|
1373 | 0x1, 0x3, 0x7, 0xf, 0x5, 0xf, 0xf, 0xf, 0x7, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, |
---|
1374 | 0xf, 0x2, 0x6, 0xe, 0xf, 0xa, 0xf, 0xf, 0xf, 0xb, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, |
---|
1375 | 0xd, 0xf, 0x4, 0xc, 0xf, 0xf, 0x5, 0xf, 0xf, 0xf, 0xd, 0xf, 0xf, 0xf, 0xf, 0xf, |
---|
1376 | 0x9, 0xb, 0xf, 0x8, 0xf, 0xf, 0xf, 0xa, 0xf, 0xf, 0xf, 0xe, 0xf, 0xf, 0xf, 0xf, |
---|
1377 | 0xf, 0xf, 0xf, 0xf, 0x4, 0xc, 0xd, 0xf, 0x6, 0xf, 0xf, 0xf, 0xe, 0xf, 0xf, 0xf, |
---|
1378 | 0xf, 0xf, 0xf, 0xf, 0xf, 0x8, 0x9, 0xb, 0xf, 0x9, 0x9, 0xf, 0xf, 0xd, 0xf, 0xf, |
---|
1379 | 0xf, 0xf, 0xf, 0xf, 0x7, 0xf, 0x1, 0x3, 0xf, 0xf, 0x9, 0xf, 0xf, 0xf, 0xb, 0xf, |
---|
1380 | 0xf, 0xf, 0xf, 0xf, 0x6, 0xe, 0xf, 0x2, 0x6, 0xf, 0xf, 0x6, 0xf, 0xf, 0xf, 0x7, |
---|
1381 | 0xb, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0x2, 0x3, 0xb, 0xf, 0xa, 0xf, 0xf, 0xf, |
---|
1382 | 0xf, 0x7, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0x1, 0x9, 0xd, 0xf, 0x5, 0xf, 0xf, |
---|
1383 | 0xf, 0xf, 0xe, 0xf, 0xf, 0xf, 0xf, 0xf, 0xe, 0xf, 0x8, 0xc, 0xf, 0xf, 0xa, 0xf, |
---|
1384 | 0xf, 0xf, 0xf, 0xd, 0xf, 0xf, 0xf, 0xf, 0x6, 0x7, 0xf, 0x4, 0xf, 0xf, 0xf, 0x5, |
---|
1385 | 0x9, 0xf, 0xf, 0xf, 0xd, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0x8, 0xc, 0xe, 0xf, |
---|
1386 | 0xf, 0x6, 0x6, 0xf, 0xf, 0xe, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0x4, 0x6, 0x7, |
---|
1387 | 0xf, 0xf, 0x6, 0xf, 0xf, 0xf, 0x7, 0xf, 0xf, 0xf, 0xf, 0xf, 0xb, 0xf, 0x2, 0x3, |
---|
1388 | 0x9, 0xf, 0xf, 0x9, 0xf, 0xf, 0xf, 0xb, 0xf, 0xf, 0xf, 0xf, 0x9, 0xd, 0xf, 0x1 } |
---|
1389 | VDST = TABLE[SRC0&0xff] |
---|
1390 | ``` |
---|
1391 | |
---|
1392 | #### V_SIN_F16 |
---|
1393 | |
---|
1394 | Opcode VOP1: 73 (0x49) for GCN 1.2 |
---|
1395 | Opcode VOP3A: 393 (0x189) for GCN 1.2 |
---|
1396 | Syntax: V_SIN_F16 VDST, SRC0 |
---|
1397 | Description: Compute sine of half FP value from SRC0. Input value must be |
---|
1398 | normalized to range 1.0 - 1.0 (-360 degree : 360 degree). |
---|
1399 | If SRC0 value is out of range then store 0.0 to VDST. |
---|
1400 | If SRC0 value is infinity, store -NAN to VDST. |
---|
1401 | Operation: |
---|
1402 | ``` |
---|
1403 | HALF SF = ASHALF(SRC0) |
---|
1404 | VDST = 0.0 |
---|
1405 | if (SF >= -1.0 && SF <= 1.0) |
---|
1406 | VDST = APPROX_SIN(SF) |
---|
1407 | else if (ABS(SF)==INF_H) |
---|
1408 | VDST = -NAN_H |
---|
1409 | else if (ISNAN(SF)) |
---|
1410 | VDST = SRC0 |
---|
1411 | ``` |
---|
1412 | |
---|
1413 | #### V_SIN_F32 |
---|
1414 | |
---|
1415 | Opcode VOP1: 53 (0x35) for GCN 1.0/1.1; 41 (0x29) for GCN 1.2 |
---|
1416 | Opcode VOP3A: 437 (0x1b5) for GCN 1.0/1.1; 361 (0x169) for GCN 1.2 |
---|
1417 | Syntax: V_SIN_F32 VDST, SRC0 |
---|
1418 | Description: Compute sine of FP value from SRC0. Input value must be normalized to range |
---|
1419 | 1.0 - 1.0 (-360 degree : 360 degree). If SRC0 value is out of range then store 0.0 to VDST. |
---|
1420 | If SRC0 value is infinity, store -NAN to VDST. |
---|
1421 | Operation: |
---|
1422 | ``` |
---|
1423 | FLOAT SF = ASFLOAT(SRC0) |
---|
1424 | VDST = 0.0 |
---|
1425 | if (SF >= -1.0 && SF <= 1.0) |
---|
1426 | VDST = APPROX_SIN(SF) |
---|
1427 | else if (ABS(SF)==INF) |
---|
1428 | VDST = -NAN |
---|
1429 | else if (ISNAN(SF)) |
---|
1430 | VDST = SRC0 |
---|
1431 | ``` |
---|
1432 | |
---|
1433 | #### V_SQRT_F16 |
---|
1434 | |
---|
1435 | Opcode VOP1: 62 (0x3e) for GCN 1.2 |
---|
1436 | Opcode VOP3A: 382 (0x17e) for GCN 1.2 |
---|
1437 | Syntax: V_SQRT_F16 VDST, SRC0 |
---|
1438 | Description: Compute square root of half floating point value SRC0, and |
---|
1439 | store result to VDST. If SRC0 is negative value then store -NaN to VDST. |
---|
1440 | Operation: |
---|
1441 | ``` |
---|
1442 | if (ASHALF(SRC0)>=0.0) |
---|
1443 | VDST = APPROX_SQRT(ASHALF(SRC0)) |
---|
1444 | else |
---|
1445 | VDST = -NAN_H |
---|
1446 | ``` |
---|
1447 | |
---|
1448 | #### V_SQRT_F32 |
---|
1449 | |
---|
1450 | Opcode VOP1: 51 (0x33) for GCN 1.0/1.1; 39 (0x27) for GCN 1.2 |
---|
1451 | Opcode VOP3A: 435 (0x1b3) for GCN 1.0/1.1; 359 (0x167) for GCN 1.2 |
---|
1452 | Syntax: V_SQRT_F32 VDST, SRC0 |
---|
1453 | Description: Compute square root of floating point value SRC0, and store result to VDST. |
---|
1454 | If SRC0 is negative value then store -NaN to VDST. |
---|
1455 | Operation: |
---|
1456 | ``` |
---|
1457 | if (ASFLOAT(SRC0)>=0.0) |
---|
1458 | VDST = APPROX_SQRT(ASFLOAT(SRC0)) |
---|
1459 | else |
---|
1460 | VDST = -NAN |
---|
1461 | ``` |
---|
1462 | |
---|
1463 | #### V_SQRT_F64 |
---|
1464 | |
---|
1465 | Opcode VOP1: 52 (0x34) for GCN 1.0/1.1; 40 (0x28) for GCN 1.2 |
---|
1466 | Opcode VOP3A: 436 (0x1b4) for GCN 1.0/1.1; 360 (0x168) for GCN 1.2 |
---|
1467 | Syntax: V_SQRT_F64 VDST(2), SRC0(2) |
---|
1468 | Description: Compute square root of double floating point value SRC0, and store result |
---|
1469 | to VDST. Relative error of approximation is ~1e-8. |
---|
1470 | If SRC0 is negative value then store -NaN to VDST. |
---|
1471 | Operation: |
---|
1472 | ``` |
---|
1473 | if (ASDOUBLE(SRC0)>=0.0) |
---|
1474 | VDST = APPROX_SQRT(ASDOUBLE(SRC0)) |
---|
1475 | else |
---|
1476 | VDST = -NAN |
---|
1477 | ``` |
---|
1478 | |
---|
1479 | #### V_SWAP_B32 |
---|
1480 | |
---|
1481 | Opcode VOP1: 81 (0x51) for GCN 1.4 |
---|
1482 | Opcode VOP3A: 401 (0x191) for GCN 1.4 |
---|
1483 | Syntax: V_SWAP_B32 VDST, SRC0 |
---|
1484 | Description: Swap SRC0 and VDST. |
---|
1485 | ``` |
---|
1486 | UINT32 TMP = VDST |
---|
1487 | VDST = SRC0 |
---|
1488 | SRC0 = TMP |
---|
1489 | ``` |
---|
1490 | |
---|
1491 | #### V_TRUNC_F16 |
---|
1492 | |
---|
1493 | Opcode VOP1: 70 (0x46) for GCN 1.2 |
---|
1494 | Opcode VOP3A: 390 (0x186) for GCN 1.2 |
---|
1495 | Syntax: V_TRUNC_F16 VDST, SRC0 |
---|
1496 | Description: Get integer value from half floating point value SRC0, and store (as half) |
---|
1497 | it to VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
1498 | Operation: |
---|
1499 | ``` |
---|
1500 | VDST = RNDTZ(ASHALF(SRC0)) |
---|
1501 | ``` |
---|
1502 | |
---|
1503 | #### V_TRUNC_F32 |
---|
1504 | |
---|
1505 | Opcode VOP1: 33 (0x21) for GCN 1.0/1.1; 28 (0x1c) for GCN 1.2 |
---|
1506 | Opcode VOP3A: 417 (0x1a1) for GCN 1.0/1.1; 348 (0x15c) for GCN 1.2 |
---|
1507 | Syntax: V_TRUNC_F32 VDST, SRC0 |
---|
1508 | Description: Get integer value from floating point value SRC0, and store (as float) |
---|
1509 | it to VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
1510 | Operation: |
---|
1511 | ``` |
---|
1512 | VDST = RNDTZ(ASFLOAT(SRC0)) |
---|
1513 | ``` |
---|
1514 | |
---|
1515 | #### V_TRUNC_F64 |
---|
1516 | |
---|
1517 | Opcode VOP1: 23 (0x17) for GCN 1.1/1.2 |
---|
1518 | Opcode VOP3A: 407 (0x197) for GCN 1.1; 343 (0x157) for GCN 1.2 |
---|
1519 | Syntax: V_TRUNC_F64 VDST(2), SRC0(2) |
---|
1520 | Description: Get integer value from double floating point value SRC0, and store (as float) |
---|
1521 | it to VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
1522 | Operation: |
---|
1523 | ``` |
---|
1524 | VDST = RNDTZ(ASDOUBLE(SRC0)) |
---|
1525 | ``` |
---|