1 | ## GCN ISA VOP1/VOP3 instructions |
---|
2 | |
---|
3 | VOP1 instructions can be encoded in the VOP1 encoding and the VOP3A/VOP3B encoding. |
---|
4 | List of fields for VOP1 encoding: |
---|
5 | |
---|
6 | Bits | Name | Description |
---|
7 | ------|----------|------------------------------ |
---|
8 | 0-8 | SRC0 | First (scalar or vector) source operand |
---|
9 | 9-16 | OPCODE | Operation code |
---|
10 | 17-24 | VDST | Destination vector operand |
---|
11 | 25-31 | ENCODING | Encoding type. Must be 0b0111111 |
---|
12 | |
---|
13 | Syntax: INSTRUCTION VDST, SRC0 |
---|
14 | |
---|
15 | List of fields for VOP3A/VOP3B encoding (GCN 1.0/1.1): |
---|
16 | |
---|
17 | Bits | Name | Description |
---|
18 | ------|----------|------------------------------ |
---|
19 | 0-7 | VDST | Vector destination operand |
---|
20 | 8-10 | ABS | Absolute modifiers for source operands (VOP3A) |
---|
21 | 8-14 | SDST | Scalar destination operand (VOP3B) |
---|
22 | 11 | CLAMP | CLAMP modifier (VOP3A) |
---|
23 | 15 | CLAMP | CLAMP modifier (VOP3B) |
---|
24 | 17-25 | OPCODE | Operation code |
---|
25 | 26-31 | ENCODING | Encoding type. Must be 0b110100 |
---|
26 | 32-40 | SRC0 | First (scalar or vector) source operand |
---|
27 | 41-49 | SRC1 | Second (scalar or vector) source operand |
---|
28 | 50-58 | SRC2 | Third (scalar or vector) source operand |
---|
29 | 59-60 | OMOD | OMOD modifier. Multiplication modifier |
---|
30 | 61-63 | NEG | Negation modifier for source operands |
---|
31 | |
---|
32 | List of fields for VOP3A/VOP3B encoding (GCN 1.2): |
---|
33 | |
---|
34 | Bits | Name | Description |
---|
35 | ------|----------|------------------------------ |
---|
36 | 0-7 | VDST | Destination vector operand |
---|
37 | 8-10 | ABS | Absolute modifiers for source operands (VOP3A) |
---|
38 | 8-14 | SDST | Scalar destination operand (VOP3B) |
---|
39 | 15 | CLAMP | CLAMP modifier |
---|
40 | 16-25 | OPCODE | Operation code |
---|
41 | 26-31 | ENCODING | Encoding type. Must be 0b110100 |
---|
42 | 32-40 | SRC0 | First (scalar or vector) source operand |
---|
43 | 41-49 | SRC1 | Second (scalar or vector) source operand |
---|
44 | 50-58 | SRC2 | Third (scalar or vector) source operand |
---|
45 | 59-60 | OMOD | OMOD modifier. Multiplication modifier |
---|
46 | 61-63 | NEG | Negation modifier for source operands |
---|
47 | |
---|
48 | Syntax: INSTRUCTION VDST, SRC0 [MODIFIERS] |
---|
49 | |
---|
50 | Modifiers: |
---|
51 | |
---|
52 | * CLAMP - clamps destination floating point value in range 0.0-1.0 |
---|
53 | * MUL:2, MUL:4, DIV:2 - OMOD modifiers. Multiply destination floating point value by |
---|
54 | 2.0, 4.0 or 0.5 respectively. Clamping applied after OMOD modifier. |
---|
55 | * -SRC - negate floating point value from source operand. Applied after ABS modifier. |
---|
56 | * ABS(SRC), |SRC| - apply absolute value to source operand |
---|
57 | |
---|
58 | NOTE: OMOD modifier doesn't work if output denormals are allowed |
---|
59 | (5 bit of MODE register for single precision or 7 bit for double precision). |
---|
60 | NOTE: OMOD and CLAMP modifier affects only for instruction that output is |
---|
61 | floating point value. |
---|
62 | NOTE: ABS and negation is applied to source operand for any instruction. |
---|
63 | NOTE: OMOD modifier doesn't work for half precision (FP16) instructions. |
---|
64 | |
---|
65 | Negation and absolute value can be combined: `-ABS(V0)`. Modifiers CLAMP and |
---|
66 | OMOD (MUL:2, MUL:4 and DIV:2) can be given in random order. |
---|
67 | |
---|
68 | Limitations for operands: |
---|
69 | |
---|
70 | * only one SGPR can be read by instruction. Multiple occurrences of this same |
---|
71 | SGPR is allowed |
---|
72 | * only one literal constant can be used, and only when a SGPR or M0 is not used in |
---|
73 | source operands |
---|
74 | * only SRC0 can holds LDS_DIRECT |
---|
75 | |
---|
76 | Unaligned pairs of SGPRs are allowed in source operands. |
---|
77 | |
---|
78 | VOP1 opcodes (0-127) are reflected in VOP3 in range: 384-511 for GCN 1.0/1.1 or |
---|
79 | 320-447 for GCN 1.2. |
---|
80 | |
---|
81 | List of the instructions by opcode (GCN 1.0/1.1): |
---|
82 | |
---|
83 | Opcode | Opcode(VOP3)|GCN 1.0|GCN 1.1| Mnemonic |
---|
84 | ------------|-------------|-------|-------|----------------------------- |
---|
85 | 0 (0x0) | 384 (0x180) | ✓ | ✓ | V_NOP |
---|
86 | 1 (0x1) | 385 (0x181) | ✓ | ✓ | V_MOV_B32 |
---|
87 | 2 (0x2) | 386 (0x182) | ✓ | ✓ | V_READFIRSTLANE_B32 |
---|
88 | 3 (0x3) | 387 (0x183) | ✓ | ✓ | V_CVT_I32_F64 |
---|
89 | 4 (0x4) | 388 (0x184) | ✓ | ✓ | V_CVT_F64_I32 |
---|
90 | 5 (0x5) | 389 (0x185) | ✓ | ✓ | V_CVT_F32_I32 |
---|
91 | 6 (0x6) | 390 (0x186) | ✓ | ✓ | V_CVT_F32_U32 |
---|
92 | 7 (0x7) | 391 (0x187) | ✓ | ✓ | V_CVT_U32_F32 |
---|
93 | 8 (0x8) | 392 (0x188) | ✓ | ✓ | V_CVT_I32_F32 |
---|
94 | 9 (0x9) | 393 (0x189) | ✓ | ✓ | V_MOV_FED_B32 |
---|
95 | 10 (0xa) | 394 (0x18a) | ✓ | ✓ | V_CVT_F16_F32 |
---|
96 | 11 (0xb) | 395 (0x18b) | ✓ | ✓ | V_CVT_F32_F16 |
---|
97 | 12 (0xc) | 396 (0x18c) | ✓ | ✓ | V_CVT_RPI_I32_F32 |
---|
98 | 13 (0xd) | 397 (0x18d) | ✓ | ✓ | V_CVT_FLR_I32_F32 |
---|
99 | 14 (0xe) | 398 (0x18e) | ✓ | ✓ | V_CVT_OFF_F32_I4 |
---|
100 | 15 (0xf) | 399 (0x18f) | ✓ | ✓ | V_CVT_F32_F64 |
---|
101 | 16 (0x10) | 400 (0x190) | ✓ | ✓ | V_CVT_F64_F32 |
---|
102 | 17 (0x11) | 401 (0x191) | ✓ | ✓ | V_CVT_F32_UBYTE0 |
---|
103 | 18 (0x12) | 402 (0x192) | ✓ | ✓ | V_CVT_F32_UBYTE1 |
---|
104 | 19 (0x13) | 403 (0x193) | ✓ | ✓ | V_CVT_F32_UBYTE2 |
---|
105 | 20 (0x14) | 404 (0x194) | ✓ | ✓ | V_CVT_F32_UBYTE3 |
---|
106 | 21 (0x15) | 405 (0x195) | ✓ | ✓ | V_CVT_U32_F64 |
---|
107 | 22 (0x16) | 406 (0x196) | ✓ | ✓ | V_CVT_F64_U32 |
---|
108 | 23 (0x17) | 407 (0x197) | | ✓ | V_TRUNC_F64 |
---|
109 | 24 (0x18) | 408 (0x198) | | ✓ | V_CEIL_F64 |
---|
110 | 25 (0x19) | 409 (0x199) | | ✓ | V_RNDNE_F64 |
---|
111 | 26 (0x1a) | 410 (0x19a) | | ✓ | V_FLOOR_F64 |
---|
112 | 32 (0x20) | 416 (0x1a0) | ✓ | ✓ | V_FRACT_F32 |
---|
113 | 33 (0x21) | 417 (0x1a1) | ✓ | ✓ | V_TRUNC_F32 |
---|
114 | 34 (0x22) | 418 (0x1a2) | ✓ | ✓ | V_CEIL_F32 |
---|
115 | 35 (0x23) | 419 (0x1a3) | ✓ | ✓ | V_RNDNE_F32 |
---|
116 | 36 (0x24) | 420 (0x1a4) | ✓ | ✓ | V_FLOOR_F32 |
---|
117 | 37 (0x25) | 421 (0x1a5) | ✓ | ✓ | V_EXP_F32 |
---|
118 | 38 (0x26) | 422 (0x1a6) | ✓ | ✓ | V_LOG_CLAMP_F32 |
---|
119 | 39 (0x27) | 423 (0x1a7) | ✓ | ✓ | V_LOG_F32 |
---|
120 | 40 (0x28) | 424 (0x1a8) | ✓ | ✓ | V_RCP_CLAMP_F32 |
---|
121 | 41 (0x29) | 425 (0x1a9) | ✓ | ✓ | V_RCP_LEGACY_F32 |
---|
122 | 42 (0x2a) | 426 (0x1aa) | ✓ | ✓ | V_RCP_F32 |
---|
123 | 43 (0x2b) | 427 (0x1ab) | ✓ | ✓ | V_RCP_IFLAG_F32 |
---|
124 | 44 (0x2c) | 428 (0x1ac) | ✓ | ✓ | V_RSQ_CLAMP_F32 |
---|
125 | 45 (0x2d) | 429 (0x1ad) | ✓ | ✓ | V_RSQ_LEGACY_F32 |
---|
126 | 46 (0x2e) | 430 (0x1ae) | ✓ | ✓ | V_RSQ_F32 |
---|
127 | 47 (0x2f) | 431 (0x1af) | ✓ | ✓ | V_RCP_F64 |
---|
128 | 48 (0x30) | 432 (0x1b0) | ✓ | ✓ | V_RCP_CLAMP_F64 |
---|
129 | 49 (0x31) | 433 (0x1b1) | ✓ | ✓ | V_RSQ_F64 |
---|
130 | 50 (0x32) | 434 (0x1b2) | ✓ | ✓ | V_RSQ_CLAMP_F64 |
---|
131 | 51 (0x33) | 435 (0x1b3) | ✓ | ✓ | V_SQRT_F32 |
---|
132 | 52 (0x34) | 436 (0x1b4) | ✓ | ✓ | V_SQRT_F64 |
---|
133 | 53 (0x35) | 437 (0x1b5) | ✓ | ✓ | V_SIN_F32 |
---|
134 | 54 (0x36) | 438 (0x1b6) | ✓ | ✓ | V_COS_F32 |
---|
135 | 55 (0x37) | 439 (0x1b7) | ✓ | ✓ | V_NOT_B32 |
---|
136 | 56 (0x38) | 440 (0x1b8) | ✓ | ✓ | V_BFREV_B32 |
---|
137 | 57 (0x39) | 441 (0x1b9) | ✓ | ✓ | V_FFBH_U32 |
---|
138 | 58 (0x3a) | 442 (0x1ba) | ✓ | ✓ | V_FFBL_B32 |
---|
139 | 59 (0x3b) | 443 (0x1bb) | ✓ | ✓ | V_FFBH_I32 |
---|
140 | 60 (0x3c) | 444 (0x1bc) | ✓ | ✓ | V_FREXP_EXP_I32_F64 |
---|
141 | 61 (0x3d) | 445 (0x1bd) | ✓ | ✓ | V_FREXP_MANT_F64 |
---|
142 | 62 (0x3e) | 446 (0x1be) | ✓ | ✓ | V_FRACT_F64 |
---|
143 | 63 (0x3f) | 447 (0x1bf) | ✓ | ✓ | V_FREXP_EXP_I32_F32 |
---|
144 | 64 (0x40) | 448 (0x1c0) | ✓ | ✓ | V_FREXP_MANT_F32 |
---|
145 | 65 (0x41) | 449 (0x1c1) | ✓ | ✓ | V_CLREXCP |
---|
146 | 66 (0x42) | 450 (0x1c2) | ✓ | ✓ | V_MOVRELD_B32 |
---|
147 | 67 (0x43) | 451 (0x1c3) | ✓ | ✓ | V_MOVRELS_B32 |
---|
148 | 68 (0x44) | 452 (0x1c4) | ✓ | ✓ | V_MOVRELSD_B32 |
---|
149 | 69 (0x45) | 453 (0x1c5) | | ✓ | V_LOG_LEGACY_F32 |
---|
150 | 70 (0x46) | 454 (0x1c6) | | ✓ | V_EXP_LEGACY_F32 |
---|
151 | |
---|
152 | List of the instructions by opcode (GCN 1.2): |
---|
153 | |
---|
154 | Opcode | Opcode(VOP3)| Mnemonic |
---|
155 | ------------|-------------|----------------------------- |
---|
156 | 0 (0x0) | 320 (0x140) | V_NOP |
---|
157 | 1 (0x1) | 321 (0x141) | V_MOV_B32 |
---|
158 | 2 (0x2) | 322 (0x142) | V_READFIRSTLANE_B32 |
---|
159 | 3 (0x3) | 323 (0x143) | V_CVT_I32_F64 |
---|
160 | 4 (0x4) | 324 (0x144) | V_CVT_F64_I32 |
---|
161 | 5 (0x5) | 325 (0x145) | V_CVT_F32_I32 |
---|
162 | 6 (0x6) | 326 (0x146) | V_CVT_F32_U32 |
---|
163 | 7 (0x7) | 327 (0x147) | V_CVT_U32_F32 |
---|
164 | 8 (0x8) | 328 (0x148) | V_CVT_I32_F32 |
---|
165 | 9 (0x9) | 329 (0x149) | V_MOV_FED_B32 |
---|
166 | 10 (0xa) | 330 (0x14a) | V_CVT_F16_F32 |
---|
167 | 11 (0xb) | 331 (0x14b) | V_CVT_F32_F16 |
---|
168 | 12 (0xc) | 332 (0x14c) | V_CVT_RPI_I32_F32 |
---|
169 | 13 (0xd) | 333 (0x14d) | V_CVT_FLR_I32_F32 |
---|
170 | 14 (0xe) | 334 (0x14e) | V_CVT_OFF_F32_I4 |
---|
171 | 15 (0xf) | 335 (0x14f) | V_CVT_F32_F64 |
---|
172 | 16 (0x10) | 336 (0x150) | V_CVT_F64_F32 |
---|
173 | 17 (0x11) | 337 (0x151) | V_CVT_F32_UBYTE0 |
---|
174 | 18 (0x12) | 338 (0x152) | V_CVT_F32_UBYTE1 |
---|
175 | 19 (0x13) | 339 (0x153) | V_CVT_F32_UBYTE2 |
---|
176 | 20 (0x14) | 340 (0x154) | V_CVT_F32_UBYTE3 |
---|
177 | 21 (0x15) | 341 (0x155) | V_CVT_U32_F64 |
---|
178 | 22 (0x16) | 342 (0x156) | V_CVT_F64_U32 |
---|
179 | 23 (0x17) | 343 (0x157) | V_TRUNC_F64 |
---|
180 | 24 (0x18) | 344 (0x158) | V_CEIL_F64 |
---|
181 | 25 (0x19) | 345 (0x159) | V_RNDNE_F64 |
---|
182 | 26 (0x1a) | 346 (0x15a) | V_FLOOR_F64 |
---|
183 | 27 (0x1b) | 347 (0x15b) | V_FRACT_F32 |
---|
184 | 28 (0x1c) | 348 (0x15c) | V_TRUNC_F32 |
---|
185 | 29 (0x1d) | 349 (0x15d) | V_CEIL_F32 |
---|
186 | 30 (0x1e) | 350 (0x15e) | V_RNDNE_F32 |
---|
187 | 31 (0x1f) | 351 (0x15f) | V_FLOOR_F32 |
---|
188 | 32 (0x20) | 352 (0x160) | V_EXP_F32 |
---|
189 | 33 (0x21) | 353 (0x161) | V_LOG_F32 |
---|
190 | 34 (0x22) | 354 (0x162) | V_RCP_F32 |
---|
191 | 35 (0x23) | 355 (0x163) | V_RCP_IFLAG_F32 |
---|
192 | 36 (0x24) | 356 (0x164) | V_RSQ_F32 |
---|
193 | 37 (0x25) | 357 (0x165) | V_RCP_F64 |
---|
194 | 38 (0x26) | 358 (0x166) | V_RSQ_F64 |
---|
195 | 39 (0x27) | 359 (0x167) | V_SQRT_F32 |
---|
196 | 40 (0x28) | 360 (0x168) | V_SQRT_F64 |
---|
197 | 41 (0x29) | 361 (0x169) | V_SIN_F32 |
---|
198 | 42 (0x2a) | 362 (0x16a) | V_COS_F32 |
---|
199 | 43 (0x2b) | 363 (0x16b) | V_NOT_B32 |
---|
200 | 44 (0x2c) | 364 (0x16c) | V_BFREV_B32 |
---|
201 | 45 (0x2d) | 365 (0x16d) | V_FFBH_U32 |
---|
202 | 46 (0x2e) | 366 (0x16e) | V_FFBL_B32 |
---|
203 | 47 (0x2f) | 367 (0x16f) | V_FFBH_I32 |
---|
204 | 48 (0x30) | 368 (0x170) | V_FREXP_EXP_I32_F64 |
---|
205 | 49 (0x31) | 369 (0x171) | V_FREXP_MANT_F64 |
---|
206 | 50 (0x32) | 370 (0x172) | V_FRACT_F64 |
---|
207 | 51 (0x33) | 371 (0x173) | V_FREXP_EXP_I32_F32 |
---|
208 | 52 (0x34) | 372 (0x174) | V_FREXP_MANT_F32 |
---|
209 | 53 (0x35) | 373 (0x175) | V_CLREXCP |
---|
210 | 54 (0x36) | 374 (0x176) | V_MOVRELD_B32 |
---|
211 | 55 (0x37) | 375 (0x177) | V_MOVRELS_B32 |
---|
212 | 56 (0x38) | 376 (0x178) | V_MOVRELSD_B32 |
---|
213 | 57 (0x39) | 377 (0x179) | V_CVT_F16_U16 |
---|
214 | 58 (0x3a) | 378 (0x17a) | V_CVT_F16_I16 |
---|
215 | 59 (0x3b) | 379 (0x17b) | V_CVT_U16_F16 |
---|
216 | 60 (0x3c) | 380 (0x17c) | V_CVT_I16_F16 |
---|
217 | 61 (0x3d) | 381 (0x17d) | V_RCP_F16 |
---|
218 | 62 (0x3e) | 382 (0x17e) | V_SQRT_F16 |
---|
219 | 63 (0x3f) | 383 (0x17f) | V_RSQ_F16 |
---|
220 | 64 (0x40) | 384 (0x180) | V_LOG_F16 |
---|
221 | 65 (0x41) | 385 (0x181) | V_EXP_F16 |
---|
222 | 66 (0x42) | 386 (0x182) | V_FREXP_MANT_F16 |
---|
223 | 67 (0x43) | 387 (0x183) | V_FREXP_EXP_I16_F16 |
---|
224 | 68 (0x44) | 388 (0x184) | V_FLOOR_F16 |
---|
225 | 69 (0x45) | 389 (0x185) | V_CEIL_F16 |
---|
226 | 70 (0x46) | 390 (0x186) | V_TRUNC_F16 |
---|
227 | 71 (0x47) | 391 (0x187) | V_RNDNE_F16 |
---|
228 | 72 (0x48) | 392 (0x188) | V_FRACT_F16 |
---|
229 | 73 (0x49) | 393 (0x189) | V_SIN_F16 |
---|
230 | 74 (0x4a) | 394 (0x18a) | V_COS_F16 |
---|
231 | 75 (0x4b) | 395 (0x18b) | V_EXP_LEGACY_F32 |
---|
232 | 76 (0x4c) | 396 (0x18c) | V_LOG_LEGACY_F32 |
---|
233 | |
---|
234 | ### Instruction set |
---|
235 | |
---|
236 | Alphabetically sorted instruction list: |
---|
237 | |
---|
238 | #### V_BFREV_B32 |
---|
239 | |
---|
240 | Opcode VOP1: 56 (0x38) for GCN 1.0/1.1; 44 (0x2c) for GCN 1.2 |
---|
241 | Opcode VOP3A: 440 (0x1b8) for GCN 1.0/1.1; 364 (0x16c) for GCN 1.2 |
---|
242 | Syntax: V_BFREV_B32 VDST, SRC0 |
---|
243 | Reverse bits in SRC0 and store result to VDST. |
---|
244 | Operation: |
---|
245 | ``` |
---|
246 | VDST = REVBIT(SRC0) |
---|
247 | ``` |
---|
248 | |
---|
249 | #### V_CEIL_F32 |
---|
250 | |
---|
251 | Opcode VOP1: 34 (0x22) for GCN 1.0/1.1; 29 (0x1d) for GCN 1.2 |
---|
252 | Opcode VOP3A: 418 (0x1a2) for GCN 1.0/1.1; 349 (0x15d) for GCN 1.2 |
---|
253 | Syntax: V_CEIL_F32 VDST, SRC0 |
---|
254 | Description: Truncate floating point valu from SRC0 with rounding to positive infinity |
---|
255 | (ceilling), and store result to VDST. Implemented by flooring. |
---|
256 | If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
257 | Operation: |
---|
258 | ``` |
---|
259 | FLOAT F = FLOOR(ASFLOAT(SRC0)) |
---|
260 | if (ASFLOAT(SRC0) > 0.0 && ASFLOAT(SRC0) != F) |
---|
261 | F += 1.0 |
---|
262 | VDST = F |
---|
263 | ``` |
---|
264 | |
---|
265 | #### V_CEIL_F64 |
---|
266 | |
---|
267 | Opcode VOP1: 24 (0x18) for GCN 1.1/1.2 |
---|
268 | Opcode VOP3A: 408 (0x198) for GCN 1.1; 344 (0x158) for GCN 1.2 |
---|
269 | Syntax: V_CEIL_F64 VDST(2), SRC0(2) |
---|
270 | Description: Truncate double floating point valu from SRC0 with rounding to |
---|
271 | positive infinity (ceilling), and store result to VDST. Implemented by flooring. |
---|
272 | If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
273 | Operation: |
---|
274 | ``` |
---|
275 | DOUBLE F = FLOOR(ASDOUBLE(SRC0)) |
---|
276 | if (ASDOUBLE(SRC0) > 0.0 && ASDOUBLE(SRC0) != F) |
---|
277 | F += 1.0 |
---|
278 | VDST = F |
---|
279 | ``` |
---|
280 | |
---|
281 | #### V_CLREXCP |
---|
282 | |
---|
283 | Opcode VOP1: 65 (0x41) for GCN 1.0/1.1; 53 (0x35) for GCN 1.2 |
---|
284 | Opcode VOP3A: 449 (0x1c1) for GCN 1.0/1.1; 373 (0x175) for GCN 1.2 |
---|
285 | Syntax: V_CLREXCP |
---|
286 | Description: Clear wave's exception state in SIMD. |
---|
287 | |
---|
288 | #### V_COS_F32 |
---|
289 | |
---|
290 | Opcode VOP1: 54 (0x36) for GCN 1.0/1.1; 42 (0x2a) for GCN 1.2 |
---|
291 | Opcode VOP3A: 438 (0x1b6) for GCN 1.0/1.1; 362 (0x16a) for GCN 1.2 |
---|
292 | Syntax: V_COS_F32 VDST, SRC0 |
---|
293 | Description: Compute cosine of FP value from SRC0. Input value must be normalized to range |
---|
294 | 1.0 - 1.0 (-360 degree : 360 degree). If SRC0 value is out of range then store 1.0 to VDST. |
---|
295 | If SRC0 value is infinity, store -NAN to VDST. |
---|
296 | Operation: |
---|
297 | ``` |
---|
298 | FLOAT SF = ASFLOAT(SRC0) |
---|
299 | VDST = 1.0 |
---|
300 | if (SF >= -1.0 && SF <= 1.0) |
---|
301 | VDST = APPROX_COS(SF) |
---|
302 | else if (ABS(SF)==INF) |
---|
303 | VDST = -NAN |
---|
304 | else if (ISNAN(SF)) |
---|
305 | VDST = SRC0 |
---|
306 | ``` |
---|
307 | |
---|
308 | #### V_CVT_F16_F32 |
---|
309 | |
---|
310 | Opcode VOP1: 10 (0xa) |
---|
311 | Opcode VOP3A: 394 (0x18a) for GCN 1.0/1.1; 330 (0x14a) for GCN 1.2 |
---|
312 | Syntax: V_CVT_F16_F32 VDST, SRC0 |
---|
313 | Description: Convert single FP value to half floating point value with rounding from |
---|
314 | MODE register (single FP rounding mode), and store result to VDST. |
---|
315 | If absolute value is too high, then store -/+infinity to VDST. |
---|
316 | Operation: |
---|
317 | ``` |
---|
318 | VDST = CVTHALF(ASFLOAT(SRC0)) |
---|
319 | ``` |
---|
320 | |
---|
321 | #### V_CVT_F32_F16 |
---|
322 | |
---|
323 | Opcode VOP1: 11 (0xb) |
---|
324 | Opcode VOP3A: 395 (0x18b) for GCN 1.0/1.1; 331 (0x14b) for GCN 1.2 |
---|
325 | Syntax: V_CVT_F32_F16 VDST, SRC0 |
---|
326 | Description: Convert half FP value to single FP value, and store result to VDST. |
---|
327 | **By default, immediate is in FP32 format!**. |
---|
328 | Operation: |
---|
329 | ``` |
---|
330 | VDST = (FLOAT)(ASHALF(SRC0)) |
---|
331 | ``` |
---|
332 | |
---|
333 | #### V_CVT_F32_F64 |
---|
334 | |
---|
335 | Opcode VOP1: 15 (0xf) |
---|
336 | Opcode VOP3A: 399 (0x18f) for GCN 1.0/1.1; 335 (0x14f) for GCN 1.2 |
---|
337 | Syntax: V_CVT_F32_F64 VDST, SRC0(2) |
---|
338 | Description: Convert double FP value to single floating point value with rounding from |
---|
339 | MODE register (single FP rounding mode), and store result to VDST. |
---|
340 | If absolute value is too high, then store -/+infinity to VDST. |
---|
341 | Operation: |
---|
342 | ``` |
---|
343 | VDST = CVTHALF(ASDOUBLE(SRC0)) |
---|
344 | ``` |
---|
345 | |
---|
346 | #### V_CVT_F32_I32 |
---|
347 | |
---|
348 | Opcode VOP1: 5 (0x5) |
---|
349 | Opcode VOP3A: 389 (0x185) for GCN 1.0/1.1; 325 (0x145) for GCN 1.2 |
---|
350 | Syntax: V_CVT_F32_I32 VDST, SRC0 |
---|
351 | Description: Convert signed 32-bit integer to single FP value, and store it to VDST. |
---|
352 | Operation: |
---|
353 | ``` |
---|
354 | VDST = (FLOAT)(INT32)SRC0 |
---|
355 | ``` |
---|
356 | |
---|
357 | #### V_CVT_F32_U32 |
---|
358 | |
---|
359 | Opcode VOP1: 6 (0x6) |
---|
360 | Opcode VOP3A: 390 (0x186) for GCN 1.0/1.1; 326 (0x146) for GCN 1.2 |
---|
361 | Syntax: V_CVT_F32_U32 VDST, SRC0 |
---|
362 | Description: Convert unsigned 32-bit integer to single FP value, and store it to VDST. |
---|
363 | Operation: |
---|
364 | ``` |
---|
365 | VDST = (FLOAT)SRC0 |
---|
366 | ``` |
---|
367 | |
---|
368 | #### V_CVT_F32_UBYTE0 |
---|
369 | |
---|
370 | Opcode VOP1: 17 (0x11) |
---|
371 | Opcode VOP3A: 401 (0x191) for GCN 1.0/1.1; 337 (0x151) for GCN 1.2 |
---|
372 | Syntax: V_CVT_F32_UBYTE0 VDST, SRC0 |
---|
373 | Description: Convert the first unsigned 8-bit byte from SRC0 to single FP value, |
---|
374 | and store it to VDST. |
---|
375 | Operation: |
---|
376 | ``` |
---|
377 | VDST = (FLOAT)(SRC0 & 0xff) |
---|
378 | ``` |
---|
379 | |
---|
380 | #### V_CVT_F32_UBYTE1 |
---|
381 | |
---|
382 | Opcode VOP1: 18 (0x12) |
---|
383 | Opcode VOP3A: 402 (0x192) for GCN 1.0/1.1; 338 (0x152) for GCN 1.2 |
---|
384 | Syntax: V_CVT_F32_UBYTE1 VDST, SRC0 |
---|
385 | Description: Convert the second unsigned 8-bit byte from SRC0 to single FP value, |
---|
386 | and store it to VDST. |
---|
387 | Operation: |
---|
388 | ``` |
---|
389 | VDST = (FLOAT)((SRC0>>8) & 0xff) |
---|
390 | ``` |
---|
391 | |
---|
392 | #### V_CVT_F32_UBYTE2 |
---|
393 | |
---|
394 | Opcode VOP1: 19 (0x13) |
---|
395 | Opcode VOP3A: 403 (0x193) for GCN 1.0/1.1; 339 (0x153) for GCN 1.2 |
---|
396 | Syntax: V_CVT_F32_UBYTE2 VDST, SRC0 |
---|
397 | Description: Convert the third unsigned 8-bit byte from SRC0 to single FP value, |
---|
398 | and store it to VDST. |
---|
399 | Operation: |
---|
400 | ``` |
---|
401 | VDST = (FLOAT)((SRC0>>16) & 0xff) |
---|
402 | ``` |
---|
403 | |
---|
404 | #### V_CVT_F32_UBYTE3 |
---|
405 | |
---|
406 | Opcode VOP1: 20 (0x14) |
---|
407 | Opcode VOP3A: 404 (0x194) for GCN 1.0/1.1; 340 (0x154) for GCN 1.2 |
---|
408 | Syntax: V_CVT_F32_UBYTE3 VDST, SRC0 |
---|
409 | Description: Convert the fourth unsigned 8-bit byte from SRC0 to single FP value, |
---|
410 | and store it to VDST. |
---|
411 | Operation: |
---|
412 | ``` |
---|
413 | VDST = (FLOAT)(SRC0>>24) |
---|
414 | ``` |
---|
415 | |
---|
416 | #### V_CVT_F64_F32 |
---|
417 | |
---|
418 | Opcode VOP1: 16 (0x10) |
---|
419 | Opcode VOP3A: 400 (0x190) for GCN 1.0/1.1; 336 (0x150) for GCN 1.2 |
---|
420 | Syntax: V_CVT_F64_F32 VDST(2), SRC0 |
---|
421 | Description: Convert single FP value to double FP value, and store result to VDST. |
---|
422 | Operation: |
---|
423 | ``` |
---|
424 | VDST = (DOUBLE)(ASFLOAT(SRC0)) |
---|
425 | ``` |
---|
426 | |
---|
427 | #### V_CVT_F64_I32 |
---|
428 | |
---|
429 | Opcode VOP1: 4 (0x4) |
---|
430 | Opcode VOP3A: 388 (0x184) for GCN 1.0/1.1; 324 (0x144) for GCN 1.2 |
---|
431 | Syntax: V_CVT_F64_I32 VDST(2), SRC0 |
---|
432 | Description: Convert signed 32-bit integer to double FP value, and store it to VDST. |
---|
433 | Operation: |
---|
434 | ``` |
---|
435 | VDST = (DOUBLE)(INT32)SRC0 |
---|
436 | ``` |
---|
437 | |
---|
438 | #### V_CVT_F64_U32 |
---|
439 | |
---|
440 | Opcode VOP1: 22 (0x16) |
---|
441 | Opcode VOP3A: 406 (0x196) for GCN 1.0/1.1; 342 (0x156) for GCN 1.2 |
---|
442 | Syntax: V_CVT_F64_U32 VDST(2), SRC0 |
---|
443 | Description: Convert unsigned 32-bit integer to double FP value, and store it to VDST. |
---|
444 | Operation: |
---|
445 | ``` |
---|
446 | VDST = (DOUBLE)SRC0 |
---|
447 | ``` |
---|
448 | |
---|
449 | #### V_CVT_FLR_I32_F32 |
---|
450 | |
---|
451 | Opcode VOP1: 13 (0xd) |
---|
452 | Opcode VOP3A: 397 (0x18d) for GCN 1.0/1.1; 333 (0x14d) for GCN 1.2 |
---|
453 | Syntax: V_CVT_FLR_I32_F32 VDST, SRC0 |
---|
454 | Description: Convert 32-bit floating point value from SRC0 to signed 32-bit integer, and |
---|
455 | store result to VDST. Conversion uses rounding to negative infinity (floor). |
---|
456 | If value is higher/lower than maximal/minimal integer then store MAX_INT32/MIN_INT32 to VDST. |
---|
457 | If input value is NaN/-NaN then store MAX_INT32/MIN_INT32 to VDST. |
---|
458 | Operation: |
---|
459 | ``` |
---|
460 | FLOAT SF = ASFLOAT(SF) |
---|
461 | if (!ISNAN(SF)) |
---|
462 | VDST = (INT32)MAX(MIN(FLOOR(SF), 2147483647.0), -2147483648.0) |
---|
463 | else |
---|
464 | VDST = (INT32)SF>=0 ? 2147483647 : -2147483648 |
---|
465 | ``` |
---|
466 | |
---|
467 | #### V_CVT_I32_F32 |
---|
468 | |
---|
469 | Opcode VOP1: 8 (0x8) |
---|
470 | Opcode VOP3A: 392 (0x188) for GCN 1.0/1.1; 328 (0x148) for GCN 1.2 |
---|
471 | Syntax: V_CVT_I32_F32 VDST, SRC0 |
---|
472 | Description: Convert 32-bit floating point value from SRC0 to signed 32-bit integer, and |
---|
473 | store result to VDST. Conversion uses rounding to zero. If value is higher/lower than |
---|
474 | maximal/minimal integer then store MAX_INT32/MIN_INT32 to VDST. |
---|
475 | If input value is NaN then store 0 to VDST. |
---|
476 | Operation: |
---|
477 | ``` |
---|
478 | VDST = 0 |
---|
479 | if (!ISNAN(ASFLOAT(SRC0))) |
---|
480 | VDST = (INT32)MAX(MIN(RNDTZINT(ASFLOAT(SRC0)), 2147483647.0), -2147483648.0) |
---|
481 | ``` |
---|
482 | |
---|
483 | #### V_CVT_I32_F64 |
---|
484 | |
---|
485 | Opcode VOP1: 3 (0x3) |
---|
486 | Opcode VOP3A: 387 (0x183) for GCN 1.0/1.1; 323 (0x143) for GCN 1.2 |
---|
487 | Syntax: V_CVT_I32_F64 VDST, SRC0(2) |
---|
488 | Description: Convert 64-bit floating point value from SRC0 to signed 32-bit integer, and |
---|
489 | store result to VDST. Conversion uses rounding to zero. If value is higher/lower than |
---|
490 | maximal/minimal integer then store MAX_INT32/MIN_INT32 to VDST. |
---|
491 | If input value is NaN then store 0 to VDST. |
---|
492 | Operation: |
---|
493 | ``` |
---|
494 | VDST = 0 |
---|
495 | if (!ISNAN(ASDOUBLE(SRC0))) |
---|
496 | VDST = (INT32)MAX(MIN(RNDTZINT(ASDOUBLE(SRC0)), 2147483647.0), -2147483648.0) |
---|
497 | ``` |
---|
498 | |
---|
499 | #### V_CVT_OFF_F32_I4 |
---|
500 | |
---|
501 | Opcode VOP1: 14 (0xe) |
---|
502 | Opcode VOP3A: 398 (0x18e) for GCN 1.0/1.1; 334 (0x14e) for GCN 1.2 |
---|
503 | Syntax: V_CVT_OFF_F32_I4 VDST, SRC0 |
---|
504 | Description: Convert 4-bit signed value from SRC0 to floating point value, normalize that |
---|
505 | value to range -0.5:0.4375 and store result to VDST. |
---|
506 | Operation: |
---|
507 | ``` |
---|
508 | VDST = (FLOAT)((SRC0 & 0xf) ^ 8) / 16.0 - 0.5 |
---|
509 | ``` |
---|
510 | |
---|
511 | #### V_CVT_RPI_I32_F32 |
---|
512 | |
---|
513 | Opcode VOP1: 12 (0xc) |
---|
514 | Opcode VOP3A: 396 (0x18c) for GCN 1.0/1.1; 332 (0x14c) for GCN 1.2 |
---|
515 | Syntax: V_CVT_RPI_I32_F32 VDST, SRC0 |
---|
516 | Description: Convert 32-bit floating point value from SRC0 to signed 32-bit integer, and |
---|
517 | store result to VDST. Conversion adds 0.5 to value and rounds negative infinity (floor). |
---|
518 | If value is higher/lower than maximal/minimal integer then store MAX_INT32/MIN_INT32 to |
---|
519 | VDST. If input value is NaN/-NaN then store MAX_INT32/MIN_INT32 to VDST. |
---|
520 | Operation: |
---|
521 | ``` |
---|
522 | FLOAT SF = ASFLOAT(SRC0) |
---|
523 | if (!ISNAN(SF)) |
---|
524 | VDST = (INT32)MAX(MIN(FLOOR(SF + 0.5), 2147483647.0), -2147483648.0) |
---|
525 | else |
---|
526 | VDST = (INT32)SF>=0 ? 2147483647 : -2147483648 |
---|
527 | ``` |
---|
528 | |
---|
529 | #### V_CVT_U32_F32 |
---|
530 | |
---|
531 | Opcode VOP1: 7 (0x7) |
---|
532 | Opcode VOP3A: 391 (0x187) for GCN 1.0/1.1; 327 (0x147) for GCN 1.2 |
---|
533 | Syntax: V_CVT_U32_F32 VDST, SRC0 |
---|
534 | Description: Convert 32-bit floating point value from SRC0 to unsigned 32-bit integer, and |
---|
535 | store result to VDST. Conversion uses rounding to zero. If value is higher than |
---|
536 | maximal integer then store MAX_UINT32 to VDST. |
---|
537 | If input value is NaN then store 0 to VDST. |
---|
538 | Operation: |
---|
539 | ``` |
---|
540 | VDST = 0 |
---|
541 | if (!ISNAN(ASFLOAT(SRC0))) |
---|
542 | VDST = (UINT32)MIN(RNDTZINT(ASFLOAT(SRC0)), 4294967295.0) |
---|
543 | ``` |
---|
544 | |
---|
545 | #### V_CVT_U32_F64 |
---|
546 | |
---|
547 | Opcode VOP1: 21 (0x15) |
---|
548 | Opcode VOP3A: 405 (0x195) for GCN 1.0/1.1; 341 (0x155) for GCN 1.2 |
---|
549 | Syntax: V_CVT_U32_F64 VDST, SRC0(2) |
---|
550 | Description: Convert 64-bit floating point value from SRC0 to unsigned 32-bit integer, and |
---|
551 | store result to VDST. Conversion uses rounding to zero. If value is higher than |
---|
552 | maximal integer then store MAX_UINT32 to VDST. |
---|
553 | If input value is NaN then store 0 to VDST. |
---|
554 | Operation: |
---|
555 | ``` |
---|
556 | VDST = 0 |
---|
557 | if (!ISNAN(ASDOUBLE(SRC0))) |
---|
558 | VDST = (UINT32)MIN(RNDTZINT(ASDOUBLE(SRC0)), 4294967295.0) |
---|
559 | ``` |
---|
560 | |
---|
561 | #### V_EXP_F32 |
---|
562 | |
---|
563 | Opcode VOP1: 37 (0x25) for GCN 1.0/1.1; 32 (0x20) for GCN 1.2 |
---|
564 | Opcode VOP3A: 421 (0x1a5) for GCN 1.0/1.1; 352 (0x160) for GCN 1.2 |
---|
565 | Syntax: V_EXP_F32 VDST, SRC0 |
---|
566 | Description: Approximate power of two from FP value SRC0 and store it to VDST. Instruction |
---|
567 | for values smaller than -126.0 always returns 0 regardless floatmode in MODE register. |
---|
568 | Operation: |
---|
569 | ``` |
---|
570 | if (ASFLOAT(SRC0)>=-126.0) |
---|
571 | VDST = APPROX_POW2(ASFLOAT(SRC0)) |
---|
572 | else |
---|
573 | VDST = 0.0 |
---|
574 | ``` |
---|
575 | |
---|
576 | ### V_EXP_LEGACY_F32 |
---|
577 | |
---|
578 | Opcode VOP1: 70 (0x46) for GCN 1.1; 75 (0x4b) for GCN 1.2 |
---|
579 | Opcode VOP3A: 454 (0x1c6) for GCN 1.1; 395 (0x18b) for GCN 1.2 |
---|
580 | Syntax: V_EXP_LEGACY_F32 VDST, SRC0 |
---|
581 | Description: Approximate power of two from FP value SRC0 and store it to VDST. Instruction |
---|
582 | for values smaller than -126.0 always returns 0 regardless floatmode in MODE register. |
---|
583 | For some cases this instructions returns slightly less accurate result than V_EXP_F32. |
---|
584 | Operation: |
---|
585 | ``` |
---|
586 | if (ASFLOAT(SRC0)>=-126.0) |
---|
587 | VDST = APPROX_POW2(ASFLOAT(SRC0)) |
---|
588 | else |
---|
589 | VDST = 0.0 |
---|
590 | ``` |
---|
591 | |
---|
592 | #### V_FFBH_U32 |
---|
593 | |
---|
594 | Opcode VOP1: 57 (0x39) for GCN 1.0/1.1; 45 (0x2d) for GCN 1.2 |
---|
595 | Opcode VOP3A: 441 (0x1b9) for GCN 1.0/1.1; 365 (0x16d) for GCN 1.2 |
---|
596 | Syntax: V_FFBH_U32 VDST, SRC0 |
---|
597 | Description: Find last one bit in SRC0. If found, store number of skipped bits to VDST, |
---|
598 | otherwise set VDST to -1. |
---|
599 | Operation: |
---|
600 | ``` |
---|
601 | VDST = -1 |
---|
602 | for (INT8 i = 31; i >= 0; i--) |
---|
603 | if ((1U<<i) & SRC0) != 0) |
---|
604 | { VDST = 31-i; break; } |
---|
605 | ``` |
---|
606 | |
---|
607 | #### V_FFBH_I32 |
---|
608 | |
---|
609 | Opcode VOP1: 59 (0x3b) for GCN 1.0/1.1; 47 (0x2f) for GCN 1.2 |
---|
610 | Opcode VOP3A: 443 (0x1bb) for GCN 1.0/1.1; 367 (0x16f) for GCN 1.2 |
---|
611 | Syntax: V_FFBH_I32 VDST, SRC0 |
---|
612 | Description: Find last opposite bit to sign in SRC0. If found, store number of skipped bits |
---|
613 | to VDST, otherwise set VDST to -1. |
---|
614 | Operation: |
---|
615 | ``` |
---|
616 | VDST = -1 |
---|
617 | UINT32 bitval = (INT32)SRC0>=0 ? 1 : 0 |
---|
618 | for (INT8 i = 31; i >= 0; i--) |
---|
619 | if ((1U<<i) & SRC0) == (bitval<<i)) |
---|
620 | { VDST = 31-i; break; } |
---|
621 | ``` |
---|
622 | |
---|
623 | #### V_FFBL_B32 |
---|
624 | |
---|
625 | Opcode VOP1: 58 (0x3a) for GCN 1.0/1.1; 46 (0x2e) for GCN 1.2 |
---|
626 | Opcode VOP3A: 442 (0x1ba) for GCN 1.0/1.1; 366 (0x16e) for GCN 1.2 |
---|
627 | Syntax: V_FFBL_B32 VDST, SRC0 |
---|
628 | Description: Find first one bit in SRC0. If found, store number of bit to VDST, |
---|
629 | otherwise set VDST to -1. |
---|
630 | Operation: |
---|
631 | ``` |
---|
632 | VDST = -1 |
---|
633 | for (UINT8 i = 0; i < 32; i++) |
---|
634 | if ((1U<<i) & SRC0) != 0) |
---|
635 | { VDST = i; break; } |
---|
636 | ``` |
---|
637 | |
---|
638 | #### V_FLOOR_F32 |
---|
639 | |
---|
640 | Opcode VOP1: 36 (0x24) for GCN 1.0/1.1; 31 (0x1f) for GCN 1.2 |
---|
641 | Opcode VOP3A: 420 (0x1a4) for GCN 1.0/1.1; 351 (0x15f) for GCN 1.2 |
---|
642 | Syntax: V_FLOOR_F32 VDST, SRC0 |
---|
643 | Description: Truncate floating point value SRC0 with rounding to positive infinity |
---|
644 | (flooring), and store result to VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
645 | Operation: |
---|
646 | ``` |
---|
647 | VDST = FLOOR(ASFLOAT(SRC0)) |
---|
648 | ``` |
---|
649 | |
---|
650 | #### V_FLOOR_F64 |
---|
651 | |
---|
652 | Opcode VOP1: 26 (0x1a) for GCN 1.1/1.2 |
---|
653 | Opcode VOP3A: 410 (0x19a) for GCN 1.1; 346 (0x15a) for GCN 1.2 |
---|
654 | Syntax: V_FLOOR_F64 VDST(2), SRC0(2) |
---|
655 | Description: Truncate double floating point value SRC0 with rounding to positive infinity |
---|
656 | (flooring), and store result to VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
657 | Operation: |
---|
658 | ``` |
---|
659 | VDST = FLOOR(ASDOUBLE(SRC0)) |
---|
660 | ``` |
---|
661 | |
---|
662 | #### V_FRACT_F32 |
---|
663 | |
---|
664 | Opcode VOP1: 32 (0x20) for GCN 1.0/1.1; 27 (0x1b) for GCN 1.2 |
---|
665 | Opcode VOP3A: 416 (0x1a0) for GCN 1.0/1.1; 347 (0x15b) for GCN 1.2 |
---|
666 | Syntax: V_FRACT_F32 VDST, SRC0 |
---|
667 | Description: Get fractional from floating point value SRC0 and store it to VDST. |
---|
668 | Fractional will be computed by subtracting floor(SRC0) from SRC0. |
---|
669 | If SRC0 is infinity or NaN then NaN with proper sign is stored to VDST. |
---|
670 | Operation: |
---|
671 | ``` |
---|
672 | FLOAT SF = ASFLOAT(SRC0) |
---|
673 | if (!ISNAN(SF) && SF!=-INF && SF!=INF) |
---|
674 | VDST = SF - FLOOR(ASFLOAT(SF)) |
---|
675 | else |
---|
676 | VDST = NAN * SIGN(SF) |
---|
677 | ``` |
---|
678 | |
---|
679 | #### V_FRACT_F64 |
---|
680 | |
---|
681 | Opcode VOP1: 62 (0x3e) for GCN 1.0/1.1; 52 (0x32) for GCN 1.2 |
---|
682 | Opcode VOP3A: 446 (0x1be) for GCN 1.0/1.1; 372 (0x172) for GCN 1.2 |
---|
683 | Syntax: V_FRACT_F64 VDST(2), SRC0(2) |
---|
684 | Description: Get fractional from double floating point value SRC0 and store it to VDST. |
---|
685 | Fractional will be computed by subtracting floor(SRC0) from SRC0. |
---|
686 | If SRC0 is infinity or NaN then NaN with proper sign is stored to VDST. |
---|
687 | Operation: |
---|
688 | ``` |
---|
689 | FLOAT SD = ASDOUBLE(SRC0) |
---|
690 | if (!ISNAN(SD) && SD!=-INF && SD!=INF) |
---|
691 | VDST = SD - FLOOR(ASDOUBLE(SD)) |
---|
692 | else |
---|
693 | VDST = NAN * SIGN(SD) |
---|
694 | ``` |
---|
695 | |
---|
696 | #### V_FREXP_EXP_I32_F32 |
---|
697 | |
---|
698 | Opcode VOP1: 63 (0x3f) for GCN 1.0/1.1; 51 (0x33) for GCN 1.2 |
---|
699 | Opcode VOP3A: 447 (0x1bf) for GCN 1.0/1.1; 371 (0x173) for GCN 1.2 |
---|
700 | Syntax: V_FREXP_EXP_I32_F32 VDST, SRC0 |
---|
701 | Description: Get exponent plus 1 from single FP value SRC0, and store that exponent to VDST. |
---|
702 | This instruction realizes frexp function. |
---|
703 | If SRC0 is infinity or NAN then store -1 to VDST. |
---|
704 | Operation: |
---|
705 | ``` |
---|
706 | FLOAT SF = ASFLOAT(SRC0) |
---|
707 | if (ABS(SF) != INF && !ISNAN(SF)) |
---|
708 | VDST = FREXP_EXP(SF) |
---|
709 | else |
---|
710 | VDST = -1 |
---|
711 | ``` |
---|
712 | |
---|
713 | #### V_FREXP_EXP_I32_F64 |
---|
714 | |
---|
715 | Opcode VOP1: 60 (0x3c) for GCN 1.0/1.1; 48 (0x30) for GCN 1.2 |
---|
716 | Opcode VOP3A: 444 (0x1bc) for GCN 1.0/1.1; 368 (0x170) for GCN 1.2 |
---|
717 | Syntax: V_FREXP_EXP_I32_F64 VDST, SRC0(2) |
---|
718 | Description: Get exponent plus 1 from double FP value SRC0, and store that exponent to VDST. |
---|
719 | This instruction realizes frexp function. |
---|
720 | If SRC0 is infinity or NAN then store -1 to VDST. |
---|
721 | Operation: |
---|
722 | ``` |
---|
723 | DOUBLE SD = ASDOUBLE(SRC0) |
---|
724 | if (ABS(SD) != INF && !ISNAN(SD)) |
---|
725 | VDST = FREXP_EXP(SD) |
---|
726 | else |
---|
727 | VDST = -1 |
---|
728 | ``` |
---|
729 | |
---|
730 | #### V_FREXP_MANT_F32 |
---|
731 | |
---|
732 | Opcode VOP1: 64 (0x40) for GCN 1.0/1.1; 52 (0x34) for GCN 1.2 |
---|
733 | Opcode VOP3A: 448 (0x1c0) for GCN 1.0/1.1; 372 (0x174) for GCN 1.2 |
---|
734 | Syntax: V_FREXP_MANT_F32 VDST, SRC0 |
---|
735 | Description: Get mantisa from double FP value SRC0, and store it to VDST. Mantisa includes |
---|
736 | sign of input. If SRC0 is infinity then store -NAN to VDST. |
---|
737 | Operation: |
---|
738 | ``` |
---|
739 | FLOAT SF = ASFLOAT(SRC0) |
---|
740 | if (ABS(SF) == INF) |
---|
741 | VDST = -NAN |
---|
742 | else if (!ISNAN(SF)) |
---|
743 | VDST = FREXP_MANT(SF) * SIGN(SF) |
---|
744 | else |
---|
745 | VDST = NAN * SIGN(SF) |
---|
746 | ``` |
---|
747 | |
---|
748 | #### V_FREXP_MANT_F64 |
---|
749 | |
---|
750 | Opcode VOP1: 61 (0x3d) for GCN 1.0/1.1; 49 (0x31) for GCN 1.2 |
---|
751 | Opcode VOP3A: 445 (0x1bd) for GCN 1.0/1.1; 369 (0x171) for GCN 1.2 |
---|
752 | Syntax: V_FREXP_MANT_F64 VDST(2), SRC0(2) |
---|
753 | Description: Get mantisa from double FP value SRC0, and store it to VDST. Mantisa includes |
---|
754 | sign of input. If SRC0 is infinity then store -NAN to VDST. |
---|
755 | Operation: |
---|
756 | ``` |
---|
757 | DOUBLE SD = ASDOUBLE(SRC0) |
---|
758 | if (ABS(SD) == INF) |
---|
759 | VDST = -NAN |
---|
760 | else if (!ISNAN(SD)) |
---|
761 | VDST = FREXP_MANT(SD) * SIGN(SD) |
---|
762 | else |
---|
763 | VDST = NAN * SIGN(SD) |
---|
764 | ``` |
---|
765 | |
---|
766 | #### V_LOG_CLAMP_F32 |
---|
767 | |
---|
768 | Opcode VOP1: 38 (0x26) for GCN 1.0/1.1 |
---|
769 | Opcode VOP3A: 422 (0x1a6) for GCN 1.0/1.1 |
---|
770 | Syntax: V_LOG_CLAMP_F32 VDST, SRC0 |
---|
771 | Description: Approximate logarithm of base 2 from floating point value SRC0 with |
---|
772 | clamping infinities to -MAX_FLOAT. Result is stored in VDST. |
---|
773 | If SRC0 is negative then store -NaN to VDST. This instruction doesn't handle denormalized |
---|
774 | values regardless FLOAT MODE register setup. |
---|
775 | Operation: |
---|
776 | ``` |
---|
777 | FLOAT F = ASFLOAT(SRC0) |
---|
778 | if (F==1.0) |
---|
779 | VDST = 0.0f |
---|
780 | if (F<0.0) |
---|
781 | VDST = -NaN |
---|
782 | else |
---|
783 | { |
---|
784 | VDST = APPROX_LOG2(F) |
---|
785 | if (ASFLOAT(VDST)==-INF) |
---|
786 | VDST = -MAX_FLOAT |
---|
787 | } |
---|
788 | ``` |
---|
789 | |
---|
790 | #### V_LOG_F32 |
---|
791 | |
---|
792 | Opcode VOP1: 39 (0x27) for GCN 1.0/1.1; 33 (0x21) for GCN 1.2 |
---|
793 | Opcode VOP3A: 423 (0x1a7) for GCN 1.0/1.1; 353 (0x161) for GCN 1.2 |
---|
794 | Syntax: V_LOG_F32 VDST, SRC0 |
---|
795 | Description: Approximate logarithm of base 2 from floating point value SRC0, and store |
---|
796 | result to VDST. If SRC0 is negative then store -NaN to VDST. |
---|
797 | This instruction doesn't handle denormalized values regardless FLOAT MODE register setup. |
---|
798 | Operation: |
---|
799 | ``` |
---|
800 | FLOAT F = ASFLOAT(SRC0) |
---|
801 | if (F==1.0) |
---|
802 | VDST = 0.0f |
---|
803 | if (F<0.0) |
---|
804 | VDST = -NaN |
---|
805 | else |
---|
806 | VDST = APPROX_LOG2(F) |
---|
807 | ``` |
---|
808 | |
---|
809 | #### V_LOG_LEGACY_F32 |
---|
810 | |
---|
811 | Opcode VOP1: 69 (0x45) for GCN 1.1; 76 (0x4c) for GCN 1.2 |
---|
812 | Opcode VOP3A: 453 (0x1c5) for GCN 1.1; 396 (0x18c) for GCN 1.2 |
---|
813 | Syntax: V_LOG_LEGACY_F32 VDST, SRC0 |
---|
814 | Description: Approximate logarithm of base 2 from floating point value SRC0, and store |
---|
815 | result to VDST. If SRC0 is negative then store -NaN to VDST. |
---|
816 | This instruction doesn't handle denormalized values regardless FLOAT MODE register setup. |
---|
817 | This instruction returns slightly different results than V_LOG_F32. |
---|
818 | Operation: |
---|
819 | ``` |
---|
820 | FLOAT F = ASFLOAT(SRC0) |
---|
821 | if (F==1.0) |
---|
822 | VDST = 0.0f |
---|
823 | if (F<0.0) |
---|
824 | VDST = -NaN |
---|
825 | else |
---|
826 | VDST = APPROX_LOG2(F) |
---|
827 | ``` |
---|
828 | |
---|
829 | #### V_MOV_B32 |
---|
830 | |
---|
831 | Opcode VOP1: 1 (0x1) |
---|
832 | Opcode VOP3A: 385 (0x181) for GCN 1.0/1.1; 321 (0x141) for GCN 1.2 |
---|
833 | Syntax: V_MOV_B32 VDST, SRC0 |
---|
834 | Description: Move SRC0 into VDST. |
---|
835 | Operation: |
---|
836 | ``` |
---|
837 | VDST = SRC0 |
---|
838 | ``` |
---|
839 | |
---|
840 | #### V_MOV_FED_B32 |
---|
841 | |
---|
842 | Opcode VOP1: 9 (0x9) |
---|
843 | Opcode VOP3A: 393 (0x189) for GCN 1.0/1.1; 329 (0x149) for GCN 1.2 |
---|
844 | Syntax: V_MOV_FED_B32 VDST, SRC0 |
---|
845 | Description: Introduce edc double error upon write to dest vgpr without causing an exception |
---|
846 | (???). |
---|
847 | |
---|
848 | #### V_MOVRELD_B32 |
---|
849 | |
---|
850 | Opcode VOP1: 66 (0x42) for GCN 1.0/1.1; 54 (0x34) for GCN 1.2 |
---|
851 | Opcode VOP3A: 450 (0x1c2) for GCN 1.0/1.1; 374 (0x174) for GCN 1.2 |
---|
852 | Syntax: V_MOVRELD_B32 VDST, VSRC0 |
---|
853 | Description: Move SRC0 to VGPR[VDST_NUMBER+M0]. |
---|
854 | Operation: |
---|
855 | ``` |
---|
856 | VGPR[VDST_NUMBER+M0] = SRC0 |
---|
857 | ``` |
---|
858 | |
---|
859 | #### V_MOVRELS_B32 |
---|
860 | |
---|
861 | Opcode VOP1: 67 (0x43) for GCN 1.0/1.1; 55 (0x35) for GCN 1.2 |
---|
862 | Opcode VOP3A: 451 (0x1c3) for GCN 1.0/1.1; 375 (0x175) for GCN 1.2 |
---|
863 | Syntax: V_MOVRELS_B32 VDST, VSRC0 |
---|
864 | Description: Move SRC0[SRC0_NUMBER+M0] to VDST. |
---|
865 | Operation: |
---|
866 | ``` |
---|
867 | VDST = VGPR[SRC0_NUMBER+M0] |
---|
868 | ``` |
---|
869 | |
---|
870 | #### V_MOVRELSD_B32 |
---|
871 | |
---|
872 | Opcode VOP1: 68 (0x44) for GCN 1.0/1.1; 56 (0x36) for GCN 1.2 |
---|
873 | Opcode VOP3A: 452 (0x1c4) for GCN 1.0/1.1; 376 (0x176) for GCN 1.2 |
---|
874 | Syntax: V_MOVRELSD_B32 VDST, VSRC0 |
---|
875 | Description: Move SRC0[SRC0_NUMBER+M0] to VGPR[VDST_NUMBER+M0]. |
---|
876 | Operation: |
---|
877 | ``` |
---|
878 | VGPR[VDST_NUMBER+M0] = VGPR[SRC0_NUMBER+M0] |
---|
879 | ``` |
---|
880 | |
---|
881 | #### V_NOP |
---|
882 | |
---|
883 | Opcode VOP1: 0 (0x0) |
---|
884 | Opcode VOP3A: 384 (0x180) for GCN 1.0/1.1; 320 (0x140) for GCN 1.2 |
---|
885 | Syntax: V_NOP |
---|
886 | Description: Do nothing. |
---|
887 | |
---|
888 | #### V_NOT_B32 |
---|
889 | |
---|
890 | Opcode VOP1: 55 (0x37) for GCN 1.0/1.1; 43 (0x2b) for GCN 1.2 |
---|
891 | Opcode VOP3A: 439 (0x1b7) for GCN 1.0/1.1; 363 (0x16b) for GCN 1.2 |
---|
892 | Syntax: V_NOT_B32 VDST, SRC0 |
---|
893 | Description: Do bitwise negation on 32-bit SRC0, and store result to VDST. |
---|
894 | Operation: |
---|
895 | ``` |
---|
896 | VDST = ~SRC0 |
---|
897 | ``` |
---|
898 | |
---|
899 | #### V_RCP_CLAMP_F32 |
---|
900 | |
---|
901 | Opcode VOP1: 40 (0x28) for GCN 1.0/1.1 |
---|
902 | Opcode VOP3A: 424 (0x1a8) for GCN 1.0/1.1 |
---|
903 | Syntax: V_RCP_CLAMP_F32 VDST, SRC0 |
---|
904 | Description: Approximate reciprocal from floating point value SRC0 and store it to VDST. |
---|
905 | Guaranted error below 1ulp. Result is clamped to MAX_FLOAT including sign of a result. |
---|
906 | Operation: |
---|
907 | ``` |
---|
908 | VDST = APPROX_RCP(ASFLOAT(SRC0)) |
---|
909 | if (ABS(ASFLOAT(VDST))==INF) |
---|
910 | VDST = SIGN(ASFLOAT(VDST)) * MAX_FLOAT |
---|
911 | ``` |
---|
912 | |
---|
913 | #### V_RCP_CLAMP_F64 |
---|
914 | |
---|
915 | Opcode VOP1: 48 (0x30) for GCN 1.0/1.1 |
---|
916 | Opcode VOP3A: 432 (0x1b0) for GCN 1.0/1.1 |
---|
917 | Syntax: V_RCP_CLAMP_F64 VDST(2), SRC0(2) |
---|
918 | Description: Approximate reciprocal from double FP value SRC0 and store it to VDST. |
---|
919 | Relative error of approximation is ~1e-8. |
---|
920 | Result is clamped to MAX_DOUBLE value including sign of a result. |
---|
921 | Operation: |
---|
922 | ``` |
---|
923 | VDST = APPROX_RCP(ASDOUBLE(SRC0)) |
---|
924 | if (ABS(ASDOUBLE(VDST))==INF) |
---|
925 | VDST = SIGN(ASDOUBLE(VDST)) * MAX_DOUBLE |
---|
926 | ``` |
---|
927 | |
---|
928 | #### V_RCP_F32 |
---|
929 | |
---|
930 | Opcode VOP1: 42 (0x2a) for GCN 1.0/1.1; 34 (0x22) for GCN 1.2 |
---|
931 | Opcode VOP3A: 426 (0x1aa) for GCN 1.0/1.1; 354 (0x162) for GCN 1.2 |
---|
932 | Syntax: V_RCP_F32 VDST, SRC0 |
---|
933 | Description: Approximate reciprocal from floating point value SRC0 and store it to VDST. |
---|
934 | Guaranted error below 1ulp. |
---|
935 | Operation: |
---|
936 | ``` |
---|
937 | VDST = APPROX_RCP(ASFLOAT(SRC0)) |
---|
938 | ``` |
---|
939 | |
---|
940 | #### V_RCP_F64 |
---|
941 | |
---|
942 | Opcode VOP1: 47 (0x2f) for GCN 1.0/1.1; 37 (0x25) for GCN 1.2 |
---|
943 | Opcode VOP3A: 431 (0x1af) for GCN 1.0/1.1; 357 (0x165) for GCN 1.2 |
---|
944 | Syntax: V_RCP_F64 VDST(2), SRC0(2) |
---|
945 | Description: Approximate reciprocal from double FP value SRC0 and store it to VDST. |
---|
946 | Relative error of approximation is ~1e-8. |
---|
947 | Operation: |
---|
948 | ``` |
---|
949 | VDST = APPROX_RCP(ASDOUBLE(SRC0)) |
---|
950 | ``` |
---|
951 | |
---|
952 | #### V_RCP_IFLAG_F32 |
---|
953 | |
---|
954 | Opcode VOP1: 43 (0x2b) for GCN 1.0/1.1; 35 (0x23) for GCN 1.2 |
---|
955 | Opcode VOP3A: 427 (0x1ab) for GCN 1.0/1.1; 355 (0x163) for GCN 1.2 |
---|
956 | Syntax: V_RCP_IFLAG_F32 VDST, SRC0 |
---|
957 | Description: Approximate reciprocal from floating point value SRC0 and store it to VDST. |
---|
958 | Guaranted error below 1ulp. This instruction signals integer division by zero, instead |
---|
959 | any floating point exception when error is occurred. |
---|
960 | Operation: |
---|
961 | ``` |
---|
962 | VDST = APPROX_RCP_IFLAG(ASFLOAT(SRC0)) |
---|
963 | ``` |
---|
964 | |
---|
965 | #### V_RCP_LEGACY_F32 |
---|
966 | |
---|
967 | Opcode VOP1: 41 (0x29) for GCN 1.0/1.1 |
---|
968 | Opcode VOP3A: 425 (0x1a9) for GCN 1.0/1.1 |
---|
969 | Syntax: V_RCP_LEGACY_F32 VDST, SRC0 |
---|
970 | Description: Approximate reciprocal from floating point value SRC0 and store it to VDST. |
---|
971 | Guaranted error below 1ulp. If SRC0 or VDST is zero or infinity then store 0 with proper |
---|
972 | sign to VDST. |
---|
973 | Operation: |
---|
974 | ``` |
---|
975 | FLOAT SF = ASFLOAT(SRC0) |
---|
976 | if (ABS(SF)==0.0) |
---|
977 | VDST = SIGN(SF)*0.0 |
---|
978 | else |
---|
979 | { |
---|
980 | VDST = APPROX_RCP(SF) |
---|
981 | if (ABS(ASFLOAT(VDST)) == INF) |
---|
982 | VDST = SIGN(SF)*0.0 |
---|
983 | } |
---|
984 | ``` |
---|
985 | |
---|
986 | #### V_READFIRSTLANE_B32 |
---|
987 | |
---|
988 | Opcode VOP1: 2 (0x2) |
---|
989 | Opcode VOP3A: 386 (0x182) for GCN 1.0/1.1; 322 (0x142) for GCN 1.2 |
---|
990 | Syntax: V_READFIRSTLANE_B32 SDST, VSRC0 |
---|
991 | Description: Copy one VSRC0 lane value to one SDST. Lane (thread id) is first active lane id |
---|
992 | or first lane id all lanes are inactive. SSRC1 can be SGPR or M0. Ignores EXEC mask. |
---|
993 | Operation: |
---|
994 | ``` |
---|
995 | UINT8 firstlane = 0 |
---|
996 | for (UINT8 i = 0; i < 64; i++) |
---|
997 | if ((1ULL<<i) & EXEC) != 0) |
---|
998 | { firstlane = i; break; } |
---|
999 | SDST = VSRC0[firstlane] |
---|
1000 | ``` |
---|
1001 | |
---|
1002 | #### V_RNDNE_F32 |
---|
1003 | |
---|
1004 | Opcode VOP1: 35 (0x23) for GCN 1.0/1.1; 30 (0x1e) for GCN 1.2 |
---|
1005 | Opcode VOP3A: 420 (0x1a4) for GCN 1.0/1.1; 350 (0x15e) for GCN 1.2 |
---|
1006 | Syntax: V_RNDNE_F32 VDST, SRC0 |
---|
1007 | Description: Round floating point value SRC0 to nearest even integer, and store result to |
---|
1008 | VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
1009 | Operation: |
---|
1010 | ``` |
---|
1011 | VDST = RNDNE(ASFLOAT(SRC0)) |
---|
1012 | ``` |
---|
1013 | |
---|
1014 | #### V_RNDNE_F64 |
---|
1015 | |
---|
1016 | Opcode VOP1: 25 (0x19) for GCN 1.1/1.2 |
---|
1017 | Opcode VOP3A: 409 (0x199) for GCN 1.1; 345 (0x159) for GCN 1.2 |
---|
1018 | Syntax: V_RNDNE_F64 VDST(2), SRC0(2) |
---|
1019 | Description: Round double floating point value SRC0 to nearest even integer, |
---|
1020 | and store result to VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
1021 | Operation: |
---|
1022 | ``` |
---|
1023 | VDST = RNDNE(ASDOUBLE(SRC0)) |
---|
1024 | ``` |
---|
1025 | |
---|
1026 | #### V_RSQ_CLAMP_F32 |
---|
1027 | |
---|
1028 | Opcode VOP1: 44 (0x2c) for GCN 1.0/1.1 |
---|
1029 | Opcode VOP3A: 428 (0x1ac) for GCN 1.0/1.1 |
---|
1030 | Syntax: V_RSQ_CLAMP_F32 VDST, SRC0 |
---|
1031 | Description: Approximate reciprocal square root from floating point value SRC0 with |
---|
1032 | clamping to MAX_FLOAT, and store result to VDST. |
---|
1033 | If SRC0 is negative value, store -NAN to VDST. |
---|
1034 | This instruction doesn't handle denormalized values regardless FLOAT MODE register setup. |
---|
1035 | Operation: |
---|
1036 | ``` |
---|
1037 | VDST = APPROX_RSQRT(ASFLOAT(SRC0)) |
---|
1038 | if (ASFLOAT(VDST)==INF) |
---|
1039 | VDST = MAX_FLOAT |
---|
1040 | ``` |
---|
1041 | |
---|
1042 | #### V_RSQ_CLAMP_F64 |
---|
1043 | |
---|
1044 | Opcode VOP1: 50 (0x32) for GCN 1.0/1.1 |
---|
1045 | Opcode VOP3A: 434 (0x1b2) for GCN 1.0/1.1 |
---|
1046 | Syntax: V_RSQ_CLAMP_F64 VDST(2), SRC0(2) |
---|
1047 | Description: Approximate reciprocal square root from double floating point value SRC0 |
---|
1048 | with clamping to MAX_DOUBLE ,and store it to VDST. If SRC0 is negative value, |
---|
1049 | store -NAN to VDST. |
---|
1050 | Operation: |
---|
1051 | ``` |
---|
1052 | VDST = APPROX_RSQRT(ASDOUBLE(SRC0)) |
---|
1053 | if (ASDOUBLE(VDST)==INF) |
---|
1054 | VDST = MAX_DOUBLE |
---|
1055 | ``` |
---|
1056 | |
---|
1057 | #### V_RSQ_F32 |
---|
1058 | |
---|
1059 | Opcode VOP1: 46 (0x2e) for GCN 1.0/1.1; 36 (0x24) for GCN 1.2 |
---|
1060 | Opcode VOP3A: 430 (0x1ae) for GCN 1.0/1.1; 356 (0x164) for GCN 1.2 |
---|
1061 | Syntax: V_RSQ_F32 VDST, SRC0 |
---|
1062 | Description: Approximate reciprocal square root from floating point value SRC0 and |
---|
1063 | store it to VDST. If SRC0 is negative value, store -NAN to VDST. |
---|
1064 | This instruction doesn't handle denormalized values regardless FLOAT MODE register setup. |
---|
1065 | Operation: |
---|
1066 | ``` |
---|
1067 | VDST = APPROX_RSQRT(ASFLOAT(SRC0)) |
---|
1068 | ``` |
---|
1069 | |
---|
1070 | #### V_RSQ_F64 |
---|
1071 | |
---|
1072 | Opcode VOP1: 49 (0x31) for GCN 1.0/1.1; 38 (0x26) for GCN 1.2 |
---|
1073 | Opcode VOP3A: 433 (0x1b1) for GCN 1.0/1.1; 358 (0x166) for GCN 1.2 |
---|
1074 | Syntax: V_RSQ_F64 VDST(2), SRC0(2) |
---|
1075 | Description: Approximate reciprocal square root from double floating point value SRC0 and |
---|
1076 | store it to VDST. If SRC0 is negative value, store -NAN to VDST. |
---|
1077 | Operation: |
---|
1078 | ``` |
---|
1079 | VDST = APPROX_RSQRT(ASDOUBLE(SRC0)) |
---|
1080 | ``` |
---|
1081 | |
---|
1082 | #### V_RSQ_LEGACY_F32 |
---|
1083 | |
---|
1084 | Opcode VOP1: 45 (0x2d) for GCN 1.0/1.1 |
---|
1085 | Opcode VOP3A: 429 (0x1ad) for GCN 1.0/1.1 |
---|
1086 | Syntax: V_RCP_LEGACY_F32 VDST, SRC0 |
---|
1087 | Description: Approximate reciprocal square root from floating point value SRC0, |
---|
1088 | and store result to VDST. If SRC0 is negative value, store -NAN to VDST. |
---|
1089 | If result is zero then store 0.0 to VDST. |
---|
1090 | This instruction doesn't handle denormalized values regardless FLOAT MODE register setup. |
---|
1091 | Operation: |
---|
1092 | ``` |
---|
1093 | VDST = APPROX_RSQRT(ASFLOAT(SRC0)) |
---|
1094 | if (ASFLOAT(VDST)==INF) |
---|
1095 | VDST = 0.0 |
---|
1096 | ``` |
---|
1097 | |
---|
1098 | #### V_SIN_F32 |
---|
1099 | |
---|
1100 | Opcode VOP1: 53 (0x35) for GCN 1.0/1.1; 41 (0x29) for GCN 1.2 |
---|
1101 | Opcode VOP3A: 437 (0x1b5) for GCN 1.0/1.1; 361 (0x169) for GCN 1.2 |
---|
1102 | Syntax: V_SIN_F32 VDST, SRC0 |
---|
1103 | Description: Compute sine of FP value from SRC0. Input value must be normalized to range |
---|
1104 | 1.0 - 1.0 (-360 degree : 360 degree). If SRC0 value is out of range then store 0.0 to VDST. |
---|
1105 | If SRC0 value is infinity, store -NAN to VDST. |
---|
1106 | Operation: |
---|
1107 | ``` |
---|
1108 | FLOAT SF = ASFLOAT(SRC0) |
---|
1109 | VDST = 0.0 |
---|
1110 | if (SF >= -1.0 && SF <= 1.0) |
---|
1111 | VDST = APPROX_SIN(SF) |
---|
1112 | else if (ABS(SF)==INF) |
---|
1113 | VDST = -NAN |
---|
1114 | else if (ISNAN(SF)) |
---|
1115 | VDST = SRC0 |
---|
1116 | ``` |
---|
1117 | |
---|
1118 | #### V_SQRT_F32 |
---|
1119 | |
---|
1120 | Opcode VOP1: 51 (0x33) for GCN 1.0/1.1; 39 (0x27) for GCN 1.2 |
---|
1121 | Opcode VOP3A: 435 (0x1b3) for GCN 1.0/1.1; 359 (0x167) for GCN 1.2 |
---|
1122 | Syntax: V_SQRT_F32 VDST, SRC0 |
---|
1123 | Description: Compute square root of floating point value SRC0, and store result to VDST. |
---|
1124 | If SRC0 is negative value then store -NaN to VDST. |
---|
1125 | Operation: |
---|
1126 | ``` |
---|
1127 | if (ASFLOAT(SRC0)>=0.0) |
---|
1128 | VDST = APPROX_SQRT(ASFLOAT(SRC0)) |
---|
1129 | else |
---|
1130 | VDST = -NAN |
---|
1131 | ``` |
---|
1132 | |
---|
1133 | #### V_SQRT_F64 |
---|
1134 | |
---|
1135 | Opcode VOP1: 52 (0x34) for GCN 1.0/1.1; 40 (0x28) for GCN 1.2 |
---|
1136 | Opcode VOP3A: 436 (0x1b4) for GCN 1.0/1.1; 360 (0x168) for GCN 1.2 |
---|
1137 | Syntax: V_SQRT_F64 VDST(2), SRC0(2) |
---|
1138 | Description: Compute square root of double floating point value SRC0, and store result |
---|
1139 | to VDST. Relative error of approximation is ~1e-8. |
---|
1140 | If SRC0 is negative value then store -NaN to VDST. |
---|
1141 | Operation: |
---|
1142 | ``` |
---|
1143 | if (ASDOUBLE(SRC0)>=0.0) |
---|
1144 | VDST = APPROX_SQRT(ASDOUBLE(SRC0)) |
---|
1145 | else |
---|
1146 | VDST = -NAN |
---|
1147 | ``` |
---|
1148 | |
---|
1149 | #### V_TRUNC_F32 |
---|
1150 | |
---|
1151 | Opcode VOP1: 33 (0x21) for GCN 1.0/1.1; 28 (0x1c) for GCN 1.2 |
---|
1152 | Opcode VOP3A: 417 (0x1a1) for GCN 1.0/1.1; 348 (0x15c) for GCN 1.2 |
---|
1153 | Syntax: V_TRUNC_F32 VDST, SRC0 |
---|
1154 | Description: Get integer value from floating point value SRC0, and store (as float) |
---|
1155 | it to VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
1156 | Operation: |
---|
1157 | ``` |
---|
1158 | VDST = RNDTZ(ASFLOAT(SRC0)) |
---|
1159 | ``` |
---|
1160 | |
---|
1161 | #### V_TRUNC_F64 |
---|
1162 | |
---|
1163 | Opcode VOP1: 23 (0x17) for GCN 1.1/1.2 |
---|
1164 | Opcode VOP3A: 407 (0x197) for GCN 1.1; 343 (0x157) for GCN 1.2 |
---|
1165 | Syntax: V_TRUNC_F64 VDST(2), SRC0(2) |
---|
1166 | Description: Get integer value from double floating point value SRC0, and store (as float) |
---|
1167 | it to VDST. If SRC0 is infinity or NaN then copy SRC0 to VDST. |
---|
1168 | Operation: |
---|
1169 | ``` |
---|
1170 | VDST = RNDTZ(ASDOUBLE(SRC0)) |
---|
1171 | ``` |
---|