Changes between Version 9 and Version 10 of GcnInstrsVop2
- Timestamp:
- 11/22/15 21:00:19 (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
GcnInstrsVop2
v9 v10 206 206 <tr> 207 207 <th>Opcode</th> 208 <th>Opcode(VOP3)</th> 208 209 <th>Mnemonic (GCN1.0/1.1)</th> 209 210 <th>Mnemonic (GCN 1.2)</th> … … 213 214 <tr> 214 215 <td>0 (0x0)</td> 216 <td>256 (0x100)</td> 215 217 <td>V_CNDMASK_B32</td> 216 218 <td>V_CNDMASK_B32</td> … … 218 220 <tr> 219 221 <td>1 (0x1)</td> 222 <td>257 (0x101)</td> 220 223 <td>V_READLANE_B32</td> 221 224 <td>V_ADD_F32</td> … … 223 226 <tr> 224 227 <td>2 (0x2)</td> 228 <td>258 (0x102)</td> 225 229 <td>V_WRITELANE_B32</td> 226 230 <td>V_SUB_F32</td> … … 228 232 <tr> 229 233 <td>3 (0x3)</td> 234 <td>259 (0x103)</td> 230 235 <td>V_ADD_F32</td> 231 236 <td>V_SUBREV_F32</td> … … 233 238 <tr> 234 239 <td>4 (0x4)</td> 240 <td>260 (0x104)</td> 235 241 <td>V_SUB_F32</td> 236 242 <td>V_MUL_LEGACY_F32</td> … … 238 244 <tr> 239 245 <td>5 (0x5)</td> 246 <td>261 (0x105)</td> 240 247 <td>V_SUBREV_F32</td> 241 248 <td>V_MUL_F32</td> … … 243 250 <tr> 244 251 <td>6 (0x6)</td> 252 <td>262 (0x106)</td> 245 253 <td>V_MAC_LEGACY_F32</td> 246 254 <td>V_MUL_I32_I24</td> … … 248 256 <tr> 249 257 <td>7 (0x7)</td> 258 <td>263 (0x107)</td> 250 259 <td>V_MUL_LEGACY_F32</td> 251 260 <td>V_MUL_HI_I32_I24</td> … … 253 262 <tr> 254 263 <td>8 (0x8)</td> 264 <td>264 (0x108)</td> 255 265 <td>V_MUL_F32</td> 256 266 <td>V_MUL_U32_U24</td> … … 258 268 <tr> 259 269 <td>9 (0x9)</td> 270 <td>265 (0x109)</td> 260 271 <td>V_MUL_I32_I24</td> 261 272 <td>V_MUL_HI_U32_U24</td> … … 263 274 <tr> 264 275 <td>10 (0xa)</td> 276 <td>266 (0x10a)</td> 265 277 <td>V_MUL_HI_I32_I24</td> 266 278 <td>V_MIN_F32</td> … … 268 280 <tr> 269 281 <td>11 (0xb)</td> 282 <td>267 (0x10b)</td> 270 283 <td>V_MUL_U32_U24</td> 271 284 <td>V_MAX_F32</td> … … 273 286 <tr> 274 287 <td>12 (0xc)</td> 288 <td>268 (0x10c)</td> 275 289 <td>V_MUL_HI_U32_U24</td> 276 290 <td>V_MIN_I32</td> … … 278 292 <tr> 279 293 <td>13 (0xd)</td> 294 <td>269 (0x10d)</td> 280 295 <td>V_MIN_LEGACY_F32</td> 281 296 <td>V_MAX_I32</td> … … 283 298 <tr> 284 299 <td>14 (0xe)</td> 300 <td>270 (0x10e)</td> 285 301 <td>V_MAX_LEGACY_F32</td> 286 302 <td>V_MIN_U32</td> … … 288 304 <tr> 289 305 <td>15 (0xf)</td> 306 <td>271 (0x10f)</td> 290 307 <td>V_MIN_F32</td> 291 308 <td>V_MAX_U32</td> … … 293 310 <tr> 294 311 <td>16 (0x10)</td> 312 <td>272 (0x110)</td> 295 313 <td>V_MAX_F32</td> 296 314 <td>V_LSHRREV_B32</td> … … 298 316 <tr> 299 317 <td>17 (0x11)</td> 318 <td>273 (0x111)</td> 300 319 <td>V_MIN_I32</td> 301 320 <td>V_ASHRREV_I32</td> … … 303 322 <tr> 304 323 <td>18 (0x12)</td> 324 <td>274 (0x112)</td> 305 325 <td>V_MAX_I32</td> 306 326 <td>V_LSHLREV_B32</td> … … 308 328 <tr> 309 329 <td>19 (0x13)</td> 330 <td>275 (0x113)</td> 310 331 <td>V_MIN_U32</td> 311 332 <td>V_AND_B32</td> … … 313 334 <tr> 314 335 <td>20 (0x14)</td> 336 <td>276 (0x114)</td> 315 337 <td>V_MAX_U32</td> 316 338 <td>V_OR_B32</td> … … 318 340 <tr> 319 341 <td>21 (0x15)</td> 342 <td>277 (0x115)</td> 320 343 <td>V_LSHR_B32</td> 321 344 <td>V_XOR_B32</td> … … 323 346 <tr> 324 347 <td>22 (0x16)</td> 348 <td>278 (0x116)</td> 325 349 <td>V_LSHRREV_B32</td> 326 350 <td>V_MAC_F32</td> … … 328 352 <tr> 329 353 <td>23 (0x17)</td> 354 <td>279 (0x117)</td> 330 355 <td>V_ASHR_I32</td> 331 356 <td>V_MADMK_F32</td> … … 333 358 <tr> 334 359 <td>24 (0x18)</td> 360 <td>280 (0x118)</td> 335 361 <td>V_ASHRREV_I32</td> 336 362 <td>V_MADAK_F32</td> … … 338 364 <tr> 339 365 <td>25 (0x19)</td> 366 <td>281 (0x119)</td> 340 367 <td>V_LSHL_B32</td> 341 368 <td>V_ADD_U32</td> … … 343 370 <tr> 344 371 <td>26 (0x1a)</td> 372 <td>282 (0x11a)</td> 345 373 <td>V_LSHLREV_B32</td> 346 374 <td>V_SUB_U32</td> … … 348 376 <tr> 349 377 <td>27 (0x1b)</td> 378 <td>283 (0x11b)</td> 350 379 <td>V_AND_B32</td> 351 380 <td>V_SUBREV_U32</td> … … 353 382 <tr> 354 383 <td>28 (0x1c)</td> 384 <td>284 (0x11c)</td> 355 385 <td>V_OR_B32</td> 356 386 <td>V_ADDC_U32</td> … … 358 388 <tr> 359 389 <td>29 (0x1d)</td> 390 <td>285 (0x11d)</td> 360 391 <td>V_XOR_B32</td> 361 392 <td>V_SUBB_U32</td> … … 363 394 <tr> 364 395 <td>30 (0x1e)</td> 396 <td>286 (0x11e)</td> 365 397 <td>V_BFM_B32</td> 366 398 <td>V_SUBBREV_U32</td> … … 368 400 <tr> 369 401 <td>31 (0x1f)</td> 402 <td>287 (0x11f)</td> 370 403 <td>V_MAC_F32</td> 371 404 <td>V_ADD_F16</td> … … 373 406 <tr> 374 407 <td>32 (0x20)</td> 408 <td>288 (0x120)</td> 375 409 <td>V_MADMK_F32</td> 376 410 <td>V_SUB_F16</td> … … 378 412 <tr> 379 413 <td>33 (0x21)</td> 414 <td>289 (0x121)</td> 380 415 <td>V_MADAK_F32</td> 381 416 <td>V_SUBREV_F16</td> … … 383 418 <tr> 384 419 <td>34 (0x22)</td> 420 <td>290 (0x122)</td> 385 421 <td>V_BCNT_U32_B32</td> 386 422 <td>V_MUL_F16</td> … … 388 424 <tr> 389 425 <td>35 (0x23)</td> 426 <td>291 (0x123)</td> 390 427 <td>V_MBCNT_LO_U32_B32</td> 391 428 <td>V_MAC_F16</td> … … 393 430 <tr> 394 431 <td>36 (0x24)</td> 432 <td>292 (0x124)</td> 395 433 <td>V_MBCNT_HI_U32_B32</td> 396 434 <td>V_MADMK_F16</td> … … 398 436 <tr> 399 437 <td>37 (0x25)</td> 438 <td>293 (0x125)</td> 400 439 <td>V_ADD_I32</td> 401 440 <td>V_MADAK_F16</td> … … 403 442 <tr> 404 443 <td>38 (0x26)</td> 444 <td>294 (0x126)</td> 405 445 <td>V_SUB_I32</td> 406 446 <td>V_ADD_U16</td> … … 408 448 <tr> 409 449 <td>39 (0x27)</td> 450 <td>295 (0x127)</td> 410 451 <td>V_SUBREV_I32</td> 411 452 <td>V_SUB_U16</td> … … 413 454 <tr> 414 455 <td>40 (0x28)</td> 456 <td>296 (0x128)</td> 415 457 <td>V_ADDC_U32</td> 416 458 <td>V_SUBREV_U16</td> … … 418 460 <tr> 419 461 <td>41 (0x29)</td> 462 <td>297 (0x129)</td> 420 463 <td>V_SUBB_U32</td> 421 464 <td>V_MUL_LO_U16</td> … … 423 466 <tr> 424 467 <td>42 (0x2a)</td> 468 <td>298 (0x12a)</td> 425 469 <td>V_SUBBREV_U32</td> 426 470 <td>V_LSHLREV_B16</td> … … 428 472 <tr> 429 473 <td>43 (0x2b)</td> 474 <td>299 (0x12b)</td> 430 475 <td>V_LDEXP_F32</td> 431 476 <td>V_LSHRREV_B16</td> … … 433 478 <tr> 434 479 <td>44 (0x2c)</td> 480 <td>300 (0x12c)</td> 435 481 <td>V_CVT_PKACCUM_U8_F32</td> 436 482 <td>V_ASHRREV_I16</td> … … 438 484 <tr> 439 485 <td>45 (0x2d)</td> 486 <td>301 (0x12d)</td> 440 487 <td>V_CVT_PKNORM_I16_F32</td> 441 488 <td>V_MAX_F16</td> … … 443 490 <tr> 444 491 <td>46 (0x2e)</td> 492 <td>302 (0x12e)</td> 445 493 <td>V_CVT_PKNORM_U16_F32</td> 446 494 <td>V_MIN_F16</td> … … 448 496 <tr> 449 497 <td>47 (0x2f)</td> 498 <td>303 (0x12f)</td> 450 499 <td>V_CVT_PKRTZ_F16_F32</td> 451 500 <td>V_MAX_U16</td> … … 453 502 <tr> 454 503 <td>48 (0x30)</td> 504 <td>304 (0x130)</td> 455 505 <td>V_CVT_PK_U16_U32</td> 456 506 <td>V_MAX_I16</td> … … 458 508 <tr> 459 509 <td>49 (0x31)</td> 510 <td>305 (0x131)</td> 460 511 <td>V_CVT_PK_I16_I32</td> 461 512 <td>V_MIN_U16</td> … … 463 514 <tr> 464 515 <td>50 (0x32)</td> 516 <td>306 (0x132)</td> 465 517 <td>--</td> 466 518 <td>V_MIN_I16</td> … … 468 520 <tr> 469 521 <td>51 (0x33)</td> 522 <td>307 (0x133)</td> 470 523 <td>--</td> 471 524 <td>V_LDEXP_F16</td> … … 481 534 Description: Add two FP value from SRC0 and SRC1 and store result to VDST.<br /> 482 535 Operation:<br /> 483 <code>VDST = (FLOAT)SRC0 + (FLOAT)SRC1</code></p>536 <code>VDST = ASFLOAT(SRC0) + ASFLOAT(SRC1)</code></p> 484 537 <h4>V_ADD_I32, V_ADD_U32</h4> 485 538 <p>Opcode VOP2: 37 (0x25) for GCN 1.0/1.1; 25 (0x19) for GCN 1.2<br /> … … 538 591 Description: Count bits in SRC0, adds SSRC1, and store result to VDST.<br /> 539 592 Operation:<br /> 540 <code>VDST = SRC1 541 for (UINT8 i = 0; i < 32; i++) 542 VDST += ((1U<<i) & SRC0) != 0</code></p> 593 <code>VDST = SRC1 + BITCOUNT(SRC0)</code></p> 543 594 <h4>V_BFM_B32</h4> 544 595 <p>Opcode VOP2: 30 (0x1e) for GCN 1.0/1.1<br /> … … 551 602 <h4>V_CNDMASK_B32</h4> 552 603 <p>Opcode VOP2: 0 (0x0) for GCN 1.0/1.1; 1 (0x0) for GCN 1.2<br /> 553 Opcode VOP3a: 25 9(0x100) for GCN 1.0/1.1; 256 (0x100) for GCN 1.2<br />604 Opcode VOP3a: 256 (0x100) for GCN 1.0/1.1; 256 (0x100) for GCN 1.2<br /> 554 605 Syntax VOP2: V_CNDMASK_B32 VDST, SRC0, SRC1, VCC<br /> 555 606 Syntax VOP3a: V_CNDMASK_B32 VDST, SRC0, SRC1, SSRC2(2)<br /> … … 558 609 Operation:<br /> 559 610 <code>VDST = SSRC2&(1ULL<<LANEID) ? SRC1 : SRC0</code></p> 611 <h4>V_CVT_PKACCUM_U8_F32</h4> 612 <p>Opcode VOP2: 44 (0x2c) for GCN 1.0/1.1<br /> 613 Opcode VOP3a: 300 (0x12c) for GCN 1.0/1.1<br /> 614 Syntax: V_CVT_PKACCUM_U8_F32 VDST, SRC0, SRC1<br /> 615 Description: Convert floating point value from SRC0 to unsigned byte value with 616 rounding mode from MODE register, and store this byte to (SRC1&3)'th byte of VDST.<br /> 617 Operation:<br /> 618 <code>UINT8 byte = ((SRC1&3) * 8) 619 UINT32 mask = 0xff << byte 620 UINT8 VAL8 = 0 621 FLOAT f = RNDINT(ASFLOAT(SRC0)) 622 if (f > 255.0) 623 VAL8 = 255 624 else if (f < 0.0 || f == NaN) 625 VAL8 = 0 626 else 627 VAL8 = f 628 VDST = (VDST&~mask) | (((UINT32)VAL8) << byte)</code></p> 629 <h4>V_CVT_PKNORM_I16_F32</h4> 630 <p>Opcode VOP2: 45 (0x2d) for GCN 1.0/1.1<br /> 631 Opcode VOP3a: 301 (0x12d) for GCN 1.0/1.1<br /> 632 Syntax: V_CVT_PKNORM_I16_F32 VDST, SRC0, SRC1<br /> 633 Description: Convert normalized FP value from SRC0 and SRC1 to signed 16-bit integers with 634 rounding to nearest to even (??), and store first value to low 16-bit and 635 second to high 16-bit of the VDST.<br /> 636 Operation:<br /> 637 <code>INT16 roundNorm(FLOAT S) 638 { 639 FLOAT f = RNDNEINT(S*32767) 640 if (f > 32767.0) 641 return 0x7fff 642 else if (f < -32767.0) 643 return -0x7fff 644 else if (f == NaN) 645 return 0 646 return (INT16)f 647 } 648 VDST = roundNorm(ASFLOAT(SRC0)) | ((UINT32)roundNorm(ASFLOAT(SRC1)) << 16)</code></p> 649 <h4>V_CVT_PKNORM_U16_F32</h4> 650 <p>Opcode VOP2: 46 (0x2e) for GCN 1.0/1.1<br /> 651 Opcode VOP3a: 302 (0x12e) for GCN 1.0/1.1<br /> 652 Syntax: V_CVT_PKNORM_U16_F32 VDST, SRC0, SRC1<br /> 653 Description: Convert normalized FP value from SRC0 and SRC1 to unsigned 16-bit integers with 654 rounding to nearest to even (??), and store first value to low 16-bit and 655 second to high 16-bit of the VDST.<br /> 656 Operation:<br /> 657 <code>UINT16 roundNorm(FLOAT S) 658 { 659 FLOAT f = RNDNEINT(S*65535.0) 660 INT16 VAL16 = 0 661 if (f > 65535.0) 662 return 0x7fff 663 else if (f < 0.0 || f == NaN) 664 return 0 665 return (UINT16)f 666 } 667 VDST = roundNorm(ASFLOAT(SRC0)) | ((UINT32)roundNorm(ASFLOAT(SRC1)) << 16)</code></p> 668 <h4>V_CVT_PKRTZ_F16_F32</h4> 669 <p>Opcode VOP2: 47 (0x2f) for GCN 1.0/1.1<br /> 670 Opcode VOP3a: 303 (0x12f) for GCN 1.0/1.1<br /> 671 Syntax: V_CVT_PKRTZ_F16_F32 VDST, SRC0, SRC1<br /> 672 Description: Convert normalized FP value from SRC0 and SRC1 to half floating points with 673 rounding to zero, and store first value to low 16-bit and 674 second to high 16-bit of the VDST.<br /> 675 Operation:<br /> 676 <code>UINT16 D0 = ASINT16(CVT_HALF_RTZ(ASFLOAT(SRC0))) 677 UINT16 D1 = ASINT16(CVT_HALF_RTZ(ASFLOAT(SRC1))) 678 VDST = D0 | (((UINT32)D1) << 16)</code></p> 679 <h4>V_CVT_PK_U16_U32</h4> 680 <p>Opcode VOP2: 48 (0x30) for GCN 1.0/1.1<br /> 681 Opcode VOP3a: 304 (0x130) for GCN 1.0/1.1<br /> 682 Syntax: V_CVT_PK_U16_U32 VDST, SRC0, SRC1<br /> 683 Description: Convert unsigned value from SRC0 and SRC1 to unsigned 16-bit values with 684 clamping, and store first value to low 16-bit and second to high 16-bit of the VDST.<br /> 685 Operation:<br /> 686 <code>UINT16 D0 = MIN(SRC0, 0xffff) 687 UINT16 D1 = MIN(SRC1, 0xffff) 688 VDST = D0 | (((UINT32)D1) << 16)</code></p> 689 <h4>V_CVT_PK_I16_I32</h4> 690 <p>Opcode VOP2: 49 (0x31) for GCN 1.0/1.1<br /> 691 Opcode VOP3a: 305 (0x131) for GCN 1.0/1.1<br /> 692 Syntax: V_CVT_PK_I16_I32 VDST, SRC0, SRC1<br /> 693 Description: Convert signed value from SRC0 and SRC1 to signed 16-bit values with 694 clamping, and store first value to low 16-bit and second to high 16-bit of the VDST.<br /> 695 Operation:<br /> 696 <code>INT16 D0 = MAX(MIN((INT32)SRC0, 0x7fff), -0x8000) 697 INT16 D1 = MAX(MIN((INT32)SRC1, 0x7fff), -0x8000) 698 VDST = D0 | (((UINT32)D1) << 16)</code></p> 699 <h4>V_LDEXP_F32</h4> 700 <p>Opcode VOP2: 43 (0x2b) for GCN 1.0/1.1<br /> 701 Opcode VOP3a: 299 (0x12b) for GCN 1.0/1.1<br /> 702 Syntax: V_LDEXP_F32 VDST, SRC0, SRC1<br /> 703 Description: Do ldexp operation on SRC0 and SRC1 (multiply SRC0 by 2**(SRC1)). 704 SRC1 is signed integer, SRC0 is floating point value.<br /> 705 Operation:<br /> 706 <code>VDST = ASFLOAT(SRC0) * POW(2.0,SRC1)</code></p> 560 707 <h4>V_LSHL_B32</h4> 561 708 <p>Opcode VOP2: 25 (0x19) for GCN 1.0/1.1<br /> … … 592 739 Description: Multiply FP value from SRC0 by FP value from SRC1 and add result to VDST.<br /> 593 740 Operation:<br /> 594 <code>VDST = (FLOAT)SRC0 * (FLOAT)SRC1 + (FLOAT)VDST</code></p>741 <code>VDST = ASFLOAT(SRC0) * ASFLOAT(SRC1) + ASFLOAT(VDST)</code></p> 595 742 <h4>V_MAC_LEGACY_F32</h4> 596 743 <p>Opcode VOP2: 6 (0x6) for GCN 1.0/1.1<br /> … … 600 747 If one of value is 0.0 then always do not change VDST (do not apply IEEE rules for 0.0*x).<br /> 601 748 Operation:<br /> 602 <code>if ( (FLOAT)SRC0!=0.0 && (FLOAT)SRC1!=0.0)603 VDST = (FLOAT)SRC0 * (FLOAT)SRC1 + (FLOAT)VDST</code></p>749 <code>if (ASFLOAT(SRC0)!=0.0 && ASFLOAT(SRC1)!=0.0) 750 VDST = ASFLOAT(SRC0) * ASFLOAT(SRC1) + ASFLOAT(VDST)</code></p> 604 751 <h4>V_MADMK_F32</h4> 605 752 <p>Opcode: VOP2: 32 (0x20) for GCN 1.0/1.1; 23 (0x17) for GCN 1.2<br /> … … 610 757 after instruction word.<br /> 611 758 Operation: 612 <code>VDST = (FLOAT)SRC0 * (FLOAT)FLOATLIT + (FLOAT)SRC1</code></p>759 <code>VDST = ASFLOAT(SRC0) * ASFLOAT(FLOATLIT) + ASFLOAT(SRC1)</code></p> 613 760 <h4>V_MADAK_F32</h4> 614 761 <p>Opcode: VOP2: 33 (0x21) for GCN 1.0/1.1; 24 (0x18) for GCN 1.2<br /> … … 619 766 after instruction word.<br /> 620 767 Operation: 621 <code>VDST = (FLOAT)SRC0 * (FLOAT)SRC1 + (FLOAT)FLOATLIT</code></p>768 <code>VDST = ASFLOAT(SRC0) * ASFLOAT(SRC1) + ASFLOAT(FLOATLIT)</code></p> 622 769 <h4>V_MAX_F32</h4> 623 770 <p>Opcode VOP2: 16 (0x10) for GCN 1.0/1.1; 11 (0xb) for GCN 1.2<br /> … … 627 774 and store result to VDST.<br /> 628 775 Operation:<br /> 629 <code>VDST = (FLOAT)SRC0>(FLOAT)SRC1 ? (FLOAT)SRC0 : (FLOAT)SRC1</code></p>776 <code>VDST = MAX(ASFLOAT(SRC0), ASFLOAT(SRC1))</code></p> 630 777 <h4>V_MAX_I32</h4> 631 <p>Opcode VOP2: 18 (0x12) for GCN 1.0/1.1; 1 1(0xd) for GCN 1.2<br />632 Opcode VOP3a: 274 (0x112) for GCN 1.0/1.1; 26 7(0x10d) for GCN 1.2<br />778 <p>Opcode VOP2: 18 (0x12) for GCN 1.0/1.1; 13 (0xd) for GCN 1.2<br /> 779 Opcode VOP3a: 274 (0x112) for GCN 1.0/1.1; 269 (0x10d) for GCN 1.2<br /> 633 780 Syntax: V_MAX_I32 VDST, SRC0, SRC1<br /> 634 781 Description: Choose largest signed value from SRC0 and SRC1, and store result to VDST.<br /> 635 782 Operation:<br /> 636 <code>VDST = (INT32)SRC0>(INT32)SRC1 ? SRC0 : SRC1</code></p>783 <code>VDST = MAX((INT32)SRC0, (INT32)SRC1)</code></p> 637 784 <h4>V_MAX_LEGACY_F32</h4> 638 785 <p>Opcode VOP2: 14 (0xe) for GCN 1.0/1.1<br /> … … 643 790 (legacy rules for handling NaNs).<br /> 644 791 Operation:<br /> 645 <code>if ( (FLOAT)SRC1!=NaN)646 VDST = (FLOAT)SRC0>(FLOAT)SRC1 ? (FLOAT)SRC0 : (FLOAT)SRC1792 <code>if (ASFLOAT(SRC1)!=NaN) 793 VDST = MAX(ASFLOAT(SRC0), ASFLOAT(SRC1)) 647 794 else 648 795 VDST = NaN</code></p> 649 796 <h4>V_MAX_U32</h4> 650 <p>Opcode VOP2: 20 (0x14) for GCN 1.0/1.1; 1 3(0xf) for GCN 1.2<br />651 Opcode VOP3a: 276 (0x114) for GCN 1.0/1.1; 2 69(0x10f) for GCN 1.2<br />797 <p>Opcode VOP2: 20 (0x14) for GCN 1.0/1.1; 15 (0xf) for GCN 1.2<br /> 798 Opcode VOP3a: 276 (0x114) for GCN 1.0/1.1; 271 (0x10f) for GCN 1.2<br /> 652 799 Syntax: V_MAX_U32 VDST, SRC0, SRC1<br /> 653 800 Description: Choose largest unsigned value from SRC0 and SRC1, and store result to VDST.<br /> 654 801 Operation:<br /> 655 <code>VDST = SRC0>SRC1 ? SRC0 : SRC1</code></p>802 <code>VDST = MAX(SRC0, SRC1)</code></p> 656 803 <h4>V_MBCNT_HI_U32_B32</h4> 657 804 <p>Opcode VOP2: 36 (0x24) for GCN 1.0/1.1<br /> … … 663 810 Operation:<br /> 664 811 <code>UINT32 MASK = ((1ULL << (LANEID-32)) - 1ULL) & SRC0 665 VDST = SRC1 666 for (UINT8 i = 0; i < 32; i++) 667 VDST += ((1U<<i) & MASK) != 0</code></p> 812 VDST = SRC1 + BITCOUNT(MASK)</code></p> 668 813 <h4>V_MBCNT_LO_U32_B32</h4> 669 814 <p>Opcode VOP2: 35 (0x23) for GCN 1.0/1.1<br /> … … 675 820 Operation:<br /> 676 821 <code>UINT32 MASK = ((1ULL << LANEID) - 1ULL) & SRC0 677 VDST = SRC1 678 for (UINT8 i = 0; i < 32; i++) 679 VDST += ((1U<<i) & MASK) != 0</code></p> 822 VDST = SRC1 + BITCOUNT(MASK)</code></p> 680 823 <h4>V_MIN_F32</h4> 681 824 <p>Opcode VOP2: 15 (0xf) for GCN 1.0/1.1; 10 (0xa) for GCN 1.2<br /> … … 685 828 and store result to VDST.<br /> 686 829 Operation:<br /> 687 <code>VDST = (FLOAT)SRC0<(FLOAT)SRC1 ? (FLOAT)SRC0 : (FLOAT)SRC1</code></p>830 <code>VDST = MIN(ASFLOAT(SRC0), ASFLOAT(SRC1))</code></p> 688 831 <h4>V_MIN_I32</h4> 689 832 <p>Opcode VOP2: 17 (0x11) for GCN 1.0/1.1; 12 (0xc) for GCN 1.2<br /> … … 692 835 Description: Choose smallest signed value from SRC0 and SRC1, and store result to VDST.<br /> 693 836 Operation:<br /> 694 <code>VDST = (INT32)SRC0<(INT32)SRC1 ? SRC0 : SRC1</code></p>837 <code>VDST = MIN((INT32)SRC0, (INT32)SRC1)</code></p> 695 838 <h4>V_MIN_LEGACY_F32</h4> 696 839 <p>Opcode VOP2: 13 (0xd) for GCN 1.0/1.1<br /> … … 701 844 (legacy rules for handling NaNs).<br /> 702 845 Operation:<br /> 703 <code>if ( (FLOAT)SRC1!=NaN)704 VDST = (FLOAT)SRC0<(FLOAT)SRC1 ? (FLOAT)SRC0 : (FLOAT)SRC1846 <code>if (ASFLOAT(SRC1)!=NaN) 847 VDST = MIN(ASFLOAT(SRC0), ASFLOAT(SRC1)) 705 848 else 706 849 VDST = NaN</code></p> … … 711 854 Description: Choose smallest unsigned value from SRC0 and SRC1, and store result to VDST.<br /> 712 855 Operation:<br /> 713 <code>VDST = SRC0<SRC1 ? SRC0 : SRC1</code></p>856 <code>VDST = MIN(SRC0, SRC1)</code></p> 714 857 <h4>V_MUL_LEGACY_F32</h4> 715 858 <p>Opcode VOP2: 7 (0x7) for GCN 1.0/1.1; 5 (0x4) for GCN 1.2<br /> … … 719 862 If one of value is 0.0 then always store 0.0 to VDST (do not apply IEEE rules for 0.0*x).<br /> 720 863 Operation:<br /> 721 <code>if ( (FLOAT)SRC0!=0.0 && (FLOAT)SRC1!=0.0)722 VDST = (FLOAT)SRC0 * (FLOAT)SRC1864 <code>if (ASFLOAT(SRC0)!=0.0 && ASFLOAT(SRC1)!=0.0) 865 VDST = ASFLOAT(SRC0) * ASFLOAT(SRC1) 723 866 else 724 867 VDST = 0.0</code></p> … … 729 872 Description: Multiply FP value from SRC0 by FP value from SRC1 and store result to VDST.<br /> 730 873 Operation:<br /> 731 <code>VDST = (FLOAT)SRC0 * (FLOAT)SRC1</code></p>874 <code>VDST = ASFLOAT(SRC0) * ASFLOAT(SRC1)</code></p> 732 875 <h4>V_MUL_HI_I32_24</h4> 733 876 <p>Opcode VOP2: 10 (0xa) for GCN 1.0/1.1; 7 (0x7) for GCN 1.2<br /> … … 790 933 Description: Subtract FP value of SRC1 from FP value of SRC0 and store result to VDST.<br /> 791 934 Operation:<br /> 792 <code>VDST = (FLOAT)SRC0 - (FLOAT)SRC1</code></p>935 <code>VDST = ASFLOAT(SRC0) - ASFLOAT(SRC1)</code></p> 793 936 <h4>V_SUB_I32, V_SUB_U32</h4> 794 937 <p>Opcode VOP2: 38 (0x26) for GCN 1.0/1.1; 26 (0x1a) for GCN 1.2<br /> … … 826 969 Description: Subtract FP value of SRC0 from FP value of SRC1 and store result to VDST.<br /> 827 970 Operation:<br /> 828 <code>VDST = (FLOAT)SRC1 - (FLOAT)SRC0</code></p>971 <code>VDST = ASFLOAT(SRC1) - ASFLOAT(SRC0)</code></p> 829 972 <h4>V_SUBBREV_U32</h4> 830 973 <p>Opcode VOP2: 42 (0x2a) for GCN 1.0/1.1; 30 (0x1e) for GCN 1.2<br />