| 797 | <h4>V_ALIGNBIT_B32</h4> |
| 798 | <p>Opcode: 334 (0x14e) for GCN 1.0/1.1; 462 (0x1ce) for GCN 1.2<br /> |
| 799 | Syntax: V_ALIGNBIT_B32 VDST, SRC0, SRC1, SRC2<br /> |
| 800 | Description: Align bit. Shift right bits in 64-bit stored in SRC1 (low part) and |
| 801 | SRC0 (high part) by SRC2&31 bits, and store low 32-bit of the result in VDST.<br /> |
| 802 | Operation:<br /> |
| 803 | <code>VDST = (((UINT64)SRC0)<<32) | SRC1) >> (SRC2&31)</code></p> |
| 804 | <h4>V_ALIGNBYTE_B32</h4> |
| 805 | <p>Opcode: 335 (0x14f) for GCN 1.0/1.1; 463 (0x1cf) for GCN 1.2<br /> |
| 806 | Syntax: V_ALIGNBYTE_B32 VDST, SRC0, SRC1, SRC2<br /> |
| 807 | Description: Align bit. Shift right bits in 64-bit stored in SRC1 (low part) and |
| 808 | SRC0 (high part) by (SRC2&3)*8 bits, and store low 32-bit of the result in VDST.<br /> |
| 809 | Operation:<br /> |
| 810 | <code>VDST = (((UINT64)SRC0)<<32) | SRC1) >> ((SRC2&3)*8)</code></p> |
| 811 | <h4>V_BFE_I32</h4> |
| 812 | <p>Opcode: 329 (0x149) for GCN 1.0/1.1; 457 (0x1c9) for GCN 1.2<br /> |
| 813 | Syntax: V_BFE_I32 VDST, SRC0, SRC1, SRC2<br /> |
| 814 | Description: Extracts bits in SRC0 from range (SRC1&31) with length (SRC2&31) |
| 815 | and extend sign from last bit of extracted value, and store result to VDST.<br /> |
| 816 | Operation:<br /> |
| 817 | <code>UINT8 shift = SRC1 & 31 |
| 818 | UINT8 length = SRC2 & 31 |
| 819 | if (length==0) |
| 820 | VDST = 0 |
| 821 | if (shift+length < 32) |
| 822 | VDST = (INT32)(SRC0 << (32 - shift - length)) >> (32 - length) |
| 823 | else |
| 824 | VDST = (INT32)SRC0 >> shift</code></p> |
| 825 | <h4>V_BFE_U32</h4> |
| 826 | <p>Opcode: 328 (0x148) for GCN 1.0/1.1; 456 (0x1c8) for GCN 1.2<br /> |
| 827 | Syntax: V_BFE_U32 VDST, SRC0, SRC1, SRC2<br /> |
| 828 | Description: Extracts bits in SRC0 from range SRC1&31 with length SRC2&31, and |
| 829 | store result to VDST.<br /> |
| 830 | Operation:<br /> |
| 831 | <code>UINT8 shift = SRC1 & 31 |
| 832 | UINT8 length = SRC2 & 31 |
| 833 | if (length==0) |
| 834 | VDST = 0 |
| 835 | if (shift+length < 32) |
| 836 | VDST = SRC0 << (32 - shift - length) >> (32 - length) |
| 837 | else |
| 838 | VDST = SRC0 >> shift</code></p> |
| 839 | <h4>V_BFI_B32</h4> |
| 840 | <p>Opcode: 330 (0x14a) for GCN 1.0/1.1; 458 (0x1ca) for GCN 1.2<br /> |
| 841 | Syntax: V_BFI_B32 VDST, SRC0, SRC1, SRC2<br /> |
| 842 | Description: Replace bits in SRC2 by bits from SRC1 marked by bits in SRC0, and store result |
| 843 | to VDST.<br /> |
| 844 | Operation:<br /> |
| 845 | <code>VDST = (SRC0 & SRC1) | (~SRC0 & SRC2)</code></p> |
| 866 | <h4>V_CUBEMA_F32</h4> |
| 867 | <p>Opcode: 327 (0x147) for GCN 1.0/1.1; 455 (0x1c7) for GCN 1.2<br /> |
| 868 | Syntax: V_CUBEMA_F32 VDST, SRC0, SRC1, SRC2<br /> |
| 869 | Description: Cubemap Major Axis. Choose highest absolute value from all three FP values |
| 870 | (SRC0, SRC1, SRC2) and multiply choosen FP value by two. Result is stored in VDST.<br /> |
| 871 | Operation:<br /> |
| 872 | <code>FLOAT SF0 = ASFLOAT(SRC0) |
| 873 | FLOAT SF1 = ASFLOAT(SRC1) |
| 874 | FLOAT SF2 = ASFLOAT(SRC2) |
| 875 | if (ABS(SF2) >= ABS(SF1) && ABS(SF2) >= ABS(SF0)) |
| 876 | OUT = 2*SF2 |
| 877 | else if (ABS(SF1) >= ABS(SF0) |
| 878 | OUT = 2*SF1 |
| 879 | else |
| 880 | OUT = 2*SF0 |
| 881 | VDST = OUT</code></p> |
| 882 | <h4>V_CUBESC_F32</h4> |
| 883 | <p>Opcode: 325 (0x145) for GCN 1.0/1.1; 453 (0x1c5) for GCN 1.2<br /> |
| 884 | Syntax: V_CUBESC_F32 VDST, SRC0, SRC1, SRC2<br /> |
| 885 | Description: Cubemap S coordination. Algorithm below.<br /> |
| 886 | Operation:<br /> |
| 887 | <code>FLOAT SF0 = ASFLOAT(SRC0) |
| 888 | FLOAT SF1 = ASFLOAT(SRC1) |
| 889 | FLOAT SF2 = ASFLOAT(SRC2) |
| 890 | if (ABS(SF2) >= ABS(SF1) && ABS(SF2) >= ABS(SF0)) |
| 891 | OUT = SIGN((SF2) * SF0 |
| 892 | else if (ABS(SF1) >= ABS(SF0) |
| 893 | OUT = SF0 |
| 894 | else |
| 895 | OUT = -SIGN((SF0) * SF2 |
| 896 | VDST = OUT</code></p> |
| 897 | <h4>V_CUBETC_F32</h4> |
| 898 | <p>Opcode: 326 (0x146) for GCN 1.0/1.1; 454 (0x1c6) for GCN 1.2<br /> |
| 899 | Syntax: V_CUBETC_F32 VDST, SRC0, SRC1, SRC2<br /> |
| 900 | Description: Cubemap T coordination. Algorithm below.<br /> |
| 901 | Operation:<br /> |
| 902 | <code>FLOAT SF0 = ASFLOAT(SRC0) |
| 903 | FLOAT SF1 = ASFLOAT(SRC1) |
| 904 | FLOAT SF2 = ASFLOAT(SRC2) |
| 905 | if (ABS(SF2) >= ABS(SF1) && ABS(SF2) >= ABS(SF0)) |
| 906 | OUT = -SF1 |
| 907 | else if (ABS(SF1) >= ABS(SF0) |
| 908 | OUT = SIGN(SF1) * SF2 |
| 909 | else |
| 910 | OUT = -SF1 |
| 911 | VDST = OUT</code></p> |
| 912 | <h4>V_FMA_F32</h4> |
| 913 | <p>Opcode: 331 (0x14b) for GCN 1.0/1.1; 459 (0x1cb) for GCN 1.2<br /> |
| 914 | Syntax: V_FMA_F32 VDST, SRC0, SRC1, SRC2<br /> |
| 915 | Description: Fused multiply addition on single floating point values from |
| 916 | SRC0, SRC1 and SRC2. Result stored in VDST.<br /> |
| 917 | Operation:<br /> |
| 918 | <code>// SRC0*SRC1+SRC2 |
| 919 | VDST = FMA(ASFLOAT(SRC0), ASFLOAT(SRC1), ASFLOAT(SRC2))</code></p> |
| 920 | <h4>V_FMA_F64</h4> |
| 921 | <p>Opcode: 332 (0x14c) for GCN 1.0/1.1; 460 (0x1cc) for GCN 1.2<br /> |
| 922 | Syntax: V_FMA_F64 VDST(2), SRC0(2), SRC1(2), SRC2(2)<br /> |
| 923 | Description: Fused multiply addition on double floating point values from |
| 924 | SRC0, SRC1 and SRC2. Result stored in VDST.<br /> |
| 925 | Operation:<br /> |
| 926 | <code>// SRC0*SRC1+SRC2 |
| 927 | VDST = FMA(ASDOUBLE(SRC0), ASDOUBLE(SRC1), ASDOUBLE(SRC2))</code></p> |
| 928 | <h4>V_LERP_U8</h4> |
| 929 | <p>Opcode: 333 (0x14d) for GCN 1.0/1.1; 461 (0x1cd) for GCN 1.2<br /> |
| 930 | Syntax: V_LERP_U8 VDST, SRC0, SRC1, SRC2<br /> |
| 931 | Description: For each byte of dword, calculate average from SRC0 byte and SRC1 byte with |
| 932 | rounding mode defined in first of the byte SRC2. If rounding bit is set then result for |
| 933 | that byte is rounded, otherwise truncated. All bytes will be stored in VDST.<br /> |
| 934 | Operation:<br /> |
| 935 | <code>for (UINT8 i = 0; i < 4; i++) |
| 936 | { |
| 937 | UINT8 S0 = (SRC0 >> (i*8)) & 0xff |
| 938 | UINT8 S1 = (SRC1 >> (i*8)) & 0xff |
| 939 | UINT8 S2 = (SRC2 >> (i*8)) & 1 |
| 940 | VDST = (VDST & ~(255U<<(i*8))) | (((S0+S1+S2) >> 1) << (i*8)) |
| 941 | }</code></p> |