Changeset 1783 in CLRX


Ignore:
Timestamp:
Dec 5, 2015, 10:19:54 PM (4 years ago)
Author:
matszpk
Message:

CLRadeonExtender: Doc updates. Better descriptions for some SOP2 instructions. Added new VOP3 instruction's descriptions.
Asm/Disasm?: V_MULLIT_F32 accepts three source operands.

Location:
CLRadeonExtender/trunk
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • CLRadeonExtender/trunk/amdasm/GCNInstructions.cpp

    r1523 r1783  
    17031703    { "v_alignbit_b32",      GCNENC_VOP3A,  GCN_STDMODE,              334,  ARCH_GCN_1_0_1  },
    17041704    { "v_alignbyte_b32",     GCNENC_VOP3A,  GCN_STDMODE,              335,  ARCH_GCN_1_0_1  },
    1705     { "v_mullit_f32",        GCNENC_VOP3A,  GCN_SRC2_NONE, /* ??? */  336,  ARCH_GCN_1_0_1  },
     1705    { "v_mullit_f32",        GCNENC_VOP3A,  GCN_STDMODE,              336,  ARCH_GCN_1_0_1  },
    17061706    { "v_min3_f32",          GCNENC_VOP3A,  GCN_STDMODE,              337,  ARCH_GCN_1_0_1  },
    17071707    { "v_min3_i32",          GCNENC_VOP3A,  GCN_STDMODE,              338,  ARCH_GCN_1_0_1  },
  • CLRadeonExtender/trunk/doc/GcnInstrsSop2.md

    r1743 r1783  
    199199Syntax: S_BFE_I32 SDST, SSRC0, SSRC1 
    200200Description: Extracts bits in SSRC0 from range (SSRC1&31) with length ((SSRC1>>16)&0x7f)
    201 and extend sign from last bit of extracted value.
     201and extend sign from last bit of extracted value, and store result to SDST.
    202202If result is non-zero store 1 to SCC, otherwise store 0 to SCC. 
    203203Operation: 
     
    219219Syntax: S_BFE_I64 SDST, SSRC0, SSRC1 
    220220Description: Extracts bits in SSRC0 from range (SSRC1&63) with length ((SSRC1>>16)&0x7f)
    221 and extend sign from last bit of extracted value.
     221and extend sign from last bit of extracted value, and store result to SDST.
    222222If result is non-zero store 1 to SCC, otherwise store 0 to SCC. 
    223223Operation: 
     
    238238Opcode: 39 (0x27) for GCN 1.0/1.1; 37 (0x25) for GCN 1.2 
    239239Syntax: S_BFE_U32 SDST, SSRC0, SSRC1 
    240 Description: Extracts bits in SSRC0 from range (SSRC1&31) with length ((SSRC1>>16)&0x7f).
    241 If result is non-zero store 1 to SCC, otherwise store 0 to SCC. 
     240Description: Extracts bits in SSRC0 from range (SSRC1&31) with length ((SSRC1>>16)&0x7f),
     241and store result to SDST. If result is non-zero store 1 to SCC, otherwise store 0 to SCC. 
    242242Operation: 
    243243```
     
    257257Opcode: 41 (0x29) for GCN 1.0/1.1; 39 (0x27) for GCN 1.2 
    258258Syntax: S_BFE_U64 SDST(2), SSRC0(2), SSRC1 
    259 Description: Extracts bits in SSRC0 from range (SSRC1&63) with length ((SSRC1>>16)&0x7f).
    260 If result is non-zero store 1 to SCC, otherwise store 0 to SCC.
     259Description: Extracts bits in SSRC0 from range (SSRC1&63) with length ((SSRC1>>16)&0x7f),
     260and store result to SDST. If result is non-zero store 1 to SCC, otherwise store 0 to SCC.
    261261SDST, SSRC0 are 64-bit, SSRC1 is 32-bit. 
    262262Operation: 
  • CLRadeonExtender/trunk/doc/GcnInstrsVop3.md

    r1782 r1783  
    217217Alphabetically sorted instruction list:
    218218
     219#### V_ALIGNBIT_B32
     220
     221Opcode: 334 (0x14e) for GCN 1.0/1.1; 462 (0x1ce) for GCN 1.2 
     222Syntax: V_ALIGNBIT_B32 VDST, SRC0, SRC1, SRC2 
     223Description: Align bit. Shift right bits in 64-bit stored in SRC1 (low part) and
     224SRC0 (high part) by SRC2&31 bits, and store low 32-bit of the result in VDST. 
     225Operation: 
     226```
     227VDST = (((UINT64)SRC0)<<32) | SRC1) >> (SRC2&31)
     228```
     229
     230#### V_ALIGNBYTE_B32
     231
     232Opcode: 335 (0x14f) for GCN 1.0/1.1; 463 (0x1cf) for GCN 1.2 
     233Syntax: V_ALIGNBYTE_B32 VDST, SRC0, SRC1, SRC2 
     234Description: Align bit. Shift right bits in 64-bit stored in SRC1 (low part) and
     235SRC0 (high part) by (SRC2&3)*8 bits, and store low 32-bit of the result in VDST. 
     236Operation: 
     237```
     238VDST = (((UINT64)SRC0)<<32) | SRC1) >> ((SRC2&3)*8)
     239```
     240
     241#### V_BFE_I32
     242
     243Opcode: 329 (0x149) for GCN 1.0/1.1; 457 (0x1c9) for GCN 1.2 
     244Syntax: V_BFE_I32 VDST, SRC0, SRC1, SRC2 
     245Description: Extracts bits in SRC0 from range (SRC1&31) with length (SRC2&31)
     246and extend sign from last bit of extracted value, and store result to VDST. 
     247Operation: 
     248```
     249UINT8 shift = SRC1 & 31
     250UINT8 length = SRC2 & 31
     251if (length==0)
     252    VDST = 0
     253if (shift+length < 32)
     254    VDST = (INT32)(SRC0 << (32 - shift - length)) >> (32 - length)
     255else
     256    VDST = (INT32)SRC0 >> shift
     257```
     258
     259#### V_BFE_U32
     260
     261Opcode: 328 (0x148) for GCN 1.0/1.1; 456 (0x1c8) for GCN 1.2 
     262Syntax: V_BFE_U32 VDST, SRC0, SRC1, SRC2 
     263Description: Extracts bits in SRC0 from range SRC1&31 with length SRC2&31, and
     264store result to VDST. 
     265Operation: 
     266```
     267UINT8 shift = SRC1 & 31
     268UINT8 length = SRC2 & 31
     269if (length==0)
     270    VDST = 0
     271if (shift+length < 32)
     272    VDST = SRC0 << (32 - shift - length) >> (32 - length)
     273else
     274    VDST = SRC0 >> shift
     275```
     276
     277#### V_BFI_B32
     278
     279Opcode: 330 (0x14a) for GCN 1.0/1.1; 458 (0x1ca) for GCN 1.2 
     280Syntax: V_BFI_B32 VDST, SRC0, SRC1, SRC2 
     281Description: Replace bits in SRC2 by bits from SRC1 marked by bits in SRC0, and store result
     282to VDST. 
     283Operation: 
     284```
     285VDST = (SRC0 & SRC1) | (~SRC0 & SRC2)
     286```
     287
     288
    219289#### V_CUBEID_F32
    220290
     
    241311```
    242312
     313#### V_CUBEMA_F32
     314
     315Opcode: 327 (0x147) for GCN 1.0/1.1; 455 (0x1c7) for GCN 1.2 
     316Syntax: V_CUBEMA_F32 VDST, SRC0, SRC1, SRC2 
     317Description: Cubemap Major Axis. Choose highest absolute value from all three FP values
     318(SRC0, SRC1, SRC2) and multiply choosen FP value by two. Result is stored in VDST. 
     319Operation: 
     320```
     321FLOAT SF0 = ASFLOAT(SRC0)
     322FLOAT SF1 = ASFLOAT(SRC1)
     323FLOAT SF2 = ASFLOAT(SRC2)
     324if (ABS(SF2) >= ABS(SF1) && ABS(SF2) >= ABS(SF0))
     325    OUT = 2*SF2
     326else if (ABS(SF1) >= ABS(SF0)
     327    OUT = 2*SF1
     328else
     329    OUT = 2*SF0
     330VDST = OUT
     331```
     332
     333#### V_CUBESC_F32
     334
     335Opcode: 325 (0x145) for GCN 1.0/1.1; 453 (0x1c5) for GCN 1.2 
     336Syntax: V_CUBESC_F32 VDST, SRC0, SRC1, SRC2 
     337Description: Cubemap S coordination. Algorithm below. 
     338Operation: 
     339```
     340FLOAT SF0 = ASFLOAT(SRC0)
     341FLOAT SF1 = ASFLOAT(SRC1)
     342FLOAT SF2 = ASFLOAT(SRC2)
     343if (ABS(SF2) >= ABS(SF1) && ABS(SF2) >= ABS(SF0))
     344    OUT = SIGN((SF2) * SF0
     345else if (ABS(SF1) >= ABS(SF0)
     346    OUT = SF0
     347else
     348    OUT = -SIGN((SF0) * SF2
     349VDST = OUT
     350```
     351
     352#### V_CUBETC_F32
     353
     354Opcode: 326 (0x146) for GCN 1.0/1.1; 454 (0x1c6) for GCN 1.2 
     355Syntax: V_CUBETC_F32 VDST, SRC0, SRC1, SRC2 
     356Description: Cubemap T coordination. Algorithm below. 
     357Operation: 
     358```
     359FLOAT SF0 = ASFLOAT(SRC0)
     360FLOAT SF1 = ASFLOAT(SRC1)
     361FLOAT SF2 = ASFLOAT(SRC2)
     362if (ABS(SF2) >= ABS(SF1) && ABS(SF2) >= ABS(SF0))
     363    OUT = -SF1
     364else if (ABS(SF1) >= ABS(SF0)
     365    OUT = SIGN(SF1) * SF2
     366else
     367    OUT = -SF1
     368VDST = OUT
     369```
     370
     371#### V_FMA_F32
     372
     373Opcode: 331 (0x14b) for GCN 1.0/1.1; 459 (0x1cb) for GCN 1.2 
     374Syntax: V_FMA_F32 VDST, SRC0, SRC1, SRC2 
     375Description: Fused multiply addition on single floating point values from
     376SRC0, SRC1 and SRC2. Result stored in VDST. 
     377Operation: 
     378```
     379// SRC0*SRC1+SRC2
     380VDST = FMA(ASFLOAT(SRC0), ASFLOAT(SRC1), ASFLOAT(SRC2))
     381```
     382
     383#### V_FMA_F64
     384
     385Opcode: 332 (0x14c) for GCN 1.0/1.1; 460 (0x1cc) for GCN 1.2 
     386Syntax: V_FMA_F64 VDST(2), SRC0(2), SRC1(2), SRC2(2) 
     387Description: Fused multiply addition on double floating point values from
     388SRC0, SRC1 and SRC2. Result stored in VDST. 
     389Operation: 
     390```
     391// SRC0*SRC1+SRC2
     392VDST = FMA(ASDOUBLE(SRC0), ASDOUBLE(SRC1), ASDOUBLE(SRC2))
     393```
     394
     395#### V_LERP_U8
     396
     397Opcode: 333 (0x14d) for GCN 1.0/1.1; 461 (0x1cd) for GCN 1.2 
     398Syntax: V_LERP_U8 VDST, SRC0, SRC1, SRC2 
     399Description: For each byte of dword, calculate average from SRC0 byte and SRC1 byte with
     400rounding mode defined in first of the byte SRC2. If rounding bit is set then result for
     401that byte is rounded, otherwise truncated. All bytes will be stored in VDST. 
     402Operation: 
     403```
     404for (UINT8 i = 0; i < 4; i++)
     405{
     406    UINT8 S0 = (SRC0 >> (i*8)) & 0xff
     407    UINT8 S1 = (SRC1 >> (i*8)) & 0xff
     408    UINT8 S2 = (SRC2 >> (i*8)) & 1
     409    VDST = (VDST & ~(255U<<(i*8))) | (((S0+S1+S2) >> 1) << (i*8))
     410}
     411```
     412
    243413#### V_MAD_F32
    244414
     
    288458VDST = (UINT32)(SRC0&0xffffff) * (UINT32)(SRC1&0xffffff) + SRC2
    289459```
     460
     461#### V_MIN3_F32
     462
     463Opcode: 337 (0x151) for GCN 1.0/1.1; 465 (0x1d0) for GCN 1.2 
     464Syntax: V_MIN3_B32 VDST, SRC0, SRC1, SRC2 
     465
  • CLRadeonExtender/trunk/tests/amdasm/GCNAsmOpc11.cpp

    r1776 r1783  
    16461646    { "   v_alignbyte_b32  v55, v79, v166, v229",
    16471647        0xd29e0037U, 0x07974d4fU, true, true, "" },
    1648     { "   v_mullit_f32    v55, v79, v166", 0xd2a00037U, 0x00034d4fU, true, true, "" },
    1649     { "   v_mullit_f32    v55, s79, v166", 0xd2a00037U, 0x00034c4fU, true, true, "" },
     1648    { "   v_mullit_f32    v55, v79, v166, s0", 0xd2a00037U, 0x00034d4fU, true, true, "" },
     1649    { "   v_mullit_f32    v55, v79, v166, s0", 0xd2a00037U, 0x00034d4fU, true, true, "" },
     1650    { "   v_mullit_f32    v55, s79, v166, v229", 0xd2a00037U, 0x07974c4fU, true, true, "" },
    16501651    { "   v_min3_f32  v55, v79, v166, v229", 0xd2a20037U, 0x07974d4fU, true, true, "" },
    16511652    { "   v_min3_i32  v55, v79, v166, v229", 0xd2a40037U, 0x07974d4fU, true, true, "" },
  • CLRadeonExtender/trunk/tests/amdasm/GCNDisasmOpc11.cpp

    r1453 r1783  
    18501850    { 0xd29e0037U, 0x07974d4fU, true, "        v_alignbyte_b32 v55, v79, v166, v229\n" },
    18511851    { 0xd2a00037U, 0x07974d4fU, true, "        v_mullit_f32    "
    1852                 "v55, v79, v166 vsrc2=0x1e5\n" },
     1852                "v55, v79, v166, v229\n" },
    18531853    { 0xd2a00037U, 0x00034d4fU, true, "        v_mullit_f32    "
    1854                 "v55, v79, v166\n" },
     1854                "v55, v79, v166, s0\n" },
    18551855    { 0xd2a20037U, 0x07974d4fU, true, "        v_min3_f32      v55, v79, v166, v229\n" },
    18561856    { 0xd2a40037U, 0x07974d4fU, true, "        v_min3_i32      v55, v79, v166, v229\n" },
Note: See TracChangeset for help on using the changeset viewer.