Changeset 1740 in CLRX


Ignore:
Timestamp:
Nov 22, 2015, 1:45:10 PM (5 years ago)
Author:
matszpk
Message:

CLRadeonExtender: Doc updates: Rename THREADID to LANEID. Updates VOP2 instruction list.

Location:
CLRadeonExtender/trunk/doc
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • CLRadeonExtender/trunk/doc/GcnInstrsVop2.md

    r1739 r1740  
    105105 30 (0x1e)  | V_BFM_B32            | V_SUBBREV_U32
    106106 31 (0x1f)  | V_MAC_F32            | V_ADD_F16
     107 32 (0x20)  | V_MADMK_F32          | V_SUB_F16
     108 33 (0x21)  | V_MADAK_F32          | V_SUBREV_F16
     109 34 (0x22)  | V_BCNT_U32_B32       | V_MUL_F16
     110 35 (0x23)  | V_MBCNT_LO_U32_B32   | V_MAC_F16
     111 36 (0x24)  | V_MBCNT_HI_U32_B32   | V_MADMK_F16
     112 37 (0x25)  | V_ADD_I32            | V_MADAK_F16
     113 38 (0x26)  | V_SUB_I32            | V_ADD_U16
     114 39 (0x27)  | V_SUBREV_I32         | V_SUB_U16
     115 40 (0x28)  | V_ADDC_U32           | V_SUBREV_U16
     116 41 (0x29)  | V_SUBB_U32           | V_MUL_LO_U16
     117 42 (0x2a)  | V_SUBBREV_U32        | V_LSHLREV_B16
     118 43 (0x2b)  | V_LDEXP_F32          | V_LSHRREV_B16
     119 44 (0x2c)  | V_CVT_PKACCUM_U8_F32 | V_ASHRREV_I16
     120 45 (0x2d)  | V_CVT_PKNORM_I16_F32 | V_MAX_F16
     121 46 (0x2e)  | V_CVT_PKNORM_U16_F32 | V_MIN_F16
     122 47 (0x2f)  | V_CVT_PKRTZ_F16_F32  | V_MAX_U16
     123 48 (0x30)  | V_CVT_PK_U16_U32     | V_MAX_I16
     124 49 (0x31)  | V_CVT_PK_I16_I32     | V_MIN_U16
     125 50 (0x32)  | --                   | V_MIN_I16
     126 51 (0x33)  | --                   | V_LDEXP_F16
    107127
    108128### Instruction set
     
    119139```
    120140VDST = (FLOAT)SRC0 + (FLOAT)SRC1
     141```
     142
     143#### V_ADD_I32, V_ADD_U32
     144
     145Opcode VOP2: 37 (0x25) for GCN 1.0/1.1; 25 (0x19) for GCN 1.2 
     146Opcode VOP3b: 293 (0x125) for GCN 1.0/1.1; 281 (0x119) for GCN 1.2 
     147Syntax VOP2 GCN 1.0/1.1: V_ADD_I32 VDST, VCC, SRC0, SRC1 
     148Syntax VOP3b GCN 1.0/1.1: V_ADD_I32 VDST, SDST(2), SRC0, SRC1 
     149Syntax VOP2 GCN 1.2: V_ADD_U32 VDST, VCC, SRC0, SRC1 
     150Syntax VOP3b GCN 1.2: V_ADD_U32 VDST, SDST(2), SRC0, SRC1 
     151Description: Add SRC0 to SRC1 and store result to VDST and store carry flag to
     152SDST bit with number that equal to lane id. SDST is 64-bit. 
     153Operation: 
     154```
     155UINT64 temp = (UINT64)SRC0 + (UINT64)SRC1
     156VDST = temp
     157UINT64 mask = (1ULL<<LANEID)
     158SDST = (SDST&~mask) | ((temp >> 32) ? MASK : 0)
    121159```
    122160
     
    155193```
    156194
     195#### V_BCNT_U32_B32
     196
     197Opcode VOP2: 34 (0x22) for GCN 1.0/1.1 
     198Opcode VOP3a: 290 (0x122) for GCN 1.0/1.1 
     199Syntax: V_BCNT_U32_B32 VDST, SRC0, SRC1 
     200Description: Count bits in SRC0, adds SSRC1, and store result to VDST. 
     201Operation: 
     202```
     203VDST = SRC1
     204for (UINT8 i = 0; i < 32; i++)
     205    VDST += ((1U<<i) & SRC0) != 0
     206```
     207
     208#### V_BFM_B32
     209
     210Opcode VOP2: 30 (0x1e) for GCN 1.0/1.1 
     211Opcode VOP3a: 286 (0x11e) for GCN 1.0/1.1 
     212Syntax: V_BFM_B32 VDST, SRC0, SRC1 
     213Description: Make 32-bit bitmask from (SRC1 & 31) bit that have length (SRC0 & 31) and
     214store it to VDST. 
     215Operation: 
     216```
     217VDST = ((1U << (SRC0&31))-1) << (SRC1&31)
     218```
     219
    157220#### V_CNDMASK_B32
    158221
     
    161224Syntax VOP2: V_CNDMASK_B32 VDST, SRC0, SRC1, VCC 
    162225Syntax VOP3a: V_CNDMASK_B32 VDST, SRC0, SRC1, SSRC2(2) 
    163 Description: If bit for current thread of VCC or SDST is set then store SRC1 to VDST,
     226Description: If bit for current lane of VCC or SDST is set then store SRC1 to VDST,
    164227otherwise store SRC0 to VDST. CLAMP and OMOD modifier doesn't affect on result. 
    165228Operation: 
    166229```
    167 VDST = SSRC2&(1ULL<<THREADID) ? SRC1 : SRC0
     230VDST = SSRC2&(1ULL<<LANEID) ? SRC1 : SRC0
    168231```
    169232
     
    234297if ((FLOAT)SRC0!=0.0 && (FLOAT)SRC1!=0.0)
    235298    VDST = (FLOAT)SRC0 * (FLOAT)SRC1 + (FLOAT)VDST
     299```
     300
     301#### V_MADMK_F32
     302
     303Opcode: VOP2: 32 (0x20) for GCN 1.0/1.1; 23 (0x17) for GCN 1.2 
     304Opcode: VOP3a: 288 (0x120) for GCN 1.0/1.1; 279 (0x117) for GCN 1.2 
     305Syntax: V_MADMK_F32 VDST, SRC0, FLOATLIT, SRC1 
     306Description: Multiply FP value from SRC0 with the constant literal FLOATLIT and add
     307FP value from SRC1; and store result to VDST. Constant literal follows
     308after instruction word. 
     309Operation:
     310```
     311VDST = (FLOAT)SRC0 * (FLOAT)FLOATLIT + (FLOAT)SRC1
     312```
     313
     314#### V_MADAK_F32
     315
     316Opcode: VOP2: 33 (0x21) for GCN 1.0/1.1; 24 (0x18) for GCN 1.2 
     317Opcode: VOP3a: 289 (0x121) for GCN 1.0/1.1; 280 (0x118) for GCN 1.2 
     318Syntax: V_MADAK_F32 VDST, SRC0, SRC1, FLOATLIT 
     319Description: Multiply FP value from SRC0 with FP value from SRC1 and add
     320the constant literal FLOATLIT; and store result to VDST. Constant literal follows
     321after instruction word. 
     322Operation:
     323```
     324VDST = (FLOAT)SRC0 * (FLOAT)SRC1 + (FLOAT)FLOATLIT
    236325```
    237326
     
    286375```
    287376
     377#### V_MBCNT_HI_U32_B32
     378
     379Opcode VOP2: 36 (0x24) for GCN 1.0/1.1 
     380Opcode VOP3a: 292 (0x124) for GCN 1.0/1.1 
     381Syntax: V_MBCNT_HI_U32_B32 VDST, SRC0, SRC1 
     382Description: Make mask for all lanes ending at current lane,
     383get from that mask higher 32-bits, use it to mask SSRC0,
     384count bits in that value, and store result to VDST. 
     385Operation: 
     386```
     387UINT32 MASK = ((1ULL << (LANEID-32)) - 1ULL) & SRC0
     388VDST = SRC1
     389for (UINT8 i = 0; i < 32; i++)
     390    VDST += ((1U<<i) & MASK) != 0
     391```
     392
     393#### V_MBCNT_LO_U32_B32
     394
     395Opcode VOP2: 35 (0x23) for GCN 1.0/1.1 
     396Opcode VOP3a: 291 (0x123) for GCN 1.0/1.1 
     397Syntax: V_MBCNT_LO_U32_B32 VDST, SRC0, SRC1 
     398Description: Make mask for all lanes ending at current lane,
     399get from that mask lower 32-bits, use it to mask SSRC0,
     400count bits in that value, and store result to VDST. 
     401Operation: 
     402```
     403UINT32 MASK = ((1ULL << LANEID) - 1ULL) & SRC0
     404VDST = SRC1
     405for (UINT8 i = 0; i < 32; i++)
     406    VDST += ((1U<<i) & MASK) != 0
     407```
     408
    288409#### V_MIN_F32
    289410
     
    440561```
    441562
     563#### V_SUB_F32
     564
     565Opcode VOP2: 4 (0x4) for GCN 1.0/1.1; 2 (0x2) for GCN 1.2 
     566Opcode VOP3a: 260 (0x104) for GCN 1.0/1.1; 258 (0x102) for GCN 1.2 
     567Syntax: V_SUB_F32 VDST, SRC0, SRC1 
     568Description: Subtract FP value from SRC0 and FP value from SRC1 and store result to VDST. 
     569Operation: 
     570```
     571VDST = (FLOAT)SRC0 - (FLOAT)SRC1
     572```
     573
     574#### V_SUBREV_F32
     575
     576Opcode VOP2: 5 (0x5) for GCN 1.0/1.1; 2 (0x3) for GCN 1.2 
     577Opcode VOP3a: 261 (0x105) for GCN 1.0/1.1; 259 (0x103) for GCN 1.2 
     578Syntax: V_SUBREV_F32 VDST, SRC0, SRC1 
     579Description: Subtract FP value from SRC1 and FP value from SRC0 and store result to VDST. 
     580Operation: 
     581```
     582VDST = (FLOAT)SRC1 - (FLOAT)SRC0
     583```
     584
     585#### V_XOR_B32
     586
     587Opcode: VOP2: 29 (0x1d) for GCN 1.0/1.1; 21 (0x15) for GCN 1.2 
     588Opcode: VOP3a: 285 (0x11d) for GCN 1.0/1.1; 277 (0x115) for GCN 1.2 
     589Syntax: V_OR_B32 VDST, SRC0, SRC1 
     590Description: Do bitwise XOR operation on SRC0 and SRC1 and store result to VDST.
     591CLAMP and OMOD modifier doesn't affect on result. 
     592Operation: 
     593```
     594VDST = SRC0 ^ SRC1
     595```
     596
    442597#### V_WRITELANE_B32
    443598
     
    451606VDST[SSRC1 & 63] = SSRC0
    452607```
    453 
    454 #### V_SUB_F32
    455 
    456 Opcode VOP2: 4 (0x4) for GCN 1.0/1.1; 2 (0x2) for GCN 1.2 
    457 Opcode VOP3a: 260 (0x104) for GCN 1.0/1.1; 258 (0x102) for GCN 1.2 
    458 Syntax: V_SUB_F32 VDST, SRC0, SRC1 
    459 Description: Subtract FP value from SRC0 and FP value from SRC1 and store result to VDST. 
    460 Operation: 
    461 ```
    462 VDST = (FLOAT)SRC0 - (FLOAT)SRC1
    463 ```
    464 
    465 #### V_SUBREV_F32
    466 
    467 Opcode VOP2: 5 (0x5) for GCN 1.0/1.1; 2 (0x3) for GCN 1.2 
    468 Opcode VOP3a: 261 (0x105) for GCN 1.0/1.1; 259 (0x103) for GCN 1.2 
    469 Syntax: V_SUBREV_F32 VDST, SRC0, SRC1 
    470 Description: Subtract FP value from SRC1 and FP value from SRC0 and store result to VDST. 
    471 Operation: 
    472 ```
    473 VDST = (FLOAT)SRC1 - (FLOAT)SRC0
    474 ```
    475 
    476 #### V_XOR_B32
    477 
    478 Opcode: VOP2: 29 (0x1d) for GCN 1.0/1.1; 21 (0x15) for GCN 1.2 
    479 Opcode: VOP3a: 285 (0x11d) for GCN 1.0/1.1; 277 (0x115) for GCN 1.2 
    480 Syntax: V_OR_B32 VDST, SRC0, SRC1 
    481 Description: Do bitwise XOR operation on SRC0 and SRC1 and store result to VDST.
    482 CLAMP and OMOD modifier doesn't affect on result. 
    483 Operation: 
    484 ```
    485 VDST = SRC0 ^ SRC1
    486 ```
  • CLRadeonExtender/trunk/doc/GcnIsa.md

    r1734 r1740  
    2525Special variables:
    2626
    27 * THREADID - identifier for current thread in wave
     27* LANEID - identifier for current thread in wave
    2828
    2929Special functions:
Note: See TracChangeset for help on using the changeset viewer.