Changeset 3138 in CLRX


Ignore:
Timestamp:
Jun 5, 2017, 8:04:18 AM (2 years ago)
Author:
matszpk
Message:

CLRadeonExtender: CLRXDocs: Describe VOP_DPP encoding. Add examples to VOP_SDWA encoding.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • CLRadeonExtender/trunk/doc/GcnSdwaDpp.md

    r3136 r3138  
    3232Possible part selection for DST_SEL, SRC0_SEL and SRC1_SEL:
    3333
    34 Value | Name        | CLRX names         | Description
     34Value | Name        | Assembler names    | Description
    3535------|-------------|--------------------|----------------------
    3636 0    | SDWA_BYTE_0 | BYTE0, BYTE_0, B0  | Byte 0 of dword
     
    4545Following options:
    4646
    47 Value | Name                 | CLRX name      | Descrption
     47Value | Name                 | Assembler name | Descrption
    4848------|----------------------|----------------|--------------------------------------
    4949 0    | SDWA_UNUSED_PAD      | PAD            | Fill by zeroes
     
    5454(fill bits after part) to source operand while for source operand was not
    5555selected whole dword (SDWA_DWORD not choosen).
     56
     57Examples:
     58```
     59v_xor_b32 v1,v2,v3 dst_sel:byte_1 src0_sel:byte1 src1_sel:word1
     60v_xor_b32 v1,v2,v3 dst_sel:b1 src0_sel:b1 src1_sel:w1
     61v_xor_b32 v1,v2,v3 dst_sel:byte_1 src0_sel:byte1 src1_sel:word1 dst_unused:preserve
     62v_xor_b32 v1,v2,v3 dst_sel:byte_1 src0_sel:byte1 src1_sel:word1 dst_unused:sext
     63v_xor_b32 v1,sext(v2),v3 dst_sel:byte_1 src0_sel:byte1 src1_sel:word1
     64```
    5665
    5766Operation code: 
     
    181190}
    182191```
     192
     193### VOP_DPP
     194
     195The VOP_DPP encoding is enabled by setting 0xfa in VSRC0 field in VOP1/VOP2/VOPC encoding.
     196List of fields:
     197
     198Bits  | Name       | Description
     199------|------------|------------------------------
     2000-7   | SRC0       | First source vector operand
     2018-16  | DPP_CTRL   | Data parallel primitive control
     20219    | BOUND_CTRL | Specifies behaviour when shared data is invalid
     20320    | SRC0_NEG   | Negation modifier for SRC0
     20421    | SRC0_ABS   | Absolute value for SRC0
     20522    | SRC1_NEG   | Negation modifier for SRC1
     20623    | SRC1_ABS   | Absolute value for SRC1
     20724-27 | BANK_MASK  | Bank enable mask
     20828-31 | ROW_MASK   | Row enable mask
     209
     210The operation on wavefronts applied to VSRC0 operand in VOP instruction.
     211The wavefront contains 4 rows (16 threads), and each row contains 4 banks (4 threads).
     212The DPP_CTRL choose which operation will be applied to VSRC0.
     213List of data parallel operations:
     214
     215Value        | Name                 | Modifier | Description
     216-------------|----------------------|----------|-------------------
     2170x00-0xff    | DPP_QUAD_PERM{00:ff} | quad_perm:[A,B,C,D]  | Full permute of 4 threads
     2180x101-0x10f  | DPP_ROW_SL{1:15}     | row_shl:N | Row shift left by N threads
     2190x111-0x11f  | DPP_ROW_SR{1:15}     | row_shr:N | Row shift right by N threads
     2200x121-0x12f  | DPP_ROW_RR{1:15}     | row_ror:N | Row rotate right by N threads
     2210x130        | DPP_WF_SL1           | wave_shl:1 | Wave shift left by 1 thread
     2220x134        | DPP_WF_RL1           | wave_rol:1 | Wave rotate left by 1 thread
     2230x138        | DPP_WF_SR1           | wave_shr:1 | Wave shift right by 1 thread
     2240x13c        | DPP_WF_RR1           | wave_ror:1 | Wave rotate right by 1 thread
     2250x140        | DPP_ROW_MIRROR       | row_mirror | Mirror threads within row
     2260x141        | DPP_ROW_HALF_MIRROR  | row_half_mirror | Mirror threads within half row
     2270x142        | DPP_ROW_BCAST15      | row_bcast:15 | Broadcast 15 thread of each row to next row
     2280x143        | DPP_ROW_BCAST15      | row_bcast:15 | Broadcast 31 thread to row 2 and row 3
     229
     230The BOUND_CTRL flag (modifier `bound_ctrl` or `bound_ctrl:0`) control how to fill invalid
     231threads (for example that last threads after left shifting). Zero value (no modifier)
     232sets invalid threads by original VSRC0 value for particular thread. One value (with modifier)
     233fills invalid threads by 0 thread VSRC0 value.
     234
     235The field BANK_MASK (modifier `bank_mask:value`) choose which banks will be enabled during
     236data parallel operation in each enabled row. The Nth bit represents Nth bank in each row.
     237Disabled bank will be filled by original VSRC0 value for particular thread
     238
     239The field ROW_MASK (modifier `row_mask:value`) choose which rows will be enabled during
     240data parallel operation. The Nth bit represents Nth row.
     241Disabled row will be filled by original VSRC0 value for particular thread.
     242
Note: See TracChangeset for help on using the changeset viewer.