Changeset 3570 in CLRX


Ignore:
Timestamp:
Dec 29, 2017, 8:44:46 AM (12 months ago)
Author:
matszpk
Message:

CLRadeonExtender: CLRXDocs: grammar&typo fixes.

Location:
CLRadeonExtender/trunk
Files:
32 edited

Legend:

Unmodified
Added
Removed
  • CLRadeonExtender/trunk/doc/AmdCl2Abi.md

    r3569 r3570  
    44Kernel setup is AMD HSA configuration, hence we recommend to refer to ROCm-ABI documentation
    55to get information about kernel setup and kernel arguments passing. Now, an assembler have
    6 all the AMD HSA configuration's pseudo-ops to do it.
     6all AMD HSA configuration's pseudo-ops to do it.
    77
    88In this chapter, size is given in dwords. Dword is 4-byte value.
  • CLRadeonExtender/trunk/doc/ClrxAsmAmd.md

    r3452 r3570  
    1919A `.data` section inside kernel is usable section and holds same zeroes.
    2020
    21 ## Layout of the source code
     21## Layout of source code
    2222
    2323The CLRX assembler allow to use one of two ways to configure kernel setup:
     
    2828To used scalar registers, assembler add 2 additional registers for handling VCC.
    2929
    30 ## List of the specific pseudo-operations
     30## List of specific pseudo-operations
    3131
    3232### .arg
  • CLRadeonExtender/trunk/doc/ClrxAsmAmdCl2.md

    r3452 r3570  
    4646```
    4747
    48 ## Layout of the source code
     48## Layout of source code
    4949
    5050The CLRX assembler allow to use one of two ways to configure kernel setup:
     
    5858what extra SGPR extra has been added.
    5959
    60 ## List of the specific pseudo-operations
     60## List of specific pseudo-operations
    6161
    6262### .acl_version
  • CLRadeonExtender/trunk/doc/ClrxAsmGallium.md

    r3561 r3570  
    3333what extra SGPR extra has been added.
    3434
    35 ## List of the specific pseudo-operations
     35## List of specific pseudo-operations
    3636
    3737### .arch_minor
  • CLRadeonExtender/trunk/doc/ClrxAsmPseudoOps.md

    r3452 r3570  
    66A CLRX assembler stores values greater than byte in the little-endian ordering.
    77
    8 ## List of the pseudo-operations
     8## List of pseudo-operations
    99
    1010### .32bit
  • CLRadeonExtender/trunk/doc/ClrxAsmRocm.md

    r3390 r3570  
    3131what extra SGPR extra has been added. The VCC register is included by default.
    3232
    33 ## List of the specific pseudo-operations
     33## List of specific pseudo-operations
    3434
    3535### .arch_minor
  • CLRadeonExtender/trunk/doc/ClrxAsmSyntax.md

    r2992 r3570  
    8383### Scopes
    8484
    85 New feature is the visibility's scopes. The scopes concerns symbols, labels
    86 (except local labels), regvars. The macros, kernels and sections are still global.
     85New feature is the visibility's scopes. Scopes concerns symbols, labels
     86(except local labels), regvars. Macros, kernels and sections are still global.
    8787 At start, the assembler create the global scope, that
    8888is root of next defined scopes. The scope can be opened by using `.scope` pseudo-op and
    8989they can be closed by using `.ends` or `.endscope`. We distinguish scope to two types:
    9090normal and temporary scopes.
    91 The temporary scopes doesn't have name and they exists until first close.
     91Temporary scopes doesn't have name and they exists until first close.
    9292
    9393If scope will be opened, any object in this scope will directly available (by simple name).
     
    9595begins from last 'using' to 'first'.
    9696
    97 The scopes are organized in tree where global scope is root of tree.
     97Scopes are organized in tree where global scope is root of tree.
    9898This feature, allow to nest scopes (even named scopes inside temporary scopes).
    9999During searching object, an assembler begins from
     
    148148```
    149149
    150 The names of the object can have the scope path. Scope path is way to particular scope in
     150Names of the object can have the scope path. Scope path is way to particular scope in
    151151tree. If searching scope should start from global scope, an scope path should be begins
    152 from `::`. The `::` is separator (likes `/` in file system path) for path elements.
     152from `::`. `::` is separator (likes `/` in file system path) for path elements.
    153153
    154154```
     
    167167```
    168168
    169 The setting symbols, labels, if simple name is given (without scope path) always
     169Setting symbols, labels, if simple name is given (without scope path) always
    170170create object in the current scope. Any call of object (even if not defined) always
    171171start searching through scope tree. It is possible to call to symbols
     
    197197program's instructions. Section `.rodata` holds read-only data (mainly constant data)
    198198that can be used by program. Section can be divided by type of the access.
    199 The most sections are writeable (any data can be put into them) and
     199Most sections are writeable (any data can be put into them) and
    200200addressable (we can define symbols inside these sections or move forward).
    201201
     
    230230
    231231For character literals and string literals, escape can be used to put special characters
    232 likes newline, tab. List of the escapes:
     232likes newline, tab. List of escapes:
    233233
    234234Escape   |  Description    | Value
     
    247247 `\HHH..`|Hexadecimal code | Various
    248248
    249 The floating point literals in instruction operands can have the suffix ('l', 'h' or 's').
     249Floating point literals in instruction operands can have the suffix ('l', 'h' or 's').
    250250Suffix 's' indicates that given value is single floating point value.
    251251Suffix 'h' indicates that given value is half floating point value.
     
    254254### Expressions
    255255
    256 The CLRX assembler get this same the operator ordering as in GNU as.
     256CLRX assembler get this same the operator ordering as in GNU as.
    257257CLRX assembler treat any literal or symbol's value as 64-bit integer value.
    258 List of the operators:
     258List of operators:
    259259
    260260Type  | Operator | Order | Description
     
    302302final result of the expression can be represented as place of the code or absolute value
    303303(without refering to any place). An assembler performs this same operations
    304 on the sections during evaluating an expression. Division, modulo,
     304on sections during evaluating an expression. Division, modulo,
    305305binary operations (except negation), logical operations is not legal.
    306306
  • CLRadeonExtender/trunk/doc/ClrxDisasm.md

    r3312 r3570  
    22
    33The CLRadeonExtender provides a disassembler that can disassemble code
    4 for the Radeon GPU's based on the GCN 1.0/1.1/1.2/1.4 (AMD VEGA) architecture.
     4for Radeon GPU's based on the GCN 1.0/1.1/1.2/1.4 (AMD VEGA) architecture.
    55Program is called `clrxdisasm`.
    66
    77Disassembler can handle the AMD Catalyst(tm) OpenCL(tm) kernel binaries and the
    88GalliumCompute kernel binaries. It displays instructions of the code and optionally
    9 structure of the binaries (kernels and their configuration). Output of that program
     9structure of binaries (kernels and their configuration). Output of that program
    1010can be used as input to the CLRX assembler if option '--all' will be used.
    1111
     
    2525* **<-m>**, **--metadata>**
    2626
    27     Print metadata from AMD Catalyst binaries to output. For a AMD Catalyst binaries,
    28 disassembler prints internal metadata. For a GalliumCompute binaries disassembler
     27    Print metadata from AMD Catalyst binaries to output. For AMD Catalyst binaries,
     28disassembler prints internal metadata. For GalliumCompute binaries disassembler
    2929prints argument of the kernel and proginfo entries.
    3030
     
    3333    Print data section from binaries. For AMD Catalyst binaries disassembler prints
    3434global constant data, and '.data' section for particular kernel executables.
    35 For GalliumCompute binaries disassembler prints a global constant data.
     35For GalliumCompute binaries disassembler prints global constant data.
    3636
    3737* **-c**, **--calNotes**
    3838
    39     Print list of the ATI CAL notes and their content from AMD Catalyst binaries to output.
     39    Print list of ATI CAL notes and their content from AMD Catalyst binaries to output.
    4040
    4141* **-C**, **--config**
     
    4646
    4747    Print floating point literals in instructions if instructions accept float point values
    48 and their has a constant literal. Floating point values will be inside comment.
     48and their has constant literal. Floating point values will be inside comment.
    4949
    5050* **-h**, **--hexcode**
     
    6868
    6969    Treat input as raw code. By default, disassembler assumes that input code is for
    70 the GCN1.0 architecture.
     70GCN1.0 architecture.
    7171
    7272* **-g GPUDEVICE**, **--gpuType=GPUDEVICE**
     
    104104* **-?**, **--help**
    105105
    106     Print help and list of the options.
     106    Print help and list of options.
    107107
    108108* **--usage**
     
    116116### Output
    117117
    118 `clrxdisasm` prints a disassembled code to standard output and errors to
     118`clrxdisasm` prints disassembled code to standard output and errors to
    119119standard error output. `clrxdisasm` returns 0 if succeeded, otherwise it returns 1
    120120and prints an error messages to stderr
     
    122122### Sample usage
    123123
    124 Below is sample usage of the `clrxdisasm`:
     124Below is sample usage of `clrxdisasm`:
    125125
    126126```
  • CLRadeonExtender/trunk/doc/GcnInstrsDs.md

    r3515 r3570  
    33These instructions access to local or global data share (LDS/GDS) memory.
    44
    5 List of fields for the DS encoding:
     5List of fields for DS encoding:
    66
    77Bits  | Name     | Description
     
    3434Any operation increments LGKM by one, and decremented by one if it will be finished.
    3535
    36 List of the instructions by opcode (GCN 1.0/1.1):
     36List of instructions by opcode (GCN 1.0/1.1):
    3737
    3838 Opcode     |GCN 1.0|GCN 1.1| Mnemonic
     
    179179 255 (0xff) |       |   ✓   | DS_READ_B128
    180180
    181 List of the instructions by opcode (GCN 1.2/1.4):
     181List of instructions by opcode (GCN 1.2/1.4):
    182182
    183183 Opcode     |GCN 1.2|GCN 1.4| Mnemonic
  • CLRadeonExtender/trunk/doc/GcnInstrsFlat.md

    r3525 r3570  
    55FLAT instructions presents only in GCN 1.1 or later architecture.
    66
    7 List of fields for the FLAT encoding (GCN 1.1/1.2):
     7List of fields for FLAT encoding (GCN 1.1/1.2):
    88
    99Bits  | Name     | Description
     
    232356-63 | VDST     | Vector destination register
    2424
    25 List of fields for the FLAT encoding (GCN 1.4):
     25List of fields for FLAT encoding (GCN 1.4):
    2626
    2727Bits  | Name     | Description
     
    7272### Instructions by opcode
    7373
    74 List of the FLAT instructions by opcode (GCN 1.1/1.2):
     74List of FLAT instructions by opcode (GCN 1.1/1.2):
    7575
    7676 Opcode     | Mnemonic (GCN1.1)      | Mnemonic (GCN1.2)
     
    155155 108 (0x6c) | --                     | FLAT_ATOMIC_DEC_X2
    156156
    157 List of the FLAT/GLOBAL/SCRATCH instructions by opcode (GCN 1.4):
     157List of FLAT/GLOBAL/SCRATCH instructions by opcode (GCN 1.4):
    158158
    159159 Opcode     | FLAT | GLOBAL | SCRATCH | Mnemonic
  • CLRadeonExtender/trunk/doc/GcnInstrsMimg.md

    r3526 r3570  
    44operates on the image resources and on the sampler resources.
    55
    6 List of fields for the MIMG encoding:
     6List of fields for MIMG encoding:
    77
    88Bits  | Name     | Description
     
    3737### Instructions by opcode
    3838
    39 List of the MIMG instructions by opcode (GCN 1.0/1.1):
     39List of MIMG instructions by opcode (GCN 1.0/1.1):
    4040
    4141 Opcode     |GCN 1.0|GCN 1.1| Mnemonic
     
    135135 111 (0x6f) |   ✓   |   ✓   | IMAGE_SAMPLE_C_CD_CL_O
    136136
    137 List of the MIMG instructions by opcode (GCN 1.2/1.4):
     137List of MIMG instructions by opcode (GCN 1.2/1.4):
    138138
    139139 Opcode     | Mnemonic
  • CLRadeonExtender/trunk/doc/GcnInstrsMtbuf.md

    r3523 r3570  
    66The buffer data format are encoding in instructions (instructions are typed).
    77
    8 List of fields for the MTBUF encoding (GCN 1.0/1.1):
     8List of fields for MTBUF encoding (GCN 1.0/1.1):
    99
    1010Bits  | Name     | Description
     
    262656-63 | SOFFSET  | Scalar base offset operand
    2727
    28 List of fields for the MTBUF encoding (GCN 1.2/1.4):
     28List of fields for MTBUF encoding (GCN 1.2/1.4):
    2929
    3030Bits  | Name     | Description
     
    5858### Instructions by opcode
    5959
    60 List of the MTBUF instructions by opcode:
     60List of MTBUF instructions by opcode:
    6161
    6262 Opcode     |GCN 1.0|GCN 1.1|GCN 1.2| Mnemonic
  • CLRadeonExtender/trunk/doc/GcnInstrsMubuf.md

    r3523 r3570  
    88instruction's format).
    99
    10 List of fields for the MUBUF encoding (GCN 1.0/1.1):
     10List of fields for MUBUF encoding (GCN 1.0/1.1):
    1111
    1212Bits  | Name     | Description
     
    272756-63 | SOFFSET  | Scalar base offset operand
    2828
    29 List of fields for the MUBUF encoding (GCN 1.2/1.4):
     29List of fields for MUBUF encoding (GCN 1.2/1.4):
    3030
    3131Bits  | Name     | Description
     
    5858### Instructions by opcode
    5959
    60 List of the MUBUF instructions by opcode (GCN 1.0/1.1):
     60List of MUBUF instructions by opcode (GCN 1.0/1.1):
    6161
    6262 Opcode     |GCN 1.0|GCN 1.1| Mnemonic
     
    121121 113 (0x71) |   ✓   |   ✓   | BUFFER_WBINVL1
    122122
    123 List of the MUBUF instructions by opcode (GCN 1.2/1.4):
     123List of MUBUF instructions by opcode (GCN 1.2/1.4):
    124124
    125125 Opcode     |GCN 1.2|GCN 1.4| Mnemonic
  • CLRadeonExtender/trunk/doc/GcnInstrsSmem.md

    r3482 r3570  
    11## GCN ISA SMEM instructions (GCN 1.2/1.4)
    22
    3 The encoding of the SMEM instructions needs 8 bytes (2 dwords). List of fields:
     3The encoding of SMEM instructions needs 8 bytes (2 dwords). List of fields:
    44
    55Bits  | Name     | Description
     
    5252is required least one instruction (vector or scalar) due to delay.
    5353
    54 List of the instructions by opcode:
     54List of instructions by opcode:
    5555
    5656 Opcode     |GCN 1.2|GCN 1.4| Mnemonic (GCN1.2/1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsSmrd.md

    r3479 r3570  
    11## GCN ISA SMRD instructions
    22
    3 The basic encoding of the SMRD instructions needs 4 bytes (dword). List of fields:
     3The basic encoding of SMRD instructions needs 4 bytes (dword). List of fields:
    44
    55Bits  | Name     | Description
     
    3131is required least one instruction (vector or scalar) due to delay.
    3232
    33 List of the instructions by opcode:
     33List of instructions by opcode:
    3434
    3535 Opcode     | Mnemonic (GCN1.0)        | Mnemonic (GCN1.1)
  • CLRadeonExtender/trunk/doc/GcnInstrsSop1.md

    r3469 r3570  
    11## GCN ISA SOP1 instructions
    22
    3 The basic encoding of the SOP1 instructions needs 4 bytes (dword). List of fields:
     3The basic encoding of SOP1 instructions needs 4 bytes (dword). List of fields:
    44
    55Bits  | Name     | Description
     
    1414Example: s_mov_b32 s0, s1
    1515
    16 List of the instructions by opcode:
     16List of instructions by opcode:
    1717
    1818 Opcode     | Mnemonic (GCN1.0/1.1) | Mnemonic (GCN 1.2)  | Mnemonic (GCN 1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsSop2.md

    r3469 r3570  
    33### Encoding
    44
    5 The basic encoding of the SOP2 instructions needs 4 bytes (dword). List of fields:
     5The basic encoding of SOP2 instructions needs 4 bytes (dword). List of fields:
    66
    77Bits  | Name     | Description
     
    1717Example: s_and_b32 s0, s1, s2
    1818
    19 List of the instructions by opcode:
     19List of instructions by opcode:
    2020
    2121 Opcode     | Mnemonic (GCN1.0/1.1) | Mnemonic (GCN 1.2)  | Mnemonic (GCN 1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsSopc.md

    r3540 r3570  
    11## GCN ISA SOPC instructions
    22
    3 The basic encoding of the SOPC instructions needs 4 bytes (dword). List of fields:
     3The basic encoding of SOPC instructions needs 4 bytes (dword). List of fields:
    44
    55Bits  | Name     | Description
     
    1414Example: s_cmp_eq_i32 s0, s1
    1515
    16 List of the instructions by opcode:
     16List of instructions by opcode:
    1717
    1818 Opcode     | Mnemonic (GCN1.0/1.1) | Mnemonic (GCN 1.2/1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsSopk.md

    r3469 r3570  
    11## GCN ISA SOPK instructions
    22
    3 The basic encoding of the SOPK instructions needs 4 bytes (dword). List of fields:
     3The basic encoding of SOPK instructions needs 4 bytes (dword). List of fields:
    44
    55Bits  | Name     | Description
     
    1616RELADDR = NEXTPC + SIMM16, NEXTPC - PC for next instruction.
    1717
    18 List of the instructions by opcode:
     18List of instructions by opcode:
    1919
    2020 Opcode     | Mnemonic (GCN1.0/1.1) | Mnemonic (GCN 1.2) | Mnemonic (GCN 1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsSopp.md

    r3540 r3570  
    11## GCN ISA SOPP instructions
    22
    3 The basic encoding of the SOPP instructions needs 4 bytes (dword). List of fields:
     3The basic encoding of SOPP instructions needs 4 bytes (dword). List of fields:
    44
    55Bits  | Name     | Description
     
    1515RELADDR = NEXTPC + SIMM16, NEXTPC - PC for next instruction.
    1616
    17 List of the instructions by opcode:
     17List of instructions by opcode:
    1818
    1919 Opcode     |GCN 1.0|GCN 1.1|GCN 1.2|GCN 1.4| Mnemonic
  • CLRadeonExtender/trunk/doc/GcnInstrsVintrp.md

    r3541 r3570  
    2525Attribute and attribute channel are case-insensitive.
    2626
    27 List of the instructions by opcode:
     27List of instructions by opcode:
    2828
    2929 Opcode      | Mnemonic
     
    4040LDS (local memory).
    4141
    42 Initial configuration is specified in the M0 registers in form:
     42Initial configuration is specified in M0 registers in form:
    4343
    4444* 0-15 bits - local memory offset
  • CLRadeonExtender/trunk/doc/GcnInstrsVop1.md

    r3569 r3570  
    9191320-447 for GCN 1.2.
    9292
    93 List of the instructions by opcode (GCN 1.0/1.1):
     93List of instructions by opcode (GCN 1.0/1.1):
    9494
    9595 Opcode     | Opcode(VOP3)|GCN 1.0|GCN 1.1| Mnemonic
     
    162162 70 (0x46)  | 454 (0x1c6) |       |   ✓   | V_EXP_LEGACY_F32
    163163
    164 List of the instructions by opcode (GCN 1.2/1.4):
     164List of instructions by opcode (GCN 1.2/1.4):
    165165
    166166 Opcode     | Opcode(VOP3)| Mnemonic (GCN 1.2)  | Mnemonic (GCN 1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsVop2.md

    r3501 r3570  
    11## GCN ISA VOP2/VOP3 instructions
    22
    3 VOP2 instructions can be encoded in the VOP2 encoding and the VOP3A/VOP3B encoding.
     3VOP2 instructions can be encoded in VOP2 encoding and the VOP3A/VOP3B encoding.
    44List of fields for VOP2 encoding:
    55
     
    9090
    9191VOP2 opcodes (0-63) are reflected in VOP3 in range: 256-319.
    92 List of the instructions by opcode:
     92List of instructions by opcode:
    9393
    9494 Opcode     | Opcode(VOP3)| Mnemonic (GCN1.0/1.1) | Mnemonic (GCN 1.2)
     
    147147 51 (0x33)  | 307 (0x133) | --                   | V_LDEXP_F16
    148148
    149 List of the instructions by opcode (GCN 1.4):
     149List of instructions by opcode (GCN 1.4):
    150150
    151151 Opcode     | Opcode(VOP3)| Mnemonic (GCN 1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsVop3.md

    r3542 r3570  
    8282Unaligned pairs of SGPRs are allowed in source operands.
    8383
    84 List of the instructions by opcode (GCN 1.0/1.1):
     84List of instructions by opcode (GCN 1.0/1.1):
    8585
    8686 Opcode      | Mnemonic (GCN 1.0) | Mnemonic (GCN 1.1)
     
    143143 375 (0x177) | --                 | V_MAD_I64_I32 (VOP3B)
    144144
    145 List of the instructions by opcode (GCN 1.2/1.4):
     145List of instructions by opcode (GCN 1.2/1.4):
    146146
    147147 Opcode      | Mnemonic (GCN 1.2)      | Mnemonic (GCN 1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsVop3p.md

    r3509 r3570  
    5151* only SRC0 can holds LDS_DIRECT
    5252
    53 List of the instructions by opcode:
     53List of instructions by opcode:
    5454
    5555 Opcode    | Mnemonic
  • CLRadeonExtender/trunk/doc/GcnInstrsVopc.md

    r3545 r3570  
    8080VOPC opcodes (0-255) and VOP3 opcodes are same.
    8181
    82 NOTE: Bits for inactive threads in the 64-bit scalar destination
     82NOTE: Bits for inactive threads in 64-bit scalar destination
    8383(VCC or pair of SGPRs) are always zeroed.
    8484
  • CLRadeonExtender/trunk/doc/GcnMemHandling.md

    r3533 r3570  
    405405### Image addressing
    406406
    407 The main addressing rules for the images are defined by the tiling registers.
     407The main addressing rules for images are defined by tiling registers.
    408408The TILINGINDEX choose what register control addressing of image. Index 8 (by default)
    409 choose the linear access. In the most cases images are splitted into the tiles which
     409choose the linear access. In the most cases images are splitted into tiles which
    410410organizes image's data in efficient manner for GPU memory subsystem. Unfortunatelly,
    411 the fields of a tiling registers and their meanigful are not known (for me).
     411fields of tiling registers and their meanigful are not known (for me).
    412412
    413413The address of image's pixel is stored in VADDR registers. Number of used registers and
     
    452452* lod - for IMAGE_*_L - LOD
    453453
    454 The LOD (Level of details) parameter choose MIPMAP: just a LOD reflects mipmap index.
     454The LOD (Level of details) parameter choose MIPMAP: just the LOD reflects mipmap index.
    455455By default, LOD are calculated as maximum value of image's MIN_LOD and sampler's MIN_LOD.
    456456The linear MIP filtering get value from two nearest mipmaps to choosen LOD.
     
    459459between pixels.
    460460
    461 The sampling of the mipmaps requires normalized coordinates.
     461The sampling of mipmaps requires normalized coordinates.
    462462
    463463### Flat addressing
  • CLRadeonExtender/trunk/doc/GcnOperands.md

    r3569 r3570  
    8484
    8585In instruction syntax, operands are listed by name of the encoding field. Optionally, in
    86 parentheses is given number of the registers. The ranges of number of a registers are in
     86parentheses is given number of the registers. The ranges of number of registers are in
    8787form 'START:LAST'. Example:
    8888
  • CLRadeonExtender/trunk/doc/GcnState.md

    r3165 r3570  
    3838
    3939The user data registers hold execution setup (global offset, pointers, arguments pointers,
    40 the same arguments). User data can allow to pass any constant data to kernel from host.
     40same arguments). User data can allow to pass any constant data to kernel from host.
    4141The register 1-5 bits of PGM_RSRC2 indicates how many first scalar registers hold user data.
    4242Further scalar registers store group id and it are different for every wavefront.
  • CLRadeonExtender/trunk/doc/GcnTimings.md

    r3546 r3570  
    5454* only the first 3 dwords in the 32-byte block incur no penalty. Any 2-dword
    5555instruction outside these first 3 dwords adds a single penalty.
    56 * if the instructions is longer (more than four cycles) then the last cycles/4 dwords are free
     56* if instructions is longer (more than four cycles) then last cycles/4 dwords are free
    5757* if 16 or more cycle 2-dword instruction and 2-dword instruction in 4 dword, then there is
    58 no penalty for the second 2-dword instruction.
    59 * best place to jump is the 5 first dwords in the 32-byte block. Jump to rest of the dwords causes
    60 1-3 penalties, depending on number of dwords (N-4, where N is the dword number). This rule
     58no penalty for second 2-dword instruction.
     59* best place to jump is 5 first dwords in 32-byte block. Jump to rest of dwords causes
     601-3 penalties, depending on number of dwords (N-4, where N is a dword number). This rule
    6161does not apply to backward jumps (???)
    62 * any conditional jump instruction should be in first half of the 32-byte block, otherwise
     62* any conditional jump instruction should be in first half of 32-byte block, otherwise
    63631-4 penalties are added if jump is not taken, depending on dword number (N-3, where N is dword number).
    6464
    65 IMPORTANT: If the occupancy is greater than 1 wave per compute unit, then the penalties,
     65IMPORTANT: If the occupancy is greater than 1 wave per compute unit, then penalties,
    6666branches, and scalar instructions will be masked while executing
    6767more waves than 4\*CUs. For best results is recommended to execute many waves
     
    402402
    403403About bank conflicts: The LDS memory is partitioned in 32 banks. The bank number is in
    404 bits 2-6 of the address. A bank conflict occurs when two addresses hit the same
    405 bank, but the addresses are different starting from the 7bit
     404bits 2-6 of the address. A bank conflict occurs when two addresses hit same
     405bank, but addresses are different starting from 7bit
    406406(the first 2 bits of the address doesn't matter).
    407407Any bank conflict adds penalty to timing and throughput. In the worst case, the throughput
  • CLRadeonExtender/trunk/programs/clrxasm.pod

    r3401 r3570  
    1717
    1818This is CLRadeonExtender assembler. This assembler can assemble code for all Radeon GPU's
    19 that based on the GCN1.0/1.1/1.2 architecture and it can generate AMD Catalyst
    20 OpenCL binaries and the GalliumCompute OpenCL binaries. It is compatible with GNU assembler
     19that based on GCN1.0/1.1/1.2 architecture and it can generate AMD Catalyst
     20OpenCL binaries and GalliumCompute OpenCL binaries. It is compatible with GNU assembler
    2121and support the almost GNU assembler's pseudo-operations (directives) including macros and
    2222repetitions.
     
    5353=item B<-6>, B<--64bit>
    5454
    55 Enable generating of the 64-bit binaries (only for AMD catalyst format).
     55Enable generating of 64-bit binaries (only for AMD catalyst format).
    5656
    5757=item B<-g GPUDEVICE>, B<--gpuType=GPUDEVICE>
     
    109109=item B<-?>, B<--help>
    110110
    111 Print help and list of the options.
     111Print help and list of options.
    112112
    113113=item B<--usage>
  • CLRadeonExtender/trunk/programs/clrxdisasm.pod

    r3312 r3570  
    1414=head1 DESCRIPTION
    1515
    16 This is CLRadeonExtender utility to disassemble the Radeon GPU code.
     16This is CLRadeonExtender utility to disassemble Radeon GPU code.
    1717This disassembler can disassemble code for GCN 1.0/1.1/1.2/1.4 (AMD VEGA)
    18 architectures, but not for the VLIW architecture.
     18architectures, but not for VLIW architecture.
    1919
    20 Disassembler can handle the AMD Catalyst(tm) OpenCL(tm) kernel binaries and the
     20Disassembler can handle AMD Catalyst(tm) OpenCL(tm) kernel binaries and the
    2121GalliumCompute kernel binaries. It displays instructions of the code and optionally
    22 structure of the binaries (kernels and their configuration). Output of that program
     22structure of binaries (kernels and their configuration). Output of that program
    2323can be used as input to the CLRX assembler if option '--all' will be used.
    2424
     
    3131=item B<-m>, B<--metadata>
    3232
    33 Print metadata from AMD Catalyst binaries to output. For a AMD Catalyst binaries,
    34 disassembler prints internal metadata. For a GalliumCompute binaries disassembler
     33Print metadata from AMD Catalyst binaries to output. For AMD Catalyst binaries,
     34disassembler prints internal metadata. For GalliumCompute binaries disassembler
    3535prints argument of the kernel and proginfo entries.
    3636
     
    3939Print data section from binaries. For AMD Catalyst binaries disassembler prints
    4040global constant data, and '.data' section for particular kernel executables.
    41 For GalliumCompute binaries disassembler prints a global constant data.
     41For GalliumCompute binaries disassembler prints global constant data.
    4242
    4343=item B<-c>, B<--calNotes>
     
    6060
    6161Print floating point literals in instructions if instructions accept float point values
    62 and their has a constant literal. Floating point values will be inside comment.
     62and their has constant literal. Floating point values will be inside comment.
    6363
    6464=item B<-h>, B<--hexcode>
     
    108108=item B<-?>, B<--help>
    109109
    110 Print help and list of the options.
     110Print help and list of options.
    111111
    112112=item B<--usage>
Note: See TracChangeset for help on using the changeset viewer.