Changeset 3572 in CLRX


Ignore:
Timestamp:
Dec 29, 2017, 1:16:44 PM (12 months ago)
Author:
matszpk
Message:

CLRadeonExtender: Revert last changes in CLRXDocs.

Location:
CLRadeonExtender/trunk/doc
Files:
31 edited

Legend:

Unmodified
Added
Removed
  • CLRadeonExtender/trunk/doc/AmdAbi.md

    r3569 r3572  
    6363Second const buffer (id=1) holds arguments aligned to 4 dwords.
    6464
    65 Global pointers holds vector offset (64-bit for 64-bit binary) to memory.
     65Global pointers holds vector offset (64-bit for 64-bit binary) to the memory.
    6666Local pointers holds its offset in bytes (1 dword).
    6767
  • CLRadeonExtender/trunk/doc/AmdCl2Abi.md

    r3570 r3572  
    33This chapter describes how kernel gets its argument, how access to constant data. Because
    44Kernel setup is AMD HSA configuration, hence we recommend to refer to ROCm-ABI documentation
    5 to get information about kernel setup and kernel arguments passing. Now, an assembler have
    6 all AMD HSA configuration's pseudo-ops to do it.
     5to get information about kernel setup and kernel arguments passing. Now an assembler have
     6all the AMD HSA configuration's pseudo-ops to do it.
    77
    88In this chapter, size is given in dwords. Dword is 4-byte value.
  • CLRadeonExtender/trunk/doc/ClrxAsmAmd.md

    r3570 r3572  
    1919A `.data` section inside kernel is usable section and holds same zeroes.
    2020
    21 ## Layout of source code
     21## Layout of the source code
    2222
    2323The CLRX assembler allow to use one of two ways to configure kernel setup:
     
    2828To used scalar registers, assembler add 2 additional registers for handling VCC.
    2929
    30 ## List of specific pseudo-operations
     30## List of the specific pseudo-operations
    3131
    3232### .arg
  • CLRadeonExtender/trunk/doc/ClrxAsmAmdCl2.md

    r3570 r3572  
    4646```
    4747
    48 ## Layout of source code
     48## Layout of the source code
    4949
    5050The CLRX assembler allow to use one of two ways to configure kernel setup:
     
    5858what extra SGPR extra has been added.
    5959
    60 ## List of specific pseudo-operations
     60## List of the specific pseudo-operations
    6161
    6262### .acl_version
  • CLRadeonExtender/trunk/doc/ClrxAsmGallium.md

    r3570 r3572  
    3333what extra SGPR extra has been added.
    3434
    35 ## List of specific pseudo-operations
     35## List of the specific pseudo-operations
    3636
    3737### .arch_minor
  • CLRadeonExtender/trunk/doc/ClrxAsmPseudoOps.md

    r3570 r3572  
    66A CLRX assembler stores values greater than byte in the little-endian ordering.
    77
    8 ## List of pseudo-operations
     8## List of the pseudo-operations
    99
    1010### .32bit
  • CLRadeonExtender/trunk/doc/ClrxAsmRocm.md

    r3570 r3572  
    3131what extra SGPR extra has been added. The VCC register is included by default.
    3232
    33 ## List of specific pseudo-operations
     33## List of the specific pseudo-operations
    3434
    3535### .arch_minor
  • CLRadeonExtender/trunk/doc/ClrxAsmSyntax.md

    r3570 r3572  
    8383### Scopes
    8484
    85 New feature is the visibility's scopes. Scopes concerns symbols, labels
    86 (except local labels), regvars. Macros, kernels and sections are still global.
     85New feature is the visibility's scopes. The scopes concerns symbols, labels
     86(except local labels), regvars. The macros, kernels and sections are still global.
    8787 At start, the assembler create the global scope, that
    8888is root of next defined scopes. The scope can be opened by using `.scope` pseudo-op and
    8989they can be closed by using `.ends` or `.endscope`. We distinguish scope to two types:
    9090normal and temporary scopes.
    91 Temporary scopes doesn't have name and they exists until first close.
     91The temporary scopes doesn't have name and they exists until first close.
    9292
    9393If scope will be opened, any object in this scope will directly available (by simple name).
     
    9595begins from last 'using' to 'first'.
    9696
    97 Scopes are organized in tree where global scope is root of tree.
     97The scopes are organized in tree where global scope is root of tree.
    9898This feature, allow to nest scopes (even named scopes inside temporary scopes).
    9999During searching object, an assembler begins from
     
    148148```
    149149
    150 Names of the object can have the scope path. Scope path is way to particular scope in
     150The names of the object can have the scope path. Scope path is way to particular scope in
    151151tree. If searching scope should start from global scope, an scope path should be begins
    152 from `::`. `::` is separator (likes `/` in file system path) for path elements.
     152from `::`. The `::` is separator (likes `/` in file system path) for path elements.
    153153
    154154```
     
    167167```
    168168
    169 Setting symbols, labels, if simple name is given (without scope path) always
     169The setting symbols, labels, if simple name is given (without scope path) always
    170170create object in the current scope. Any call of object (even if not defined) always
    171171start searching through scope tree. It is possible to call to symbols
     
    197197program's instructions. Section `.rodata` holds read-only data (mainly constant data)
    198198that can be used by program. Section can be divided by type of the access.
    199 Most sections are writeable (any data can be put into them) and
     199The most sections are writeable (any data can be put into them) and
    200200addressable (we can define symbols inside these sections or move forward).
    201201
     
    230230
    231231For character literals and string literals, escape can be used to put special characters
    232 likes newline, tab. List of escapes:
     232likes newline, tab. List of the escapes:
    233233
    234234Escape   |  Description    | Value
     
    247247 `\HHH..`|Hexadecimal code | Various
    248248
    249 Floating point literals in instruction operands can have the suffix ('l', 'h' or 's').
     249The floating point literals in instruction operands can have the suffix ('l', 'h' or 's').
    250250Suffix 's' indicates that given value is single floating point value.
    251251Suffix 'h' indicates that given value is half floating point value.
     
    254254### Expressions
    255255
    256 CLRX assembler get this same the operator ordering as in GNU as.
     256The CLRX assembler get this same the operator ordering as in GNU as.
    257257CLRX assembler treat any literal or symbol's value as 64-bit integer value.
    258 List of operators:
     258List of the operators:
    259259
    260260Type  | Operator | Order | Description
     
    302302final result of the expression can be represented as place of the code or absolute value
    303303(without refering to any place). An assembler performs this same operations
    304 on sections during evaluating an expression. Division, modulo,
     304on the sections during evaluating an expression. Division, modulo,
    305305binary operations (except negation), logical operations is not legal.
    306306
  • CLRadeonExtender/trunk/doc/ClrxDisasm.md

    r3570 r3572  
    22
    33The CLRadeonExtender provides a disassembler that can disassemble code
    4 for Radeon GPU's based on the GCN 1.0/1.1/1.2/1.4 (AMD VEGA) architecture.
     4for the Radeon GPU's based on the GCN 1.0/1.1/1.2/1.4 (AMD VEGA) architecture.
    55Program is called `clrxdisasm`.
    66
    77Disassembler can handle the AMD Catalyst(tm) OpenCL(tm) kernel binaries and the
    88GalliumCompute kernel binaries. It displays instructions of the code and optionally
    9 structure of binaries (kernels and their configuration). Output of that program
     9structure of the binaries (kernels and their configuration). Output of that program
    1010can be used as input to the CLRX assembler if option '--all' will be used.
    1111
     
    2525* **<-m>**, **--metadata>**
    2626
    27     Print metadata from AMD Catalyst binaries to output. For AMD Catalyst binaries,
    28 disassembler prints internal metadata. For GalliumCompute binaries disassembler
     27    Print metadata from AMD Catalyst binaries to output. For a AMD Catalyst binaries,
     28disassembler prints internal metadata. For a GalliumCompute binaries disassembler
    2929prints argument of the kernel and proginfo entries.
    3030
     
    3333    Print data section from binaries. For AMD Catalyst binaries disassembler prints
    3434global constant data, and '.data' section for particular kernel executables.
    35 For GalliumCompute binaries disassembler prints global constant data.
     35For GalliumCompute binaries disassembler prints a global constant data.
    3636
    3737* **-c**, **--calNotes**
    3838
    39     Print list of ATI CAL notes and their content from AMD Catalyst binaries to output.
     39    Print list of the ATI CAL notes and their content from AMD Catalyst binaries to output.
    4040
    4141* **-C**, **--config**
     
    4646
    4747    Print floating point literals in instructions if instructions accept float point values
    48 and their has constant literal. Floating point values will be inside comment.
     48and their has a constant literal. Floating point values will be inside comment.
    4949
    5050* **-h**, **--hexcode**
     
    6868
    6969    Treat input as raw code. By default, disassembler assumes that input code is for
    70 GCN1.0 architecture.
     70the GCN1.0 architecture.
    7171
    7272* **-g GPUDEVICE**, **--gpuType=GPUDEVICE**
     
    104104* **-?**, **--help**
    105105
    106     Print help and list of options.
     106    Print help and list of the options.
    107107
    108108* **--usage**
     
    116116### Output
    117117
    118 `clrxdisasm` prints disassembled code to standard output and errors to
     118`clrxdisasm` prints a disassembled code to standard output and errors to
    119119standard error output. `clrxdisasm` returns 0 if succeeded, otherwise it returns 1
    120120and prints an error messages to stderr
     
    122122### Sample usage
    123123
    124 Below is sample usage of `clrxdisasm`:
     124Below is sample usage of the `clrxdisasm`:
    125125
    126126```
  • CLRadeonExtender/trunk/doc/GcnInstrsDs.md

    r3570 r3572  
    33These instructions access to local or global data share (LDS/GDS) memory.
    44
    5 List of fields for DS encoding:
     5List of fields for the DS encoding:
    66
    77Bits  | Name     | Description
     
    3434Any operation increments LGKM by one, and decremented by one if it will be finished.
    3535
    36 List of instructions by opcode (GCN 1.0/1.1):
     36List of the instructions by opcode (GCN 1.0/1.1):
    3737
    3838 Opcode     |GCN 1.0|GCN 1.1| Mnemonic
     
    179179 255 (0xff) |       |   ✓   | DS_READ_B128
    180180
    181 List of instructions by opcode (GCN 1.2/1.4):
     181List of the instructions by opcode (GCN 1.2/1.4):
    182182
    183183 Opcode     |GCN 1.2|GCN 1.4| Mnemonic
  • CLRadeonExtender/trunk/doc/GcnInstrsFlat.md

    r3570 r3572  
    55FLAT instructions presents only in GCN 1.1 or later architecture.
    66
    7 List of fields for FLAT encoding (GCN 1.1/1.2):
     7List of fields for the FLAT encoding (GCN 1.1/1.2):
    88
    99Bits  | Name     | Description
     
    232356-63 | VDST     | Vector destination register
    2424
    25 List of fields for FLAT encoding (GCN 1.4):
     25List of fields for the FLAT encoding (GCN 1.4):
    2626
    2727Bits  | Name     | Description
     
    7272### Instructions by opcode
    7373
    74 List of FLAT instructions by opcode (GCN 1.1/1.2):
     74List of the FLAT instructions by opcode (GCN 1.1/1.2):
    7575
    7676 Opcode     | Mnemonic (GCN1.1)      | Mnemonic (GCN1.2)
     
    155155 108 (0x6c) | --                     | FLAT_ATOMIC_DEC_X2
    156156
    157 List of FLAT/GLOBAL/SCRATCH instructions by opcode (GCN 1.4):
     157List of the FLAT/GLOBAL/SCRATCH instructions by opcode (GCN 1.4):
    158158
    159159 Opcode     | FLAT | GLOBAL | SCRATCH | Mnemonic
  • CLRadeonExtender/trunk/doc/GcnInstrsMimg.md

    r3570 r3572  
    44operates on the image resources and on the sampler resources.
    55
    6 List of fields for MIMG encoding:
     6List of fields for the MIMG encoding:
    77
    88Bits  | Name     | Description
     
    3737### Instructions by opcode
    3838
    39 List of MIMG instructions by opcode (GCN 1.0/1.1):
     39List of the MIMG instructions by opcode (GCN 1.0/1.1):
    4040
    4141 Opcode     |GCN 1.0|GCN 1.1| Mnemonic
     
    135135 111 (0x6f) |   ✓   |   ✓   | IMAGE_SAMPLE_C_CD_CL_O
    136136
    137 List of MIMG instructions by opcode (GCN 1.2/1.4):
     137List of the MIMG instructions by opcode (GCN 1.2/1.4):
    138138
    139139 Opcode     | Mnemonic
  • CLRadeonExtender/trunk/doc/GcnInstrsMtbuf.md

    r3570 r3572  
    66The buffer data format are encoding in instructions (instructions are typed).
    77
    8 List of fields for MTBUF encoding (GCN 1.0/1.1):
     8List of fields for the MTBUF encoding (GCN 1.0/1.1):
    99
    1010Bits  | Name     | Description
     
    262656-63 | SOFFSET  | Scalar base offset operand
    2727
    28 List of fields for MTBUF encoding (GCN 1.2/1.4):
     28List of fields for the MTBUF encoding (GCN 1.2/1.4):
    2929
    3030Bits  | Name     | Description
     
    5858### Instructions by opcode
    5959
    60 List of MTBUF instructions by opcode:
     60List of the MTBUF instructions by opcode:
    6161
    6262 Opcode     |GCN 1.0|GCN 1.1|GCN 1.2| Mnemonic
  • CLRadeonExtender/trunk/doc/GcnInstrsMubuf.md

    r3570 r3572  
    88instruction's format).
    99
    10 List of fields for MUBUF encoding (GCN 1.0/1.1):
     10List of fields for the MUBUF encoding (GCN 1.0/1.1):
    1111
    1212Bits  | Name     | Description
     
    272756-63 | SOFFSET  | Scalar base offset operand
    2828
    29 List of fields for MUBUF encoding (GCN 1.2/1.4):
     29List of fields for the MUBUF encoding (GCN 1.2/1.4):
    3030
    3131Bits  | Name     | Description
     
    5858### Instructions by opcode
    5959
    60 List of MUBUF instructions by opcode (GCN 1.0/1.1):
     60List of the MUBUF instructions by opcode (GCN 1.0/1.1):
    6161
    6262 Opcode     |GCN 1.0|GCN 1.1| Mnemonic
     
    121121 113 (0x71) |   ✓   |   ✓   | BUFFER_WBINVL1
    122122
    123 List of MUBUF instructions by opcode (GCN 1.2/1.4):
     123List of the MUBUF instructions by opcode (GCN 1.2/1.4):
    124124
    125125 Opcode     |GCN 1.2|GCN 1.4| Mnemonic
  • CLRadeonExtender/trunk/doc/GcnInstrsSmem.md

    r3570 r3572  
    11## GCN ISA SMEM instructions (GCN 1.2/1.4)
    22
    3 The encoding of SMEM instructions needs 8 bytes (2 dwords). List of fields:
     3The encoding of the SMEM instructions needs 8 bytes (2 dwords). List of fields:
    44
    55Bits  | Name     | Description
     
    5252is required least one instruction (vector or scalar) due to delay.
    5353
    54 List of instructions by opcode:
     54List of the instructions by opcode:
    5555
    5656 Opcode     |GCN 1.2|GCN 1.4| Mnemonic (GCN1.2/1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsSmrd.md

    r3570 r3572  
    11## GCN ISA SMRD instructions
    22
    3 The basic encoding of SMRD instructions needs 4 bytes (dword). List of fields:
     3The basic encoding of the SMRD instructions needs 4 bytes (dword). List of fields:
    44
    55Bits  | Name     | Description
     
    3131is required least one instruction (vector or scalar) due to delay.
    3232
    33 List of instructions by opcode:
     33List of the instructions by opcode:
    3434
    3535 Opcode     | Mnemonic (GCN1.0)        | Mnemonic (GCN1.1)
  • CLRadeonExtender/trunk/doc/GcnInstrsSop1.md

    r3570 r3572  
    11## GCN ISA SOP1 instructions
    22
    3 The basic encoding of SOP1 instructions needs 4 bytes (dword). List of fields:
     3The basic encoding of the SOP1 instructions needs 4 bytes (dword). List of fields:
    44
    55Bits  | Name     | Description
     
    1414Example: s_mov_b32 s0, s1
    1515
    16 List of instructions by opcode:
     16List of the instructions by opcode:
    1717
    1818 Opcode     | Mnemonic (GCN1.0/1.1) | Mnemonic (GCN 1.2)  | Mnemonic (GCN 1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsSop2.md

    r3570 r3572  
    33### Encoding
    44
    5 The basic encoding of SOP2 instructions needs 4 bytes (dword). List of fields:
     5The basic encoding of the SOP2 instructions needs 4 bytes (dword). List of fields:
    66
    77Bits  | Name     | Description
     
    1717Example: s_and_b32 s0, s1, s2
    1818
    19 List of instructions by opcode:
     19List of the instructions by opcode:
    2020
    2121 Opcode     | Mnemonic (GCN1.0/1.1) | Mnemonic (GCN 1.2)  | Mnemonic (GCN 1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsSopc.md

    r3570 r3572  
    11## GCN ISA SOPC instructions
    22
    3 The basic encoding of SOPC instructions needs 4 bytes (dword). List of fields:
     3The basic encoding of the SOPC instructions needs 4 bytes (dword). List of fields:
    44
    55Bits  | Name     | Description
     
    1414Example: s_cmp_eq_i32 s0, s1
    1515
    16 List of instructions by opcode:
     16List of the instructions by opcode:
    1717
    1818 Opcode     | Mnemonic (GCN1.0/1.1) | Mnemonic (GCN 1.2/1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsSopk.md

    r3570 r3572  
    11## GCN ISA SOPK instructions
    22
    3 The basic encoding of SOPK instructions needs 4 bytes (dword). List of fields:
     3The basic encoding of the SOPK instructions needs 4 bytes (dword). List of fields:
    44
    55Bits  | Name     | Description
     
    1616RELADDR = NEXTPC + SIMM16, NEXTPC - PC for next instruction.
    1717
    18 List of instructions by opcode:
     18List of the instructions by opcode:
    1919
    2020 Opcode     | Mnemonic (GCN1.0/1.1) | Mnemonic (GCN 1.2) | Mnemonic (GCN 1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsSopp.md

    r3570 r3572  
    11## GCN ISA SOPP instructions
    22
    3 The basic encoding of SOPP instructions needs 4 bytes (dword). List of fields:
     3The basic encoding of the SOPP instructions needs 4 bytes (dword). List of fields:
    44
    55Bits  | Name     | Description
     
    1515RELADDR = NEXTPC + SIMM16, NEXTPC - PC for next instruction.
    1616
    17 List of instructions by opcode:
     17List of the instructions by opcode:
    1818
    1919 Opcode     |GCN 1.0|GCN 1.1|GCN 1.2|GCN 1.4| Mnemonic
  • CLRadeonExtender/trunk/doc/GcnInstrsVintrp.md

    r3570 r3572  
    2525Attribute and attribute channel are case-insensitive.
    2626
    27 List of instructions by opcode:
     27List of the instructions by opcode:
    2828
    2929 Opcode      | Mnemonic
     
    4040LDS (local memory).
    4141
    42 Initial configuration is specified in M0 registers in form:
     42Initial configuration is specified in the M0 registers in form:
    4343
    4444* 0-15 bits - local memory offset
  • CLRadeonExtender/trunk/doc/GcnInstrsVop1.md

    r3570 r3572  
    9191320-447 for GCN 1.2.
    9292
    93 List of instructions by opcode (GCN 1.0/1.1):
     93List of the instructions by opcode (GCN 1.0/1.1):
    9494
    9595 Opcode     | Opcode(VOP3)|GCN 1.0|GCN 1.1| Mnemonic
     
    162162 70 (0x46)  | 454 (0x1c6) |       |   ✓   | V_EXP_LEGACY_F32
    163163
    164 List of instructions by opcode (GCN 1.2/1.4):
     164List of the instructions by opcode (GCN 1.2/1.4):
    165165
    166166 Opcode     | Opcode(VOP3)| Mnemonic (GCN 1.2)  | Mnemonic (GCN 1.4)
     
    969969Opcode VOP3A: 422 (0x1a6) for GCN 1.0/1.1 
    970970Syntax: V_LOG_CLAMP_F32 VDST, SRC0 
    971 Description: Approximate logarithm of base 2 from floating point value SRC0 with
     971Description: Approximate logarithm of the base 2 from floating point value SRC0 with
    972972clamping infinities to -MAX_FLOAT. Result is stored in VDST.
    973973If SRC0 is negative then store -NaN to VDST. This instruction doesn't handle denormalized
     
    993993Opcode VOP3A: 384 (0x180) for GCN 1.2 
    994994Syntax: V_LOG_F16 VDST, SRC0 
    995 Description: Approximate logarithm of base 2 from half floating point value SRC0,
     995Description: Approximate logarithm of the base 2 from half floating point value SRC0,
    996996and store result to VDST. If SRC0 is negative then store -NaN to VDST. 
    997997Operation: 
     
    10111011Opcode VOP3A: 423 (0x1a7) for GCN 1.0/1.1; 353 (0x161) for GCN 1.2 
    10121012Syntax: V_LOG_F32 VDST, SRC0 
    1013 Description: Approximate logarithm of base 2 from floating point value SRC0, and store
     1013Description: Approximate logarithm of base the 2 from floating point value SRC0, and store
    10141014result to VDST. If SRC0 is negative then store -NaN to VDST.
    10151015This instruction doesn't handle denormalized values regardless FLOAT MODE register setup. 
     
    10301030Opcode VOP3A: 453 (0x1c5) for GCN 1.1; 396 (0x18c) for GCN 1.2 
    10311031Syntax: V_LOG_LEGACY_F32 VDST, SRC0 
    1032 Description: Approximate logarithm of base 2 from floating point value SRC0, and store
     1032Description: Approximate logarithm of the base 2 from floating point value SRC0, and store
    10331033result to VDST. If SRC0 is negative then store -NaN to VDST.
    10341034This instruction doesn't handle denormalized values regardless FLOAT MODE register setup.
  • CLRadeonExtender/trunk/doc/GcnInstrsVop2.md

    r3570 r3572  
    11## GCN ISA VOP2/VOP3 instructions
    22
    3 VOP2 instructions can be encoded in VOP2 encoding and the VOP3A/VOP3B encoding.
     3VOP2 instructions can be encoded in the VOP2 encoding and the VOP3A/VOP3B encoding.
    44List of fields for VOP2 encoding:
    55
     
    9090
    9191VOP2 opcodes (0-63) are reflected in VOP3 in range: 256-319.
    92 List of instructions by opcode:
     92List of the instructions by opcode:
    9393
    9494 Opcode     | Opcode(VOP3)| Mnemonic (GCN1.0/1.1) | Mnemonic (GCN 1.2)
     
    147147 51 (0x33)  | 307 (0x133) | --                   | V_LDEXP_F16
    148148
    149 List of instructions by opcode (GCN 1.4):
     149List of the instructions by opcode (GCN 1.4):
    150150
    151151 Opcode     | Opcode(VOP3)| Mnemonic (GCN 1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsVop3.md

    r3570 r3572  
    8282Unaligned pairs of SGPRs are allowed in source operands.
    8383
    84 List of instructions by opcode (GCN 1.0/1.1):
     84List of the instructions by opcode (GCN 1.0/1.1):
    8585
    8686 Opcode      | Mnemonic (GCN 1.0) | Mnemonic (GCN 1.1)
     
    143143 375 (0x177) | --                 | V_MAD_I64_I32 (VOP3B)
    144144
    145 List of instructions by opcode (GCN 1.2/1.4):
     145List of the instructions by opcode (GCN 1.2/1.4):
    146146
    147147 Opcode      | Mnemonic (GCN 1.2)      | Mnemonic (GCN 1.4)
  • CLRadeonExtender/trunk/doc/GcnInstrsVop3p.md

    r3570 r3572  
    5151* only SRC0 can holds LDS_DIRECT
    5252
    53 List of instructions by opcode:
     53List of the instructions by opcode:
    5454
    5555 Opcode    | Mnemonic
  • CLRadeonExtender/trunk/doc/GcnInstrsVopc.md

    r3570 r3572  
    8080VOPC opcodes (0-255) and VOP3 opcodes are same.
    8181
    82 NOTE: Bits for inactive threads in 64-bit scalar destination
     82NOTE: Bits for inactive threads in the 64-bit scalar destination
    8383(VCC or pair of SGPRs) are always zeroed.
    8484
  • CLRadeonExtender/trunk/doc/GcnMemHandling.md

    r3570 r3572  
    405405### Image addressing
    406406
    407 The main addressing rules for images are defined by tiling registers.
     407The main addressing rules for the images are defined by the tiling registers.
    408408The TILINGINDEX choose what register control addressing of image. Index 8 (by default)
    409 choose the linear access. In the most cases images are splitted into tiles which
     409choose the linear access. In the most cases images are splitted into the tiles which
    410410organizes image's data in efficient manner for GPU memory subsystem. Unfortunatelly,
    411 fields of tiling registers and their meanigful are not known (for me).
     411the fields of a tiling registers and their meanigful are not known (for me).
    412412
    413413The address of image's pixel is stored in VADDR registers. Number of used registers and
     
    452452* lod - for IMAGE_*_L - LOD
    453453
    454 The LOD (Level of details) parameter choose MIPMAP: just the LOD reflects mipmap index.
     454The LOD (Level of details) parameter choose MIPMAP: just a LOD reflects mipmap index.
    455455By default, LOD are calculated as maximum value of image's MIN_LOD and sampler's MIN_LOD.
    456456The linear MIP filtering get value from two nearest mipmaps to choosen LOD.
     
    459459between pixels.
    460460
    461 The sampling of mipmaps requires normalized coordinates.
     461The sampling of the mipmaps requires normalized coordinates.
    462462
    463463### Flat addressing
  • CLRadeonExtender/trunk/doc/GcnOperands.md

    r3570 r3572  
    7070### Operand syntax
    7171
    72 Single operands can be given by their name: `s0`, `v54`.
     72THe Single operands can be given by their name: `s0`, `v54`.
    7373CLRX assembler accepts the syntax with
    7474brackets: `s[0]`, `s[z]`, `v[66]`. In many instructions operands are
     
    7979The names of the registers are case-insensitive.
    8080
    81 Constant values are automatically resolved if an expression have already value.
     81The constant values are automatically resolved if an expression have already value.
    8282The 1/(2*PI), 1.0, -2.0 and other floating point constant values will be
    8383resolved if that accurate floating point value will be given.
    8484
    8585In instruction syntax, operands are listed by name of the encoding field. Optionally, in
    86 parentheses is given number of the registers. The ranges of number of registers are in
     86parentheses is given number of the registers. The ranges of number of a registers are in
    8787form 'START:LAST'. Example:
    8888
  • CLRadeonExtender/trunk/doc/GcnState.md

    r3570 r3572  
    3838
    3939The user data registers hold execution setup (global offset, pointers, arguments pointers,
    40 same arguments). User data can allow to pass any constant data to kernel from host.
     40the same arguments). User data can allow to pass any constant data to kernel from host.
    4141The register 1-5 bits of PGM_RSRC2 indicates how many first scalar registers hold user data.
    4242Further scalar registers store group id and it are different for every wavefront.
  • CLRadeonExtender/trunk/doc/GcnTimings.md

    r3570 r3572  
    5454* only the first 3 dwords in the 32-byte block incur no penalty. Any 2-dword
    5555instruction outside these first 3 dwords adds a single penalty.
    56 * if instructions is longer (more than four cycles) then last cycles/4 dwords are free
     56* if the instructions is longer (more than four cycles) then the last cycles/4 dwords are free
    5757* if 16 or more cycle 2-dword instruction and 2-dword instruction in 4 dword, then there is
    58 no penalty for second 2-dword instruction.
    59 * best place to jump is 5 first dwords in 32-byte block. Jump to rest of dwords causes
    60 1-3 penalties, depending on number of dwords (N-4, where N is a dword number). This rule
     58no penalty for the second 2-dword instruction.
     59* best place to jump is the 5 first dwords in the 32-byte block. Jump to rest of the dwords causes
     601-3 penalties, depending on number of dwords (N-4, where N is the dword number). This rule
    6161does not apply to backward jumps (???)
    62 * any conditional jump instruction should be in first half of 32-byte block, otherwise
     62* any conditional jump instruction should be in first half of the 32-byte block, otherwise
    63631-4 penalties are added if jump is not taken, depending on dword number (N-3, where N is dword number).
    6464
    65 IMPORTANT: If the occupancy is greater than 1 wave per compute unit, then penalties,
     65IMPORTANT: If the occupancy is greater than 1 wave per compute unit, then the penalties,
    6666branches, and scalar instructions will be masked while executing
    6767more waves than 4\*CUs. For best results is recommended to execute many waves
     
    402402
    403403About bank conflicts: The LDS memory is partitioned in 32 banks. The bank number is in
    404 bits 2-6 of the address. A bank conflict occurs when two addresses hit same
    405 bank, but addresses are different starting from 7bit
     404bits 2-6 of the address. A bank conflict occurs when two addresses hit the same
     405bank, but the addresses are different starting from the 7bit
    406406(the first 2 bits of the address doesn't matter).
    407407Any bank conflict adds penalty to timing and throughput. In the worst case, the throughput
Note: See TracChangeset for help on using the changeset viewer.