Changes between Version 1 and Version 2 of ClrxAsmGallium


Ignore:
Timestamp:
10/27/15 20:41:43 (8 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ClrxAsmGallium

    v1 v2  
    11{{{
    2 #!Markdown
    3 ## CLRadeonExtender Assembler Gallium handling
    4 
    5 The GalliumCompute is an open-source the OpenCL implementation for the Mesa3D
     2#!html
     3<h2>CLRadeonExtender Assembler Gallium handling</h2>
     4<p>The GalliumCompute is an open-source the OpenCL implementation for the Mesa3D
    65drivers. It divided into three components: CLover, libclc, LLVM AMDGPU. Since LLVM v3.6
    76and Mesa3D v10.5, GalliumCompute binary format with native code. CLRadeonExtender
    8 supports only these binaries.
    9 
    10 ## Binary format
    11 
    12 The binary format contains: kernel informations and the main binary in the ELF format.
    13 Main `.text` section contains all code for all kernels. Optionally,
    14 section `.rodata` contains constant global data for all kernels.
    15 Main binary have the kernel configuration (ProgInfo) in the `.AMDGPU.config` section.
     7supports only these binaries.</p>
     8<h2>Binary format</h2>
     9<p>The binary format contains: kernel informations and the main binary in the ELF format.
     10Main <code>.text</code> section contains all code for all kernels. Optionally,
     11section <code>.rodata</code> contains constant global data for all kernels.
     12Main binary have the kernel configuration (ProgInfo) in the <code>.AMDGPU.config</code> section.
    1613ProgInfo holds three addresses and values that describes runtime environment for kernel:
    17 floating point setup, register usage, local data usage and rest.
    18 
    19 The assembler source code divided to three parts:
    20 
    21 * kernel configuration
    22 * kernel constant data (in `.rodata` section)
    23 * kernel code (in `.text` section`)
    24 
    25 Order of these parts doesn't matter.
    26 
    27 Kernel function should to be aligned to 256 byte boundary.
    28 
    29 ## List of the specific pseudo-operations
    30 
    31 ### .arg
    32 
    33 Syntax: .arg ARGTYPE, SIZE[, TARGETSIZE[, ALIGNMENT[, NUMEXT[, SEMANTIC]]]]
    34 
    35 Adds kernel argument definition. Must be inside argument configuration.
    36 First argument is type:
    37 
    38 * scalar - scalar value
    39 * contant - constant pointer (32-bit ???)
    40 * global - global pointer (64-bit)
    41 * local - local pointer
    42 * image2d_rdonly - ??
    43 * image2d_wronly - ??
    44 * image3d_rdonly - ??
    45 * image3d_wronly - ??
    46 * sampler - ??
    47 * griddim - shortcut for griddim argument definition
    48 * gridoffset - shortcut for gridoffset argument definition
    49 
    50 Second argument is size of argument. Third argument is targetSize which
     14floating point setup, register usage, local data usage and rest.</p>
     15<p>The assembler source code divided to three parts:</p>
     16<ul>
     17<li>kernel configuration</li>
     18<li>kernel constant data (in <code>.rodata</code> section)</li>
     19<li>kernel code (in <code>.text</code> section`)</li>
     20</ul>
     21<p>Order of these parts doesn't matter.</p>
     22<p>Kernel function should to be aligned to 256 byte boundary.</p>
     23<h2>List of the specific pseudo-operations</h2>
     24<h3>.arg</h3>
     25<p>Syntax: .arg ARGTYPE, SIZE[, TARGETSIZE[, ALIGNMENT[, NUMEXT[, SEMANTIC]]]]</p>
     26<p>Adds kernel argument definition. Must be inside argument configuration.
     27First argument is type:</p>
     28<ul>
     29<li>scalar - scalar value</li>
     30<li>contant - constant pointer (32-bit ???)</li>
     31<li>global - global pointer (64-bit)</li>
     32<li>local - local pointer</li>
     33<li>image2d_rdonly - ??</li>
     34<li>image2d_wronly - ??</li>
     35<li>image3d_rdonly - ??</li>
     36<li>image3d_wronly - ??</li>
     37<li>sampler - ??</li>
     38<li>griddim - shortcut for griddim argument definition</li>
     39<li>gridoffset - shortcut for gridoffset argument definition</li>
     40</ul>
     41<p>Second argument is size of argument. Third argument is targetSize which
    5142should be a multiplier of 4. Fourth argument is target alignment.
    5243Fifth argument determines how extend numeric value to larger target size:
    53 `sext` - signed, `zext` - zero extend. If argument is smaller than 4 byte,
    54 then `sext` can be to define signed integer, `zext` to unsigned integer.
    55 Sixth argument is semantic:
    56 
    57 * general - general argument
    58 * griddim - griddim argument
    59 * gridoffset - gridoffset argument
    60 * imgsize - image size
    61 * imgformat - image format
    62 
    63 Example argument definition:
    64 
    65 ```
    66 .arg scalar, 4, 4, 4, zext, general
     44<code>sext</code> - signed, <code>zext</code> - zero extend. If argument is smaller than 4 byte,
     45then <code>sext</code> can be to define signed integer, <code>zext</code> to unsigned integer.
     46Sixth argument is semantic:</p>
     47<ul>
     48<li>general - general argument</li>
     49<li>griddim - griddim argument</li>
     50<li>gridoffset - gridoffset argument</li>
     51<li>imgsize - image size</li>
     52<li>imgformat - image format</li>
     53</ul>
     54<p>Example argument definition:</p>
     55<p><code>.arg scalar, 4, 4, 4, zext, general
    6756.arg global, 8, 8, 8, zext, general
    6857.arg scalar, 4, 4, 4, zext, griddim # shortcut: .arg griddim
    69 .arg scalar, 4, 4, 4, zext, gridoffset # shortcut .arg gridoffset
    70 ```
    71 
    72 Last two arguments (griddim, gridoffset) shall to be defined in any kernel definition.
    73 
    74 ### .args
    75 
    76 Open kernel argument configuration. Must be inside kernel.
    77 
    78 ### .config
    79 
    80 Open kernel configuration. Must be inside kernel. Kernel configuration can not be
    81 defined if proginfo configuration was defined (by using `.proginfo`).
    82 Following pseudo-ops can be inside kernel config:
    83 
    84 * .dims DIMS - choose dimensions used by kernel function. Can be: x,y,z.
    85 * .floatmode VALUE - choose float mode for kernel (byte value).
    86 Default value is 0xc0
    87 * .ieeemode - choose IEEE mode for kernel
    88 * .localsize SIZE - initial local data size for kernel in bytes
    89 * .pgmrsrc2 VALUE - value of the PGMRSRC2 (only bits that is not set by other pseudo-ops)
    90 * .priority VALUE - set priority for kernel (0-3). Default value is 0.
    91 * .scratchbuffer SIZE - size of scratchbuffer (???). Default value is 0.
    92 * .sgprsnum NUMBER - number of SGPR registers used by kernel (excluding VCC,FLAT_SCRATCH).
    93 By default, automatically computed by assembler.
    94 * .vgprsnum NUMBER - number of VGPR registers used by kernel.
    95 By default, automatically computed by assembler.
    96 * .userdatanum NUMBER - number of USERDATA used by kernel (0-16). Default value is 4.
    97 * .tgsize - enables using of TG_SIZE_EN (we recommend to add this always)
    98 
    99 Example configuration:
    100 
    101 ```
    102 .config
     58.arg scalar, 4, 4, 4, zext, gridoffset # shortcut .arg gridoffset</code></p>
     59<p>Last two arguments (griddim, gridoffset) shall to be defined in any kernel definition.</p>
     60<h3>.args</h3>
     61<p>Open kernel argument configuration. Must be inside kernel.</p>
     62<h3>.config</h3>
     63<p>Open kernel configuration. Must be inside kernel. Kernel configuration can not be
     64defined if proginfo configuration was defined (by using <code>.proginfo</code>).
     65Following pseudo-ops can be inside kernel config:</p>
     66<ul>
     67<li>.dims DIMS - choose dimensions used by kernel function. Can be: x,y,z.</li>
     68<li>.floatmode VALUE - choose float mode for kernel (byte value).
     69Default value is 0xc0</li>
     70<li>.ieeemode - choose IEEE mode for kernel</li>
     71<li>.localsize SIZE - initial local data size for kernel in bytes</li>
     72<li>.pgmrsrc2 VALUE - value of the PGMRSRC2 (only bits that is not set by other pseudo-ops)</li>
     73<li>.priority VALUE - set priority for kernel (0-3). Default value is 0.</li>
     74<li>.scratchbuffer SIZE - size of scratchbuffer (???). Default value is 0.</li>
     75<li>.sgprsnum NUMBER - number of SGPR registers used by kernel (excluding VCC,FLAT_SCRATCH).
     76By default, automatically computed by assembler.</li>
     77<li>.vgprsnum NUMBER - number of VGPR registers used by kernel.
     78By default, automatically computed by assembler.</li>
     79<li>.userdatanum NUMBER - number of USERDATA used by kernel (0-16). Default value is 4.</li>
     80<li>.tgsize - enables using of TG_SIZE_EN (we recommend to add this always)</li>
     81</ul>
     82<p>Example configuration:</p>
     83<p><code>.config
    10384    .dims xyz
    104     .tgsize
    105 ```
    106 
    107 ### .dims
    108 
    109 Syntax: .dims DIMENSIONS
    110 
    111 This pseudo-op must be inside kernel configuration (`.config`). Defines what dimensions
    112 (from list: x, y, z) will be used to determine space of the kernel execution.
    113 
    114 ### .entry
    115 
    116 Syntax: .entry ADDRESS, VALUE
    117 
    118 Add entry of proginfo. Must be inside proginfo configuration. Sample proginfo:
    119 
    120 ```
    121 .entry 0x0000b848, 0x000c0080
     85    .tgsize</code></p>
     86<h3>.dims</h3>
     87<p>Syntax: .dims DIMENSIONS</p>
     88<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines what dimensions
     89(from list: x, y, z) will be used to determine space of the kernel execution.</p>
     90<h3>.entry</h3>
     91<p>Syntax: .entry ADDRESS, VALUE</p>
     92<p>Add entry of proginfo. Must be inside proginfo configuration. Sample proginfo:</p>
     93<p><code>.entry 0x0000b848, 0x000c0080
    12294.entry 0x0000b84c, 0x00001788
    123 .entry 0x0000b860, 0x00000000
    124 ```
    125 
    126 ### .floatmode
    127 
    128 Syntax: .floatmode BYTE-VALUE
    129 
    130 This pseudo-op must be inside kernel configuration (`.config`). Defines float-mode.
    131 
    132 ### .globaldata
    133 
    134 Go to constant global data section (`.rodata`).
    135 
    136 ### .ieeemode
    137 
    138 Syntax: .ieeemode
    139 
    140 This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
    141 
    142 ### .kcode
    143 
    144 Syntax: .kcode KERNEL1,.... 
    145 Syntax: .kcode +
    146 
    147 Open code that will be belonging to specified kernels. By default any code between
     95.entry 0x0000b860, 0x00000000</code></p>
     96<h3>.floatmode</h3>
     97<p>Syntax: .floatmode BYTE-VALUE</p>
     98<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines float-mode.</p>
     99<h3>.globaldata</h3>
     100<p>Go to constant global data section (<code>.rodata</code>).</p>
     101<h3>.ieeemode</h3>
     102<p>Syntax: .ieeemode</p>
     103<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set ieee-mode.</p>
     104<h3>.kcode</h3>
     105<p>Syntax: .kcode KERNEL1,....<br />
     106Syntax: .kcode +</p>
     107<p>Open code that will be belonging to specified kernels. By default any code between
    148108two consecutive kernel labels belongs to the kernel with first label name.
    149109This pseudo-operation can change membership of the code to specified kernels.
    150 You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
     110You can nest this <code>.kcode</code> any times. Just next .kcode adds or remove membership code
    151111to kernels. The most important reason why this feature has been added is register usage
    152 calculation. Any kernel given in this pseudo-operation must be already defined.
    153 
    154 Sample usage:
    155 
    156 ```
    157 .kcode + # this code belongs to all kernels
     112calculation. Any kernel given in this pseudo-operation must be already defined.</p>
     113<p>Sample usage:</p>
     114<p><code>.kcode + # this code belongs to all kernels
    158115.kcodeend
    159116.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
    160117    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
    161118    .kcodeend
    162 .kcodeend
    163 ```
    164 
    165 ### .kcodeend
    166 
    167 Close `.kcode` clause. Refer to `.kcode`.
    168 
    169 ### .localsize
    170 
    171 Syntax: .localsize SIZE
    172 
    173 This pseudo-op must be inside kernel configuration (`.config`). Defines initial
    174 local memory size used by kernel.
    175 
    176 ### .pgmrsrc2
    177 
    178 Syntax: .pgmrsrc2 VALUE
    179 
    180 This pseudo-op must be inside kernel configuration (`.config`).
     119.kcodeend</code></p>
     120<h3>.kcodeend</h3>
     121<p>Close <code>.kcode</code> clause. Refer to <code>.kcode</code>.</p>
     122<h3>.localsize</h3>
     123<p>Syntax: .localsize SIZE</p>
     124<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines initial
     125local memory size used by kernel.</p>
     126<h3>.pgmrsrc2</h3>
     127<p>Syntax: .pgmrsrc2 VALUE</p>
     128<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
    181129Defines value of the PGMRSRC2 excepts bits which can be set by other
    182 config pseudo-operations.
    183 
    184 ### .priority
    185 
    186 Syntax: .priority PRIORITY
    187 
    188 This pseudo-op must be inside kernel configuration (`.config`). Defines priority (0-3).
    189 
    190 ### .proginfo
    191 
    192 Open progInfo definition. Must be inside kernel.
     130config pseudo-operations.</p>
     131<h3>.priority</h3>
     132<p>Syntax: .priority PRIORITY</p>
     133<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines priority (0-3).</p>
     134<h3>.proginfo</h3>
     135<p>Open progInfo definition. Must be inside kernel.
    193136ProgInfo shall to be containing 3 entries. ProgInfo can not be defined if kernel config
    194 was defined (by using `.config`).
    195 
    196 ### .scratchbuffer
    197 
    198 Syntax: .scratchbuffer SIZE
    199 
    200 This pseudo-op must be inside kernel configuration (`.config`). Defines scratchbuffer size.
    201 
    202 ### .sgprsnum
    203 
    204 Syntax: .sgprsnum REGNUM
    205 
    206 This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
    207 registers which can be used during kernel execution.
    208 
    209 ### .tgsize
    210 
    211 This pseudo-op must be inside kernel configuration (`.config`).
    212 Enable usage of the TG_SIZE_EN. Should be set.
    213 
    214 ### .userdatanum
    215 
    216 Syntax: .userdatanum NUMBER
    217 
    218 This pseudo-op must be inside kernel configuration (`.config`). Set number of
    219 registers for USERDATA.
    220 
    221 ### .vgprsnum
    222 
    223 Syntax: .vgprsnum REGNUM
    224 
    225 This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
    226 registers which can be used during kernel execution.
    227 
    228 ## Sample code
    229 
    230 This is sample example of the kernel setup:
    231 
    232 ```
    233 .kernel DCT
     137was defined (by using <code>.config</code>).</p>
     138<h3>.scratchbuffer</h3>
     139<p>Syntax: .scratchbuffer SIZE</p>
     140<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines scratchbuffer size.</p>
     141<h3>.sgprsnum</h3>
     142<p>Syntax: .sgprsnum REGNUM</p>
     143<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set number of scalar
     144registers which can be used during kernel execution.</p>
     145<h3>.tgsize</h3>
     146<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
     147Enable usage of the TG_SIZE_EN. Should be set.</p>
     148<h3>.userdatanum</h3>
     149<p>Syntax: .userdatanum NUMBER</p>
     150<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set number of
     151registers for USERDATA.</p>
     152<h3>.vgprsnum</h3>
     153<p>Syntax: .vgprsnum REGNUM</p>
     154<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set number of vector
     155registers which can be used during kernel execution.</p>
     156<h2>Sample code</h2>
     157<p>This is sample example of the kernel setup:</p>
     158<p><code>.kernel DCT
    234159    .args
    235160        .arg global, 8, 8, 8, zext, general
     
    245170        .entry 0x0000b848, 0x000c0183
    246171        .entry 0x0000b84c, 0x00001788
    247         .entry 0x0000b860, 0x00000000
    248 ```
    249 
    250 with kernel configuration:
    251 
    252 ```
    253     .args
     172        .entry 0x0000b860, 0x00000000</code></p>
     173<p>with kernel configuration:</p>
     174<p><code>.args
    254175        .arg global, 8, 8, 8, zext, general
    255176        .arg global, 8, 8, 8, zext, general
     
    263184    .config
    264185        .dims xyz
    265         .tgsize
    266 ```
    267 
    268 All code:
    269 
    270 ```
    271 .gallium
     186        .tgsize</code></p>
     187<p>All code:</p>
     188<p><code>.gallium
    272189.gpu CapeVerde
    273190.kernel DCT
     
    291208/*c0038107         */ s_load_dword    s7, s[0:1], 0x7
    292209/* we skip rest of instruction to demonstrate how to write GalliumCompute program */
    293 /*bf810000         */ s_endpgm
    294 ```
    295 }}}
     210/*bf810000         */ s_endpgm</code></p>}}}