Context Navigation

Changes between Version 1 and Version 2 of ClrxAsmGallium

Timestamp:: 10/27/15 20:41:43 (8 years ago)
Author:: trac
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

ClrxAsmGallium

-                      v1
+                      v2
 {{{
+#!Markdown
+## CLRadeonExtender Assembler Gallium handling
+The GalliumCompute is an open-source the OpenCL implementation for the Mesa3D
+#!html
+<h2>CLRadeonExtender Assembler Gallium handling</h2>
+<p>The GalliumCompute is an open-source the OpenCL implementation for the Mesa3D
 drivers. It divided into three components: CLover, libclc, LLVM AMDGPU. Since LLVM v3.6
 and Mesa3D v10.5, GalliumCompute binary format with native code. CLRadeonExtender
+supports only these binaries.
+## Binary format
+The binary format contains: kernel informations and the main binary in the ELF format.
+Main `.text` section contains all code for all kernels. Optionally,
+section `.rodata` contains constant global data for all kernels.
+Main binary have the kernel configuration (ProgInfo) in the `.AMDGPU.config` section.
+supports only these binaries.</p>
+<h2>Binary format</h2>
+<p>The binary format contains: kernel informations and the main binary in the ELF format.
+Main <code>.text</code> section contains all code for all kernels. Optionally,
+section <code>.rodata</code> contains constant global data for all kernels.
+Main binary have the kernel configuration (ProgInfo) in the <code>.AMDGPU.config</code> section.
 ProgInfo holds three addresses and values that describes runtime environment for kernel:
+floating point setup, register usage, local data usage and rest.
+The assembler source code divided to three parts:
+* kernel configuration
+* kernel constant data (in `.rodata` section)
+* kernel code (in `.text` section`)
+Order of these parts doesn't matter.
+Kernel function should to be aligned to 256 byte boundary.
+## List of the specific pseudo-operations
+### .arg
+Syntax: .arg ARGTYPE, SIZE[, TARGETSIZE[, ALIGNMENT[, NUMEXT[, SEMANTIC]]]]
+Adds kernel argument definition. Must be inside argument configuration.
+First argument is type:
+* scalar - scalar value
+* contant - constant pointer (32-bit ???)
+* global - global pointer (64-bit)
+* local - local pointer
+* image2d_rdonly - ??
+* image2d_wronly - ??
+* image3d_rdonly - ??
+* image3d_wronly - ??
+* sampler - ??
+* griddim - shortcut for griddim argument definition
+* gridoffset - shortcut for gridoffset argument definition
+Second argument is size of argument. Third argument is targetSize which
+floating point setup, register usage, local data usage and rest.</p>
+<p>The assembler source code divided to three parts:</p>
+<ul>
+<li>kernel configuration</li>
+<li>kernel constant data (in <code>.rodata</code> section)</li>
+<li>kernel code (in <code>.text</code> section`)</li>
+</ul>
+<p>Order of these parts doesn't matter.</p>
+<p>Kernel function should to be aligned to 256 byte boundary.</p>
+<h2>List of the specific pseudo-operations</h2>
+<h3>.arg</h3>
+<p>Syntax: .arg ARGTYPE, SIZE[, TARGETSIZE[, ALIGNMENT[, NUMEXT[, SEMANTIC]]]]</p>
+<p>Adds kernel argument definition. Must be inside argument configuration.
+First argument is type:</p>
+<ul>
+<li>scalar - scalar value</li>
+<li>contant - constant pointer (32-bit ???)</li>
+<li>global - global pointer (64-bit)</li>
+<li>local - local pointer</li>
+<li>image2d_rdonly - ??</li>
+<li>image2d_wronly - ??</li>
+<li>image3d_rdonly - ??</li>
+<li>image3d_wronly - ??</li>
+<li>sampler - ??</li>
+<li>griddim - shortcut for griddim argument definition</li>
+<li>gridoffset - shortcut for gridoffset argument definition</li>
+</ul>
+<p>Second argument is size of argument. Third argument is targetSize which
 should be a multiplier of 4. Fourth argument is target alignment.
 Fifth argument determines how extend numeric value to larger target size:
+`sext` - signed, `zext` - zero extend. If argument is smaller than 4 byte,
+then `sext` can be to define signed integer, `zext` to unsigned integer.
+Sixth argument is semantic:
+* general - general argument
+* griddim - griddim argument
+* gridoffset - gridoffset argument
+* imgsize - image size
+* imgformat - image format
+Example argument definition:
+```
+.arg scalar, 4, 4, 4, zext, general
+<code>sext</code> - signed, <code>zext</code> - zero extend. If argument is smaller than 4 byte,
+then <code>sext</code> can be to define signed integer, <code>zext</code> to unsigned integer.
+Sixth argument is semantic:</p>
+<ul>
+<li>general - general argument</li>
+<li>griddim - griddim argument</li>
+<li>gridoffset - gridoffset argument</li>
+<li>imgsize - image size</li>
+<li>imgformat - image format</li>
+</ul>
+<p>Example argument definition:</p>
+<p><code>.arg scalar, 4, 4, 4, zext, general
 .arg global, 8, 8, 8, zext, general
 .arg scalar, 4, 4, 4, zext, griddim # shortcut: .arg griddim
+.arg scalar, 4, 4, 4, zext, gridoffset # shortcut .arg gridoffset
+```
+Last two arguments (griddim, gridoffset) shall to be defined in any kernel definition.
+### .args
+Open kernel argument configuration. Must be inside kernel.
+### .config
+Open kernel configuration. Must be inside kernel. Kernel configuration can not be
+defined if proginfo configuration was defined (by using `.proginfo`).
+Following pseudo-ops can be inside kernel config:
+* .dims DIMS - choose dimensions used by kernel function. Can be: x,y,z.
+* .floatmode VALUE - choose float mode for kernel (byte value).
+Default value is 0xc0
+* .ieeemode - choose IEEE mode for kernel
+* .localsize SIZE - initial local data size for kernel in bytes
+* .pgmrsrc2 VALUE - value of the PGMRSRC2 (only bits that is not set by other pseudo-ops)
+* .priority VALUE - set priority for kernel (0-3). Default value is 0.
+* .scratchbuffer SIZE - size of scratchbuffer (???). Default value is 0.
+* .sgprsnum NUMBER - number of SGPR registers used by kernel (excluding VCC,FLAT_SCRATCH).
+By default, automatically computed by assembler.
+* .vgprsnum NUMBER - number of VGPR registers used by kernel.
+By default, automatically computed by assembler.
+* .userdatanum NUMBER - number of USERDATA used by kernel (0-16). Default value is 4.
+* .tgsize - enables using of TG_SIZE_EN (we recommend to add this always)
+Example configuration:
+```
+.config
+.arg scalar, 4, 4, 4, zext, gridoffset # shortcut .arg gridoffset</code></p>
+<p>Last two arguments (griddim, gridoffset) shall to be defined in any kernel definition.</p>
+<h3>.args</h3>
+<p>Open kernel argument configuration. Must be inside kernel.</p>
+<h3>.config</h3>
+<p>Open kernel configuration. Must be inside kernel. Kernel configuration can not be
+defined if proginfo configuration was defined (by using <code>.proginfo</code>).
+Following pseudo-ops can be inside kernel config:</p>
+<ul>
+<li>.dims DIMS - choose dimensions used by kernel function. Can be: x,y,z.</li>
+<li>.floatmode VALUE - choose float mode for kernel (byte value).
+Default value is 0xc0</li>
+<li>.ieeemode - choose IEEE mode for kernel</li>
+<li>.localsize SIZE - initial local data size for kernel in bytes</li>
+<li>.pgmrsrc2 VALUE - value of the PGMRSRC2 (only bits that is not set by other pseudo-ops)</li>
+<li>.priority VALUE - set priority for kernel (0-3). Default value is 0.</li>
+<li>.scratchbuffer SIZE - size of scratchbuffer (???). Default value is 0.</li>
+<li>.sgprsnum NUMBER - number of SGPR registers used by kernel (excluding VCC,FLAT_SCRATCH).
+By default, automatically computed by assembler.</li>
+<li>.vgprsnum NUMBER - number of VGPR registers used by kernel.
+By default, automatically computed by assembler.</li>
+<li>.userdatanum NUMBER - number of USERDATA used by kernel (0-16). Default value is 4.</li>
+<li>.tgsize - enables using of TG_SIZE_EN (we recommend to add this always)</li>
+</ul>
+<p>Example configuration:</p>
+<p><code>.config
     .dims xyz
+    .tgsize
+```
+### .dims
+Syntax: .dims DIMENSIONS
+This pseudo-op must be inside kernel configuration (`.config`). Defines what dimensions
+(from list: x, y, z) will be used to determine space of the kernel execution.
+### .entry
+Syntax: .entry ADDRESS, VALUE
+Add entry of proginfo. Must be inside proginfo configuration. Sample proginfo:
+```
+.entry 0x0000b848, 0x000c0080
+    .tgsize</code></p>
+<h3>.dims</h3>
+<p>Syntax: .dims DIMENSIONS</p>
+<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines what dimensions
+(from list: x, y, z) will be used to determine space of the kernel execution.</p>
+<h3>.entry</h3>
+<p>Syntax: .entry ADDRESS, VALUE</p>
+<p>Add entry of proginfo. Must be inside proginfo configuration. Sample proginfo:</p>
+<p><code>.entry 0x0000b848, 0x000c0080
 .entry 0x0000b84c, 0x00001788
+.entry 0x0000b860, 0x00000000
+```
+### .floatmode
+Syntax: .floatmode BYTE-VALUE
+This pseudo-op must be inside kernel configuration (`.config`). Defines float-mode.
+### .globaldata
+Go to constant global data section (`.rodata`).
+### .ieeemode
+Syntax: .ieeemode
+This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
+### .kcode
+Syntax: .kcode KERNEL1,....
+Syntax: .kcode +
+Open code that will be belonging to specified kernels. By default any code between
+.entry 0x0000b860, 0x00000000</code></p>
+<h3>.floatmode</h3>
+<p>Syntax: .floatmode BYTE-VALUE</p>
+<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines float-mode.</p>
+<h3>.globaldata</h3>
+<p>Go to constant global data section (<code>.rodata</code>).</p>
+<h3>.ieeemode</h3>
+<p>Syntax: .ieeemode</p>
+<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set ieee-mode.</p>
+<h3>.kcode</h3>
+<p>Syntax: .kcode KERNEL1,....<br />
+Syntax: .kcode +</p>
+<p>Open code that will be belonging to specified kernels. By default any code between
 two consecutive kernel labels belongs to the kernel with first label name.
 This pseudo-operation can change membership of the code to specified kernels.
 You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
+You can nest this <code>.kcode</code> any times. Just next .kcode adds or remove membership code
 to kernels. The most important reason why this feature has been added is register usage
+calculation. Any kernel given in this pseudo-operation must be already defined.
+Sample usage:
+```
+.kcode + # this code belongs to all kernels
+calculation. Any kernel given in this pseudo-operation must be already defined.</p>
+<p>Sample usage:</p>
+<p><code>.kcode + # this code belongs to all kernels
 .kcodeend
 .kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
     .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
     .kcodeend
+.kcodeend
+```
+### .kcodeend
+Close `.kcode` clause. Refer to `.kcode`.
+### .localsize
+Syntax: .localsize SIZE
+This pseudo-op must be inside kernel configuration (`.config`). Defines initial
+local memory size used by kernel.
+### .pgmrsrc2
+Syntax: .pgmrsrc2 VALUE
+This pseudo-op must be inside kernel configuration (`.config`).
+.kcodeend</code></p>
+<h3>.kcodeend</h3>
+<p>Close <code>.kcode</code> clause. Refer to <code>.kcode</code>.</p>
+<h3>.localsize</h3>
+<p>Syntax: .localsize SIZE</p>
+<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines initial
+local memory size used by kernel.</p>
+<h3>.pgmrsrc2</h3>
+<p>Syntax: .pgmrsrc2 VALUE</p>
+<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
 Defines value of the PGMRSRC2 excepts bits which can be set by other
+config pseudo-operations.
+### .priority
+Syntax: .priority PRIORITY
+This pseudo-op must be inside kernel configuration (`.config`). Defines priority (0-3).
+### .proginfo
+Open progInfo definition. Must be inside kernel.
+config pseudo-operations.</p>
+<h3>.priority</h3>
+<p>Syntax: .priority PRIORITY</p>
+<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines priority (0-3).</p>
+<h3>.proginfo</h3>
+<p>Open progInfo definition. Must be inside kernel.
 ProgInfo shall to be containing 3 entries. ProgInfo can not be defined if kernel config
+was defined (by using `.config`).
+### .scratchbuffer
+Syntax: .scratchbuffer SIZE
+This pseudo-op must be inside kernel configuration (`.config`). Defines scratchbuffer size.
+### .sgprsnum
+Syntax: .sgprsnum REGNUM
+This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
+registers which can be used during kernel execution.
+### .tgsize
+This pseudo-op must be inside kernel configuration (`.config`).
+Enable usage of the TG_SIZE_EN. Should be set.
+### .userdatanum
+Syntax: .userdatanum NUMBER
+This pseudo-op must be inside kernel configuration (`.config`). Set number of
+registers for USERDATA.
+### .vgprsnum
+Syntax: .vgprsnum REGNUM
+This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
+registers which can be used during kernel execution.
+## Sample code
+This is sample example of the kernel setup:
+```
+.kernel DCT
+was defined (by using <code>.config</code>).</p>
+<h3>.scratchbuffer</h3>
+<p>Syntax: .scratchbuffer SIZE</p>
+<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines scratchbuffer size.</p>
+<h3>.sgprsnum</h3>
+<p>Syntax: .sgprsnum REGNUM</p>
+<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set number of scalar
+registers which can be used during kernel execution.</p>
+<h3>.tgsize</h3>
+<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
+Enable usage of the TG_SIZE_EN. Should be set.</p>
+<h3>.userdatanum</h3>
+<p>Syntax: .userdatanum NUMBER</p>
+<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set number of
+registers for USERDATA.</p>
+<h3>.vgprsnum</h3>
+<p>Syntax: .vgprsnum REGNUM</p>
+<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set number of vector
+registers which can be used during kernel execution.</p>
+<h2>Sample code</h2>
+<p>This is sample example of the kernel setup:</p>
+<p><code>.kernel DCT
     .args
         .arg global, 8, 8, 8, zext, general
 …
         .entry 0x0000b848, 0x000c0183
         .entry 0x0000b84c, 0x00001788
+        .entry 0x0000b860, 0x00000000
+```
+with kernel configuration:
+```
+    .args
+        .entry 0x0000b860, 0x00000000</code></p>
+<p>with kernel configuration:</p>
+<p><code>.args
         .arg global, 8, 8, 8, zext, general
         .arg global, 8, 8, 8, zext, general
 …
     .config
         .dims xyz
+        .tgsize
+```
+All code:
+```
+.gallium
+        .tgsize</code></p>
+<p>All code:</p>
+<p><code>.gallium
 .gpu CapeVerde
 .kernel DCT
 …
 /*c0038107         */ s_load_dword    s7, s[0:1], 0x7
 /* we skip rest of instruction to demonstrate how to write GalliumCompute program */
+/*bf810000         */ s_endpgm
+```
+}}}
+/*bf810000         */ s_endpgm</code></p>}}}