Changes between Version 17 and Version 18 of ClrxAsmRocm


Ignore:
Timestamp:
02/07/18 21:00:40 (6 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ClrxAsmRocm

    v17 v18  
    1717<ul>
    1818<li>kernel configuration</li>
    19 <li>kernel code and data (in <code>.text</code> section`)</li>
     19<li>kernel code and data (in <code>.text</code> section)</li>
    2020</ul>
    2121<p>Order of these parts doesn't matter.</p>
    2222<p>Kernel function should to be aligned to 256 byte boundary.</p>
     23<p>Additional kernel informations and binary informations are in metadata ELF note.
     24It holds informations about <code>printf</code> calls, kernel configuration and its arguments.</p>
    2325<h2>Register usage setup</h2>
    2426<p>The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
     
    4446<h3>.config</h3>
    4547<p>Open kernel configuration. Must be inside kernel.</p>
     48<p>The kernel metadata info config pseudo-ops:</p>
     49<ul>
     50<li>.arg - add kernel argument</li>
     51<li>.md_language - kernel language</li>
     52<li>.cws, .reqd_work_group_size - reqd_work_group_size</li>
     53<li>.work_group_size_hint - work_group_size_hint</li>
     54<li>.fixed_work_group_size - fixed work group size</li>
     55<li>.max_flat_work_group_size - max flat work group size</li>
     56<li>.vectypehint - vector type hint</li>
     57<li>.runtime_handle - runtime handle symbol name</li>
     58<li>.md_kernarg_segment_align - kernel argument segment alignment</li>
     59<li>.md_kernarg_segment_size - kernel argument segment size</li>
     60<li>.md_group_segment_fixed_size - group segment fixed size</li>
     61<li>.md_private_segment_fixed_size - private segment fixed size</li>
     62<li>.md_symname - kernel symbol name</li>
     63<li>.md_sgprsnum - number of SGPRs</li>
     64<li>.md_vgprsnum - number of VGPRs</li>
     65<li>.spilledsgprs - number of spilled SGPRs</li>
     66<li>.spilledvgprs - number of spilled VGPRs</li>
     67<li>.md_wavefront_size - wavefront size</li>
     68</ul>
    4669<h3>.control_directive</h3>
    4770<p>Open control directive section. This section must be 128 bytes. The content of this
    4871section will be stored in control_directive field in kernel configuration.
    4972Must be defined inside kernel.</p>
     73<h3>.cws, .reqd_work_group_size</h3>
     74<p>Syntax: .cws SIZEHINT[, SIZEHINT[, SIZEHINT]]
     75Syntax: .reqd_work_group_size SIZEHINT[, SIZEHINT[, SIZEHINT]]</p>
     76<p>This pseudo-operation must be inside any kernel configuration.
     77Set reqd_work_group_size hint for this kernel in metadata info.</p>
    5078<h3>.debug_private_segment_buffer_sgpr</h3>
    5179<p>Syntax: .debug_private_segment_buffer_sgpr SGPRREG</p>
     
    6189<h3>.dims</h3>
    6290<p>Syntax: .dims DIMENSIONS</p>
    63 <p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines what dimensions
     91<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Define what dimensions
    6492(from list: x, y, z) will be used to determine space of the kernel execution.</p>
    6593<h3>.dx10clamp</h3>
     
    73101<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
    74102Set exception mask in PGMRSRC2 register value. Value should be 7-bit.</p>
     103<h3>.fixed_work_group_size</h3>
     104<p>Syntax: .fixed_work_group_size SIZEHINT[, SIZEHINT[, SIZEHINT]]</p>
     105<p>This pseudo-operation must be inside any kernel configuration.
     106Set fixed_work_group_size for this kernel in metadata info.</p>
    75107<h3>.fkernel</h3>
    76108<p>Mark given kernel as function in ROCm. Must be inside kernel.</p>
    77109<h3>.floatmode</h3>
    78110<p>Syntax: .floatmode BYTE-VALUE</p>
    79 <p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines float-mode.
     111<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Define float-mode.
    80112Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.</p>
    81113<h3>.gds_segment_size</h3>
     
    138170<h3>.localsize</h3>
    139171<p>Syntax: .localsize SIZE</p>
    140 <p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines initial
     172<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Define initial
    141173local memory size used by kernel.</p>
    142174<h3>.machine</h3>
     
    144176<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set
    145177machine version fields in kernel configuration.</p>
     178<h3>.max_flat_work_group_size</h3>
     179<p>Syntax: .max_flat_work_group_size SIZE</p>
     180<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
     181Set max flat work group size in metadata info.</p>
    146182<h3>.max_scratch_backing_memory</h3>
    147183<p>Syntax: .max_scratch_backing_memory SIZE</p>
    148184<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set
    149185<code>max_scratch_backing_memory_byte_size</code> field in kernel configuration.</p>
     186<h3>.md_group_segment_fixed_size</h3>
     187<p>Syntax: .md_group_segment_fixed_size SIZE</p>
     188<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
     189Set group segment fixed size in metadata info.</p>
     190<h3>.md_kernarg_segment_align</h3>
     191<p>Syntax: .md_kernarg_segment_align ALIGNMENT</p>
     192<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
     193Set kernel argument segment alignment in metadata info.</p>
     194<h3>.md_kernarg_segment_size</h3>
     195<p>Syntax: .md_kernarg_segment_size SIZE</p>
     196<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
     197Set kernel argument segment size in metadata info.</p>
     198<h3>.md_private_segment_fixed_size</h3>
     199<p>Syntax: .md_private_segment_fixed_size SIZE</p>
     200<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
     201Set private segment fixed size in metadata info.</p>
     202<h3>.md_symname</h3>
     203<p>Syntax: .md_symname "SYMBOLNAME"</p>
     204<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
     205Set kernel symbol name in metadata info. It should be in format "NAME@kd".</p>
     206<h3>.md_language</h3>
     207<p>Syntax .md_language "LANGUAGE"[, MAJOR, MINOR]</p>
     208<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
     209Set kernel language and its version in metadata info. The language name is as string.</p>
     210<h3>.md_sgprsnum</h3>
     211<p>Syntax: .md_sgprsnum REGNUM</p>
     212<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
     213Define number of scalar registers for kernel in metadata info.</p>
     214<h3>.md_version</h3>
     215<p>Syntax: .md_version MAJOR, MINOR</p>
     216<p>This pseudo-ops defines metadata format version.</p>
     217<h3>.md_wavefront_size</h3>
     218<p>Syntax: .md_wavefront_size SIZE</p>
     219<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
     220Define wavefront size in metadata info. If not specified then value get from HSA config.</p>
     221<h3>.md_vgprsnum</h3>
     222<p>Syntax: .md_vgprsnum REGNUM</p>
     223<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
     224Define number of vector registers for kernel in metadata info.</p>
     225<h3>.metadata</h3>
     226<p>Go to metadata (metadata ELF note) section.</p>
    150227<h3>.newbinfmt</h3>
    151228<p>This pseudo-ops set new binary format.</p>
     
    153230<p>Syntax: .pgmrsrc1 VALUE</p>
    154231<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
    155 Defines value of the PGMRSRC1.</p>
     232Define value of the PGMRSRC1.</p>
    156233<h3>.pgmrsrc2</h3>
    157234<p>Syntax: .pgmrsrc2 VALUE</p>
    158235<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
    159 Defines value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
     236Define value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
    160237will be ignored. SCRATCH_EN bit will be ignored.</p>
     238<h3>.printf</h3>
     239<p>Syntax: .printf [ID][,ARGSIZE,....],"FORMAT"</p>
     240<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
     241Adds new printf info entry to metadata info. The first argument is ID (must be unique)
     242and is optional. Next arguments are argument size for printf call. The last argument
     243is format string.</p>
    161244<h3>.priority</h3>
    162245<p>Syntax: .priority PRIORITY</p>
    163 <p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines priority (0-3).</p>
     246<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Define priority (0-3).</p>
    164247<h3>.private_elem_size</h3>
    165248<p>Syntax: .private_elem_size ELEMSIZE</p>
     
    183266<code>reserved_vgpr_first</code> and <code>reserved_vgpr_count</code> fields in kernel configuration.
    184267<code>reserved_vgpr_count</code> filled by number of registers (LASTREG-FIRSTREG+1).</p>
     268<h3>.runtime_handle</h3>
     269<p>Syntax: .runtime_handle "SYMBOLNAME"</p>
     270<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
     271Set runtime handle in metadata info</p>
    185272<h3>.runtime_loader_kernel_symbol</h3>
    186273<p>Syntax: .runtime_loader_kernel_symbol ADDRESS</p>
     
    189276<h3>.scratchbuffer</h3>
    190277<p>Syntax: .scratchbuffer SIZE</p>
    191 <p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines scratchbuffer size.</p>
     278<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Define scratchbuffer size.</p>
    192279<h3>.sgprsnum</h3>
    193280<p>Syntax: .sgprsnum REGNUM</p>
     
    195282registers which can be used during kernel execution.
    196283It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.</p>
     284<h3>.spilledsgprs</h3>
     285<p>Syntax: .spilledsgprs REGNUM</p>
     286<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set number of scalar
     287registers to spill in scratch buffer (in metadata info).</p>
     288<h3>.spilledvgprs</h3>
     289<p>Syntax: .spilledvgprs REGNUM</p>
     290<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set number of vector
     291registers to spill in scratch buffer (in metadata info).</p>
    197292<h3>.target</h3>
    198293<p>Syntax: .target "TARGET"</p>
     
    251346<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set number of
    252347registers for USERDATA.</p>
     348<h3>.vectypehint</h3>
     349<p>Syntax: .vectypehint "OPENCLTYPE"</p>
     350<p>This pseudo-op must be inside kernel configuration (<code>.config</code>).
     351Set vectypehint for kernel in metadata info. The argument is OpenCL type.</p>
    253352<h3>.vgprsnum</h3>
    254353<p>Syntax: .vgprsnum REGNUM</p>
     
    279378<p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set
    280379<code>workitem_vgpr_count</code> field in kernel configuration.</p>
     380<h3>.work_group_size_hint</h3>
     381<p>Syntax: .work_group_size_hint SIZEHINT[, SIZEHINT[, SIZEHINT]]</p>
     382<p>This pseudo-operation must be inside any kernel configuration.
     383Set work_group_size_hint for this kernel in metadata info.</p>
    281384<h2>Sample code</h2>
    282385<p>This is sample example of the kernel setup:</p>
     
    351454/*32060200         */ v_add_u32       v3, vcc, s0, v1
    352455...</code></p>
     456<p>The sample with metadata info:</p>
     457<p><code>.rocm
     458.gpu Fiji
     459.arch_minor 0
     460.arch_stepping 4
     461.eflags 2
     462.newbinfmt
     463.tripple "amdgcn-amd-amdhsa-amdgizcl"
     464.md_version 1, 0
     465.kernel vectorAdd
     466    .config
     467        .dims x
     468        .codeversion 1, 1
     469        .use_private_segment_buffer
     470        .use_dispatch_ptr
     471        .use_kernarg_segment_ptr
     472        .private_elem_size 4
     473        .use_ptr64
     474        .kernarg_segment_align 16
     475        .group_segment_align 16
     476        .private_segment_align 16
     477    .control_directive
     478        .fill 128, 1, 0x00
     479    .config
     480        .md_language "OpenCL", 1, 2
     481        .arg n, "uint", 4, , value, u32
     482        .arg a, "float*", 8, , globalbuf, f32, global, default const volatile
     483        .arg b, "float*", 8, , globalbuf, f32, global, default const
     484        .arg c, "float*", 8, , globalbuf, f32, global, default
     485        .arg , "", 8, , gox, i64
     486        .arg , "", 8, , goy, i64
     487        .arg , "", 8, , goz, i64
     488        .arg , "", 8, , printfbuf, i8
     489.text
     490vectorAdd:
     491.skip 256           # skip ROCm kernel configuration (required)
     492...</code></p>
    353493}}}