[wiki:ClrxToc Back to Table of content] {{{ #!html
The GalliumCompute is an open-source the OpenCL implementation for the Mesa3D drivers. It divided into three components: CLover, libclc, LLVM AMDGPU. Since LLVM v3.6 and Mesa3D v10.5, GalliumCompute binary format with native code. CLRadeonExtender supports only these binaries.
The binary format contains: kernel informations and the main binary in the ELF format.
Main .text
section contains all code for all kernels. Optionally,
section .rodata
contains constant global data for all kernels.
Main binary have the kernel configuration (ProgInfo) in the .AMDGPU.config
section.
ProgInfo holds three addresses and values that describes runtime environment for kernel:
floating point setup, register usage, local data usage and rest.
The assembler source code divided to three parts:
.rodata
section).text
section)Order of these parts doesn't matter.
Kernel function should to be aligned to 256 byte boundary.
Assembler for GalliumCompute format counts all SGPR registers and add extra registers (VCC, FLAT_SCRATCH, XNACK_MASK) if any used to register pool. The VCC register is included by default.
Syntax: .arch_minor ARCH_MINOR
Set architecture minor number. Used only if LLVM version is 4.0.0 or later.
Syntax: .arch_minor ARCH_STEPPING
Set architecture stepping number. Used only if LLVM version is 4.0.0 or later.
Syntax: .arg ARGTYPE, SIZE[, TARGETSIZE[, ALIGNMENT[, NUMEXT[, SEMANTIC]]]]
Adds kernel argument definition. Must be inside argument configuration. First argument is type:
Second argument is size of argument. Third argument is targetSize which
should be a multiplier of 4. Fourth argument is target alignment. By default target
alignment is power of 2 not less than size.
Fifth argument determines how extend numeric value to larger target size:
sext
- signed, zext
- zero extend. If argument is smaller than 4 byte,
then sext
can be to define signed integer, zext
to unsigned integer.
Sixth argument is semantic:
Example argument definition:
.arg scalar, 4, 4, 4, zext, general
.arg global, 8, 8, 8, zext, general
.arg scalar, 2, 4, 4, sext, general # short
.arg scalar, 16, 16, 16, zext, general # uint4 or double2
.arg scalar, 4, 4, 4, zext, griddim # shortcut: .arg griddim
.arg scalar, 4, 4, 4, zext, gridoffset # shortcut .arg gridoffset
Last two arguments (griddim, gridoffset) shall to be defined in any kernel definition.
Open kernel argument configuration. Must be inside kernel.
Syntax: .call_convention CALL_CONV
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set call convention for kernel.
Syntax .codeversion MAJOR, MINOR
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set AMD code version.
Open kernel configuration. Must be inside kernel. Kernel configuration can not be
defined if proginfo configuration was defined (by using .proginfo
).
Following pseudo-ops can be inside kernel config:
Example configuration:
.config
.dims xyz
.tgsize
Open control directive section. This section must be 128 bytes. The content of this section will be stored in control_directive field in kernel configuration. Must be defined inside kernel. Can ben used only if LLVM version is 4.0.0 or later
Syntax: .debug_private_segment_buffer_sgpr SGPRREG
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set debug_private_segment_buffer_sgpr
field in
kernel configuration.
Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set debug_wavefront_private_segment_offset_sgpr
field in
kernel configuration.
This pseudo-op must be inside kernel configuration (.config
).
Enable usage of the DEBUG_MODE.
Syntax: .dims DIMENSIONS
This pseudo-op must be inside kernel configuration (.config
). Defines what dimensions
(from list: x, y, z) will be used to determine space of the kernel execution.
Syntax: .driver_version VERSION
Set driver (Mesa3D) version for this binary. Version in form: MajorVersion*100+MinorVersion. This pseudo-op replaces driver info.
This pseudo-op must be inside kernel configuration (.config
).
Enable usage of the DX10_CLAMP.
Syntax: .entry ADDRESS, VALUE
Add entry of proginfo. Must be inside proginfo configuration. Sample proginfo:
.entry 0x0000b848, 0x000c0080
.entry 0x0000b84c, 0x00001788
.entry 0x0000b860, 0x00000000
Syntax: .exceptions EXCPMASK
This pseudo-op must be inside kernel configuration (.config
).
Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
Syntax: .floatmode BYTE-VALUE
This pseudo-op must be inside kernel configuration (.config
). Defines float-mode.
Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set gds_segment_size
field in kernel configuration.
Syntax: .get_driver_version SYMBOL
Store current driver version to SYMBOL.
Syntax: .get_llvm_version SYMBOL
Store current LLVM compiler version to SYMBOL.
Go to constant global data section (.rodata
).
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set group_segment_align
field in kernel configuration.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable usage of the DEBUG_MODE in kernel HSA configuration.
Syntax: .hsa_dims DIMENSIONS
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Defines what dimensions (from list: x, y, z) will be used
to determine space of the kernel execution in kernel HSA configuration.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable usage of the DX10_CLAMP in kernel HSA configuration.
Syntax: .hsa_exceptions EXCPMASK
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set exception mask in PGMRSRC2 register value in
kernel HSA configuration. Value should be 7-bit.
Syntax: .hsa_floatmode BYTE-VALUE
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Defines float-mode in kernel HSA configuration.
Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
Syntax: .hsa_ieeemode
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set ieee-mode in kernel HSA configuration.
Syntax: .hsa_localsize SIZE
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Defines initial local memory size used by kernel in
kernel HSA configuration.
Syntax: .hsa_pgmrsrc1 VALUE
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Defines value of the PGMRSRC1 in kernel HSA configuration.
Syntax: .hsa_pgmrsrc2 VALUE
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Defines value of the PGMRSRC2 in kernel HSA configration.
If dimensions is set then bits that controls dimension setup will be ignored.
SCRATCH_EN bit will be ignored.
Syntax: .hsa_priority PRIORITY
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Defines priority (0-3) in kernel HSA configuration.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable usage of the PRIV (privileged mode) in
kernel HSA configuration.
Syntax: .hsa_scratchbuffer SIZE
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Defines scratchbuffer size in kernel HSA configuration.
Syntax: .hsa_sgprsnum REGNUM
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set number of scalar registers which can be used during
kernel execution in kernel HSA configuration.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable usage of the TG_SIZE_EN in kernel HSA configuration.
Syntax: .userdatanum NUMBER
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set number of registers for USERDATA in
kernel HSA configuration.
Syntax: .hsa_vgprsnum REGNUM
This pseudo-op must be inside kernel configuration (.config
) can ben used only if
LLVM version is 4.0.0 or later. Set number of vector registers which can be used during
kernel execution in kernel HSA configuration.
Syntax: .ieeemode
This pseudo-op must be inside kernel configuration (.config
). Set ieee-mode.
Syntax: .kcode KERNEL1,....
Syntax: .kcode +
Open code that will be belonging to specified kernels. By default any code between
two consecutive kernel labels belongs to the kernel with first label name.
This pseudo-operation can change membership of the code to specified kernels.
You can nest this .kcode
any times. Just next .kcode adds or remove membership code
to kernels. The most important reason why this feature has been added is register usage
calculation. Any kernel given in this pseudo-operation must be already defined.
Sample usage:
.kcode + # this code belongs to all kernels
.kcodeend
.kcode kernel1, kernel2 # this code belongs to kernel1, kernel2
.kcode -kernel1 # this code belongs only to kernel2 (kernel1 removed)
.kcodeend
.kcodeend
Close .kcode
clause. Refer to .kcode
.
Syntax: .kernarg_segment_align ALIGN
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set kernarg_segment_alignment
field in
kernel configuration. Value must be a power of two.
Syntax: .kernarg_segment_size SIZE
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set kernarg_segment_byte_size
field in
kernel configuration.
Syntax: .kernel_code_entry_offset OFFSET
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set kernel_code_entry_byte_offset
field in
kernel configuration. This field store offset between configuration and kernel code.
By default is 256.
Syntax: .kernel_code_prefetch_offset OFFSET
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set kernel_code_prefetch_byte_offset
field in kernel
configuration.
Syntax: .kernel_code_prefetch_size OFFSET
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set kernel_code_prefetch_byte_size
field in kernel configuration.
Syntax: .llvm_version VERSION
Set LLVM compiler version for this binary. Version in form: MajorVersion*100+MinorVersion. This pseudo-op replaces driver info.
Syntax: .localsize SIZE
This pseudo-op must be inside kernel configuration (.config
). Defines initial
local memory size used by kernel.
Syntax: .machine KIND, MAJOR, MINOR, STEPPING
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set machine version fields in kernel configuration.
Syntax: .max_scratch_backing_memory SIZE
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set max_scratch_backing_memory_byte_size
field
in kernel configuration.
Syntax: .pgmrsrc1 VALUE
This pseudo-op must be inside kernel configuration (.config
).
Defines value of the PGMRSRC1.
Syntax: .pgmrsrc2 VALUE
This pseudo-op must be inside kernel configuration (.config
).
Defines value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
will be ignored. SCRATCH_EN bit will be ignored.
Syntax: .priority PRIORITY
This pseudo-op must be inside kernel configuration (.config
). Defines priority (0-3).
Syntax: .private_elem_size ELEMSIZE
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set private_element_size
field in kernel configuration.
Must be a power of two between 2 and 16.
Syntax: .private_segment ALIGN
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set private_segment_alignment
field in kernel
configuration. Value must be a power of two.
This pseudo-op must be inside kernel configuration (.config
).
Enable usage of the PRIV (privileged mode).
Open progInfo definition. Must be inside kernel.
ProgInfo shall to be containing 3 entries. ProgInfo can not be defined if kernel config
was defined (by using .config
).
Syntax: .reserved_sgprs FIRSTREG, LASTREG
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set reserved_sgpr_first
and reserved_sgpr_count
fields in kernel configuration. reserved_sgpr_count
filled by number of registers
(LASTREG-FIRSTREG+1).
Syntax: .reserved_vgprs FIRSTREG, LASTREG
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set reserved_vgpr_first
and reserved_vgpr_count
fields in kernel configuration. reserved_vgpr_count
filled by number of registers
(LASTREG-FIRSTREG+1).
Syntax: .runtime_loader_kernel_symbol ADDRESS
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set runtime_loader_kernel_symbol
field in kernel
configuration.
Syntax: .scratchbuffer SIZE
This pseudo-op must be inside kernel configuration (.config
). Defines scratchbuffer size.
Syntax: .sgprsnum REGNUM
This pseudo-op must be inside kernel configuration (.config
). Set number of scalar
registers which can be used during kernel execution.
Syntax: .sgpilledsgprs REGNUM
This pseudo-op must be inside kernel configuration (.config
). Set number of scalar
registers to spill in scratch buffer. It have meaning for LLVM 3.9 or later.
Syntax: .sgpilledvgprs REGNUM
This pseudo-op must be inside kernel configuration (.config
). Set number of vector
registers to spill in scratch buffer. It have meaning for LLVM 3.9 or later.
This pseudo-op must be inside kernel configuration (.config
).
Enable usage of the TG_SIZE_EN. Should be set.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable is_debug_enabled
field in kernel configuration.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable enable_sgpr_dispatch_id
field in kernel
configuration.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable enable_sgpr_dispatch_ptr
field in kernel
configuration.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable is_dynamic_call_stack
field in
kernel configuration.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable enable_sgpr_flat_scratch_init
field in
kernel configuration.
Syntax: .use_grid_workgroup_count DIMENSIONS
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable enable_sgpr_grid_workgroup_count_X
,
enable_sgpr_grid_workgroup_count_Y
and enable_sgpr_grid_workgroup_count_Z
fields
in kernel configuration, respectively by given dimensions.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable enable_sgpr_kernarg_segment_ptr
field in
kernel configuration.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable enable_ordered_append_gds
field in
kernel configuration.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable enable_sgpr_private_segment_buffer
field in
kernel configuration.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable enable_sgpr_private_segment_size
field in
kernel configuration.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable is_ptr64
field in kernel configuration.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable enable_sgpr_queue_ptr
field in
kernel configuration.
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Enable is_xnack_enabled
field in kernel configuration.
Syntax: .userdatanum NUMBER
This pseudo-op must be inside kernel configuration (.config
). Set number of
registers for USERDATA.
Syntax: .vgprsnum REGNUM
This pseudo-op must be inside kernel configuration (.config
). Set number of vector
registers which can be used during kernel execution.
Syntax: .wavefront_sgpr_count REGNUM
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set wavefront_sgpr_count
field in kernel configuration.
Syntax: .wavefront_size POWEROFTWO
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set wavefront_size
field in kernel configuration.
Value must be a power of two.
Syntax: .workgroup_fbarrier_count COUNT
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set workgroup_fbarrier_count
field in
kernel configuration.
Syntax: .workgroup_group_segment_size SIZE
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set workgroup_group_segment_byte_size
in
kernel configuration.
Syntax: .workitem_private_segment_size SIZE
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set workitem_private_segment_byte_size
field in
kernel configuration.
Syntax: .workitem_vgpr_count REGNUM
This pseudo-op must be inside kernel configuration (.config
) and can ben used only if
LLVM version is 4.0.0 or later. Set workitem_vgpr_count
field in kernel configuration.
This is sample example of the kernel setup:
.kernel DCT
.args
.arg global, 8, 8, 8, zext, general
.arg global, 8, 8, 8, zext, general
.arg global, 8, 8, 8, zext, general
.arg local, 4, 4, 4, zext, general
.arg scalar, 4, 4, 4, zext, general
.arg scalar, 4, 4, 4, zext, general
.arg scalar, 4, 4, 4, zext, general
.arg scalar, 4, 4, 4, zext, griddim
.arg scalar, 4, 4, 4, zext, gridoffset
.proginfo
.entry 0x0000b848, 0x000c0183
.entry 0x0000b84c, 0x00001788
.entry 0x0000b860, 0x00000000
with kernel configuration:
.args
.arg global, 8, 8, 8, zext, general
.arg global, 8, 8, 8, zext, general
.arg global, 8, 8, 8, zext, general
.arg local, 4, 4, 4, zext, general
.arg scalar, 4, 4, 4, zext, general
.arg scalar, 4, 4, 4, zext, general
.arg scalar, 4, 4, 4, zext, general
.arg scalar, 4, 4, 4, zext, griddim
.arg scalar, 4, 4, 4, zext, gridoffset
.config
.dims xyz
.tgsize
All code:
.gallium
.gpu CapeVerde
.kernel DCT
.args
.arg global, 8, 8, 8, zext, general
.arg global, 8, 8, 8, zext, general
.arg global, 8, 8, 8, zext, general
.arg local, 4, 4, 4, zext, general
.arg scalar, 4, 4, 4, zext, general
.arg scalar, 4, 4, 4, zext, general
.arg scalar, 4, 4, 4, zext, general
.arg scalar, 4, 4, 4, zext, griddim
.arg scalar, 4, 4, 4, zext, gridoffset
.proginfo
.entry 0x0000b848, 0x000c0183
.entry 0x0000b84c, 0x00001788
.entry 0x0000b860, 0x00000000
.text
DCT:
/*c0030106 */ s_load_dword s6, s[0:1], 0x6
/*c0038107 */ s_load_dword s7, s[0:1], 0x7
/* we skip rest of instruction to demonstrate how to write GalliumCompute program */
/*bf810000 */ s_endpgm