[wiki:ClrxToc Back to Table of content] {{{ #!html
The AMD Catalyst driver provides own OpenCL implementation that can generates own binaries of the OpenCL programs. The CLRX assembler supports both OpenCL 1.2 and OpenCL 2.0 binary format. This chapter describes Amd OpenCL 1.2 binary format.
The AMD OpenCL binaries contains constant global data, the device and compilation
informations and embedded kernel binaries. Kernel binaries are inside .text
section.
Program code are separate for each kernel and no shared machine code between kernels.
Each kernel binary have the metadata string, ATI CAL notes and program code.
The metadata strings describes the kernel arguments, settings of the
input/output buffers, constant buffers, read only and write only images, local data.
ATI CAL notes are special small data fragments that describes features of the kernel.
The most important ATI CAL note is PROGINFO that holds important data for runtime execution,
like register usage, UAV usage, floating point setup.
A .data
section inside kernel is usable section and holds same zeroes.
The CLRX assembler allow to use one of two ways to configure kernel setup:
for human (.config
) and for quick recompilation (ATI CALNotes and the metadata string).
The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs. This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
To used scalar registers, assembler add 2 additional registers for handling VCC.
Syntax for scalar: .arg ARGNAME [, "ARGTYPENAME"], ARGTYPE[, unused]
Syntax for structure: .arg ARGNAME, [, "ARGTYPENAME"], ARGTYPE[, STRUCTSIZE[, unused]]
Syntax for image: .arg ARGNAME[, "ARGTYPENAME"], ARGTYPE[, [ACCESS] [, RESID[, unused]]]
Syntax for counter32: .arg ARGNAME[, "ARGTYPENAME"], ARGTYPE[, RESID[, unused]]
Syntax for global pointer: .arg ARGNAME[, "ARGTYPENAME"],
ARGTYPE[[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, RESID[, unused]]]]
Syntax for local pointer: .arg ARGNAME[, "ARGTYPENAME"],
ARGTYPE[[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, unused]]]
Syntax for constant pointer: .arg ARGNAME[, "ARGTYPENAME"],
ARGTYPE[[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, [CONSTSIZE] [, RESID[, unused]]]]
Adds kernel argument definition. Must be inside kernel configuration. First argument is argument name from OpenCL kernel definition. Next optional argument is argument type name from OpenCL kernel definition. Next arugment is argument type:
Rest of the argument depends on type of the kernel argument. STRUCTSIZE determines size of
structure. ACCESS for image determines can be one of the: read_only
, rdonly
or
write_only
, wronly
.
PTRSPACE determines space where pointer points to.
It can be one of: local
, constant
or global
.
ACCESS for pointers can be: const
, restrict
and volatile
.
CONSTSIZE determines maximum size in bytes for constant buffer.
RESID determines resource id.
The last argument unused
indicates that argument will not be used by kernel.
Sample usage:
.arg v1,"double_t",double
.arg v2,double2
.arg v3,double3
.arg v23,image2d,
.arg v30,image2d,,5
.arg v41,ulong16 *,global
.arg v42,ulong16 *,global, restrict
.arg v57,structure*,82,global
This pseudo-operation must be inside kernel. Open ATI_BOOL32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
Syntax: .calnote CALNOTEID
This pseudo-operation must be inside kernel. Open ATI CAL note.
Syntax: .cbid Syntax: .cbid VALUE
If this pseudo-operation inside ATI_CONSTANT_BUFFERS CAL note then it adds entry into ATI_CONSTANT_BUFFERS CAL note. If this pseudo-operation in kernel configuration then set constant buffer id.
Syntax: .cbmask INDEX, SIZE
This pseudo-operation must be in ATI_CONSTANT_BUFFERS CAL note. Add entry into ATI_CONSTANT_BUFFERS CAL note.
Syntax: .compile_options "STRING"
Set compile options for this binary.
Syntax: .condout [VALUE]
Syntax: .condout VALUE
If this pseudo-operation inside kernel then it open ATI_CONDOUT CAL note. Next occurrence in this same kernel, add new CAL note. Optional argument add 4-byte value to content of this CAL note. If this pseudo-operation in kernel configuration then set CONDOUT value.
Open kernel configuration. Must be inside kernel. Kernel configuration can not be defined if any CALNote, metadata or header was defined. Following pseudo-ops can be inside kernel config:
This pseudo-operation must be inside kernel. Open ATI_CONSTANT_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
Syntax: .cws [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
Syntax: .reqd_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
This pseudo-operation must be inside kernel configuration. Set reqd_work_group_size hint for this kernel. In versions earlier than 0.1.7 this pseudo-op has been broken and this pseudo-op set zeroes in two last component instead ones. We recomment to fill all components.
Syntax: .dims DIMENSIONS
This pseudo-operation must be inside kernel configuration. Define what dimensions (from list: x, y, z) will be used to determine space of the kernel execution.
Syntax: .driver_info "INFO"
Set driver info for this binary.
Syntax: .driver_version VERSION
Set driver version for this binary. Version in form: MajorVersion*100+MinorVersion. This pseudo-op replaces driver info.
Syntax: .earlyexit [VALUE]
Syntax: .earlyexit VALUE
If this pseudo-operation inside kernel then it open ATI_EARLY_EXIT CAL note. Next occurrence in this same kernel, add new CAL note. Optional argument add 4-byte value to content of this CAL note. If this pseudo-operation in kernel configuration then set EARLY_EXIT value.
Syntax: .entry UAVID, F1, F2, TYPE
Syntax: .entry VALUE1, VALUE2
This pseudo-operation must be in ATI_UAV or ATI_PROGINFO CAL note. Add entry into CAL note. For ATI_UAV, pseudo-operation accepts 4 32-bit values. For ATI_PROGINFO, accepts 2 32-bit values.
Syntax: .exceptions EXCPMASK
This pseudo-operation must be inside kernel configuration. Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
This pseudo-operation must be inside kernel. Open ATI_FLOAT32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
Syntax: .floatmode VALUE
This pseudo-operation must be inside kernel configuration. Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Value shall to be byte value. Default value is 0xc0.
Syntax: .get_driver_version SYMBOL
Store current driver version to SYMBOL. Version in form version*100 + revision
.
This pseudo-operation must be inside kernel. Open ATI_GLOBAL_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
Go to constant global data section.
Go to main header of the binary.
Syntax: .hwlocal SIZE
Syntax: .localsize SIZE
This pseudo-operation must be inside kernel configuration. Set HWLOCAL value, the initial local data size.
Syntax: .hwregion VALUE
This pseudo-operation must be inside kernel configuration. Set HWREGION value.
Syntax: .ieeemode
This pseudo-op must be inside kernel configuration. Set ieee-mode.
This pseudo-operation must be inside kernel. Open ATI_INPUTS CAL note. Next occurrence in this same kernel, add new CAL note.
This pseudo-operation must be inside kernel. Open ATI_INPUT_SAMPLERS CAL note. Next occurrence in this same kernel, add new CAL note.
This pseudo-operation must be inside kernel. Open ATI_INT32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
This pseudo-operation must be inside kernel. Go to metadata content.
This pseudo-operation must be inside kernel. Open ATI_OUTPUTS CAL note. Next occurrence in this same kernel, add new CAL note.
This pseudo-operation must be inside kernel. Open ATI_PERSISTENT_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
Syntax: .pgmrsrc2 VALUE
This pseudo-operation must be inside kernel configuration. Set PGMRSRC2 value. If dimensions is set then bits that controls dimension setup will be ignored. SCRATCH_EN bit will be ignored.
Syntax: .printfid RESID
This pseudo-operation must be inside kernel configuration. Set printfid.
Syntax: .privateid RESID
This pseudo-operation must be inside kernel configuration. Set privateid.
This pseudo-operation must be inside kernel. Open ATI_PROGINFO CAL note. Next occurrence in this same kernel, add new CAL note.
Syntax: .sampler INPUT, SAMPLER
Syntax: .sampler RESID,....
If this pseudo-operation is in ATI_SAMPLER CAL note, then it adds sampler entry. If this pseudo-operation is in kernel configuration, then it adds samplers with specified resource ids.
Syntax: .scratchbuffer SIZE
This pseudo-operation must be inside kernel configuration. Set scratchbuffer size.
This pseudo-operation must be inside kernel. Open ATI_SCRATCH_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
Syntax: .segment OFFSET, SIZE
This pseudo-operation must be in ATI_BOOL32CONSTS, ATI_INT32CONSTS or ATI_FLOAT32CONSTS CAL note. Add entry into CAL note.
Syntax: .sgprsnum REGNUM
This pseudo-op must be inside kernel configuration. Set number of scalar registers which can be used during kernel execution. It counts SGPR registers excluding VCC, FLAT_SCRATCH and XNACK_MASK.
This pseudo-operation must be inside kernel. Open ATI_SUB_CONSTANT_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
This pseudo-op must be inside kernel configuration. Enable usage of the TG_SIZE_EN.
This pseudo-operation must be inside kernel. Open ATI_UAV CAL note. Next occurrence in this same kernel, add new CAL note.
Syntax: .uavid UAVID
This pseudo-op must be inside kernel configuration. Set UAVId value.
Syntax: .uavmailboxsize [VALUE]
This pseudo-operation must be inside kernel. Open ATI_UAV_MAILBOX_SIZE CAL note. Next occurrence in this same kernel, add new CAL note. If first argument is given, then 32-bit value will be added to content.
Syntax: .uavopmask [VALUE]
This pseudo-operation must be inside kernel. Open ATI_UAV_OP_MASK CAL note. Next occurrence in this same kernel, add new CAL note. If first argument is given, then 32-bit value will be added to content.
Syntax: .uavprivate VALUE
This pseudo-op must be inside kernel configuration. Set uav private value.
Eanble using of the const data.
Eanble using of the printf mechanism.
Syntax: .userdata DATACLASS, APISLOT, REGSTART, REGSIZE
This pseudo-op must be inside kernel configuration. Add USERDATA entry. First argument is data class. It can be one of the following:
Second argument is apiSlot. Third argument determines the first scalar register which will hold userdata. Fourth argument determines how many scalar register needed to hold userdata.
Syntax: .vgprsnum REGNUM
This pseudo-op must be inside kernel configuration. Set number of vector registers which can be used during kernel execution.
This is sample example of the kernel setup:
/* Disassembling 'DCT_15_5.1' */
.amd
.gpu Pitcairn
.32bit
.compile_options ""
.driver_info "@(#) OpenCL 1.2 AMD-APP (1702.3). Driver version: 1702.3 (VM)"
.kernel DCT
.header
.fill 16, 1, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00
.fill 8, 1, 0x00
.metadata
.ascii ";ARGSTART:__OpenCL_DCT_kernel\n"
.ascii ";version:3:1:111\n"
.ascii ";device:pitcairn\n"
.ascii ";uniqueid:1024\n"
.ascii ";memory:uavprivate:0\n"
.ascii ";memory:hwlocal:0\n"
.ascii ";memory:hwregion:0\n"
.ascii ";pointer:output:float:1:1:0:uav:12:4:RW:0:0\n"
.ascii ";pointer:input:float:1:1:16:uav:13:4:RO:0:0\n"
.ascii ";pointer:dct8x8:float:1:1:32:uav:14:4:RO:0:0\n"
.ascii ";pointer:inter:float:1:1:48:hl:1:4:RW:0:0\n"
.ascii ";value:width:u32:1:1:64\n"
.ascii ";value:blockWidth:u32:1:1:80\n"
.ascii ";value:inverse:u32:1:1:96\n"
.ascii ";function:1:1030\n"
.ascii ";uavid:11\n"
.ascii ";printfid:9\n"
.ascii ";cbid:10\n"
.ascii ";privateid:8\n"
.ascii ";reflection:0:float*\n"
.ascii ";reflection:1:float*\n"
.ascii ";reflection:2:float*\n"
.ascii ";reflection:3:float*\n"
.ascii ";reflection:4:uint\n"
.ascii ";reflection:5:uint\n"
.ascii ";reflection:6:uint\n"
.ascii ";ARGEND:__OpenCL_DCT_kernel\n"
.data
.fill 4736, 1, 0x00
.inputs
.outputs
.uav
.entry 12, 4, 0, 5
.entry 13, 4, 0, 5
.entry 14, 4, 0, 5
.entry 11, 4, 0, 5
.condout 0
.floatconsts
.intconsts
.boolconsts
.earlyexit 0
.globalbuffers
.constantbuffers
.cbmask 0, 32764
.cbmask 1, 0
.inputsamplers
.scratchbuffers
.int 0x00000000
.persistentbuffers
.proginfo
.entry 0x80001000, 0x00000003
.entry 0x80001001, 0x00000017
.entry 0x80001002, 0x00000000
.entry 0x80001003, 0x00000002
.entry 0x80001004, 0x00000002
.entry 0x80001005, 0x00000002
.entry 0x80001006, 0x00000000
.entry 0x80001007, 0x00000004
.entry 0x80001008, 0x00000004
.entry 0x80001009, 0x00000002
.entry 0x8000100a, 0x00000001
.entry 0x8000100b, 0x00000008
.entry 0x8000100c, 0x00000004
.entry 0x80001041, 0x0000000b
.entry 0x80001042, 0x00000018
.entry 0x80001863, 0x00000066
.entry 0x80001864, 0x00000100
.entry 0x80001043, 0x000000c0
.entry 0x80001044, 0x00000000
.entry 0x80001045, 0x00000000
.entry 0x00002e13, 0x00400998
.entry 0x8000001c, 0x00000100
.entry 0x8000001d, 0x00000000
.entry 0x8000001e, 0x00000000
.entry 0x80001841, 0x00000000
.entry 0x8000001f, 0x00007000
.entry 0x80001843, 0x00007000
.entry 0x80001844, 0x00000000
.entry 0x80001845, 0x00000000
.entry 0x80001846, 0x00000000
.entry 0x80001847, 0x00000000
.entry 0x80001848, 0x00000000
.entry 0x80001849, 0x00000000
.entry 0x8000184a, 0x00000000
.entry 0x8000184b, 0x00000000
.entry 0x8000184c, 0x00000000
.entry 0x8000184d, 0x00000000
.entry 0x8000184e, 0x00000000
.entry 0x8000184f, 0x00000000
.entry 0x80001850, 0x00000000
.entry 0x80001851, 0x00000000
.entry 0x80001852, 0x00000000
.entry 0x80001853, 0x00000000
.entry 0x80001854, 0x00000000
.entry 0x80001855, 0x00000000
.entry 0x80001856, 0x00000000
.entry 0x80001857, 0x00000000
.entry 0x80001858, 0x00000000
.entry 0x80001859, 0x00000000
.entry 0x8000185a, 0x00000000
.entry 0x8000185b, 0x00000000
.entry 0x8000185c, 0x00000000
.entry 0x8000185d, 0x00000000
.entry 0x8000185e, 0x00000000
.entry 0x8000185f, 0x00000000
.entry 0x80001860, 0x00000000
.entry 0x80001861, 0x00000000
.entry 0x80001862, 0x00000000
.entry 0x8000000a, 0x00000001
.entry 0x80000078, 0x00000040
.entry 0x80000081, 0x00008000
.entry 0x80000082, 0x00008000
.subconstantbuffers
.uavmailboxsize 0
.uavopmask
.byte 0x00, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.fill 120, 1, 0x00
.text
/*befc03ff 00008000*/ s_mov_b32 m0, 0x8000
...
/*bf810000 */ s_endpgm
with kernel configuration:
.amd
.gpu Pitcairn
.32bit
.kernel DCT
.config
.dims xy
.arg output,float*,global
.arg input,float*,global,const
.arg dct8x8,float*,global,const
.arg inter,float*,local
.arg width,uint
.arg blockWidth,uint
.arg inverse,uint
.userdata PTR_UAV_TABLE,0,2,2
.userdata IMM_CONST_BUFFER,0,4,4
.userdata IMM_CONST_BUFFER,1,8,4
.text
/*befc03ff 00008000*/ s_mov_b32 m0, 0x8000
...
/*bf810000 */ s_endpgm