wiki:ClrxAsmAmd

Version 29 (modified by trac, 5 days ago) (diff)

--

Back to Table of content

CLRadeonExtender Assembler AMD Catalyst handling

The AMD Catalyst driver provides own OpenCL implementation that can generates own binaries of the OpenCL programs. The CLRX assembler supports both OpenCL 1.2 and OpenCL 2.0 binary format. This chapter describes Amd OpenCL 1.2 binary format.

Binary format

The AMD OpenCL binaries contains constant global data, the device and compilation informations and embedded kernel binaries. Kernel binaries are inside .text section. Program code are separate for each kernel and no shared machine code between kernels. Each kernel binary have the metadata string, ATI CAL notes and program code. The metadata strings describes the kernel arguments, settings of the input/output buffers, constant buffers, read only and write only images, local data. ATI CAL notes are special small data fragments that describes features of the kernel. The most important ATI CAL note is PROGINFO that holds important data for runtime execution, like register usage, UAV usage, floating point setup.

A .data section inside kernel is usable section and holds same zeroes.

Layout of the source code

The CLRX assembler allow to use one of two ways to configure kernel setup: for human (.config) and for quick recompilation (ATI CALNotes and the metadata string).

Register usage setup

The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs. This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.

Scalar register allocation

To used scalar registers, assembler add 2 additional registers for handling VCC. The .sgprsnum set number of all SGPRs except VCC.

List of the specific pseudo-operations

.arg

Syntax for scalar: .arg ARGNAME [, "ARGTYPENAME"], ARGTYPE[, unused]
Syntax for structure: .arg ARGNAME, [, "ARGTYPENAME"], ARGTYPE[, STRUCTSIZE[, unused]]
Syntax for image: .arg ARGNAME[, "ARGTYPENAME"], ARGTYPE[, [ACCESS] [, RESID[, unused]]]
Syntax for counter32: .arg ARGNAME[, "ARGTYPENAME"], ARGTYPE[, RESID[, unused]]
Syntax for global pointer: .arg ARGNAME[, "ARGTYPENAME"], ARGTYPE[[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, RESID[, unused]]]]
Syntax for local pointer: .arg ARGNAME[, "ARGTYPENAME"], ARGTYPE[[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, unused]]]
Syntax for constant pointer: .arg ARGNAME[, "ARGTYPENAME"], ARGTYPE[[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, [CONSTSIZE] [, RESID[, unused]]]]

Adds kernel argument definition. Must be inside kernel configuration. First argument is argument name from OpenCL kernel definition. Next optional argument is argument type name from OpenCL kernel definition. Next arugment is argument type:

  • char, uchar, short, ushort, int, uint, ulong, long, float, double - simple scalar types
  • charX, ucharX, shortX, ushortX, intX, uintX, ulongX, longX, floatX, doubleX - vector types (X indicates number of elements: 2, 3, 4, 8 or 16)
  • counter32 - 32-bit counter type
  • structure - structure
  • image, image1d, image1d_array, image1d_buffer, image2d, image2d_array, image3d - image types
  • sampler - sampler
  • type* - pointer to data

Rest of the argument depends on type of the kernel argument. STRUCTSIZE determines size of structure. ACCESS for image determines can be one of the: read_only, rdonly or write_only, wronly. PTRSPACE determines space where pointer points to. It can be one of: local, constant or global. ACCESS for pointers can be: const, restrict and volatile. CONSTSIZE determines maximum size in bytes for constant buffer. RESID determines resource id.

  • for global or constant pointers is UAVID, range is in 8-1023.
  • for constant pointers (driver older than 1348.X), range is in 1-159.
  • for read only images range is in 0-127.
  • For write only images or counters range is in 0-7.

The last argument unused indicates that argument will not be used by kernel.

Sample usage:

.arg v1,"double_t",double .arg v2,double2 .arg v3,double3 .arg v23,image2d, .arg v30,image2d,,5 .arg v41,ulong16 *,global .arg v42,ulong16 *,global, restrict .arg v57,structure*,82,global

.boolconsts

This pseudo-operation must be inside kernel. Open ATI_BOOL32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.

.calnote

Syntax: .calnote CALNOTEID

This pseudo-operation must be inside kernel. Open ATI CAL note.

.cbid

Syntax: .cbid Syntax: .cbid VALUE

If this pseudo-operation inside ATI_CONSTANT_BUFFERS CAL note then it adds entry into ATI_CONSTANT_BUFFERS CAL note. If this pseudo-operation in kernel configuration then set constant buffer id.

.cbmask

Syntax: .cbmask INDEX, SIZE

This pseudo-operation must be in ATI_CONSTANT_BUFFERS CAL note. Add entry into ATI_CONSTANT_BUFFERS CAL note.

.compile_options

Syntax: .compile_options "STRING"

Set compile options for this binary.

.condout

Syntax: .condout [VALUE]
Syntax: .condout VALUE

If this pseudo-operation inside kernel then it open ATI_CONDOUT CAL note. Next occurrence in this same kernel, add new CAL note. Optional argument add 4-byte value to content of this CAL note. If this pseudo-operation in kernel configuration then set CONDOUT value.

.config

Open kernel configuration. Must be inside kernel. Kernel configuration can not be defined if any CALNote, metadata or header was defined. Following pseudo-ops can be inside kernel config:

  • .arg
  • .cbid
  • .condout
  • .cws
  • .dims
  • .earlyexit
  • .hwlocal
  • .hwregion
  • .ieeemode
  • .localsize
  • .pgmrsrc2
  • .printfid
  • .privateid
  • .sampler
  • .scratchbuffer
  • .sgprsnum
  • .tgsize
  • .uavid
  • .uavprivate
  • .useconstdata
  • .useprintf
  • .userdata
  • .vgprsnum

.constantbuffers

This pseudo-operation must be inside kernel. Open ATI_CONSTANT_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.

.cws, .reqd_work_group_size

Syntax: .cws [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
Syntax: .reqd_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]

This pseudo-operation must be inside kernel configuration. Set reqd_work_group_size hint for this kernel. In versions earlier than 0.1.7 this pseudo-op has been broken and this pseudo-op set zeroes in two last component instead ones. We recomment to fill all components.

.dims

Syntax: .dims DIMENSIONS

This pseudo-operation must be inside kernel configuration. Define what dimensions (from list: x, y, z) will be used to determine space of the kernel execution.

.driver_info

Syntax: .driver_info "INFO"

Set driver info for this binary.

.driver_version

Syntax: .driver_version VERSION

Set driver version for this binary. Version in form: MajorVersion*100+MinorVersion. This pseudo-op replaces driver info.

.earlyexit

Syntax: .earlyexit [VALUE]
Syntax: .earlyexit VALUE

If this pseudo-operation inside kernel then it open ATI_EARLY_EXIT CAL note. Next occurrence in this same kernel, add new CAL note. Optional argument add 4-byte value to content of this CAL note. If this pseudo-operation in kernel configuration then set EARLY_EXIT value.

.entry

Syntax: .entry UAVID, F1, F2, TYPE
Syntax: .entry VALUE1, VALUE2

This pseudo-operation must be in ATI_UAV or ATI_PROGINFO CAL note. Add entry into CAL note. For ATI_UAV, pseudo-operation accepts 4 32-bit values. For ATI_PROGINFO, accepts 2 32-bit values.

.exceptions

Syntax: .exceptions EXCPMASK

This pseudo-operation must be inside kernel configuration. Set exception mask in PGMRSRC2 register value. Value should be 7-bit.

.floatconsts

This pseudo-operation must be inside kernel. Open ATI_FLOAT32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.

.floatmode

Syntax: .floatmode VALUE

This pseudo-operation must be inside kernel configuration. Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Value shall to be byte value. Default value is 0xc0.

.get_driver_version

Syntax: .get_driver_version SYMBOL

Store current driver version to SYMBOL. Version in form version*100 + revision.

.globalbuffers

This pseudo-operation must be inside kernel. Open ATI_GLOBAL_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.

.globaldata

Go to constant global data section.

.header

Go to main header of the binary.

.hwlocal, .localsize

Syntax: .hwlocal SIZE
Syntax: .localsize SIZE

This pseudo-operation must be inside kernel configuration. Set HWLOCAL value, the initial local data size.

.hwregion

Syntax: .hwregion VALUE

This pseudo-operation must be inside kernel configuration. Set HWREGION value.

.ieeemode

Syntax: .ieeemode

This pseudo-op must be inside kernel configuration. Set ieee-mode.

.inputs

This pseudo-operation must be inside kernel. Open ATI_INPUTS CAL note. Next occurrence in this same kernel, add new CAL note.

.inputsamplers

This pseudo-operation must be inside kernel. Open ATI_INPUT_SAMPLERS CAL note. Next occurrence in this same kernel, add new CAL note.

.intconsts

This pseudo-operation must be inside kernel. Open ATI_INT32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.

.metadata

This pseudo-operation must be inside kernel. Go to metadata content.

.outputs

This pseudo-operation must be inside kernel. Open ATI_OUTPUTS CAL note. Next occurrence in this same kernel, add new CAL note.

.persistentbuffers

This pseudo-operation must be inside kernel. Open ATI_PERSISTENT_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.

.pgmrsrc2

Syntax: .pgmrsrc2 VALUE

This pseudo-operation must be inside kernel configuration. Set PGMRSRC2 value. If dimensions is set then bits that controls dimension setup will be ignored. SCRATCH_EN bit will be ignored.

.printfid

Syntax: .printfid RESID

This pseudo-operation must be inside kernel configuration. Set printfid.

.privateid

Syntax: .privateid RESID

This pseudo-operation must be inside kernel configuration. Set privateid.

.proginfo

This pseudo-operation must be inside kernel. Open ATI_PROGINFO CAL note. Next occurrence in this same kernel, add new CAL note.

.sampler

Syntax: .sampler INPUT, SAMPLER
Syntax: .sampler RESID,....

If this pseudo-operation is in ATI_SAMPLER CAL note, then it adds sampler entry. If this pseudo-operation is in kernel configuration, then it adds samplers with specified resource ids.

.scratchbuffer

Syntax: .scratchbuffer SIZE

This pseudo-operation must be inside kernel configuration. Set scratchbuffer size.

.scratchbuffers

This pseudo-operation must be inside kernel. Open ATI_SCRATCH_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.

.segment

Syntax: .segment OFFSET, SIZE

This pseudo-operation must be in ATI_BOOL32CONSTS, ATI_INT32CONSTS or ATI_FLOAT32CONSTS CAL note. Add entry into CAL note.

.sgprsnum

Syntax: .sgprsnum REGNUM

This pseudo-op must be inside kernel configuration. Set number of scalar registers which can be used during kernel execution. It counts SGPR registers excluding VCC, FLAT_SCRATCH and XNACK_MASK.

.subconstantbuffers

This pseudo-operation must be inside kernel. Open ATI_SUB_CONSTANT_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.

.tgsize

This pseudo-op must be inside kernel configuration. Enable usage of the TG_SIZE_EN.

.uav

This pseudo-operation must be inside kernel. Open ATI_UAV CAL note. Next occurrence in this same kernel, add new CAL note.

.uavid

Syntax: .uavid UAVID

This pseudo-op must be inside kernel configuration. Set UAVId value.

.uavmailboxsize

Syntax: .uavmailboxsize [VALUE]

This pseudo-operation must be inside kernel. Open ATI_UAV_MAILBOX_SIZE CAL note. Next occurrence in this same kernel, add new CAL note. If first argument is given, then 32-bit value will be added to content.

.uavopmask

Syntax: .uavopmask [VALUE]

This pseudo-operation must be inside kernel. Open ATI_UAV_OP_MASK CAL note. Next occurrence in this same kernel, add new CAL note. If first argument is given, then 32-bit value will be added to content.

.uavprivate

Syntax: .uavprivate VALUE

This pseudo-op must be inside kernel configuration. Set uav private value.

.useconstdata

Eanble using of the const data.

.useprintf

Eanble using of the printf mechanism.

.userdata

Syntax: .userdata DATACLASS, APISLOT, REGSTART, REGSIZE

This pseudo-op must be inside kernel configuration. Add USERDATA entry. First argument is data class. It can be one of the following:

  • IMM_RESOURCE
  • IMM_SAMPLER
  • IMM_CONST_BUFFER
  • IMM_VERTEX_BUFFER
  • IMM_UAV
  • IMM_ALU_FLOAT_CONST
  • IMM_ALU_BOOL32_CONST
  • IMM_GDS_COUNTER_RANGE
  • IMM_GDS_MEMORY_RANGE
  • IMM_GWS_BASE
  • IMM_WORK_ITEM_RANGE
  • IMM_WORK_GROUP_RANGE
  • IMM_DISPATCH_ID
  • IMM_SCRATCH_BUFFER
  • IMM_HEAP_BUFFER
  • IMM_KERNEL_ARG
  • SUB_PTR_FETCH_SHADER
  • PTR_RESOURCE_TABLE
  • PTR_INTERNAL_RESOURCE_TABLE
  • PTR_SAMPLER_TABLE
  • PTR_CONST_BUFFER_TABLE
  • PTR_VERTEX_BUFFER_TABLE
  • PTR_SO_BUFFER_TABLE
  • PTR_UAV_TABLE
  • PTR_INTERNAL_GLOBAL_TABLE
  • PTR_EXTENDED_USER_DATA
  • PTR_INDIRECT_RESOURCE
  • PTR_INDIRECT_INTERNAL_RESOURCE
  • PTR_INDIRECT_UAV
  • IMM_CONTEXT_BASE
  • IMM_LDS_ESGS_SIZE
  • IMM_GLOBAL_OFFSET
  • IMM_GENERIC_USER_DAT

Second argument is apiSlot. Third argument determines the first scalar register which will hold userdata. Fourth argument determines how many scalar register needed to hold userdata.

.vgprsnum

Syntax: .vgprsnum REGNUM

This pseudo-op must be inside kernel configuration. Set number of vector registers which can be used during kernel execution.

Sample code

This is sample example of the kernel setup:

/* Disassembling 'DCT_15_5.1' */ .amd .gpu Pitcairn .32bit .compile_options "" .driver_info "@(#) OpenCL 1.2 AMD-APP (1702.3). Driver version: 1702.3 (VM)" .kernel DCT .header .fill 16, 1, 0x00 .byte 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00 .fill 8, 1, 0x00 .metadata .ascii ";ARGSTART:__OpenCL_DCT_kernel\n" .ascii ";version:3:1:111\n" .ascii ";device:pitcairn\n" .ascii ";uniqueid:1024\n" .ascii ";memory:uavprivate:0\n" .ascii ";memory:hwlocal:0\n" .ascii ";memory:hwregion:0\n" .ascii ";pointer:output:float:1:1:0:uav:12:4:RW:0:0\n" .ascii ";pointer:input:float:1:1:16:uav:13:4:RO:0:0\n" .ascii ";pointer:dct8x8:float:1:1:32:uav:14:4:RO:0:0\n" .ascii ";pointer:inter:float:1:1:48:hl:1:4:RW:0:0\n" .ascii ";value:width:u32:1:1:64\n" .ascii ";value:blockWidth:u32:1:1:80\n" .ascii ";value:inverse:u32:1:1:96\n" .ascii ";function:1:1030\n" .ascii ";uavid:11\n" .ascii ";printfid:9\n" .ascii ";cbid:10\n" .ascii ";privateid:8\n" .ascii ";reflection:0:float*\n" .ascii ";reflection:1:float*\n" .ascii ";reflection:2:float*\n" .ascii ";reflection:3:float*\n" .ascii ";reflection:4:uint\n" .ascii ";reflection:5:uint\n" .ascii ";reflection:6:uint\n" .ascii ";ARGEND:__OpenCL_DCT_kernel\n" .data .fill 4736, 1, 0x00 .inputs .outputs .uav .entry 12, 4, 0, 5 .entry 13, 4, 0, 5 .entry 14, 4, 0, 5 .entry 11, 4, 0, 5 .condout 0 .floatconsts .intconsts .boolconsts .earlyexit 0 .globalbuffers .constantbuffers .cbmask 0, 32764 .cbmask 1, 0 .inputsamplers .scratchbuffers .int 0x00000000 .persistentbuffers .proginfo .entry 0x80001000, 0x00000003 .entry 0x80001001, 0x00000017 .entry 0x80001002, 0x00000000 .entry 0x80001003, 0x00000002 .entry 0x80001004, 0x00000002 .entry 0x80001005, 0x00000002 .entry 0x80001006, 0x00000000 .entry 0x80001007, 0x00000004 .entry 0x80001008, 0x00000004 .entry 0x80001009, 0x00000002 .entry 0x8000100a, 0x00000001 .entry 0x8000100b, 0x00000008 .entry 0x8000100c, 0x00000004 .entry 0x80001041, 0x0000000b .entry 0x80001042, 0x00000018 .entry 0x80001863, 0x00000066 .entry 0x80001864, 0x00000100 .entry 0x80001043, 0x000000c0 .entry 0x80001044, 0x00000000 .entry 0x80001045, 0x00000000 .entry 0x00002e13, 0x00400998 .entry 0x8000001c, 0x00000100 .entry 0x8000001d, 0x00000000 .entry 0x8000001e, 0x00000000 .entry 0x80001841, 0x00000000 .entry 0x8000001f, 0x00007000 .entry 0x80001843, 0x00007000 .entry 0x80001844, 0x00000000 .entry 0x80001845, 0x00000000 .entry 0x80001846, 0x00000000 .entry 0x80001847, 0x00000000 .entry 0x80001848, 0x00000000 .entry 0x80001849, 0x00000000 .entry 0x8000184a, 0x00000000 .entry 0x8000184b, 0x00000000 .entry 0x8000184c, 0x00000000 .entry 0x8000184d, 0x00000000 .entry 0x8000184e, 0x00000000 .entry 0x8000184f, 0x00000000 .entry 0x80001850, 0x00000000 .entry 0x80001851, 0x00000000 .entry 0x80001852, 0x00000000 .entry 0x80001853, 0x00000000 .entry 0x80001854, 0x00000000 .entry 0x80001855, 0x00000000 .entry 0x80001856, 0x00000000 .entry 0x80001857, 0x00000000 .entry 0x80001858, 0x00000000 .entry 0x80001859, 0x00000000 .entry 0x8000185a, 0x00000000 .entry 0x8000185b, 0x00000000 .entry 0x8000185c, 0x00000000 .entry 0x8000185d, 0x00000000 .entry 0x8000185e, 0x00000000 .entry 0x8000185f, 0x00000000 .entry 0x80001860, 0x00000000 .entry 0x80001861, 0x00000000 .entry 0x80001862, 0x00000000 .entry 0x8000000a, 0x00000001 .entry 0x80000078, 0x00000040 .entry 0x80000081, 0x00008000 .entry 0x80000082, 0x00008000 .subconstantbuffers .uavmailboxsize 0 .uavopmask .byte 0x00, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 .fill 120, 1, 0x00 .text /*befc03ff 00008000*/ s_mov_b32 m0, 0x8000 ... /*bf810000 */ s_endpgm

with kernel configuration:

.amd .gpu Pitcairn .32bit .kernel DCT .config .dims xy .arg output,float*,global .arg input,float*,global,const .arg dct8x8,float*,global,const .arg inter,float*,local .arg width,uint .arg blockWidth,uint .arg inverse,uint .userdata PTR_UAV_TABLE,0,2,2 .userdata IMM_CONST_BUFFER,0,4,4 .userdata IMM_CONST_BUFFER,1,8,4 .text /*befc03ff 00008000*/ s_mov_b32 m0, 0x8000 ... /*bf810000 */ s_endpgm