source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmAmd.md @ 3737

Last change on this file since 3737 was 3737, checked in by matszpk, 3 years ago

CLRadeonExtender: CLRXDocs: Add info about broken '.cws' pseudo-op in earlier versions.

File size: 18.1 KB
Line 
1## CLRadeonExtender Assembler AMD Catalyst handling
2
3The AMD Catalyst driver provides own OpenCL implementation that can generates
4own binaries of the OpenCL programs. The CLRX assembler supports both OpenCL 1.2
5and OpenCL 2.0 binary format. This chapter describes Amd OpenCL 1.2 binary format.
6
7## Binary format
8
9The AMD OpenCL binaries contains constant global data, the device and compilation
10informations and embedded kernel binaries. Kernel binaries are inside `.text` section.
11Program code are separate for each kernel and no shared machine code between kernels.
12Each kernel binary have the metadata string, ATI CAL notes and program code.
13The metadata strings describes the kernel arguments, settings of the
14input/output buffers, constant buffers, read only and write only images, local data.
15ATI CAL notes are special small data fragments that describes features of the kernel.
16The most important ATI CAL note is PROGINFO that holds important data for runtime execution,
17like register usage, UAV usage, floating point setup.
18
19A `.data` section inside kernel is usable section and holds same zeroes.
20
21## Layout of the source code
22
23The CLRX assembler allow to use one of two ways to configure kernel setup:
24for human (`.config`) and for quick recompilation (ATI CALNotes and the metadata string).
25
26## Register usage setup
27
28The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
29This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
30
31## Scalar register allocation
32
33To used scalar registers, assembler add 2 additional registers for handling VCC.
34
35## List of the specific pseudo-operations
36
37### .arg
38
39Syntax for scalar: .arg ARGNAME \[, "ARGTYPENAME"], ARGTYPE[, unused] 
40Syntax for structure: .arg ARGNAME, \[, "ARGTYPENAME"], ARGTYPE[, STRUCTSIZE[, unused]] 
41Syntax for image: .arg ARGNAME\[, "ARGTYPENAME"], ARGTYPE[, [ACCESS] [, RESID[, unused]]] 
42Syntax for counter32: .arg ARGNAME\[, "ARGTYPENAME"], ARGTYPE[, RESID[, unused]] 
43Syntax for global pointer: .arg ARGNAME\[, "ARGTYPENAME"],
44ARGTYPE\[\[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, RESID[, unused]]]] 
45Syntax for local pointer: .arg ARGNAME\[, "ARGTYPENAME"],
46ARGTYPE\[\[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, unused]]] 
47Syntax for constant pointer: .arg ARGNAME\[, "ARGTYPENAME"],
48ARGTYPE\[\[, STRUCTSIZE], PTRSPACE\[, [ACCESS] [, [CONSTSIZE] [, RESID[, unused]]]]
49
50Adds kernel argument definition. Must be inside kernel configuration. First argument is
51argument name from OpenCL kernel definition. Next optional argument is argument type name
52from OpenCL kernel definition. Next arugment is argument type:
53
54* char, uchar, short, ushort, int, uint, ulong, long, float, double - simple scalar types
55* charX, ucharX, shortX, ushortX, intX, uintX, ulongX, longX, floatX, doubleX - vector types
56(X indicates number of elements: 2, 3, 4, 8 or 16)
57* counter32 - 32-bit counter type
58* structure - structure
59* image, image1d, image1d_array, image1d_buffer, image2d, image2d_array, image3d -
60image types
61* sampler - sampler
62* type* - pointer to data
63
64Rest of the argument depends on type of the kernel argument. STRUCTSIZE determines size of
65structure. ACCESS for image determines can be one of the: `read_only`, `rdonly` or
66`write_only`, `wronly`.
67PTRSPACE determines space where pointer points to.
68It can be one of: `local`, `constant` or `global`.
69ACCESS for pointers can be: `const`, `restrict` and `volatile`.
70CONSTSIZE determines maximum size in bytes for constant buffer.
71RESID determines resource id.
72
73* for global or constant pointers is UAVID, range is in 8-1023.
74* for constant pointers (driver older than 1348.X), range is in 1-159.
75* for read only images range is in 0-127.
76* For write only images or counters range is in 0-7.
77
78The last argument `unused` indicates that argument will not be used by kernel.
79
80Sample usage:
81
82```
83.arg v1,"double_t",double
84.arg v2,double2
85.arg v3,double3
86.arg v23,image2d,
87.arg v30,image2d,,5
88.arg v41,ulong16  *,global
89.arg v42,ulong16  *,global, restrict
90.arg v57,structure*,82,global
91```
92
93### .boolconsts
94
95This pseudo-operation must be inside kernel.
96Open ATI_BOOL32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
97
98### .calnote
99
100Syntax: .calnote CALNOTEID
101
102This pseudo-operation must be inside kernel. Open ATI CAL note.
103
104### .cbid
105
106Syntax: .cbid
107Syntax: .cbid VALUE
108
109If this pseudo-operation inside ATI_CONSTANT_BUFFERS CAL note then
110it adds entry into ATI_CONSTANT_BUFFERS CAL note.
111If this pseudo-operation in kernel configuration then set constant buffer id.
112
113### .cbmask
114
115Syntax: .cbmask INDEX, SIZE
116
117This pseudo-operation must be in ATI_CONSTANT_BUFFERS CAL note.
118Add entry into ATI_CONSTANT_BUFFERS CAL note.
119
120### .compile_options
121
122Syntax: .compile_options "STRING"
123
124Set compile options for this binary.
125
126### .condout
127
128Syntax: .condout [VALUE] 
129Syntax: .condout VALUE
130
131If this pseudo-operation inside kernel then it open ATI_CONDOUT CAL note.
132Next occurrence in this same kernel, add new CAL note.
133Optional argument add 4-byte value to content of this CAL note.
134If this pseudo-operation in kernel configuration then set CONDOUT value.
135
136### .config
137
138Open kernel configuration. Must be inside kernel. Kernel configuration can not be
139defined if any CALNote, metadata or header was defined.
140Following pseudo-ops can be inside kernel config:
141
142* .arg
143* .cbid
144* .condout
145* .cws
146* .dims
147* .earlyexit
148* .hwlocal
149* .hwregion
150* .ieeemode
151* .localsize
152* .pgmrsrc2
153* .printfid
154* .privateid
155* .sampler
156* .scratchbuffer
157* .sgprsnum
158* .tgsize
159* .uavid
160* .uavprivate
161* .useconstdata
162* .useprintf
163* .userdata
164* .vgprsnum
165
166### .constantbuffers
167
168This pseudo-operation must be inside kernel.
169Open ATI_CONSTANT_BUFFERS CAL note. Next occurrence in this same kernel,
170add new CAL note.
171
172### .cws, .reqd_work_group_size
173
174Syntax: .cws SIZEHINT[, SIZEHINT[, SIZEHINT]]
175Syntax: .reqd_work_group_size SIZEHINT[, SIZEHINT[, SIZEHINT]]
176
177This pseudo-operation must be inside kernel configuration.
178Set reqd_work_group_size hint for this kernel.
179In versions earlier than 0.1.7 this pseudo-op has been broken and this pseudo-op
180set zeroes in two last component instead ones. We recomment to fill all components.
181
182### .dims
183
184Syntax: .dims DIMENSIONS
185
186This pseudo-operation must be inside kernel configuration. Defines what dimensions
187(from list: x, y, z) will be used to determine space of the kernel execution.
188
189### .driver_info
190
191Syntax: .driver_info "INFO"
192
193Set driver info for this binary.
194
195### .driver_version
196
197Syntax: .driver_version VERSION
198
199Set driver version for this binary. Version in form: MajorVersion*100+MinorVersion.
200This pseudo-op replaces driver info.
201
202### .earlyexit
203
204Syntax: .earlyexit [VALUE] 
205Syntax: .earlyexit VALUE
206
207If this pseudo-operation inside kernel then it open ATI_EARLY_EXIT CAL note.
208Next occurrence in this same kernel, add new CAL note.
209Optional argument add 4-byte value to content of this CAL note.
210If this pseudo-operation in kernel configuration then set EARLY_EXIT value.
211
212### .entry
213
214Syntax: .entry UAVID, F1, F2, TYPE 
215Syntax: .entry VALUE1, VALUE2
216
217This pseudo-operation must be in ATI_UAV or ATI_PROGINFO CAL note.
218Add entry into CAL note. For ATI_UAV, pseudo-operation accepts 4 32-bit values.
219For ATI_PROGINFO, accepts 2 32-bit values.
220
221### .exceptions
222
223Syntax: .exceptions EXCPMASK
224
225This pseudo-operation must be inside kernel configuration.
226Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
227
228### .floatconsts
229
230This pseudo-operation must be inside kernel.
231Open ATI_FLOAT32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
232
233### .floatmode
234
235Syntax: .floatmode VALUE
236
237This pseudo-operation must be inside kernel configuration.
238Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register).
239Value shall to be byte value. Default value is 0xc0.
240
241### .get_driver_version
242
243Syntax: .get_driver_version SYMBOL
244
245Store current driver version to SYMBOL. Version in form `version*100 + revision`.
246
247### .globalbuffers
248
249This pseudo-operation must be inside kernel.
250Open ATI_GLOBAL_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
251
252### .globaldata
253
254Go to constant global data section.
255
256### .header
257
258Go to main header of the binary.
259
260### .hwlocal, .localsize
261
262Syntax: .hwlocal SIZE 
263Syntax: .localsize SIZE
264
265This pseudo-operation must be inside kernel configuration. Set HWLOCAL value, the initial
266local data size.
267
268### .hwregion
269
270Syntax: .hwregion VALUE
271
272This pseudo-operation must be inside kernel configuration. Set HWREGION value.
273
274### .ieeemode
275
276Syntax: .ieeemode
277
278This pseudo-op must be inside kernel configuration. Set ieee-mode.
279
280### .inputs
281
282This pseudo-operation must be inside kernel.
283Open ATI_INPUTS CAL note. Next occurrence in this same kernel, add new CAL note.
284
285### .inputsamplers
286
287This pseudo-operation must be inside kernel.
288Open ATI_INPUT_SAMPLERS CAL note. Next occurrence in this same kernel, add new CAL note.
289
290### .intconsts
291
292This pseudo-operation must be inside kernel.
293Open ATI_INT32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
294
295### .metadata
296
297This pseudo-operation must be inside kernel.
298Go to metadata content.
299
300### .outputs
301
302This pseudo-operation must be inside kernel.
303Open ATI_OUTPUTS CAL note. Next occurrence in this same kernel, add new CAL note.
304
305### .persistentbuffers
306
307This pseudo-operation must be inside kernel.
308Open ATI_PERSISTENT_BUFFERS CAL note. Next occurrence in this same kernel,
309add new CAL note.
310
311### .pgmrsrc2
312
313Syntax: .pgmrsrc2 VALUE
314
315This pseudo-operation must be inside kernel configuration. Set PGMRSRC2 value.
316If dimensions is set then bits that controls dimension setup will be ignored.
317SCRATCH_EN bit will be ignored.
318
319### .printfid
320
321Syntax: .printfid RESID
322
323This pseudo-operation must be inside kernel configuration. Set printfid.
324
325### .privateid
326
327Syntax: .privateid RESID
328
329This pseudo-operation must be inside kernel configuration. Set privateid.
330
331### .proginfo
332
333This pseudo-operation must be inside kernel.
334Open ATI_PROGINFO CAL note. Next occurrence in this same kernel, add new CAL note.
335
336### .sampler
337
338Syntax: .sampler INPUT, SAMPLER 
339Syntax: .sampler RESID,....
340
341If this pseudo-operation is in ATI_SAMPLER CAL note, then it adds sampler entry.
342If this  pseudo-operation is in kernel configuration, then it adds samplers with specified
343resource ids.
344
345### .scratchbuffer
346
347Syntax: .scratchbuffer SIZE
348
349This pseudo-operation must be inside kernel configuration.
350Set scratchbuffer size.
351
352### .scratchbuffers
353
354This pseudo-operation must be inside kernel.
355Open ATI_SCRATCH_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
356
357### .segment
358
359Syntax: .segment OFFSET, SIZE
360
361This pseudo-operation must be in ATI_BOOL32CONSTS, ATI_INT32CONSTS or
362ATI_FLOAT32CONSTS CAL note. Add entry into CAL note.
363
364### .sgprsnum
365
366Syntax: .sgprsnum REGNUM
367
368This pseudo-op must be inside kernel configuration. Set number of scalar
369registers which can be used during kernel execution. It counts SGPR registers excluding
370VCC, FLAT_SCRATCH and XNACK_MASK.
371
372### .subconstantbuffers
373
374This pseudo-operation must be inside kernel.
375Open ATI_SUB_CONSTANT_BUFFERS CAL note. Next occurrence in this same kernel,
376add new CAL note.
377
378### .tgsize
379
380This pseudo-op must be inside kernel configuration.
381Enable usage of the TG_SIZE_EN.
382
383### .uav
384
385This pseudo-operation must be inside kernel.
386Open ATI_UAV CAL note. Next occurrence in this same kernel,
387add new CAL note.
388
389### .uavid
390
391Syntax: .uavid UAVID
392
393This pseudo-op must be inside kernel configuration. Set UAVId value.
394
395### .uavmailboxsize
396
397Syntax: .uavmailboxsize [VALUE]
398
399This pseudo-operation must be inside kernel.
400Open ATI_UAV_MAILBOX_SIZE CAL note. Next occurrence in this same kernel,
401add new CAL note. If first argument is given, then 32-bit value will be added to content.
402
403### .uavopmask
404
405Syntax: .uavopmask [VALUE]
406
407This pseudo-operation must be inside kernel.
408Open ATI_UAV_OP_MASK CAL note. Next occurrence in this same kernel,
409add new CAL note. If first argument is given, then 32-bit value will be added to content.
410
411### .uavprivate
412
413Syntax: .uavprivate VALUE
414
415This pseudo-op must be inside kernel configuration. Set uav private value.
416
417### .useconstdata
418
419Eanble using of the const data.
420
421### .useprintf
422
423Eanble using of the printf mechanism.
424
425### .userdata
426
427Syntax: .userdata DATACLASS, APISLOT, REGSTART, REGSIZE
428
429This pseudo-op must be inside kernel configuration. Add USERDATA entry. First argument is
430data class. It can be one of the following:
431
432* IMM_RESOURCE
433* IMM_SAMPLER
434* IMM_CONST_BUFFER
435* IMM_VERTEX_BUFFER
436* IMM_UAV
437* IMM_ALU_FLOAT_CONST
438* IMM_ALU_BOOL32_CONST
439* IMM_GDS_COUNTER_RANGE
440* IMM_GDS_MEMORY_RANGE
441* IMM_GWS_BASE
442* IMM_WORK_ITEM_RANGE
443* IMM_WORK_GROUP_RANGE
444* IMM_DISPATCH_ID
445* IMM_SCRATCH_BUFFER
446* IMM_HEAP_BUFFER
447* IMM_KERNEL_ARG
448* SUB_PTR_FETCH_SHADER
449* PTR_RESOURCE_TABLE
450* PTR_INTERNAL_RESOURCE_TABLE
451* PTR_SAMPLER_TABLE
452* PTR_CONST_BUFFER_TABLE
453* PTR_VERTEX_BUFFER_TABLE
454* PTR_SO_BUFFER_TABLE
455* PTR_UAV_TABLE
456* PTR_INTERNAL_GLOBAL_TABLE
457* PTR_EXTENDED_USER_DATA
458* PTR_INDIRECT_RESOURCE
459* PTR_INDIRECT_INTERNAL_RESOURCE
460* PTR_INDIRECT_UAV
461* IMM_CONTEXT_BASE
462* IMM_LDS_ESGS_SIZE
463* IMM_GLOBAL_OFFSET
464* IMM_GENERIC_USER_DAT
465
466Second argument is apiSlot.
467Third argument determines the first scalar register which will hold userdata.
468Fourth argument determines how many scalar register needed to hold userdata.
469
470### .vgprsnum
471
472Syntax: .vgprsnum REGNUM
473
474This pseudo-op must be inside kernel configuration. Set number of vector
475registers which can be used during kernel execution.
476
477## Sample code
478
479This is sample example of the kernel setup:
480
481```
482/* Disassembling 'DCT_15_5.1' */
483.amd
484.gpu Pitcairn
485.32bit
486.compile_options ""
487.driver_info "@(#) OpenCL 1.2 AMD-APP (1702.3).  Driver version: 1702.3 (VM)"
488.kernel DCT
489    .header
490        .fill 16, 1, 0x00
491        .byte 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00
492        .fill 8, 1, 0x00
493    .metadata
494        .ascii ";ARGSTART:__OpenCL_DCT_kernel\n"
495        .ascii ";version:3:1:111\n"
496        .ascii ";device:pitcairn\n"
497        .ascii ";uniqueid:1024\n"
498        .ascii ";memory:uavprivate:0\n"
499        .ascii ";memory:hwlocal:0\n"
500        .ascii ";memory:hwregion:0\n"
501        .ascii ";pointer:output:float:1:1:0:uav:12:4:RW:0:0\n"
502        .ascii ";pointer:input:float:1:1:16:uav:13:4:RO:0:0\n"
503        .ascii ";pointer:dct8x8:float:1:1:32:uav:14:4:RO:0:0\n"
504        .ascii ";pointer:inter:float:1:1:48:hl:1:4:RW:0:0\n"
505        .ascii ";value:width:u32:1:1:64\n"
506        .ascii ";value:blockWidth:u32:1:1:80\n"
507        .ascii ";value:inverse:u32:1:1:96\n"
508        .ascii ";function:1:1030\n"
509        .ascii ";uavid:11\n"
510        .ascii ";printfid:9\n"
511        .ascii ";cbid:10\n"
512        .ascii ";privateid:8\n"
513        .ascii ";reflection:0:float*\n"
514        .ascii ";reflection:1:float*\n"
515        .ascii ";reflection:2:float*\n"
516        .ascii ";reflection:3:float*\n"
517        .ascii ";reflection:4:uint\n"
518        .ascii ";reflection:5:uint\n"
519        .ascii ";reflection:6:uint\n"
520        .ascii ";ARGEND:__OpenCL_DCT_kernel\n"
521    .data
522        .fill 4736, 1, 0x00
523    .inputs
524    .outputs
525    .uav
526        .entry 12, 4, 0, 5
527        .entry 13, 4, 0, 5
528        .entry 14, 4, 0, 5
529        .entry 11, 4, 0, 5
530    .condout 0
531    .floatconsts
532    .intconsts
533    .boolconsts
534    .earlyexit 0
535    .globalbuffers
536    .constantbuffers
537        .cbmask 0, 32764
538        .cbmask 1, 0
539    .inputsamplers
540    .scratchbuffers
541        .int 0x00000000
542    .persistentbuffers
543    .proginfo
544        .entry 0x80001000, 0x00000003
545        .entry 0x80001001, 0x00000017
546        .entry 0x80001002, 0x00000000
547        .entry 0x80001003, 0x00000002
548        .entry 0x80001004, 0x00000002
549        .entry 0x80001005, 0x00000002
550        .entry 0x80001006, 0x00000000
551        .entry 0x80001007, 0x00000004
552        .entry 0x80001008, 0x00000004
553        .entry 0x80001009, 0x00000002
554        .entry 0x8000100a, 0x00000001
555        .entry 0x8000100b, 0x00000008
556        .entry 0x8000100c, 0x00000004
557        .entry 0x80001041, 0x0000000b
558        .entry 0x80001042, 0x00000018
559        .entry 0x80001863, 0x00000066
560        .entry 0x80001864, 0x00000100
561        .entry 0x80001043, 0x000000c0
562        .entry 0x80001044, 0x00000000
563        .entry 0x80001045, 0x00000000
564        .entry 0x00002e13, 0x00400998
565        .entry 0x8000001c, 0x00000100
566        .entry 0x8000001d, 0x00000000
567        .entry 0x8000001e, 0x00000000
568        .entry 0x80001841, 0x00000000
569        .entry 0x8000001f, 0x00007000
570        .entry 0x80001843, 0x00007000
571        .entry 0x80001844, 0x00000000
572        .entry 0x80001845, 0x00000000
573        .entry 0x80001846, 0x00000000
574        .entry 0x80001847, 0x00000000
575        .entry 0x80001848, 0x00000000
576        .entry 0x80001849, 0x00000000
577        .entry 0x8000184a, 0x00000000
578        .entry 0x8000184b, 0x00000000
579        .entry 0x8000184c, 0x00000000
580        .entry 0x8000184d, 0x00000000
581        .entry 0x8000184e, 0x00000000
582        .entry 0x8000184f, 0x00000000
583        .entry 0x80001850, 0x00000000
584        .entry 0x80001851, 0x00000000
585        .entry 0x80001852, 0x00000000
586        .entry 0x80001853, 0x00000000
587        .entry 0x80001854, 0x00000000
588        .entry 0x80001855, 0x00000000
589        .entry 0x80001856, 0x00000000
590        .entry 0x80001857, 0x00000000
591        .entry 0x80001858, 0x00000000
592        .entry 0x80001859, 0x00000000
593        .entry 0x8000185a, 0x00000000
594        .entry 0x8000185b, 0x00000000
595        .entry 0x8000185c, 0x00000000
596        .entry 0x8000185d, 0x00000000
597        .entry 0x8000185e, 0x00000000
598        .entry 0x8000185f, 0x00000000
599        .entry 0x80001860, 0x00000000
600        .entry 0x80001861, 0x00000000
601        .entry 0x80001862, 0x00000000
602        .entry 0x8000000a, 0x00000001
603        .entry 0x80000078, 0x00000040
604        .entry 0x80000081, 0x00008000
605        .entry 0x80000082, 0x00008000
606    .subconstantbuffers
607    .uavmailboxsize 0
608    .uavopmask
609        .byte 0x00, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
610        .fill 120, 1, 0x00
611    .text
612/*befc03ff 00008000*/ s_mov_b32       m0, 0x8000
613...
614/*bf810000         */ s_endpgm
615```
616
617with kernel configuration:
618
619```
620.amd
621.gpu Pitcairn
622.32bit
623.kernel DCT
624    .config
625    .dims xy
626    .arg output,float*,global
627    .arg input,float*,global,const
628    .arg dct8x8,float*,global,const
629    .arg inter,float*,local
630    .arg width,uint
631    .arg blockWidth,uint
632    .arg inverse,uint
633    .userdata PTR_UAV_TABLE,0,2,2
634    .userdata IMM_CONST_BUFFER,0,4,4
635    .userdata IMM_CONST_BUFFER,1,8,4
636    .text
637/*befc03ff 00008000*/ s_mov_b32       m0, 0x8000
638...
639/*bf810000         */ s_endpgm
640```
Note: See TracBrowser for help on using the repository browser.