source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmAmd.md @ 3702

Last change on this file since 3702 was 3702, checked in by matszpk, 3 years ago

CLRadeonExtender: CLRXDocs: Add info about registers kernel setup.

File size: 17.9 KB
Line 
1## CLRadeonExtender Assembler AMD Catalyst handling
2
3The AMD Catalyst driver provides own OpenCL implementation that can generates
4own binaries of the OpenCL programs. The CLRX assembler supports both OpenCL 1.2
5and OpenCL 2.0 binary format. This chapter describes Amd OpenCL 1.2 binary format.
6
7## Binary format
8
9The AMD OpenCL binaries contains constant global data, the device and compilation
10informations and embedded kernel binaries. Kernel binaries are inside `.text` section.
11Program code are separate for each kernel and no shared machine code between kernels.
12Each kernel binary have the metadata string, ATI CAL notes and program code.
13The metadata strings describes the kernel arguments, settings of the
14input/output buffers, constant buffers, read only and write only images, local data.
15ATI CAL notes are special small data fragments that describes features of the kernel.
16The most important ATI CAL note is PROGINFO that holds important data for runtime execution,
17like register usage, UAV usage, floating point setup.
18
19A `.data` section inside kernel is usable section and holds same zeroes.
20
21## Layout of the source code
22
23The CLRX assembler allow to use one of two ways to configure kernel setup:
24for human (`.config`) and for quick recompilation (ATI CALNotes and the metadata string).
25
26## Register usage setup
27
28The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
29This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
30
31## Scalar register allocation
32
33To used scalar registers, assembler add 2 additional registers for handling VCC.
34
35## List of the specific pseudo-operations
36
37### .arg
38
39Syntax for scalar: .arg ARGNAME \[, "ARGTYPENAME"], ARGTYPE[, unused] 
40Syntax for structure: .arg ARGNAME, \[, "ARGTYPENAME"], ARGTYPE[, STRUCTSIZE[, unused]] 
41Syntax for image: .arg ARGNAME\[, "ARGTYPENAME"], ARGTYPE[, [ACCESS] [, RESID[, unused]]] 
42Syntax for counter32: .arg ARGNAME\[, "ARGTYPENAME"], ARGTYPE[, RESID[, unused]] 
43Syntax for global pointer: .arg ARGNAME\[, "ARGTYPENAME"],
44ARGTYPE\[\[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, RESID[, unused]]]] 
45Syntax for local pointer: .arg ARGNAME\[, "ARGTYPENAME"],
46ARGTYPE\[\[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, unused]]] 
47Syntax for constant pointer: .arg ARGNAME\[, "ARGTYPENAME"],
48ARGTYPE\[\[, STRUCTSIZE], PTRSPACE\[, [ACCESS] [, [CONSTSIZE] [, RESID[, unused]]]]
49
50Adds kernel argument definition. Must be inside kernel configuration. First argument is
51argument name from OpenCL kernel definition. Next optional argument is argument type name
52from OpenCL kernel definition. Next arugment is argument type:
53
54* char, uchar, short, ushort, int, uint, ulong, long, float, double - simple scalar types
55* charX, ucharX, shortX, ushortX, intX, uintX, ulongX, longX, floatX, doubleX - vector types
56(X indicates number of elements: 2, 3, 4, 8 or 16)
57* counter32 - 32-bit counter type
58* structure - structure
59* image, image1d, image1d_array, image1d_buffer, image2d, image2d_array, image3d -
60image types
61* sampler - sampler
62* type* - pointer to data
63
64Rest of the argument depends on type of the kernel argument. STRUCTSIZE determines size of
65structure. ACCESS for image determines can be one of the: `read_only`, `rdonly` or
66`write_only`, `wronly`.
67PTRSPACE determines space where pointer points to.
68It can be one of: `local`, `constant` or `global`.
69ACCESS for pointers can be: `const`, `restrict` and `volatile`.
70CONSTSIZE determines maximum size in bytes for constant buffer.
71RESID determines resource id.
72
73* for global or constant pointers is UAVID, range is in 8-1023.
74* for constant pointers (driver older than 1348.X), range is in 1-159.
75* for read only images range is in 0-127.
76* For write only images or counters range is in 0-7.
77
78The last argument `unused` indicates that argument will not be used by kernel.
79
80Sample usage:
81
82```
83.arg v1,"double_t",double
84.arg v2,double2
85.arg v3,double3
86.arg v23,image2d,
87.arg v30,image2d,,5
88.arg v41,ulong16  *,global
89.arg v42,ulong16  *,global, restrict
90.arg v57,structure*,82,global
91```
92
93### .boolconsts
94
95This pseudo-operation must be inside kernel.
96Open ATI_BOOL32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
97
98### .calnote
99
100Syntax: .calnote CALNOTEID
101
102This pseudo-operation must be inside kernel. Open ATI CAL note.
103
104### .cbid
105
106Syntax: .cbid
107Syntax: .cbid VALUE
108
109If this pseudo-operation inside ATI_CONSTANT_BUFFERS CAL note then
110it adds entry into ATI_CONSTANT_BUFFERS CAL note.
111If this pseudo-operation in kernel configuration then set constant buffer id.
112
113### .cbmask
114
115Syntax: .cbmask INDEX, SIZE
116
117This pseudo-operation must be in ATI_CONSTANT_BUFFERS CAL note.
118Add entry into ATI_CONSTANT_BUFFERS CAL note.
119
120### .compile_options
121
122Syntax: .compile_options "STRING"
123
124Set compile options for this binary.
125
126### .condout
127
128Syntax: .condout [VALUE] 
129Syntax: .condout VALUE
130
131If this pseudo-operation inside kernel then it open ATI_CONDOUT CAL note.
132Next occurrence in this same kernel, add new CAL note.
133Optional argument add 4-byte value to content of this CAL note.
134If this pseudo-operation in kernel configuration then set CONDOUT value.
135
136### .config
137
138Open kernel configuration. Must be inside kernel. Kernel configuration can not be
139defined if any CALNote, metadata or header was defined.
140Following pseudo-ops can be inside kernel config:
141
142* .arg
143* .cbid
144* .condout
145* .cws
146* .dims
147* .earlyexit
148* .hwlocal
149* .hwregion
150* .ieeemode
151* .localsize
152* .pgmrsrc2
153* .printfid
154* .privateid
155* .sampler
156* .scratchbuffer
157* .sgprsnum
158* .tgsize
159* .uavid
160* .uavprivate
161* .useconstdata
162* .useprintf
163* .userdata
164* .vgprsnum
165
166### .constantbuffers
167
168This pseudo-operation must be inside kernel.
169Open ATI_CONSTANT_BUFFERS CAL note. Next occurrence in this same kernel,
170add new CAL note.
171
172### .cws, .reqd_work_group_size
173
174Syntax: .cws SIZEHINT[, SIZEHINT[, SIZEHINT]]
175Syntax: .reqd_work_group_size SIZEHINT[, SIZEHINT[, SIZEHINT]]
176
177This pseudo-operation must be inside kernel configuration.
178Set reqd_work_group_size hint for this kernel.
179
180### .dims
181
182Syntax: .dims DIMENSIONS
183
184This pseudo-operation must be inside kernel configuration. Defines what dimensions
185(from list: x, y, z) will be used to determine space of the kernel execution.
186
187### .driver_info
188
189Syntax: .driver_info "INFO"
190
191Set driver info for this binary.
192
193### .driver_version
194
195Syntax: .driver_version VERSION
196
197Set driver version for this binary. Version in form: MajorVersion*100+MinorVersion.
198This pseudo-op replaces driver info.
199
200### .earlyexit
201
202Syntax: .earlyexit [VALUE] 
203Syntax: .earlyexit VALUE
204
205If this pseudo-operation inside kernel then it open ATI_EARLY_EXIT CAL note.
206Next occurrence in this same kernel, add new CAL note.
207Optional argument add 4-byte value to content of this CAL note.
208If this pseudo-operation in kernel configuration then set EARLY_EXIT value.
209
210### .entry
211
212Syntax: .entry UAVID, F1, F2, TYPE 
213Syntax: .entry VALUE1, VALUE2
214
215This pseudo-operation must be in ATI_UAV or ATI_PROGINFO CAL note.
216Add entry into CAL note. For ATI_UAV, pseudo-operation accepts 4 32-bit values.
217For ATI_PROGINFO, accepts 2 32-bit values.
218
219### .exceptions
220
221Syntax: .exceptions EXCPMASK
222
223This pseudo-operation must be inside kernel configuration.
224Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
225
226### .floatconsts
227
228This pseudo-operation must be inside kernel.
229Open ATI_FLOAT32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
230
231### .floatmode
232
233Syntax: .floatmode VALUE
234
235This pseudo-operation must be inside kernel configuration.
236Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register).
237Value shall to be byte value. Default value is 0xc0.
238
239### .get_driver_version
240
241Syntax: .get_driver_version SYMBOL
242
243Store current driver version to SYMBOL. Version in form `version*100 + revision`.
244
245### .globalbuffers
246
247This pseudo-operation must be inside kernel.
248Open ATI_GLOBAL_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
249
250### .globaldata
251
252Go to constant global data section.
253
254### .header
255
256Go to main header of the binary.
257
258### .hwlocal, .localsize
259
260Syntax: .hwlocal SIZE 
261Syntax: .localsize SIZE
262
263This pseudo-operation must be inside kernel configuration. Set HWLOCAL value, the initial
264local data size.
265
266### .hwregion
267
268Syntax: .hwregion VALUE
269
270This pseudo-operation must be inside kernel configuration. Set HWREGION value.
271
272### .ieeemode
273
274Syntax: .ieeemode
275
276This pseudo-op must be inside kernel configuration. Set ieee-mode.
277
278### .inputs
279
280This pseudo-operation must be inside kernel.
281Open ATI_INPUTS CAL note. Next occurrence in this same kernel, add new CAL note.
282
283### .inputsamplers
284
285This pseudo-operation must be inside kernel.
286Open ATI_INPUT_SAMPLERS CAL note. Next occurrence in this same kernel, add new CAL note.
287
288### .intconsts
289
290This pseudo-operation must be inside kernel.
291Open ATI_INT32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
292
293### .metadata
294
295This pseudo-operation must be inside kernel.
296Go to metadata content.
297
298### .outputs
299
300This pseudo-operation must be inside kernel.
301Open ATI_OUTPUTS CAL note. Next occurrence in this same kernel, add new CAL note.
302
303### .persistentbuffers
304
305This pseudo-operation must be inside kernel.
306Open ATI_PERSISTENT_BUFFERS CAL note. Next occurrence in this same kernel,
307add new CAL note.
308
309### .pgmrsrc2
310
311Syntax: .pgmrsrc2 VALUE
312
313This pseudo-operation must be inside kernel configuration. Set PGMRSRC2 value.
314If dimensions is set then bits that controls dimension setup will be ignored.
315SCRATCH_EN bit will be ignored.
316
317### .printfid
318
319Syntax: .printfid RESID
320
321This pseudo-operation must be inside kernel configuration. Set printfid.
322
323### .privateid
324
325Syntax: .privateid RESID
326
327This pseudo-operation must be inside kernel configuration. Set privateid.
328
329### .proginfo
330
331This pseudo-operation must be inside kernel.
332Open ATI_PROGINFO CAL note. Next occurrence in this same kernel, add new CAL note.
333
334### .sampler
335
336Syntax: .sampler INPUT, SAMPLER 
337Syntax: .sampler RESID,....
338
339If this pseudo-operation is in ATI_SAMPLER CAL note, then it adds sampler entry.
340If this  pseudo-operation is in kernel configuration, then it adds samplers with specified
341resource ids.
342
343### .scratchbuffer
344
345Syntax: .scratchbuffer SIZE
346
347This pseudo-operation must be inside kernel configuration.
348Set scratchbuffer size.
349
350### .scratchbuffers
351
352This pseudo-operation must be inside kernel.
353Open ATI_SCRATCH_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
354
355### .segment
356
357Syntax: .segment OFFSET, SIZE
358
359This pseudo-operation must be in ATI_BOOL32CONSTS, ATI_INT32CONSTS or
360ATI_FLOAT32CONSTS CAL note. Add entry into CAL note.
361
362### .sgprsnum
363
364Syntax: .sgprsnum REGNUM
365
366This pseudo-op must be inside kernel configuration. Set number of scalar
367registers which can be used during kernel execution. It counts SGPR registers excluding
368VCC, FLAT_SCRATCH and XNACK_MASK.
369
370### .subconstantbuffers
371
372This pseudo-operation must be inside kernel.
373Open ATI_SUB_CONSTANT_BUFFERS CAL note. Next occurrence in this same kernel,
374add new CAL note.
375
376### .tgsize
377
378This pseudo-op must be inside kernel configuration.
379Enable usage of the TG_SIZE_EN.
380
381### .uav
382
383This pseudo-operation must be inside kernel.
384Open ATI_UAV CAL note. Next occurrence in this same kernel,
385add new CAL note.
386
387### .uavid
388
389Syntax: .uavid UAVID
390
391This pseudo-op must be inside kernel configuration. Set UAVId value.
392
393### .uavmailboxsize
394
395Syntax: .uavmailboxsize [VALUE]
396
397This pseudo-operation must be inside kernel.
398Open ATI_UAV_MAILBOX_SIZE CAL note. Next occurrence in this same kernel,
399add new CAL note. If first argument is given, then 32-bit value will be added to content.
400
401### .uavopmask
402
403Syntax: .uavopmask [VALUE]
404
405This pseudo-operation must be inside kernel.
406Open ATI_UAV_OP_MASK CAL note. Next occurrence in this same kernel,
407add new CAL note. If first argument is given, then 32-bit value will be added to content.
408
409### .uavprivate
410
411Syntax: .uavprivate VALUE
412
413This pseudo-op must be inside kernel configuration. Set uav private value.
414
415### .useconstdata
416
417Eanble using of the const data.
418
419### .useprintf
420
421Eanble using of the printf mechanism.
422
423### .userdata
424
425Syntax: .userdata DATACLASS, APISLOT, REGSTART, REGSIZE
426
427This pseudo-op must be inside kernel configuration. Add USERDATA entry. First argument is
428data class. It can be one of the following:
429
430* IMM_RESOURCE
431* IMM_SAMPLER
432* IMM_CONST_BUFFER
433* IMM_VERTEX_BUFFER
434* IMM_UAV
435* IMM_ALU_FLOAT_CONST
436* IMM_ALU_BOOL32_CONST
437* IMM_GDS_COUNTER_RANGE
438* IMM_GDS_MEMORY_RANGE
439* IMM_GWS_BASE
440* IMM_WORK_ITEM_RANGE
441* IMM_WORK_GROUP_RANGE
442* IMM_DISPATCH_ID
443* IMM_SCRATCH_BUFFER
444* IMM_HEAP_BUFFER
445* IMM_KERNEL_ARG
446* SUB_PTR_FETCH_SHADER
447* PTR_RESOURCE_TABLE
448* PTR_INTERNAL_RESOURCE_TABLE
449* PTR_SAMPLER_TABLE
450* PTR_CONST_BUFFER_TABLE
451* PTR_VERTEX_BUFFER_TABLE
452* PTR_SO_BUFFER_TABLE
453* PTR_UAV_TABLE
454* PTR_INTERNAL_GLOBAL_TABLE
455* PTR_EXTENDED_USER_DATA
456* PTR_INDIRECT_RESOURCE
457* PTR_INDIRECT_INTERNAL_RESOURCE
458* PTR_INDIRECT_UAV
459* IMM_CONTEXT_BASE
460* IMM_LDS_ESGS_SIZE
461* IMM_GLOBAL_OFFSET
462* IMM_GENERIC_USER_DAT
463
464Second argument is apiSlot.
465Third argument determines the first scalar register which will hold userdata.
466Fourth argument determines how many scalar register needed to hold userdata.
467
468### .vgprsnum
469
470Syntax: .vgprsnum REGNUM
471
472This pseudo-op must be inside kernel configuration. Set number of vector
473registers which can be used during kernel execution.
474
475## Sample code
476
477This is sample example of the kernel setup:
478
479```
480/* Disassembling 'DCT_15_5.1' */
481.amd
482.gpu Pitcairn
483.32bit
484.compile_options ""
485.driver_info "@(#) OpenCL 1.2 AMD-APP (1702.3).  Driver version: 1702.3 (VM)"
486.kernel DCT
487    .header
488        .fill 16, 1, 0x00
489        .byte 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00
490        .fill 8, 1, 0x00
491    .metadata
492        .ascii ";ARGSTART:__OpenCL_DCT_kernel\n"
493        .ascii ";version:3:1:111\n"
494        .ascii ";device:pitcairn\n"
495        .ascii ";uniqueid:1024\n"
496        .ascii ";memory:uavprivate:0\n"
497        .ascii ";memory:hwlocal:0\n"
498        .ascii ";memory:hwregion:0\n"
499        .ascii ";pointer:output:float:1:1:0:uav:12:4:RW:0:0\n"
500        .ascii ";pointer:input:float:1:1:16:uav:13:4:RO:0:0\n"
501        .ascii ";pointer:dct8x8:float:1:1:32:uav:14:4:RO:0:0\n"
502        .ascii ";pointer:inter:float:1:1:48:hl:1:4:RW:0:0\n"
503        .ascii ";value:width:u32:1:1:64\n"
504        .ascii ";value:blockWidth:u32:1:1:80\n"
505        .ascii ";value:inverse:u32:1:1:96\n"
506        .ascii ";function:1:1030\n"
507        .ascii ";uavid:11\n"
508        .ascii ";printfid:9\n"
509        .ascii ";cbid:10\n"
510        .ascii ";privateid:8\n"
511        .ascii ";reflection:0:float*\n"
512        .ascii ";reflection:1:float*\n"
513        .ascii ";reflection:2:float*\n"
514        .ascii ";reflection:3:float*\n"
515        .ascii ";reflection:4:uint\n"
516        .ascii ";reflection:5:uint\n"
517        .ascii ";reflection:6:uint\n"
518        .ascii ";ARGEND:__OpenCL_DCT_kernel\n"
519    .data
520        .fill 4736, 1, 0x00
521    .inputs
522    .outputs
523    .uav
524        .entry 12, 4, 0, 5
525        .entry 13, 4, 0, 5
526        .entry 14, 4, 0, 5
527        .entry 11, 4, 0, 5
528    .condout 0
529    .floatconsts
530    .intconsts
531    .boolconsts
532    .earlyexit 0
533    .globalbuffers
534    .constantbuffers
535        .cbmask 0, 32764
536        .cbmask 1, 0
537    .inputsamplers
538    .scratchbuffers
539        .int 0x00000000
540    .persistentbuffers
541    .proginfo
542        .entry 0x80001000, 0x00000003
543        .entry 0x80001001, 0x00000017
544        .entry 0x80001002, 0x00000000
545        .entry 0x80001003, 0x00000002
546        .entry 0x80001004, 0x00000002
547        .entry 0x80001005, 0x00000002
548        .entry 0x80001006, 0x00000000
549        .entry 0x80001007, 0x00000004
550        .entry 0x80001008, 0x00000004
551        .entry 0x80001009, 0x00000002
552        .entry 0x8000100a, 0x00000001
553        .entry 0x8000100b, 0x00000008
554        .entry 0x8000100c, 0x00000004
555        .entry 0x80001041, 0x0000000b
556        .entry 0x80001042, 0x00000018
557        .entry 0x80001863, 0x00000066
558        .entry 0x80001864, 0x00000100
559        .entry 0x80001043, 0x000000c0
560        .entry 0x80001044, 0x00000000
561        .entry 0x80001045, 0x00000000
562        .entry 0x00002e13, 0x00400998
563        .entry 0x8000001c, 0x00000100
564        .entry 0x8000001d, 0x00000000
565        .entry 0x8000001e, 0x00000000
566        .entry 0x80001841, 0x00000000
567        .entry 0x8000001f, 0x00007000
568        .entry 0x80001843, 0x00007000
569        .entry 0x80001844, 0x00000000
570        .entry 0x80001845, 0x00000000
571        .entry 0x80001846, 0x00000000
572        .entry 0x80001847, 0x00000000
573        .entry 0x80001848, 0x00000000
574        .entry 0x80001849, 0x00000000
575        .entry 0x8000184a, 0x00000000
576        .entry 0x8000184b, 0x00000000
577        .entry 0x8000184c, 0x00000000
578        .entry 0x8000184d, 0x00000000
579        .entry 0x8000184e, 0x00000000
580        .entry 0x8000184f, 0x00000000
581        .entry 0x80001850, 0x00000000
582        .entry 0x80001851, 0x00000000
583        .entry 0x80001852, 0x00000000
584        .entry 0x80001853, 0x00000000
585        .entry 0x80001854, 0x00000000
586        .entry 0x80001855, 0x00000000
587        .entry 0x80001856, 0x00000000
588        .entry 0x80001857, 0x00000000
589        .entry 0x80001858, 0x00000000
590        .entry 0x80001859, 0x00000000
591        .entry 0x8000185a, 0x00000000
592        .entry 0x8000185b, 0x00000000
593        .entry 0x8000185c, 0x00000000
594        .entry 0x8000185d, 0x00000000
595        .entry 0x8000185e, 0x00000000
596        .entry 0x8000185f, 0x00000000
597        .entry 0x80001860, 0x00000000
598        .entry 0x80001861, 0x00000000
599        .entry 0x80001862, 0x00000000
600        .entry 0x8000000a, 0x00000001
601        .entry 0x80000078, 0x00000040
602        .entry 0x80000081, 0x00008000
603        .entry 0x80000082, 0x00008000
604    .subconstantbuffers
605    .uavmailboxsize 0
606    .uavopmask
607        .byte 0x00, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
608        .fill 120, 1, 0x00
609    .text
610/*befc03ff 00008000*/ s_mov_b32       m0, 0x8000
611...
612/*bf810000         */ s_endpgm
613```
614
615with kernel configuration:
616
617```
618.amd
619.gpu Pitcairn
620.32bit
621.kernel DCT
622    .config
623    .dims xy
624    .arg output,float*,global
625    .arg input,float*,global,const
626    .arg dct8x8,float*,global,const
627    .arg inter,float*,local
628    .arg width,uint
629    .arg blockWidth,uint
630    .arg inverse,uint
631    .userdata PTR_UAV_TABLE,0,2,2
632    .userdata IMM_CONST_BUFFER,0,4,4
633    .userdata IMM_CONST_BUFFER,1,8,4
634    .text
635/*befc03ff 00008000*/ s_mov_b32       m0, 0x8000
636...
637/*bf810000         */ s_endpgm
638```
Note: See TracBrowser for help on using the repository browser.