source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmAmd.md @ 3996

Last change on this file since 3996 was 3996, checked in by matszpk, 15 months ago

CLRadeonExtender: CLRXDocs: add extra info about setting up number of the SGPRs registers.

File size: 18.2 KB
Line 
1## CLRadeonExtender Assembler AMD Catalyst handling
2
3The AMD Catalyst driver provides own OpenCL implementation that can generates
4own binaries of the OpenCL programs. The CLRX assembler supports both OpenCL 1.2
5and OpenCL 2.0 binary format. This chapter describes Amd OpenCL 1.2 binary format.
6
7## Binary format
8
9The AMD OpenCL binaries contains constant global data, the device and compilation
10informations and embedded kernel binaries. Kernel binaries are inside `.text` section.
11Program code are separate for each kernel and no shared machine code between kernels.
12Each kernel binary have the metadata string, ATI CAL notes and program code.
13The metadata strings describes the kernel arguments, settings of the
14input/output buffers, constant buffers, read only and write only images, local data.
15ATI CAL notes are special small data fragments that describes features of the kernel.
16The most important ATI CAL note is PROGINFO that holds important data for runtime execution,
17like register usage, UAV usage, floating point setup.
18
19A `.data` section inside kernel is usable section and holds same zeroes.
20
21## Layout of the source code
22
23The CLRX assembler allow to use one of two ways to configure kernel setup:
24for human (`.config`) and for quick recompilation (ATI CALNotes and the metadata string).
25
26## Register usage setup
27
28The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
29This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
30
31## Scalar register allocation
32
33To used scalar registers, assembler add 2 additional registers for handling VCC.
34The `.sgprsnum` set number of all SGPRs except VCC.
35
36## List of the specific pseudo-operations
37
38### .arg
39
40Syntax for scalar: .arg ARGNAME \[, "ARGTYPENAME"], ARGTYPE[, unused] 
41Syntax for structure: .arg ARGNAME, \[, "ARGTYPENAME"], ARGTYPE[, STRUCTSIZE[, unused]] 
42Syntax for image: .arg ARGNAME\[, "ARGTYPENAME"], ARGTYPE[, [ACCESS] [, RESID[, unused]]] 
43Syntax for counter32: .arg ARGNAME\[, "ARGTYPENAME"], ARGTYPE[, RESID[, unused]] 
44Syntax for global pointer: .arg ARGNAME\[, "ARGTYPENAME"],
45ARGTYPE\[\[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, RESID[, unused]]]] 
46Syntax for local pointer: .arg ARGNAME\[, "ARGTYPENAME"],
47ARGTYPE\[\[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, unused]]] 
48Syntax for constant pointer: .arg ARGNAME\[, "ARGTYPENAME"],
49ARGTYPE\[\[, STRUCTSIZE], PTRSPACE\[, [ACCESS] [, [CONSTSIZE] [, RESID[, unused]]]]
50
51Adds kernel argument definition. Must be inside kernel configuration. First argument is
52argument name from OpenCL kernel definition. Next optional argument is argument type name
53from OpenCL kernel definition. Next arugment is argument type:
54
55* char, uchar, short, ushort, int, uint, ulong, long, float, double - simple scalar types
56* charX, ucharX, shortX, ushortX, intX, uintX, ulongX, longX, floatX, doubleX - vector types
57(X indicates number of elements: 2, 3, 4, 8 or 16)
58* counter32 - 32-bit counter type
59* structure - structure
60* image, image1d, image1d_array, image1d_buffer, image2d, image2d_array, image3d -
61image types
62* sampler - sampler
63* type* - pointer to data
64
65Rest of the argument depends on type of the kernel argument. STRUCTSIZE determines size of
66structure. ACCESS for image determines can be one of the: `read_only`, `rdonly` or
67`write_only`, `wronly`.
68PTRSPACE determines space where pointer points to.
69It can be one of: `local`, `constant` or `global`.
70ACCESS for pointers can be: `const`, `restrict` and `volatile`.
71CONSTSIZE determines maximum size in bytes for constant buffer.
72RESID determines resource id.
73
74* for global or constant pointers is UAVID, range is in 8-1023.
75* for constant pointers (driver older than 1348.X), range is in 1-159.
76* for read only images range is in 0-127.
77* For write only images or counters range is in 0-7.
78
79The last argument `unused` indicates that argument will not be used by kernel.
80
81Sample usage:
82
83```
84.arg v1,"double_t",double
85.arg v2,double2
86.arg v3,double3
87.arg v23,image2d,
88.arg v30,image2d,,5
89.arg v41,ulong16  *,global
90.arg v42,ulong16  *,global, restrict
91.arg v57,structure*,82,global
92```
93
94### .boolconsts
95
96This pseudo-operation must be inside kernel.
97Open ATI_BOOL32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
98
99### .calnote
100
101Syntax: .calnote CALNOTEID
102
103This pseudo-operation must be inside kernel. Open ATI CAL note.
104
105### .cbid
106
107Syntax: .cbid
108Syntax: .cbid VALUE
109
110If this pseudo-operation inside ATI_CONSTANT_BUFFERS CAL note then
111it adds entry into ATI_CONSTANT_BUFFERS CAL note.
112If this pseudo-operation in kernel configuration then set constant buffer id.
113
114### .cbmask
115
116Syntax: .cbmask INDEX, SIZE
117
118This pseudo-operation must be in ATI_CONSTANT_BUFFERS CAL note.
119Add entry into ATI_CONSTANT_BUFFERS CAL note.
120
121### .compile_options
122
123Syntax: .compile_options "STRING"
124
125Set compile options for this binary.
126
127### .condout
128
129Syntax: .condout [VALUE] 
130Syntax: .condout VALUE
131
132If this pseudo-operation inside kernel then it open ATI_CONDOUT CAL note.
133Next occurrence in this same kernel, add new CAL note.
134Optional argument add 4-byte value to content of this CAL note.
135If this pseudo-operation in kernel configuration then set CONDOUT value.
136
137### .config
138
139Open kernel configuration. Must be inside kernel. Kernel configuration can not be
140defined if any CALNote, metadata or header was defined.
141Following pseudo-ops can be inside kernel config:
142
143* .arg
144* .cbid
145* .condout
146* .cws
147* .dims
148* .earlyexit
149* .hwlocal
150* .hwregion
151* .ieeemode
152* .localsize
153* .pgmrsrc2
154* .printfid
155* .privateid
156* .sampler
157* .scratchbuffer
158* .sgprsnum
159* .tgsize
160* .uavid
161* .uavprivate
162* .useconstdata
163* .useprintf
164* .userdata
165* .vgprsnum
166
167### .constantbuffers
168
169This pseudo-operation must be inside kernel.
170Open ATI_CONSTANT_BUFFERS CAL note. Next occurrence in this same kernel,
171add new CAL note.
172
173### .cws, .reqd_work_group_size
174
175Syntax: .cws [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]] 
176Syntax: .reqd_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
177
178This pseudo-operation must be inside kernel configuration.
179Set reqd_work_group_size hint for this kernel.
180In versions earlier than 0.1.7 this pseudo-op has been broken and this pseudo-op
181set zeroes in two last component instead ones. We recomment to fill all components.
182
183### .dims
184
185Syntax: .dims DIMENSIONS
186
187This pseudo-operation must be inside kernel configuration. Define what dimensions
188(from list: x, y, z) will be used to determine space of the kernel execution.
189
190### .driver_info
191
192Syntax: .driver_info "INFO"
193
194Set driver info for this binary.
195
196### .driver_version
197
198Syntax: .driver_version VERSION
199
200Set driver version for this binary. Version in form: MajorVersion*100+MinorVersion.
201This pseudo-op replaces driver info.
202
203### .earlyexit
204
205Syntax: .earlyexit [VALUE] 
206Syntax: .earlyexit VALUE
207
208If this pseudo-operation inside kernel then it open ATI_EARLY_EXIT CAL note.
209Next occurrence in this same kernel, add new CAL note.
210Optional argument add 4-byte value to content of this CAL note.
211If this pseudo-operation in kernel configuration then set EARLY_EXIT value.
212
213### .entry
214
215Syntax: .entry UAVID, F1, F2, TYPE 
216Syntax: .entry VALUE1, VALUE2
217
218This pseudo-operation must be in ATI_UAV or ATI_PROGINFO CAL note.
219Add entry into CAL note. For ATI_UAV, pseudo-operation accepts 4 32-bit values.
220For ATI_PROGINFO, accepts 2 32-bit values.
221
222### .exceptions
223
224Syntax: .exceptions EXCPMASK
225
226This pseudo-operation must be inside kernel configuration.
227Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
228
229### .floatconsts
230
231This pseudo-operation must be inside kernel.
232Open ATI_FLOAT32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
233
234### .floatmode
235
236Syntax: .floatmode VALUE
237
238This pseudo-operation must be inside kernel configuration.
239Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register).
240Value shall to be byte value. Default value is 0xc0.
241
242### .get_driver_version
243
244Syntax: .get_driver_version SYMBOL
245
246Store current driver version to SYMBOL. Version in form `version*100 + revision`.
247
248### .globalbuffers
249
250This pseudo-operation must be inside kernel.
251Open ATI_GLOBAL_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
252
253### .globaldata
254
255Go to constant global data section.
256
257### .header
258
259Go to main header of the binary.
260
261### .hwlocal, .localsize
262
263Syntax: .hwlocal SIZE 
264Syntax: .localsize SIZE
265
266This pseudo-operation must be inside kernel configuration. Set HWLOCAL value, the initial
267local data size.
268
269### .hwregion
270
271Syntax: .hwregion VALUE
272
273This pseudo-operation must be inside kernel configuration. Set HWREGION value.
274
275### .ieeemode
276
277Syntax: .ieeemode
278
279This pseudo-op must be inside kernel configuration. Set ieee-mode.
280
281### .inputs
282
283This pseudo-operation must be inside kernel.
284Open ATI_INPUTS CAL note. Next occurrence in this same kernel, add new CAL note.
285
286### .inputsamplers
287
288This pseudo-operation must be inside kernel.
289Open ATI_INPUT_SAMPLERS CAL note. Next occurrence in this same kernel, add new CAL note.
290
291### .intconsts
292
293This pseudo-operation must be inside kernel.
294Open ATI_INT32CONSTS CAL note. Next occurrence in this same kernel, add new CAL note.
295
296### .metadata
297
298This pseudo-operation must be inside kernel.
299Go to metadata content.
300
301### .outputs
302
303This pseudo-operation must be inside kernel.
304Open ATI_OUTPUTS CAL note. Next occurrence in this same kernel, add new CAL note.
305
306### .persistentbuffers
307
308This pseudo-operation must be inside kernel.
309Open ATI_PERSISTENT_BUFFERS CAL note. Next occurrence in this same kernel,
310add new CAL note.
311
312### .pgmrsrc2
313
314Syntax: .pgmrsrc2 VALUE
315
316This pseudo-operation must be inside kernel configuration. Set PGMRSRC2 value.
317If dimensions is set then bits that controls dimension setup will be ignored.
318SCRATCH_EN bit will be ignored.
319
320### .printfid
321
322Syntax: .printfid RESID
323
324This pseudo-operation must be inside kernel configuration. Set printfid.
325
326### .privateid
327
328Syntax: .privateid RESID
329
330This pseudo-operation must be inside kernel configuration. Set privateid.
331
332### .proginfo
333
334This pseudo-operation must be inside kernel.
335Open ATI_PROGINFO CAL note. Next occurrence in this same kernel, add new CAL note.
336
337### .sampler
338
339Syntax: .sampler INPUT, SAMPLER 
340Syntax: .sampler RESID,....
341
342If this pseudo-operation is in ATI_SAMPLER CAL note, then it adds sampler entry.
343If this  pseudo-operation is in kernel configuration, then it adds samplers with specified
344resource ids.
345
346### .scratchbuffer
347
348Syntax: .scratchbuffer SIZE
349
350This pseudo-operation must be inside kernel configuration.
351Set scratchbuffer size.
352
353### .scratchbuffers
354
355This pseudo-operation must be inside kernel.
356Open ATI_SCRATCH_BUFFERS CAL note. Next occurrence in this same kernel, add new CAL note.
357
358### .segment
359
360Syntax: .segment OFFSET, SIZE
361
362This pseudo-operation must be in ATI_BOOL32CONSTS, ATI_INT32CONSTS or
363ATI_FLOAT32CONSTS CAL note. Add entry into CAL note.
364
365### .sgprsnum
366
367Syntax: .sgprsnum REGNUM
368
369This pseudo-op must be inside kernel configuration. Set number of scalar
370registers which can be used during kernel execution. It counts SGPR registers excluding
371VCC, FLAT_SCRATCH and XNACK_MASK.
372
373### .subconstantbuffers
374
375This pseudo-operation must be inside kernel.
376Open ATI_SUB_CONSTANT_BUFFERS CAL note. Next occurrence in this same kernel,
377add new CAL note.
378
379### .tgsize
380
381This pseudo-op must be inside kernel configuration.
382Enable usage of the TG_SIZE_EN.
383
384### .uav
385
386This pseudo-operation must be inside kernel.
387Open ATI_UAV CAL note. Next occurrence in this same kernel,
388add new CAL note.
389
390### .uavid
391
392Syntax: .uavid UAVID
393
394This pseudo-op must be inside kernel configuration. Set UAVId value.
395
396### .uavmailboxsize
397
398Syntax: .uavmailboxsize [VALUE]
399
400This pseudo-operation must be inside kernel.
401Open ATI_UAV_MAILBOX_SIZE CAL note. Next occurrence in this same kernel,
402add new CAL note. If first argument is given, then 32-bit value will be added to content.
403
404### .uavopmask
405
406Syntax: .uavopmask [VALUE]
407
408This pseudo-operation must be inside kernel.
409Open ATI_UAV_OP_MASK CAL note. Next occurrence in this same kernel,
410add new CAL note. If first argument is given, then 32-bit value will be added to content.
411
412### .uavprivate
413
414Syntax: .uavprivate VALUE
415
416This pseudo-op must be inside kernel configuration. Set uav private value.
417
418### .useconstdata
419
420Eanble using of the const data.
421
422### .useprintf
423
424Eanble using of the printf mechanism.
425
426### .userdata
427
428Syntax: .userdata DATACLASS, APISLOT, REGSTART, REGSIZE
429
430This pseudo-op must be inside kernel configuration. Add USERDATA entry. First argument is
431data class. It can be one of the following:
432
433* IMM_RESOURCE
434* IMM_SAMPLER
435* IMM_CONST_BUFFER
436* IMM_VERTEX_BUFFER
437* IMM_UAV
438* IMM_ALU_FLOAT_CONST
439* IMM_ALU_BOOL32_CONST
440* IMM_GDS_COUNTER_RANGE
441* IMM_GDS_MEMORY_RANGE
442* IMM_GWS_BASE
443* IMM_WORK_ITEM_RANGE
444* IMM_WORK_GROUP_RANGE
445* IMM_DISPATCH_ID
446* IMM_SCRATCH_BUFFER
447* IMM_HEAP_BUFFER
448* IMM_KERNEL_ARG
449* SUB_PTR_FETCH_SHADER
450* PTR_RESOURCE_TABLE
451* PTR_INTERNAL_RESOURCE_TABLE
452* PTR_SAMPLER_TABLE
453* PTR_CONST_BUFFER_TABLE
454* PTR_VERTEX_BUFFER_TABLE
455* PTR_SO_BUFFER_TABLE
456* PTR_UAV_TABLE
457* PTR_INTERNAL_GLOBAL_TABLE
458* PTR_EXTENDED_USER_DATA
459* PTR_INDIRECT_RESOURCE
460* PTR_INDIRECT_INTERNAL_RESOURCE
461* PTR_INDIRECT_UAV
462* IMM_CONTEXT_BASE
463* IMM_LDS_ESGS_SIZE
464* IMM_GLOBAL_OFFSET
465* IMM_GENERIC_USER_DAT
466
467Second argument is apiSlot.
468Third argument determines the first scalar register which will hold userdata.
469Fourth argument determines how many scalar register needed to hold userdata.
470
471### .vgprsnum
472
473Syntax: .vgprsnum REGNUM
474
475This pseudo-op must be inside kernel configuration. Set number of vector
476registers which can be used during kernel execution.
477
478## Sample code
479
480This is sample example of the kernel setup:
481
482```
483/* Disassembling 'DCT_15_5.1' */
484.amd
485.gpu Pitcairn
486.32bit
487.compile_options ""
488.driver_info "@(#) OpenCL 1.2 AMD-APP (1702.3).  Driver version: 1702.3 (VM)"
489.kernel DCT
490    .header
491        .fill 16, 1, 0x00
492        .byte 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00
493        .fill 8, 1, 0x00
494    .metadata
495        .ascii ";ARGSTART:__OpenCL_DCT_kernel\n"
496        .ascii ";version:3:1:111\n"
497        .ascii ";device:pitcairn\n"
498        .ascii ";uniqueid:1024\n"
499        .ascii ";memory:uavprivate:0\n"
500        .ascii ";memory:hwlocal:0\n"
501        .ascii ";memory:hwregion:0\n"
502        .ascii ";pointer:output:float:1:1:0:uav:12:4:RW:0:0\n"
503        .ascii ";pointer:input:float:1:1:16:uav:13:4:RO:0:0\n"
504        .ascii ";pointer:dct8x8:float:1:1:32:uav:14:4:RO:0:0\n"
505        .ascii ";pointer:inter:float:1:1:48:hl:1:4:RW:0:0\n"
506        .ascii ";value:width:u32:1:1:64\n"
507        .ascii ";value:blockWidth:u32:1:1:80\n"
508        .ascii ";value:inverse:u32:1:1:96\n"
509        .ascii ";function:1:1030\n"
510        .ascii ";uavid:11\n"
511        .ascii ";printfid:9\n"
512        .ascii ";cbid:10\n"
513        .ascii ";privateid:8\n"
514        .ascii ";reflection:0:float*\n"
515        .ascii ";reflection:1:float*\n"
516        .ascii ";reflection:2:float*\n"
517        .ascii ";reflection:3:float*\n"
518        .ascii ";reflection:4:uint\n"
519        .ascii ";reflection:5:uint\n"
520        .ascii ";reflection:6:uint\n"
521        .ascii ";ARGEND:__OpenCL_DCT_kernel\n"
522    .data
523        .fill 4736, 1, 0x00
524    .inputs
525    .outputs
526    .uav
527        .entry 12, 4, 0, 5
528        .entry 13, 4, 0, 5
529        .entry 14, 4, 0, 5
530        .entry 11, 4, 0, 5
531    .condout 0
532    .floatconsts
533    .intconsts
534    .boolconsts
535    .earlyexit 0
536    .globalbuffers
537    .constantbuffers
538        .cbmask 0, 32764
539        .cbmask 1, 0
540    .inputsamplers
541    .scratchbuffers
542        .int 0x00000000
543    .persistentbuffers
544    .proginfo
545        .entry 0x80001000, 0x00000003
546        .entry 0x80001001, 0x00000017
547        .entry 0x80001002, 0x00000000
548        .entry 0x80001003, 0x00000002
549        .entry 0x80001004, 0x00000002
550        .entry 0x80001005, 0x00000002
551        .entry 0x80001006, 0x00000000
552        .entry 0x80001007, 0x00000004
553        .entry 0x80001008, 0x00000004
554        .entry 0x80001009, 0x00000002
555        .entry 0x8000100a, 0x00000001
556        .entry 0x8000100b, 0x00000008
557        .entry 0x8000100c, 0x00000004
558        .entry 0x80001041, 0x0000000b
559        .entry 0x80001042, 0x00000018
560        .entry 0x80001863, 0x00000066
561        .entry 0x80001864, 0x00000100
562        .entry 0x80001043, 0x000000c0
563        .entry 0x80001044, 0x00000000
564        .entry 0x80001045, 0x00000000
565        .entry 0x00002e13, 0x00400998
566        .entry 0x8000001c, 0x00000100
567        .entry 0x8000001d, 0x00000000
568        .entry 0x8000001e, 0x00000000
569        .entry 0x80001841, 0x00000000
570        .entry 0x8000001f, 0x00007000
571        .entry 0x80001843, 0x00007000
572        .entry 0x80001844, 0x00000000
573        .entry 0x80001845, 0x00000000
574        .entry 0x80001846, 0x00000000
575        .entry 0x80001847, 0x00000000
576        .entry 0x80001848, 0x00000000
577        .entry 0x80001849, 0x00000000
578        .entry 0x8000184a, 0x00000000
579        .entry 0x8000184b, 0x00000000
580        .entry 0x8000184c, 0x00000000
581        .entry 0x8000184d, 0x00000000
582        .entry 0x8000184e, 0x00000000
583        .entry 0x8000184f, 0x00000000
584        .entry 0x80001850, 0x00000000
585        .entry 0x80001851, 0x00000000
586        .entry 0x80001852, 0x00000000
587        .entry 0x80001853, 0x00000000
588        .entry 0x80001854, 0x00000000
589        .entry 0x80001855, 0x00000000
590        .entry 0x80001856, 0x00000000
591        .entry 0x80001857, 0x00000000
592        .entry 0x80001858, 0x00000000
593        .entry 0x80001859, 0x00000000
594        .entry 0x8000185a, 0x00000000
595        .entry 0x8000185b, 0x00000000
596        .entry 0x8000185c, 0x00000000
597        .entry 0x8000185d, 0x00000000
598        .entry 0x8000185e, 0x00000000
599        .entry 0x8000185f, 0x00000000
600        .entry 0x80001860, 0x00000000
601        .entry 0x80001861, 0x00000000
602        .entry 0x80001862, 0x00000000
603        .entry 0x8000000a, 0x00000001
604        .entry 0x80000078, 0x00000040
605        .entry 0x80000081, 0x00008000
606        .entry 0x80000082, 0x00008000
607    .subconstantbuffers
608    .uavmailboxsize 0
609    .uavopmask
610        .byte 0x00, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
611        .fill 120, 1, 0x00
612    .text
613/*befc03ff 00008000*/ s_mov_b32       m0, 0x8000
614...
615/*bf810000         */ s_endpgm
616```
617
618with kernel configuration:
619
620```
621.amd
622.gpu Pitcairn
623.32bit
624.kernel DCT
625    .config
626    .dims xy
627    .arg output,float*,global
628    .arg input,float*,global,const
629    .arg dct8x8,float*,global,const
630    .arg inter,float*,local
631    .arg width,uint
632    .arg blockWidth,uint
633    .arg inverse,uint
634    .userdata PTR_UAV_TABLE,0,2,2
635    .userdata IMM_CONST_BUFFER,0,4,4
636    .userdata IMM_CONST_BUFFER,1,8,4
637    .text
638/*befc03ff 00008000*/ s_mov_b32       m0, 0x8000
639...
640/*bf810000         */ s_endpgm
641```
Note: See TracBrowser for help on using the repository browser.