source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmAmdCl2.md @ 3321

Last change on this file since 3321 was 3321, checked in by matszpk, 11 months ago

CLRadeonExtender: CLRXDocs: Add info about HSA config SGPR allocation info.

File size: 18.9 KB
Line 
1## CLRadeonExtender Assembler AMD Catalyst OpenCL 2.0 handling
2
3The AMD Catalyst driver provides own OpenCL implementation that can generates
4own binaries of the OpenCL programs. The CLRX assembler supports both OpenCL 1.2
5and OpenCL 2.0 binary format. This chapter describes Amd OpenCL 2.0 binary format.
6The first Catalyst drivers uses this format for OpenCL 2.0 programs.
7Current AMD drivers uses this format for OpenCL 1.2 and OpenCL 2.0 programs for
8GCN 1.1 and later architectures.
9
10## Binary format
11
12An AMD Catalyst binary format for OpenCL 2.0 support significantly differs from
13prevbious binary format for OpenCL 1.2. The Kernel codes are in single text inner binary.
14Instead of AMD CAL notes and ProgInfo entries, the kernel setup is in special
15format structure. Metadatas mainly holds arguments definitions of kernels.
16
17A CLRadeonExtender supports two versions of binary formats for OpenCL 2.0: newer (since
18AMD OpenCL 1912.05) and older (before 1912.05 driver version).
19
20Special section to define global data for all kernels:
21
22* `rodata`, `.globaldata` - read-only constant (global) data
23* `.rwdata`, `.data` - read-write global data
24* `.bss`, `.bssdata` - allocatable read-write data
25
26## Relocations
27
28An CLRX assembler handles relocations to symbol at global data, global rwdata and
29global bss data in kernel code. These relocations can be applied to places that accepts
3032-bit literal immediates. Only two types of relocations is allowed:
31
32* `place`, `place&0xffffffff`, `place%0x10000000`, `place%%0x10000000` -
33low 32 bits of value
34* `place>>32`, `place/0x100000000`, `place//0x100000000` - high 32 bits of value
35
36The `place` indicates an expression that result points to some place in one of
37allowed sections.
38
39Examples:
40
41```
42s_mov_b32       s13, (gdata+152)>>32
43s_mov_b32       s12, (gdata+152)&0xffffffff
44s_mov_b32       s15, (gdata+160)>>32
45s_mov_b32       s14, (gdata+160)&0xffffffff
46```
47
48## Layout of the source code
49
50The CLRX assembler allow to use one of two ways to configure kernel setup:
51for human (`.config`) and for quick recompilation (kernel setup, stub, metadata content).
52
53## Scalar register allocation
54
55Depend on configuration options, an assembler add VCC and FLAT_SCRATCH
56(if `.useenqueue` or `.usegeneric` enabled).
57In HSA configuration mode, a special fields determines
58what extra SGPR extra has been added.
59
60## List of the specific pseudo-operations
61
62### .acl_version
63
64Syntax: .acl_version "STRING"
65
66Set ACL version string.
67
68### .arch_minor
69
70Syntax: .arch_minor ARCH_MINOR
71
72Set architecture minor number.
73
74### .arch_stepping
75
76Syntax: .arch_minor ARCH_STEPPING
77
78Set architecture stepping number.
79
80### .arg
81
82Syntax for scalar: .arg ARGNAME \[, "ARGTYPENAME"], ARGTYPE[, unused] 
83Syntax for structure: .arg ARGNAME, \[, "ARGTYPENAME"], ARGTYPE[, STRUCTSIZE[, unused]] 
84Syntax for image: .arg ARGNAME\[, "ARGTYPENAME"], ARGTYPE[, [ACCESS] [, RESID[, unused]]] 
85Syntax for sampler: .arg ARGNAME\[, "ARGTYPENAME"], ARGTYPE[, RESID[, unused]] 
86Syntax for global pointer: .arg ARGNAME\[, "ARGTYPENAME"],
87ARGTYPE\[\[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, unused]]] 
88Syntax for local pointer: .arg ARGNAME\[, "ARGTYPENAME"],
89ARGTYPE\[\[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, unused]]] 
90Syntax for constant pointer: .arg ARGNAME\[, "ARGTYPENAME"],
91ARGTYPE\[\[, STRUCTSIZE], PTRSPACE\[, [ACCESS] [, [CONSTSIZE] [, unused]]]
92
93Adds kernel argument definition. Must be inside any kernel configuration. First argument is
94argument name from OpenCL kernel definition. Next optional argument is argument type name
95from OpenCL kernel definition. Next arugment is argument type:
96
97* char, uchar, short, ushort, int, uint, ulong, long, float, double - simple scalar types
98* charX, ucharX, shortX, ushortX, intX, uintX, ulongX, longX, floatX, doubleX - vector types
99(X indicates number of elements: 2, 3, 4, 8 or 16)
100* structure - structure
101* image, image1d, image1d_array, image1d_buffer, image2d, image2d_array, image3d -
102image types
103* sampler - sampler
104* queue - command queue
105* clkevent - clkevent
106* type* - pointer to data
107
108Rest of the argument depends on type of the kernel argument. STRUCTSIZE determines size of
109structure. ACCESS for image determines can be one of the: `read_only`, `rdonly` or
110`write_only`, `wronly`.
111PTRSPACE determines space where pointer points to.
112It can be one of: `local`, `constant` or `global`.
113ACCESS for pointers can be: `const`, `restrict` and `volatile`.
114CONSTSIZE determines maximum size in bytes for constant buffer.
115RESID determines resource id (only for samplers and images).
116
117* for read only images range is in 0-127.
118* for other images is in 0-63.
119* for samplers is in 0-15.
120
121The last argument `unused` indicates that argument will not be used by kernel. In this
122argument can be given 'rdonly' (argument used for read-only) and 'wronly'
123(argument used for write-only).
124
125Sample usage:
126
127```
128.arg v1,"double_t",double
129.arg v2,double2
130.arg v3,double3
131.arg v23,image2d,
132.arg v30,image2d,,5
133.arg v41,ulong16  *,global
134.arg v42,ulong16  *,global, restrict
135.arg v57,structure*,82,global
136```
137
138### .bssdata
139
140Syntax: .bssdata [align=ALIGNMENT]
141
142Go to global data bss section. Optional argument sets alignment of section.
143
144### .call_convention
145
146Syntax: .call_convention CALL_CONV
147
148This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`).
149Set call convention for kernel.
150
151### .codeversion
152
153Syntax .codeversion MAJOR, MINOR
154
155This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`).
156Set AMD code version.
157
158### .compile_options
159
160Syntax: .compile_options "STRING"
161
162Set compile options for this binary.
163
164### .config
165
166Open kernel configuration. Must be inside kernel. Kernel configuration can not be
167defined if any isametadata, metadata or stub was defined.
168Following pseudo-ops can be inside kernel config:
169
170* .arg
171* .cws
172* .debugmode
173* .dims
174* .dx10clamp
175* .exceptions
176* .localsize
177* .ieeemode
178* .pgmrsrc1
179* .pgmrsrc2
180* .priority
181* .privmode
182* .sampler
183* .scratchbuffer
184* .setupargs
185* .sgprsnum
186* .tgsize
187* .uavid
188* .useargs
189* .useenqueue
190* .usegeneric
191* .usesetup
192* .vgprsnum
193
194### .control_directive
195
196Open control directive section. This section must be 128 bytes. The content of this
197section will be stored in control_directive field in kernel configuration.
198Must be defined inside kernel.
199
200### .cws
201
202Syntax: .cws SIZEHINT[, SIZEHINT[, SIZEHINT]]
203
204This pseudo-operation must be inside any kernel configuration.
205Set reqd_work_group_size hint for this kernel.
206
207### .debug_private_segment_buffer_sgpr
208
209Syntax: .debug_private_segment_buffer_sgpr SGPRREG
210
211This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
212`debug_private_segment_buffer_sgpr` field in kernel configuration.
213
214### .debug_wavefront_private_segment_offset_sgpr
215
216Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
217
218This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
219`debug_wavefront_private_segment_offset_sgpr` field in kernel configuration.
220
221### .debugmode
222
223This pseudo-operation must be inside any kernel configuration.
224Enable usage of the DEBUG_MODE.
225
226### .dims
227
228Syntax: .dims DIMENSIONS
229
230This pseudo-operation must be inside any kernel configuration. Defines what dimensions
231(from list: x, y, z) will be used to determine space of the kernel execution.
232
233### .driver_version
234
235Syntax: .driver_version VERSION
236
237Set driver version for this binary. Version in form: MajorVersion*100+MinorVersion.
238This pseudo-op replaces driver info.
239
240### .dx10clamp
241
242This pseudo-operation must be inside any kernel configuration.
243Enable usage of the DX10_CLAMP.
244
245### .exceptions
246
247Syntax: .exceptions EXCPMASK
248
249This pseudo-operation must be inside any kernel configuration.
250Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
251
252### .gds_segment_size
253
254Syntax: .gds_segment_size SIZE
255
256This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
257`gds_segment_size` field in kernel configuration.
258
259### .gdssize
260
261Syntax: .gdssize SIZE
262
263This pseudo-operation must be inside any kernel configuration. Set the GDS
264(global data share) size.
265
266### .get_driver_version
267
268Syntax: .get_driver_version SYMBOL
269
270Store current driver version to SYMBOL.
271
272### .globaldata
273
274Go to constant global data section.
275
276### .group_segment_align
277
278Syntax: .group_segment_align ALIGN
279
280This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
281`group_segment_align` field in kernel configuration.
282
283### .hsaconfig
284
285Open kernel HSA configuration. Must be inside kernel. Kernel configuration can not be
286defined if any isametadata, metadata or stub was defined. Do not mix with `.config`.
287
288### .ieeemode
289
290This pseudo-op must be inside any kernel configuration. Set ieee-mode.
291
292### .inner
293
294Go to inner binary place. By default assembler is in main binary.
295
296### .isametadata
297
298This pseudo-operation must be inside kernel. Go to ISA metadata content
299(only older driver binaries).
300
301### .kernarg_segment_align
302
303Syntax: .kernarg_segment_align ALIGN
304
305This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
306`kernarg_segment_alignment` field in kernel configuration. Value must be a power of two.
307
308### .kernarg_segment_size
309
310Syntax: .kernarg_segment_size SIZE
311
312This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
313`kernarg_segment_byte_size` field in kernel configuration.
314
315### .kernel_code_entry_offset
316
317Syntax: .kernel_code_entry_offset OFFSET
318
319This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
320`kernel_code_entry_byte_offset` field in kernel configuration. This field
321store offset between configuration and kernel code. By default is 256.
322
323### .kernel_code_prefetch_offset
324
325Syntax: .kernel_code_prefetch_offset OFFSET
326
327This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
328`kernel_code_prefetch_byte_offset` field in kernel configuration.
329
330### .kernel_code_prefetch_size
331
332Syntax: .kernel_code_prefetch_size OFFSET
333
334This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
335`kernel_code_prefetch_byte_size` field in kernel configuration.
336
337### .localsize
338
339Syntax: .localsize SIZE
340
341This pseudo-operation must be inside any kernel configuration. Set the initial
342local data size.
343
344### .machine
345
346Syntax: .machine KIND, MAJOR, MINOR, STEPPING
347
348This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
349machine version fields in kernel configuration.
350
351### .max_scratch_backing_memory
352
353Syntax: .max_scratch_backing_memory SIZE
354
355This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
356`max_scratch_backing_memory_byte_size` field in kernel configuration.
357
358### .metadata
359
360This pseudo-operation must be inside kernel. Go to metadata content.
361
362### .pgmrsrc1
363
364Syntax: .pgmrsrc1 VALUE
365
366This pseudo-operation must be inside kernel.
367Defines value of the PGMRSRC1.
368
369
370### .pgmrsrc2
371
372Syntax: .pgmrsrc2 VALUE
373
374This pseudo-operation must be inside any kernel configuration. Set PGMRSRC2 value.
375If dimensions is set then bits that controls dimension setup will be ignored.
376SCRATCH_EN bit will be ignored.
377
378### .priority
379
380Syntax: .priority PRIORITY
381
382This pseudo-operation must be inside kernel. Defines priority (0-3).
383
384### .private_elem_size
385
386Syntax: .private_elem_size ELEMSIZE
387
388This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`).
389Set `private_element_size` field in kernel configuration.
390Must be a power of two between 2 and 16.
391
392### .private_segment_align
393
394Syntax: .private_segment ALIGN
395
396This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
397`private_segment_alignment` field in kernel configuration. Value must be a power of two.
398
399### .privmode
400
401This pseudo-operation must be inside kernel.
402Enable usage of the PRIV (privileged mode).
403
404### .reserved_sgprs
405
406Syntax: .reserved_sgprs FIRSTREG, LASTREG
407
408This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
409`reserved_sgpr_first` and `reserved_sgpr_count` fields in kernel configuration.
410`reserved_sgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
411
412### .reserved_vgprs
413
414Syntax: .reserved_vgprs FIRSTREG, LASTREG
415
416This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
417`reserved_vgpr_first` and `reserved_vgpr_count` fields in kernel configuration.
418`reserved_vgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
419
420### .runtime_loader_kernel_symbol
421
422Syntax: .runtime_loader_kernel_symbol ADDRESS
423
424This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
425`runtime_loader_kernel_symbol` field in kernel configuration.
426
427### .rwdata
428
429Go to read-write global data section.
430
431### .sampler
432
433Syntax: .sampler VALUE,...
434
435Inside main and inner binary: add sampler definitions.
436Only legal when no samplerinit section. Inside kernel configuration:
437add samplers to kernel (values are sampler ids).
438
439### .samplerinit
440
441Go to samplerinit content section. Only legal if no sampler definitions.
442
443### .samplerreloc
444
445Syntax: .samplerreloc OFFSET, SAMPLERID
446
447Add sampler relocation that points to constant global data (rodata).
448
449### .scratchbuffer
450
451Syntax: .scratchbuffer SIZE
452
453This pseudo-operation must be inside any kernel configuration.
454Set scratchbuffer size.
455
456### .setup
457
458Go to kernel setup content section.
459
460### .setupargs
461
462This pseudo-op must be inside any kernel configuration. Add first kernel setup arguments.
463This pseudo-op must be before any other arguments.
464
465### .sgprsnum
466
467Syntax: .sgprsnum REGNUM
468
469This pseudo-op must be inside any kernel configuration. Set number of scalar
470registers which can be used during kernel execution.
471
472### .stub
473
474Go to kernel stub content section. Only allowed for older driver version binaries.
475
476### .tgsize
477
478This pseudo-op must be inside any kernel configuration.
479Enable usage of the TG_SIZE_EN.
480
481### .use_debug_enabled
482
483This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
484`is_debug_enabled` field in kernel configuration.
485
486### .use_dispatch_id
487
488This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
489`enable_sgpr_dispatch_id` field in kernel configuration.
490
491### .use_dispatch_ptr
492
493This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
494`enable_sgpr_dispatch_ptr` field in kernel configuration.
495
496### .use_dynamic_call_stack
497
498This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
499`is_dynamic_call_stack` field in kernel configuration.
500
501### .use_flat_scratch_init
502
503This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
504`enable_sgpr_flat_scratch_init` field in kernel configuration.
505
506### .use_grid_workgroup_count
507
508Syntax: .use_grid_workgroup_count DIMENSIONS
509
510This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
511`enable_sgpr_grid_workgroup_count_X`, `enable_sgpr_grid_workgroup_count_Y`
512and `enable_sgpr_grid_workgroup_count_Z` fields in kernel configuration,
513respectively by given dimensions.
514
515### .use_kernarg_segment_ptr
516
517This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
518`enable_sgpr_kernarg_segment_ptr` field in kernel configuration.
519
520### .use_ordered_append_gds
521
522This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
523`enable_ordered_append_gds` field in kernel configuration.
524
525### .use_private_segment_buffer
526
527This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
528`enable_sgpr_private_segment_buffer` field in kernel configuration.
529
530### .use_private_segment_size
531
532This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
533`enable_sgpr_private_segment_size` field in kernel configuration.
534
535### .use_ptr64
536
537This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`).
538Enable `is_ptr64` field in kernel configuration.
539
540### .use_queue_ptr
541
542This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
543`enable_sgpr_queue_ptr` field in kernel configuration.
544
545### .use_xnack_enabled
546
547This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
548`is_xnack_enabled` field in kernel configuration.
549
550### .useargs
551
552This pseudo-op must be inside any kernel (non-HSA) configuration.
553Indicate that kernel uses arguments.
554
555### .useenqueue
556
557This pseudo-op must be inside any kernel (non-HSA) configuration.
558Indicate that kernel uses enqueue mechanism.
559
560### .usegeneric
561
562This pseudo-op must be inside any kernel (non-HSA) configuration.
563Indicate that kernel uses generic pointers mechanism (FLAT instructions).
564
565### .usesetup
566
567This pseudo-op must be inside any kernel (non-HSA) configuration.
568Indicate that kernel uses setup data (global sizes, local sizes, work groups num).
569
570### .userdatanum
571
572Syntax: .userdatanum NUMBER
573
574This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set number of
575registers for USERDATA.
576
577### .vgprsnum
578
579Syntax: .vgprsnum REGNUM
580
581This pseudo-op must be inside any kernel configuration. Set number of vector
582registers which can be used during kernel execution.
583
584### .wavefront_sgpr_count
585
586Syntax: .wavefront_sgpr_count REGNUM
587
588This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
589`wavefront_sgpr_count` field in kernel configuration.
590
591### .wavefront_size
592
593Syntax: .wavefront_size POWEROFTWO
594
595This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`).
596Set `wavefront_size` field in kernel configuration. Value must be a power of two.
597
598### .workgroup_fbarrier_count
599
600Syntax: .workgroup_fbarrier_count COUNT
601
602This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
603`workgroup_fbarrier_count` field in kernel configuration.
604
605### .workgroup_group_segment_size
606
607Syntax: .workgroup_group_segment_size SIZE
608
609This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
610`workgroup_group_segment_byte_size` in kernel configuration.
611
612### .workitem_private_segment_size
613
614Syntax: .workitem_private_segment_size SIZE
615
616This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
617`workitem_private_segment_byte_size` field in kernel configuration.
618
619### .workitem_vgpr_count
620
621Syntax: .workitem_vgpr_count REGNUM
622
623This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
624`workitem_vgpr_count` field in kernel configuration.
625
626## Sample code
627
628This is sample example of the kernel setup:
629
630```
631.amdcl2
632.64bit
633.gpu Bonaire
634.driver_version 191205
635.compile_options "-I ./ -cl-std=CL2.0"
636.acl_version "AMD-COMP-LIB-v0.8 (0.0.SC_BUILD_NUMBER)"
637.kernel DCT
638    .metadata
639        .byte 0xe0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
640        ...,
641    .setup
642        .byte 0x01, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00
643        .byte 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
644        ....
645    .text
646/*c0000501         */ s_load_dword    s0, s[4:5], 0x1
647....
648/*bf810000         */ s_endpgm
649```
650
651This is sample of the kernel with configuration:
652
653```
654.amdcl2
655.64bit
656.gpu Bonaire
657.driver_version 191205
658.compile_options "-I ./ -cl-std=CL2.0"
659.acl_version "AMD-COMP-LIB-v0.8 (0.0.SC_BUILD_NUMBER)"
660.kernel DCT
661    .config
662        .dims xy
663        .useargs
664        .usesetup
665        .setupargs
666        .arg output,float*
667        .arg input,float*
668        .arg dct8x8,float*
669        .arg dct8x8_trans,float*
670        .arg inter,float*,local
671        .arg width,uint
672        .arg blockWidth,uint
673        .arg inverse,uint
674        .......
675    .text
676/*c0000501         */ s_load_dword    s0, s[4:5], 0x1
677....
678/*bf810000         */ s_endpgm
679```
Note: See TracBrowser for help on using the repository browser.