source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmRocm.md @ 3900

Last change on this file since 3900 was 3900, checked in by matszpk, 15 months ago

CLRadeonExtender: AsmROCm: Add '.nosectdiffs' for compatibility with the ROCm behaviour from older assembler's versions.

File size: 24.8 KB
Line 
1## CLRadeonExtender Assembler ROCm handling
2
3The ROCm platform is new an open-source  environment created by AMD for Radeon GPU
4(especially designed for HPC and their proffesional products). This platform uses HSACO
5binary object file format to store compiled code for GPU's.
6
7The ROCm binary format implementation and this documentation based on source:
8[ROCm-ComputeABI-Doc](https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc).
9
10## Binary format
11
12The binary file is stored in ELF file. The symbol table holds kernels and data's symbols.
13Main `.text` section contains all code for all kernels. Data
14(for example global constant datas) also stored in `.text' section.
15Kernel symbols points to configuration for kernel. Special offset field in configuration
16data's points where is kernel code.
17
18The assembler source code divided to three parts:
19
20* kernel configuration
21* kernel code and data (in `.text` section)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27Additional kernel informations and binary informations are in metadata ELF note.
28It holds informations about `printf` calls, kernel configuration and its arguments.
29
30## Register usage setup
31
32The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
33This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
34
35## Scalar register allocation
36
37An assembler for ROCm format counts all SGPR registers and add extra registers
38(FLAT_SCRATCH, XNACK_MASK). Special fields determines
39what extra SGPR extra has been added. The VCC register is included by default.
40
41## Expression with sections
42
43An assembler can calculate difference between symbols which present in one of three sections:
44globaldata (rodata) section, code section and GOT (Global Offset Table) section.
45For example, an expression `.-globaldata1` (if globaldata is defined in global data section)
46calculates distance between current position and `globaldata1` place.
47An assembler automcatically found section where symbol points to between code,
48globaldata and GOT. Because, layout of the sections is not known while assemblying,
49section differences are possible in places where expression can be evaluated later:
50in `.int` or similar pseudo-ops, in the literal values in instructions,
51in the symbol assignments, etc.
52
53## List of the specific pseudo-operations
54
55### .arch_minor
56
57Syntax: .arch_minor ARCH_MINOR
58
59Set architecture minor number.
60
61### .arch_stepping
62
63Syntax: .arch_minor ARCH_STEPPING
64
65Set architecture stepping number.
66
67### .arg
68
69Syntax arg: .arg [NAME]\[, "TYPENAME"], SIZE, [ALIGN], VALUEKIND, VALUETYPE[,POINTEEALIGN]\[, ADDRSPACE]\[,ACCQUAL]\[,ACTACCQUAL] \[FLAG1\] \[FLAG2\]...
70
71This pseudo-op must be inside kernel configuration (`.config`).
72Define kernel argument in metadata info. The argument name, type name, alignment are
73optional. The ADDRSPACE is address space and it present only if value kind is
74`globalbuf` or `dynshptr`. The POINTEEALIGN is pointee alignment in bytes and it present
75only if value kind is `dynshptr`. The ACCQUAL defines access qualifier and it present
76only if value kind is `image` or `pipe`. The ACTACCQUAL defines actual access qualifier
77and it present only if value kind is `image`, `pipe` or `globalbuf`.
78The FLAGS is list of flags delimited by spaces.
79
80The list of value kinds:
81
82* complact - hidden competion action
83* defqueue -hidden default command queue
84* dynshptr - dynamic shared pointer (local, private)
85* globalbuf - global buffer
86* gox, globaloffsetx - hidden global offset x
87* goy, globaloffsety - hidden global offset y
88* goz, globaloffsetz - hidden global offset z
89* image - image object
90* none - hidden none to make space between arguments
91* pipe - OpenCL 2.0 pipe object
92* printfbuf - hidden printf buffer
93* queue - command queue
94* sampler - image sampler
95* value - ByValue - argument holds value (integer, floats)
96
97The list of value types:
98
99* i8, char - signed 8-bit integer
100* i16, short - signed 16-bit integer
101* i32, int - signed 32-bit integer
102* i64, long - signed 64-bit integer
103* u8, uchar - unsigned 8-bit integer
104* u16, ushort - unsigned 16-bit integer
105* u32, uint - unsigned 32-bit integer
106* u64, ulong - unsigned 64-bit integer
107* f16, half - 16-bit half floating point
108* f32, float - 32-bit single floating point
109* f64, double - 64-bit double floating point
110* struct - structure
111
112The list of address spaces:
113
114* constant - constant space (???)
115* generic - generic (global or scratch or local)
116* global - global memory
117* local - local memory
118* private - private memory
119* region - ???
120
121This list of access qualifiers:
122
123* default - default access qualifier
124* read_only, rdonly - read only
125* read_write, rdwr - read and write
126* write_only, wronly - write only
127
128This list of flags:
129
130* const - constant value (only for global buffer)
131* restrict - restrict value (only for global buffer)
132* volatile - volatile (only for global buffer)
133* pipe - only for pipe value kind
134
135### .call_convention
136
137Syntax: .call_convention CALL_CONV
138
139This pseudo-op must be inside kernel configuration (`.config`).
140Set call convention for kernel.
141
142### .codeversion
143
144Syntax .codeversion MAJOR, MINOR
145
146This pseudo-op must be inside kernel configuration (`.config`). Set AMD code version.
147
148### .config
149
150Open kernel configuration. Must be inside kernel.
151
152The kernel metadata info config pseudo-ops:
153
154* .arg - add kernel argument
155* .md_language - kernel language
156* .cws, .reqd_work_group_size - reqd_work_group_size
157* .work_group_size_hint - work_group_size_hint
158* .fixed_work_group_size - fixed work group size
159* .max_flat_work_group_size - max flat work group size
160* .vectypehint - vector type hint
161* .runtime_handle - runtime handle symbol name
162* .md_kernarg_segment_align - kernel argument segment alignment
163* .md_kernarg_segment_size - kernel argument segment size
164* .md_group_segment_fixed_size - group segment fixed size
165* .md_private_segment_fixed_size - private segment fixed size
166* .md_symname - kernel symbol name
167* .md_sgprsnum - number of SGPRs
168* .md_vgprsnum - number of VGPRs
169* .spilledsgprs - number of spilled SGPRs
170* .spilledvgprs - number of spilled VGPRs
171* .md_wavefront_size - wavefront size
172
173### .control_directive
174
175Open control directive section. This section must be 128 bytes. The content of this
176section will be stored in control_directive field in kernel configuration.
177Must be defined inside kernel.
178
179### .cws, .reqd_work_group_size
180
181Syntax: .cws [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]] 
182Syntax: .reqd_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
183
184This pseudo-operation must be inside any kernel configuration.
185Set reqd_work_group_size hint for this kernel in metadata info.
186
187### .debug_private_segment_buffer_sgpr
188
189Syntax: .debug_private_segment_buffer_sgpr SGPRREG
190
191This pseudo-op must be inside kernel configuration (`.config`). Set
192`debug_private_segment_buffer_sgpr` field in kernel configuration.
193
194### .debug_wavefront_private_segment_offset_sgpr
195
196Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
197
198This pseudo-op must be inside kernel configuration (`.config`). Set
199`debug_wavefront_private_segment_offset_sgpr` field in kernel configuration.
200
201### .debugmode
202
203This pseudo-op must be inside kernel configuration (`.config`).
204Enable usage of the DEBUG_MODE.
205
206### .default_hsa_features
207
208This pseudo-op must be inside kernel configuration (`.config`).
209It sets default HSA kernel features and register features (extra SGPR registers usage).
210These default features are `.use_private_segment_buffer`, `.use_dispatch_ptr`,
211`.use_kernarg_segment_ptr`, `.use_ptr64` and private_elem_size to 4 bytes.
212
213### .dims
214
215Syntax: .dims DIMENSIONS
216
217This pseudo-op must be inside kernel configuration (`.config`). Define what dimensions
218(from list: x, y, z) will be used to determine space of the kernel execution.
219
220### .dx10clamp
221
222This pseudo-op must be inside kernel configuration (`.config`).
223Enable usage of the DX10_CLAMP.
224
225### .eflags
226
227Syntax: .eflags EFLAGS
228
229Set value of ELF header e_flags field.
230
231### .exceptions
232
233Syntax: .exceptions EXCPMASK
234
235This pseudo-op must be inside kernel configuration (`.config`).
236Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
237
238### .fixed_work_group_size
239
240Syntax: .fixed_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
241
242This pseudo-operation must be inside any kernel configuration.
243Set fixed_work_group_size for this kernel in metadata info.
244
245### .fkernel
246
247Mark given kernel as function in ROCm. Must be inside kernel.
248
249### .floatmode
250
251Syntax: .floatmode BYTE-VALUE
252
253This pseudo-op must be inside kernel configuration (`.config`). Define float-mode.
254Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
255
256### .gds_segment_size
257
258Syntax: .gds_segment_size SIZE
259
260This pseudo-op must be inside kernel configuration (`.config`). Set
261`gds_segment_size` field in kernel configuration.
262
263### .globaldata
264
265Go to constant global data section (`.rodata`).
266
267### .gotsym
268
269Syntax: .gotsym SYMBOL[, OUTSYMBOL]
270
271Add GOT entry for SYMBOL. A SYMBOL must be defined in global scope. Optionally, pseudo-op
272set position of the GOT entry to OUTSYMBOL if symbol was given. A GOT entry take 8 bytes.
273
274### .group_segment_align
275
276Syntax: .group_segment_align ALIGN
277
278This pseudo-op must be inside kernel configuration (`.config`). Set
279`group_segment_align` field in kernel configuration.
280
281### .ieeemode
282
283Syntax: .ieeemode
284
285This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
286
287### .kcode
288
289Syntax: .kcode KERNEL1,.... 
290Syntax: .kcode +
291
292Open code that will be belonging to specified kernels. By default any code between
293two consecutive kernel labels belongs to the kernel with first label name.
294This pseudo-operation can change membership of the code to specified kernels.
295You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
296to kernels. The most important reason why this feature has been added is register usage
297calculation. Any kernel given in this pseudo-operation must be already defined.
298
299Sample usage:
300
301```
302.kcode + # this code belongs to all kernels
303.kcodeend
304.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
305    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
306    .kcodeend
307.kcodeend
308```
309
310### .kcodeend
311
312Close `.kcode` clause. Refer to `.kcode`.
313
314### .kernarg_segment_align
315
316Syntax: .kernarg_segment_align ALIGN
317
318This pseudo-op must be inside kernel configuration (`.config`). Set
319`kernarg_segment_alignment` field in kernel configuration. Value must be a power of two.
320
321### .kernarg_segment_size
322
323Syntax: .kernarg_segment_size SIZE
324
325This pseudo-op must be inside kernel configuration (`.config`). Set
326`kernarg_segment_byte_size` field in kernel configuration.
327
328### .kernel_code_entry_offset
329
330Syntax: .kernel_code_entry_offset OFFSET
331
332This pseudo-op must be inside kernel configuration (`.config`). Set
333`kernel_code_entry_byte_offset` field in kernel configuration. This field
334store offset between configuration and kernel code. By default is 256.
335
336### .kernel_code_prefetch_offset
337
338Syntax: .kernel_code_prefetch_offset OFFSET
339
340This pseudo-op must be inside kernel configuration (`.config`). Set
341`kernel_code_prefetch_byte_offset` field in kernel configuration.
342
343### .kernel_code_prefetch_size
344
345Syntax: .kernel_code_prefetch_size OFFSET
346
347This pseudo-op must be inside kernel configuration (`.config`). Set
348`kernel_code_prefetch_byte_size` field in kernel configuration.
349
350### .localsize
351
352Syntax: .localsize SIZE
353
354This pseudo-op must be inside kernel configuration (`.config`). Define initial
355local memory size used by kernel.
356
357### .machine
358
359Syntax: .machine KIND, MAJOR, MINOR, STEPPING
360
361This pseudo-op must be inside kernel configuration (`.config`). Set
362machine version fields in kernel configuration.
363
364### .max_flat_work_group_size
365
366Syntax: .max_flat_work_group_size SIZE
367
368This pseudo-op must be inside kernel configuration (`.config`).
369Set max flat work group size in metadata info.
370
371### .max_scratch_backing_memory
372
373Syntax: .max_scratch_backing_memory SIZE
374
375This pseudo-op must be inside kernel configuration (`.config`). Set
376`max_scratch_backing_memory_byte_size` field in kernel configuration.
377
378### .md_group_segment_fixed_size
379
380Syntax: .md_group_segment_fixed_size SIZE
381
382This pseudo-op must be inside kernel configuration (`.config`).
383Set group segment fixed size in metadata info.
384
385### .md_kernarg_segment_align
386
387Syntax: .md_kernarg_segment_align ALIGNMENT
388
389This pseudo-op must be inside kernel configuration (`.config`).
390Set kernel argument segment alignment in metadata info.
391
392### .md_kernarg_segment_size
393
394Syntax: .md_kernarg_segment_size SIZE
395
396This pseudo-op must be inside kernel configuration (`.config`).
397Set kernel argument segment size in metadata info.
398
399### .md_private_segment_fixed_size
400
401Syntax: .md_private_segment_fixed_size SIZE
402
403This pseudo-op must be inside kernel configuration (`.config`).
404Set private segment fixed size in metadata info.
405
406### .md_symname
407
408Syntax: .md_symname "SYMBOLNAME"
409
410This pseudo-op must be inside kernel configuration (`.config`).
411Set kernel symbol name in metadata info. It should be in format "NAME@kd".
412
413### .md_language
414
415Syntax .md_language "LANGUAGE"[, MAJOR, MINOR]
416
417This pseudo-op must be inside kernel configuration (`.config`).
418Set kernel language and its version in metadata info. The language name is as string.
419
420### .md_sgprsnum
421
422Syntax: .md_sgprsnum REGNUM
423
424This pseudo-op must be inside kernel configuration (`.config`).
425Define number of scalar registers for kernel in metadata info.
426
427### .md_version
428
429Syntax: .md_version MAJOR, MINOR
430
431This pseudo-ops defines metadata format version.
432
433### .md_wavefront_size
434
435Syntax: .md_wavefront_size SIZE
436
437This pseudo-op must be inside kernel configuration (`.config`).
438Define wavefront size in metadata info. If not specified then value get from HSA config.
439
440### .md_vgprsnum
441
442Syntax: .md_vgprsnum REGNUM
443
444This pseudo-op must be inside kernel configuration (`.config`).
445Define number of vector registers for kernel in metadata info.
446
447### .metadata
448
449This pseudo-operation must be inside kernel. Go to metadata (metadata ELF note) section.
450
451### .newbinfmt
452
453This pseudo-op set new binary format.
454
455### .nosectdiffs
456
457This pseudo-op disable section difference resolving. After disabling it, the global data
458and GOT sections are absolute addressable. This is old ROCm mode for compatibility with
459older an assembler's versions.
460
461### .pgmrsrc1
462
463Syntax: .pgmrsrc1 VALUE
464
465This pseudo-op must be inside kernel configuration (`.config`).
466Define value of the PGMRSRC1.
467
468### .pgmrsrc2
469
470Syntax: .pgmrsrc2 VALUE
471
472This pseudo-op must be inside kernel configuration (`.config`).
473Define value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
474will be ignored. SCRATCH_EN bit will be ignored.
475
476### .printf
477
478Syntax: .printf [ID]\[,ARGSIZE,....],"FORMAT"
479
480This pseudo-op must be inside kernel configuration (`.config`).
481Adds new printf info entry to metadata info. The first argument is ID (must be unique)
482and is optional. Next arguments are argument size for printf call. The last argument
483is format string.
484
485### .priority
486
487Syntax: .priority PRIORITY
488
489This pseudo-op must be inside kernel configuration (`.config`). Define priority (0-3).
490
491### .private_elem_size
492
493Syntax: .private_elem_size ELEMSIZE
494
495This pseudo-op must be inside kernel configuration (`.config`). Set `private_element_size`
496field in kernel configuration. Must be a power of two between 2 and 16.
497
498### .private_segment_align
499
500Syntax: .private_segment ALIGN
501
502This pseudo-op must be inside kernel configuration (`.config`). Set
503`private_segment_alignment` field in kernel configuration. Value must be a power of two.
504
505### .privmode
506
507This pseudo-op must be inside kernel configuration (`.config`).
508Enable usage of the PRIV (privileged mode).
509
510### .reserved_sgprs
511
512Syntax: .reserved_sgprs FIRSTREG, LASTREG
513
514This pseudo-op must be inside kernel configuration (`.config`). Set
515`reserved_sgpr_first` and `reserved_sgpr_count` fields in kernel configuration.
516`reserved_sgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
517
518### .reserved_vgprs
519
520Syntax: .reserved_vgprs FIRSTREG, LASTREG
521
522This pseudo-op must be inside kernel configuration (`.config`). Set
523`reserved_vgpr_first` and `reserved_vgpr_count` fields in kernel configuration.
524`reserved_vgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
525
526### .runtime_handle
527
528Syntax: .runtime_handle "SYMBOLNAME"
529
530This pseudo-op must be inside kernel configuration (`.config`).
531Set runtime handle in metadata info
532
533### .runtime_loader_kernel_symbol
534
535Syntax: .runtime_loader_kernel_symbol ADDRESS
536
537This pseudo-op must be inside kernel configuration (`.config`). Set
538`runtime_loader_kernel_symbol` field in kernel configuration.
539
540### .scratchbuffer
541
542Syntax: .scratchbuffer SIZE
543
544This pseudo-op must be inside kernel configuration (`.config`). Define scratchbuffer size.
545
546### .sgprsnum
547
548Syntax: .sgprsnum REGNUM
549
550This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
551registers which can be used during kernel execution.
552It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
553
554### .spilledsgprs
555
556Syntax: .spilledsgprs REGNUM
557
558This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
559registers to spill in scratch buffer (in metadata info).
560
561### .spilledvgprs
562
563Syntax: .spilledvgprs REGNUM
564
565This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
566registers to spill in scratch buffer (in metadata info).
567
568### .target
569
570Syntax: .target "TARGET"
571
572Set LLVM target with device name. For example: "amdgcn-amd-amdhsa-amdgizcl-gfx803".
573
574### .tgsize
575
576This pseudo-op must be inside kernel configuration (`.config`).
577Enable usage of the TG_SIZE_EN.
578
579### .tripple
580
581Syntax: .tripple "TRIPPLE"
582
583Set LLVM target without device name. For example "amdgcn-amd-amdhsa-amdgizcl" with
584Fiji device generates target "amdgcn-amd-amdhsa-amdgizcl-gfx803".
585
586### .use_debug_enabled
587
588This pseudo-op must be inside kernel configuration (`.config`). Enable `is_debug_enabled`
589field in kernel configuration.
590
591### .use_dispatch_id
592
593This pseudo-op must be inside kernel configuration (`.config`). Enable
594`enable_sgpr_dispatch_id` field in kernel configuration.
595
596### .use_dispatch_ptr
597
598This pseudo-op must be inside kernel configuration (`.config`). Enable
599`enable_sgpr_dispatch_ptr` field in kernel configuration.
600
601### .use_dynamic_call_stack
602
603This pseudo-op must be inside kernel configuration (`.config`). Enable
604`is_dynamic_call_stack` field in kernel configuration.
605
606### .use_flat_scratch_init
607
608This pseudo-op must be inside kernel configuration (`.config`). Enable
609`enable_sgpr_flat_scratch_init` field in kernel configuration.
610
611### .use_grid_workgroup_count
612
613Syntax: .use_grid_workgroup_count DIMENSIONS
614
615This pseudo-op must be inside kernel configuration (`.config`). Enable
616`enable_sgpr_grid_workgroup_count_X`, `enable_sgpr_grid_workgroup_count_Y`
617and `enable_sgpr_grid_workgroup_count_Z` fields in kernel configuration,
618respectively by given dimensions.
619
620### .use_kernarg_segment_ptr
621
622This pseudo-op must be inside kernel configuration (`.config`). Enable
623`enable_sgpr_kernarg_segment_ptr` field in kernel configuration.
624
625### .use_ordered_append_gds
626
627This pseudo-op must be inside kernel configuration (`.config`). Enable
628`enable_ordered_append_gds` field in kernel configuration.
629
630### .use_private_segment_buffer
631
632This pseudo-op must be inside kernel configuration (`.config`). Enable
633`enable_sgpr_private_segment_buffer` field in kernel configuration.
634
635### .use_private_segment_size
636
637This pseudo-op must be inside kernel configuration (`.config`). Enable
638`enable_sgpr_private_segment_size` field in kernel configuration.
639
640### .use_ptr64
641
642This pseudo-op must be inside kernel configuration (`.config`). Enable `is_ptr64` field
643in kernel configuration.
644
645### .use_queue_ptr
646
647This pseudo-op must be inside kernel configuration (`.config`). Enable
648`enable_sgpr_queue_ptr` field in kernel configuration.
649
650### .use_xnack_enabled
651
652This pseudo-op must be inside kernel configuration (`.config`). Enable
653`is_xnack_enabled` field in kernel configuration.
654
655### .userdatanum
656
657Syntax: .userdatanum NUMBER
658
659This pseudo-op must be inside kernel configuration (`.config`). Set number of
660registers for USERDATA.
661
662### .vectypehint
663
664Syntax: .vectypehint "OPENCLTYPE"
665
666This pseudo-op must be inside kernel configuration (`.config`).
667Set vectypehint for kernel in metadata info. The argument is OpenCL type.
668
669### .vgprsnum
670
671Syntax: .vgprsnum REGNUM
672
673This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
674registers which can be used during kernel execution.
675
676### .wavefront_sgpr_count
677
678Syntax: .wavefront_sgpr_count REGNUM
679
680This pseudo-op must be inside kernel configuration (`.config`). Set
681`wavefront_sgpr_count` field in kernel configuration.
682
683### .wavefront_size
684
685Syntax: .wavefront_size POWEROFTWO
686
687This pseudo-op must be inside kernel configuration (`.config`). Set `wavefront_size`
688field in kernel configuration. Value must be a power of two.
689
690### .work_group_size_hint
691
692Syntax: .work_group_size_hint [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
693
694This pseudo-operation must be inside any kernel configuration.
695Set work_group_size_hint for this kernel in metadata info.
696
697### .workgroup_fbarrier_count
698
699Syntax: .workgroup_fbarrier_count COUNT
700
701This pseudo-op must be inside kernel configuration (`.config`). Set
702`workgroup_fbarrier_count` field in kernel configuration.
703
704### .workgroup_group_segment_size
705
706Syntax: .workgroup_group_segment_size SIZE
707
708This pseudo-op must be inside kernel configuration (`.config`). Set
709`workgroup_group_segment_byte_size` in kernel configuration.
710
711### .workitem_private_segment_size
712
713Syntax: .workitem_private_segment_size SIZE
714
715This pseudo-op must be inside kernel configuration (`.config`). Set
716`workitem_private_segment_byte_size` field in kernel configuration.
717
718### .workitem_vgpr_count
719
720Syntax: .workitem_vgpr_count REGNUM
721
722This pseudo-op must be inside kernel configuration (`.config`). Set
723`workitem_vgpr_count` field in kernel configuration.
724
725## Sample code
726
727This is sample example of the kernel setup:
728
729```
730.rocm
731.gpu Carrizo
732.arch_minor 0
733.arch_stepping 1
734.kernel test1
735.kernel test2
736.text
737test1:
738        .byte 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
739        .byte 0x01, 0x00, 0x08, 0x00, 0x00, 0x00, 0x01, 0x00
740        .byte 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
741        .fill 24, 1, 0x00
742        .byte 0x41, 0x00, 0x2c, 0x00, 0x90, 0x00, 0x00, 0x00
743        .byte 0x0b, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00
744        .fill 8, 1, 0x00
745        .byte 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
746        .byte 0x00, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x07, 0x00
747        .fill 8, 1, 0x00
748        .byte 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x06
749        .fill 152, 1, 0x00
750/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
751/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
752....
753```
754
755with kernel configuration:
756
757```
758.rocm
759.gpu Carrizo
760.arch_minor 0
761.arch_stepping 1
762.kernel test1
763    .config
764        .dims x
765        .sgprsnum 16
766        .vgprsnum 8
767        .dx10clamp
768        .floatmode 0xc0
769        .priority 0
770        .userdatanum 8
771        .pgmrsrc1 0x002c0041
772        .pgmrsrc2 0x00000090
773        .codeversion 1, 0
774        .machine 1, 8, 0, 1
775        .kernel_code_entry_offset 0x100
776        .use_private_segment_buffer
777        .use_dispatch_ptr
778        .use_kernarg_segment_ptr
779        .private_elem_size 4
780        .use_ptr64
781        .kernarg_segment_size 8
782        .wavefront_sgpr_count 15
783        .workitem_vgpr_count 7
784        .kernarg_segment_align 16
785        .group_segment_align 16
786        .private_segment_align 16
787        .wavefront_size 64
788        .call_convention 0x0
789    .control_directive          # optional
790        .fill 128, 1, 0x00
791.text
792test1:
793.skip 256           # skip ROCm kernel configuration (required)
794/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
795/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
796/*bf8c007f         */ s_waitcnt       lgkmcnt(0)
797/*8602ff02 0000ffff*/ s_and_b32       s2, s2, 0xffff
798/*92020802         */ s_mul_i32       s2, s2, s8
799/*32000002         */ v_add_u32       v0, vcc, s2, v0
800/*2202009f         */ v_ashrrev_i32   v1, 31, v0
801/*d28f0001 00020082*/ v_lshlrev_b64   v[1:2], 2, v[0:1]
802/*32060200         */ v_add_u32       v3, vcc, s0, v1
803...
804```
805
806The sample with metadata info:
807
808```
809.rocm
810.gpu Fiji
811.arch_minor 0
812.arch_stepping 4
813.eflags 2
814.newbinfmt
815.tripple "amdgcn-amd-amdhsa-amdgizcl"
816.md_version 1, 0
817.kernel vectorAdd
818    .config
819        .dims x
820        .codeversion 1, 1
821        .use_private_segment_buffer
822        .use_dispatch_ptr
823        .use_kernarg_segment_ptr
824        .private_elem_size 4
825        .use_ptr64
826        .kernarg_segment_align 16
827        .group_segment_align 16
828        .private_segment_align 16
829    .control_directive
830        .fill 128, 1, 0x00
831    .config
832        .md_language "OpenCL", 1, 2
833        .arg n, "uint", 4, , value, u32
834        .arg a, "float*", 8, , globalbuf, f32, global, default const volatile
835        .arg b, "float*", 8, , globalbuf, f32, global, default const
836        .arg c, "float*", 8, , globalbuf, f32, global, default
837        .arg , "", 8, , gox, i64
838        .arg , "", 8, , goy, i64
839        .arg , "", 8, , goz, i64
840        .arg , "", 8, , printfbuf, i8
841.text
842vectorAdd:
843.skip 256           # skip ROCm kernel configuration (required)
844...
845```
Note: See TracBrowser for help on using the repository browser.