source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmRocm.md @ 3829

Last change on this file since 3829 was 3829, checked in by matszpk, 18 months ago

CLRadeonExtender: AsmROCm: Add shortcuts to access qualifiers (rdonly, wronly, rdwr).

File size: 24.6 KB
Line 
1## CLRadeonExtender Assembler ROCm handling
2
3The ROCm platform is new an open-source  environment created by AMD for Radeon GPU
4(especially designed for HPC and their proffesional products). This platform uses HSACO
5binary object file format to store compiled code for GPU's.
6
7The ROCm binary format implementation and this documentation based on source:
8[ROCm-ComputeABI-Doc](https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc).
9
10## Binary format
11
12The binary file is stored in ELF file. The symbol table holds kernels and data's symbols.
13Main `.text` section contains all code for all kernels. Data
14(for example global constant datas) also stored in `.text' section.
15Kernel symbols points to configuration for kernel. Special offset field in configuration
16data's points where is kernel code.
17
18The assembler source code divided to three parts:
19
20* kernel configuration
21* kernel code and data (in `.text` section)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27Additional kernel informations and binary informations are in metadata ELF note.
28It holds informations about `printf` calls, kernel configuration and its arguments.
29
30## Register usage setup
31
32The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
33This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
34
35## Scalar register allocation
36
37An assembler for ROCm format counts all SGPR registers and add extra registers
38(FLAT_SCRATCH, XNACK_MASK). Special fields determines
39what extra SGPR extra has been added. The VCC register is included by default.
40
41## Expression with sections
42
43An assembler can calculate difference between symbols which present in one of three sections:
44globaldata (rodata) section, code section and GOT (Global Offset Table) section.
45For example, an expression `.-globaldata1` (if globaldata is defined in global data section)
46calculates distance between current position and `globaldata1` place.
47An assembler automcatically found section where symbol points to between code,
48globaldata and GOT. Because, layout of the sections is not known while assemblying,
49section differences are possible in places where expression can be evaluated later:
50in `.int` or similar pseudo-ops, in the literal values in instructions,
51in the symbol assignments, etc.
52
53## List of the specific pseudo-operations
54
55### .arch_minor
56
57Syntax: .arch_minor ARCH_MINOR
58
59Set architecture minor number.
60
61### .arch_stepping
62
63Syntax: .arch_minor ARCH_STEPPING
64
65Set architecture stepping number.
66
67### .arg
68
69Syntax arg: .arg [NAME]\[, "TYPENAME"], SIZE, [ALIGN], VALUEKIND, VALUETYPE[,POINTEEALIGN]\[, ADDRSPACE]\[,ACCQUAL]\[,ACTACCQUAL] \[FLAG1\] \[FLAG2\]...
70
71This pseudo-op must be inside kernel configuration (`.config`).
72Define kernel argument in metadata info. The argument name, type name, alignment are
73optional. The ADDRSPACE is address space and it present only if value kind is
74`globalbuf` or `dynshptr`. The POINTEEALIGN is pointee alignment in bytes and it present
75only if value kind is `dynshptr`. The ACCQUAL defines access qualifier and it present
76only if value kind is `image` or `pipe`. The ACTACCQUAL defines actual access qualifier
77and it present only if value kind is `image`, `pipe` or `globalbuf`.
78The FLAGS is list of flags delimited by spaces.
79
80The list of value kinds:
81
82* complact - hidden competion action
83* defqueue -hidden default command queue
84* dynshptr - dynamic shared pointer (local, private)
85* globalbuf - global buffer
86* gox, globaloffsetx - hidden global offset x
87* goy, globaloffsety - hidden global offset y
88* goz, globaloffsetz - hidden global offset z
89* image - image object
90* none - hidden none to make space between arguments
91* pipe - OpenCL 2.0 pipe object
92* printfbuf - hidden printf buffer
93* queue - command queue
94* sampler - image sampler
95* value - ByValue - argument holds value (integer, floats)
96
97The list of value types:
98
99* i8, char - signed 8-bit integer
100* i16, short - signed 16-bit integer
101* i32, int - signed 32-bit integer
102* i64, long - signed 64-bit integer
103* u8, uchar - unsigned 8-bit integer
104* u16, ushort - unsigned 16-bit integer
105* u32, uint - unsigned 32-bit integer
106* u64, ulong - unsigned 64-bit integer
107* f16, half - 16-bit half floating point
108* f32, float - 32-bit single floating point
109* f64, double - 64-bit double floating point
110* struct - structure
111
112The list of address spaces:
113
114* constant - constant space (???)
115* generic - generic (global or scratch or local)
116* global - global memory
117* local - local memory
118* private - private memory
119* region - ???
120
121This list of access qualifiers:
122
123* default - default access qualifier
124* read_only, rdonly - read only
125* read_write, rdwr - read and write
126* write_only, wronly - write only
127
128This list of flags:
129
130* const - constant value (only for global buffer)
131* restrict - restrict value (only for global buffer)
132* volatile - volatile (only for global buffer)
133* pipe - only for pipe value kind
134
135### .call_convention
136
137Syntax: .call_convention CALL_CONV
138
139This pseudo-op must be inside kernel configuration (`.config`).
140Set call convention for kernel.
141
142### .codeversion
143
144Syntax .codeversion MAJOR, MINOR
145
146This pseudo-op must be inside kernel configuration (`.config`). Set AMD code version.
147
148### .config
149
150Open kernel configuration. Must be inside kernel.
151
152The kernel metadata info config pseudo-ops:
153
154* .arg - add kernel argument
155* .md_language - kernel language
156* .cws, .reqd_work_group_size - reqd_work_group_size
157* .work_group_size_hint - work_group_size_hint
158* .fixed_work_group_size - fixed work group size
159* .max_flat_work_group_size - max flat work group size
160* .vectypehint - vector type hint
161* .runtime_handle - runtime handle symbol name
162* .md_kernarg_segment_align - kernel argument segment alignment
163* .md_kernarg_segment_size - kernel argument segment size
164* .md_group_segment_fixed_size - group segment fixed size
165* .md_private_segment_fixed_size - private segment fixed size
166* .md_symname - kernel symbol name
167* .md_sgprsnum - number of SGPRs
168* .md_vgprsnum - number of VGPRs
169* .spilledsgprs - number of spilled SGPRs
170* .spilledvgprs - number of spilled VGPRs
171* .md_wavefront_size - wavefront size
172
173### .control_directive
174
175Open control directive section. This section must be 128 bytes. The content of this
176section will be stored in control_directive field in kernel configuration.
177Must be defined inside kernel.
178
179### .cws, .reqd_work_group_size
180
181Syntax: .cws [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]] 
182Syntax: .reqd_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
183
184This pseudo-operation must be inside any kernel configuration.
185Set reqd_work_group_size hint for this kernel in metadata info.
186
187### .debug_private_segment_buffer_sgpr
188
189Syntax: .debug_private_segment_buffer_sgpr SGPRREG
190
191This pseudo-op must be inside kernel configuration (`.config`). Set
192`debug_private_segment_buffer_sgpr` field in kernel configuration.
193
194### .debug_wavefront_private_segment_offset_sgpr
195
196Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
197
198This pseudo-op must be inside kernel configuration (`.config`). Set
199`debug_wavefront_private_segment_offset_sgpr` field in kernel configuration.
200
201### .debugmode
202
203This pseudo-op must be inside kernel configuration (`.config`).
204Enable usage of the DEBUG_MODE.
205
206### .default_hsa_features
207
208This pseudo-op must be inside kernel configuration (`.config`).
209It sets default HSA kernel features and register features (extra SGPR registers usage).
210These default features are `.use_private_segment_buffer`, `.use_dispatch_ptr`,
211`.use_kernarg_segment_ptr`, `.use_ptr64` and private_elem_size to 4 bytes.
212
213### .dims
214
215Syntax: .dims DIMENSIONS
216
217This pseudo-op must be inside kernel configuration (`.config`). Define what dimensions
218(from list: x, y, z) will be used to determine space of the kernel execution.
219
220### .dx10clamp
221
222This pseudo-op must be inside kernel configuration (`.config`).
223Enable usage of the DX10_CLAMP.
224
225### .eflags
226
227Syntax: .eflags EFLAGS
228
229Set value of ELF header e_flags field.
230
231### .exceptions
232
233Syntax: .exceptions EXCPMASK
234
235This pseudo-op must be inside kernel configuration (`.config`).
236Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
237
238### .fixed_work_group_size
239
240Syntax: .fixed_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
241
242This pseudo-operation must be inside any kernel configuration.
243Set fixed_work_group_size for this kernel in metadata info.
244
245### .fkernel
246
247Mark given kernel as function in ROCm. Must be inside kernel.
248
249### .floatmode
250
251Syntax: .floatmode BYTE-VALUE
252
253This pseudo-op must be inside kernel configuration (`.config`). Define float-mode.
254Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
255
256### .gds_segment_size
257
258Syntax: .gds_segment_size SIZE
259
260This pseudo-op must be inside kernel configuration (`.config`). Set
261`gds_segment_size` field in kernel configuration.
262
263### .globaldata
264
265Go to constant global data section (`.rodata`).
266
267### .gotsym
268
269Syntax: .gotsym SYMBOL[, OUTSYMBOL]
270
271Add GOT entry for SYMBOL. A SYMBOL must be defined in global scope. Optionally, pseudo-op
272set position of the GOT entry to OUTSYMBOL if symbol was given. A GOT entry take 8 bytes.
273
274### .group_segment_align
275
276Syntax: .group_segment_align ALIGN
277
278This pseudo-op must be inside kernel configuration (`.config`). Set
279`group_segment_align` field in kernel configuration.
280
281### .ieeemode
282
283Syntax: .ieeemode
284
285This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
286
287### .kcode
288
289Syntax: .kcode KERNEL1,.... 
290Syntax: .kcode +
291
292Open code that will be belonging to specified kernels. By default any code between
293two consecutive kernel labels belongs to the kernel with first label name.
294This pseudo-operation can change membership of the code to specified kernels.
295You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
296to kernels. The most important reason why this feature has been added is register usage
297calculation. Any kernel given in this pseudo-operation must be already defined.
298
299Sample usage:
300
301```
302.kcode + # this code belongs to all kernels
303.kcodeend
304.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
305    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
306    .kcodeend
307.kcodeend
308```
309
310### .kcodeend
311
312Close `.kcode` clause. Refer to `.kcode`.
313
314### .kernarg_segment_align
315
316Syntax: .kernarg_segment_align ALIGN
317
318This pseudo-op must be inside kernel configuration (`.config`). Set
319`kernarg_segment_alignment` field in kernel configuration. Value must be a power of two.
320
321### .kernarg_segment_size
322
323Syntax: .kernarg_segment_size SIZE
324
325This pseudo-op must be inside kernel configuration (`.config`). Set
326`kernarg_segment_byte_size` field in kernel configuration.
327
328### .kernel_code_entry_offset
329
330Syntax: .kernel_code_entry_offset OFFSET
331
332This pseudo-op must be inside kernel configuration (`.config`). Set
333`kernel_code_entry_byte_offset` field in kernel configuration. This field
334store offset between configuration and kernel code. By default is 256.
335
336### .kernel_code_prefetch_offset
337
338Syntax: .kernel_code_prefetch_offset OFFSET
339
340This pseudo-op must be inside kernel configuration (`.config`). Set
341`kernel_code_prefetch_byte_offset` field in kernel configuration.
342
343### .kernel_code_prefetch_size
344
345Syntax: .kernel_code_prefetch_size OFFSET
346
347This pseudo-op must be inside kernel configuration (`.config`). Set
348`kernel_code_prefetch_byte_size` field in kernel configuration.
349
350### .localsize
351
352Syntax: .localsize SIZE
353
354This pseudo-op must be inside kernel configuration (`.config`). Define initial
355local memory size used by kernel.
356
357### .machine
358
359Syntax: .machine KIND, MAJOR, MINOR, STEPPING
360
361This pseudo-op must be inside kernel configuration (`.config`). Set
362machine version fields in kernel configuration.
363
364### .max_flat_work_group_size
365
366Syntax: .max_flat_work_group_size SIZE
367
368This pseudo-op must be inside kernel configuration (`.config`).
369Set max flat work group size in metadata info.
370
371### .max_scratch_backing_memory
372
373Syntax: .max_scratch_backing_memory SIZE
374
375This pseudo-op must be inside kernel configuration (`.config`). Set
376`max_scratch_backing_memory_byte_size` field in kernel configuration.
377
378### .md_group_segment_fixed_size
379
380Syntax: .md_group_segment_fixed_size SIZE
381
382This pseudo-op must be inside kernel configuration (`.config`).
383Set group segment fixed size in metadata info.
384
385### .md_kernarg_segment_align
386
387Syntax: .md_kernarg_segment_align ALIGNMENT
388
389This pseudo-op must be inside kernel configuration (`.config`).
390Set kernel argument segment alignment in metadata info.
391
392### .md_kernarg_segment_size
393
394Syntax: .md_kernarg_segment_size SIZE
395
396This pseudo-op must be inside kernel configuration (`.config`).
397Set kernel argument segment size in metadata info.
398
399### .md_private_segment_fixed_size
400
401Syntax: .md_private_segment_fixed_size SIZE
402
403This pseudo-op must be inside kernel configuration (`.config`).
404Set private segment fixed size in metadata info.
405
406### .md_symname
407
408Syntax: .md_symname "SYMBOLNAME"
409
410This pseudo-op must be inside kernel configuration (`.config`).
411Set kernel symbol name in metadata info. It should be in format "NAME@kd".
412
413### .md_language
414
415Syntax .md_language "LANGUAGE"[, MAJOR, MINOR]
416
417This pseudo-op must be inside kernel configuration (`.config`).
418Set kernel language and its version in metadata info. The language name is as string.
419
420### .md_sgprsnum
421
422Syntax: .md_sgprsnum REGNUM
423
424This pseudo-op must be inside kernel configuration (`.config`).
425Define number of scalar registers for kernel in metadata info.
426
427### .md_version
428
429Syntax: .md_version MAJOR, MINOR
430
431This pseudo-ops defines metadata format version.
432
433### .md_wavefront_size
434
435Syntax: .md_wavefront_size SIZE
436
437This pseudo-op must be inside kernel configuration (`.config`).
438Define wavefront size in metadata info. If not specified then value get from HSA config.
439
440### .md_vgprsnum
441
442Syntax: .md_vgprsnum REGNUM
443
444This pseudo-op must be inside kernel configuration (`.config`).
445Define number of vector registers for kernel in metadata info.
446
447### .metadata
448
449This pseudo-operation must be inside kernel. Go to metadata (metadata ELF note) section.
450
451### .newbinfmt
452
453This pseudo-ops set new binary format.
454
455### .pgmrsrc1
456
457Syntax: .pgmrsrc1 VALUE
458
459This pseudo-op must be inside kernel configuration (`.config`).
460Define value of the PGMRSRC1.
461
462### .pgmrsrc2
463
464Syntax: .pgmrsrc2 VALUE
465
466This pseudo-op must be inside kernel configuration (`.config`).
467Define value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
468will be ignored. SCRATCH_EN bit will be ignored.
469
470### .printf
471
472Syntax: .printf [ID]\[,ARGSIZE,....],"FORMAT"
473
474This pseudo-op must be inside kernel configuration (`.config`).
475Adds new printf info entry to metadata info. The first argument is ID (must be unique)
476and is optional. Next arguments are argument size for printf call. The last argument
477is format string.
478
479### .priority
480
481Syntax: .priority PRIORITY
482
483This pseudo-op must be inside kernel configuration (`.config`). Define priority (0-3).
484
485### .private_elem_size
486
487Syntax: .private_elem_size ELEMSIZE
488
489This pseudo-op must be inside kernel configuration (`.config`). Set `private_element_size`
490field in kernel configuration. Must be a power of two between 2 and 16.
491
492### .private_segment_align
493
494Syntax: .private_segment ALIGN
495
496This pseudo-op must be inside kernel configuration (`.config`). Set
497`private_segment_alignment` field in kernel configuration. Value must be a power of two.
498
499### .privmode
500
501This pseudo-op must be inside kernel configuration (`.config`).
502Enable usage of the PRIV (privileged mode).
503
504### .reserved_sgprs
505
506Syntax: .reserved_sgprs FIRSTREG, LASTREG
507
508This pseudo-op must be inside kernel configuration (`.config`). Set
509`reserved_sgpr_first` and `reserved_sgpr_count` fields in kernel configuration.
510`reserved_sgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
511
512### .reserved_vgprs
513
514Syntax: .reserved_vgprs FIRSTREG, LASTREG
515
516This pseudo-op must be inside kernel configuration (`.config`). Set
517`reserved_vgpr_first` and `reserved_vgpr_count` fields in kernel configuration.
518`reserved_vgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
519
520### .runtime_handle
521
522Syntax: .runtime_handle "SYMBOLNAME"
523
524This pseudo-op must be inside kernel configuration (`.config`).
525Set runtime handle in metadata info
526
527### .runtime_loader_kernel_symbol
528
529Syntax: .runtime_loader_kernel_symbol ADDRESS
530
531This pseudo-op must be inside kernel configuration (`.config`). Set
532`runtime_loader_kernel_symbol` field in kernel configuration.
533
534### .scratchbuffer
535
536Syntax: .scratchbuffer SIZE
537
538This pseudo-op must be inside kernel configuration (`.config`). Define scratchbuffer size.
539
540### .sgprsnum
541
542Syntax: .sgprsnum REGNUM
543
544This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
545registers which can be used during kernel execution.
546It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
547
548### .spilledsgprs
549
550Syntax: .spilledsgprs REGNUM
551
552This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
553registers to spill in scratch buffer (in metadata info).
554
555### .spilledvgprs
556
557Syntax: .spilledvgprs REGNUM
558
559This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
560registers to spill in scratch buffer (in metadata info).
561
562### .target
563
564Syntax: .target "TARGET"
565
566Set LLVM target with device name. For example: "amdgcn-amd-amdhsa-amdgizcl-gfx803".
567
568### .tgsize
569
570This pseudo-op must be inside kernel configuration (`.config`).
571Enable usage of the TG_SIZE_EN.
572
573### .tripple
574
575Syntax: .tripple "TRIPPLE"
576
577Set LLVM target without device name. For example "amdgcn-amd-amdhsa-amdgizcl" with
578Fiji device generates target "amdgcn-amd-amdhsa-amdgizcl-gfx803".
579
580### .use_debug_enabled
581
582This pseudo-op must be inside kernel configuration (`.config`). Enable `is_debug_enabled`
583field in kernel configuration.
584
585### .use_dispatch_id
586
587This pseudo-op must be inside kernel configuration (`.config`). Enable
588`enable_sgpr_dispatch_id` field in kernel configuration.
589
590### .use_dispatch_ptr
591
592This pseudo-op must be inside kernel configuration (`.config`). Enable
593`enable_sgpr_dispatch_ptr` field in kernel configuration.
594
595### .use_dynamic_call_stack
596
597This pseudo-op must be inside kernel configuration (`.config`). Enable
598`is_dynamic_call_stack` field in kernel configuration.
599
600### .use_flat_scratch_init
601
602This pseudo-op must be inside kernel configuration (`.config`). Enable
603`enable_sgpr_flat_scratch_init` field in kernel configuration.
604
605### .use_grid_workgroup_count
606
607Syntax: .use_grid_workgroup_count DIMENSIONS
608
609This pseudo-op must be inside kernel configuration (`.config`). Enable
610`enable_sgpr_grid_workgroup_count_X`, `enable_sgpr_grid_workgroup_count_Y`
611and `enable_sgpr_grid_workgroup_count_Z` fields in kernel configuration,
612respectively by given dimensions.
613
614### .use_kernarg_segment_ptr
615
616This pseudo-op must be inside kernel configuration (`.config`). Enable
617`enable_sgpr_kernarg_segment_ptr` field in kernel configuration.
618
619### .use_ordered_append_gds
620
621This pseudo-op must be inside kernel configuration (`.config`). Enable
622`enable_ordered_append_gds` field in kernel configuration.
623
624### .use_private_segment_buffer
625
626This pseudo-op must be inside kernel configuration (`.config`). Enable
627`enable_sgpr_private_segment_buffer` field in kernel configuration.
628
629### .use_private_segment_size
630
631This pseudo-op must be inside kernel configuration (`.config`). Enable
632`enable_sgpr_private_segment_size` field in kernel configuration.
633
634### .use_ptr64
635
636This pseudo-op must be inside kernel configuration (`.config`). Enable `is_ptr64` field
637in kernel configuration.
638
639### .use_queue_ptr
640
641This pseudo-op must be inside kernel configuration (`.config`). Enable
642`enable_sgpr_queue_ptr` field in kernel configuration.
643
644### .use_xnack_enabled
645
646This pseudo-op must be inside kernel configuration (`.config`). Enable
647`is_xnack_enabled` field in kernel configuration.
648
649### .userdatanum
650
651Syntax: .userdatanum NUMBER
652
653This pseudo-op must be inside kernel configuration (`.config`). Set number of
654registers for USERDATA.
655
656### .vectypehint
657
658Syntax: .vectypehint "OPENCLTYPE"
659
660This pseudo-op must be inside kernel configuration (`.config`).
661Set vectypehint for kernel in metadata info. The argument is OpenCL type.
662
663### .vgprsnum
664
665Syntax: .vgprsnum REGNUM
666
667This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
668registers which can be used during kernel execution.
669
670### .wavefront_sgpr_count
671
672Syntax: .wavefront_sgpr_count REGNUM
673
674This pseudo-op must be inside kernel configuration (`.config`). Set
675`wavefront_sgpr_count` field in kernel configuration.
676
677### .wavefront_size
678
679Syntax: .wavefront_size POWEROFTWO
680
681This pseudo-op must be inside kernel configuration (`.config`). Set `wavefront_size`
682field in kernel configuration. Value must be a power of two.
683
684### .work_group_size_hint
685
686Syntax: .work_group_size_hint [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
687
688This pseudo-operation must be inside any kernel configuration.
689Set work_group_size_hint for this kernel in metadata info.
690
691### .workgroup_fbarrier_count
692
693Syntax: .workgroup_fbarrier_count COUNT
694
695This pseudo-op must be inside kernel configuration (`.config`). Set
696`workgroup_fbarrier_count` field in kernel configuration.
697
698### .workgroup_group_segment_size
699
700Syntax: .workgroup_group_segment_size SIZE
701
702This pseudo-op must be inside kernel configuration (`.config`). Set
703`workgroup_group_segment_byte_size` in kernel configuration.
704
705### .workitem_private_segment_size
706
707Syntax: .workitem_private_segment_size SIZE
708
709This pseudo-op must be inside kernel configuration (`.config`). Set
710`workitem_private_segment_byte_size` field in kernel configuration.
711
712### .workitem_vgpr_count
713
714Syntax: .workitem_vgpr_count REGNUM
715
716This pseudo-op must be inside kernel configuration (`.config`). Set
717`workitem_vgpr_count` field in kernel configuration.
718
719## Sample code
720
721This is sample example of the kernel setup:
722
723```
724.rocm
725.gpu Carrizo
726.arch_minor 0
727.arch_stepping 1
728.kernel test1
729.kernel test2
730.text
731test1:
732        .byte 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
733        .byte 0x01, 0x00, 0x08, 0x00, 0x00, 0x00, 0x01, 0x00
734        .byte 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
735        .fill 24, 1, 0x00
736        .byte 0x41, 0x00, 0x2c, 0x00, 0x90, 0x00, 0x00, 0x00
737        .byte 0x0b, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00
738        .fill 8, 1, 0x00
739        .byte 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
740        .byte 0x00, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x07, 0x00
741        .fill 8, 1, 0x00
742        .byte 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x06
743        .fill 152, 1, 0x00
744/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
745/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
746....
747```
748
749with kernel configuration:
750
751```
752.rocm
753.gpu Carrizo
754.arch_minor 0
755.arch_stepping 1
756.kernel test1
757    .config
758        .dims x
759        .sgprsnum 16
760        .vgprsnum 8
761        .dx10clamp
762        .floatmode 0xc0
763        .priority 0
764        .userdatanum 8
765        .pgmrsrc1 0x002c0041
766        .pgmrsrc2 0x00000090
767        .codeversion 1, 0
768        .machine 1, 8, 0, 1
769        .kernel_code_entry_offset 0x100
770        .use_private_segment_buffer
771        .use_dispatch_ptr
772        .use_kernarg_segment_ptr
773        .private_elem_size 4
774        .use_ptr64
775        .kernarg_segment_size 8
776        .wavefront_sgpr_count 15
777        .workitem_vgpr_count 7
778        .kernarg_segment_align 16
779        .group_segment_align 16
780        .private_segment_align 16
781        .wavefront_size 64
782        .call_convention 0x0
783    .control_directive          # optional
784        .fill 128, 1, 0x00
785.text
786test1:
787.skip 256           # skip ROCm kernel configuration (required)
788/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
789/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
790/*bf8c007f         */ s_waitcnt       lgkmcnt(0)
791/*8602ff02 0000ffff*/ s_and_b32       s2, s2, 0xffff
792/*92020802         */ s_mul_i32       s2, s2, s8
793/*32000002         */ v_add_u32       v0, vcc, s2, v0
794/*2202009f         */ v_ashrrev_i32   v1, 31, v0
795/*d28f0001 00020082*/ v_lshlrev_b64   v[1:2], 2, v[0:1]
796/*32060200         */ v_add_u32       v3, vcc, s0, v1
797...
798```
799
800The sample with metadata info:
801
802```
803.rocm
804.gpu Fiji
805.arch_minor 0
806.arch_stepping 4
807.eflags 2
808.newbinfmt
809.tripple "amdgcn-amd-amdhsa-amdgizcl"
810.md_version 1, 0
811.kernel vectorAdd
812    .config
813        .dims x
814        .codeversion 1, 1
815        .use_private_segment_buffer
816        .use_dispatch_ptr
817        .use_kernarg_segment_ptr
818        .private_elem_size 4
819        .use_ptr64
820        .kernarg_segment_align 16
821        .group_segment_align 16
822        .private_segment_align 16
823    .control_directive
824        .fill 128, 1, 0x00
825    .config
826        .md_language "OpenCL", 1, 2
827        .arg n, "uint", 4, , value, u32
828        .arg a, "float*", 8, , globalbuf, f32, global, default const volatile
829        .arg b, "float*", 8, , globalbuf, f32, global, default const
830        .arg c, "float*", 8, , globalbuf, f32, global, default
831        .arg , "", 8, , gox, i64
832        .arg , "", 8, , goy, i64
833        .arg , "", 8, , goz, i64
834        .arg , "", 8, , printfbuf, i8
835.text
836vectorAdd:
837.skip 256           # skip ROCm kernel configuration (required)
838...
839```
Note: See TracBrowser for help on using the repository browser.