source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmRocm.md @ 3799

Last change on this file since 3799 was 3799, checked in by matszpk, 19 months ago

CLRadeonExtender: CLRXDocs: Add info about .gotsym and section differences.

File size: 24.3 KB
Line 
1## CLRadeonExtender Assembler ROCm handling
2
3The ROCm platform is new an open-source  environment created by AMD for Radeon GPU
4(especially designed for HPC and their proffesional products). This platform uses HSACO
5binary object file format to store compiled code for GPU's.
6
7The ROCm binary format implementation and this documentation based on source:
8[ROCm-ComputeABI-Doc](https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc).
9
10## Binary format
11
12The binary file is stored in ELF file. The symbol table holds kernels and data's symbols.
13Main `.text` section contains all code for all kernels. Data
14(for example global constant datas) also stored in `.text' section.
15Kernel symbols points to configuration for kernel. Special offset field in configuration
16data's points where is kernel code.
17
18The assembler source code divided to three parts:
19
20* kernel configuration
21* kernel code and data (in `.text` section)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27Additional kernel informations and binary informations are in metadata ELF note.
28It holds informations about `printf` calls, kernel configuration and its arguments.
29
30## Register usage setup
31
32The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
33This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
34
35## Scalar register allocation
36
37An assembler for ROCm format counts all SGPR registers and add extra registers
38(FLAT_SCRATCH, XNACK_MASK). Special fields determines
39what extra SGPR extra has been added. The VCC register is included by default.
40
41## Expression with sections
42
43An assembler can calculate difference between symbols which present in one of three sections:
44globaldata (rodata) section, code section and GOT (Global Offset Table) section.
45For example, an expression `.-globaldata1` (if globaldata is defined in global data section)
46calculates distance between current position and `globaldata1` place.
47An assembler automcatically found section where symbol points to between code,
48globaldata and GOT.
49
50## List of the specific pseudo-operations
51
52### .arch_minor
53
54Syntax: .arch_minor ARCH_MINOR
55
56Set architecture minor number.
57
58### .arch_stepping
59
60Syntax: .arch_minor ARCH_STEPPING
61
62Set architecture stepping number.
63
64### .arg
65
66Syntax arg: .arg [NAME]\[, "TYPENAME"], SIZE, [ALIGN], VALUEKIND, VALUETYPE[,POINTEEALIGN]\[, ADDRSPACE]\[,ACCQUAL]\[,ACTACCQUAL] \[FLAG1\] \[FLAG2\]...
67
68This pseudo-op must be inside kernel configuration (`.config`).
69Define kernel argument in metadata info. The argument name, type name, alignment are
70optional. The ADDRSPACE is address space and it present only if value kind is
71`globalbuf` or `dynshptr`. The POINTEEALIGN is pointee alignment in bytes and it present
72only if value kind is `dynshptr`. The ACCQUAL defines access qualifier and it present
73only if value kind is `image` or `pipe`. The ACTACCQUAL defines actual access qualifier
74and it present only if value kind is `image`, `pipe` or `globalbuf`.
75The FLAGS is list of flags delimited by spaces.
76
77The list of value kinds:
78
79* complact - hidden competion action
80* defqueue -hidden default command queue
81* dynshptr - dynamic shared pointer (local, private)
82* globalbuf - global buffer
83* gox, globaloffsetx - hidden global offset x
84* goy, globaloffsety - hidden global offset y
85* goz, globaloffsetz - hidden global offset z
86* image - image object
87* none - hidden none to make space between arguments
88* pipe - OpenCL 2.0 pipe object
89* printfbuf - hidden printf buffer
90* queue - command queue
91* sampler - image sampler
92* value - ByValue - argument holds value (integer, floats)
93
94The list of value types:
95
96* i8, char - signed 8-bit integer
97* i16, short - signed 16-bit integer
98* i32, int - signed 32-bit integer
99* i64, long - signed 64-bit integer
100* u8, uchar - unsigned 8-bit integer
101* u16, ushort - unsigned 16-bit integer
102* u32, uint - unsigned 32-bit integer
103* u64, ulong - unsigned 64-bit integer
104* f16, half - 16-bit half floating point
105* f32, float - 32-bit single floating point
106* f64, double - 64-bit double floating point
107* struct - structure
108
109The list of address spaces:
110
111* constant - constant space (???)
112* generic - generic (global or scratch or local)
113* global - global memory
114* local - local memory
115* private - private memory
116* region - ???
117
118This list of access qualifiers:
119
120* default - default access qualifier
121* read_only - read only
122* read_write - read and write
123* write_only - write only
124
125This list of flags:
126
127* const - constant value (only for global buffer)
128* restrict - restrict value (only for global buffer)
129* volatile - volatile (only for global buffer)
130* pipe - only for pipe value kind
131
132### .call_convention
133
134Syntax: .call_convention CALL_CONV
135
136This pseudo-op must be inside kernel configuration (`.config`).
137Set call convention for kernel.
138
139### .codeversion
140
141Syntax .codeversion MAJOR, MINOR
142
143This pseudo-op must be inside kernel configuration (`.config`). Set AMD code version.
144
145### .config
146
147Open kernel configuration. Must be inside kernel.
148
149The kernel metadata info config pseudo-ops:
150
151* .arg - add kernel argument
152* .md_language - kernel language
153* .cws, .reqd_work_group_size - reqd_work_group_size
154* .work_group_size_hint - work_group_size_hint
155* .fixed_work_group_size - fixed work group size
156* .max_flat_work_group_size - max flat work group size
157* .vectypehint - vector type hint
158* .runtime_handle - runtime handle symbol name
159* .md_kernarg_segment_align - kernel argument segment alignment
160* .md_kernarg_segment_size - kernel argument segment size
161* .md_group_segment_fixed_size - group segment fixed size
162* .md_private_segment_fixed_size - private segment fixed size
163* .md_symname - kernel symbol name
164* .md_sgprsnum - number of SGPRs
165* .md_vgprsnum - number of VGPRs
166* .spilledsgprs - number of spilled SGPRs
167* .spilledvgprs - number of spilled VGPRs
168* .md_wavefront_size - wavefront size
169
170### .control_directive
171
172Open control directive section. This section must be 128 bytes. The content of this
173section will be stored in control_directive field in kernel configuration.
174Must be defined inside kernel.
175
176### .cws, .reqd_work_group_size
177
178Syntax: .cws [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]] 
179Syntax: .reqd_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
180
181This pseudo-operation must be inside any kernel configuration.
182Set reqd_work_group_size hint for this kernel in metadata info.
183
184### .debug_private_segment_buffer_sgpr
185
186Syntax: .debug_private_segment_buffer_sgpr SGPRREG
187
188This pseudo-op must be inside kernel configuration (`.config`). Set
189`debug_private_segment_buffer_sgpr` field in kernel configuration.
190
191### .debug_wavefront_private_segment_offset_sgpr
192
193Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
194
195This pseudo-op must be inside kernel configuration (`.config`). Set
196`debug_wavefront_private_segment_offset_sgpr` field in kernel configuration.
197
198### .debugmode
199
200This pseudo-op must be inside kernel configuration (`.config`).
201Enable usage of the DEBUG_MODE.
202
203### .default_hsa_features
204
205This pseudo-op must be inside kernel configuration (`.config`).
206It sets default HSA kernel features and register features (extra SGPR registers usage).
207These default features are `.use_private_segment_buffer`, `.use_dispatch_ptr`,
208`.use_kernarg_segment_ptr`, `.use_ptr64` and private_elem_size to 4 bytes.
209
210### .dims
211
212Syntax: .dims DIMENSIONS
213
214This pseudo-op must be inside kernel configuration (`.config`). Define what dimensions
215(from list: x, y, z) will be used to determine space of the kernel execution.
216
217### .dx10clamp
218
219This pseudo-op must be inside kernel configuration (`.config`).
220Enable usage of the DX10_CLAMP.
221
222### .eflags
223
224Syntax: .eflags EFLAGS
225
226Set value of ELF header e_flags field.
227
228### .exceptions
229
230Syntax: .exceptions EXCPMASK
231
232This pseudo-op must be inside kernel configuration (`.config`).
233Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
234
235### .fixed_work_group_size
236
237Syntax: .fixed_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
238
239This pseudo-operation must be inside any kernel configuration.
240Set fixed_work_group_size for this kernel in metadata info.
241
242### .fkernel
243
244Mark given kernel as function in ROCm. Must be inside kernel.
245
246### .floatmode
247
248Syntax: .floatmode BYTE-VALUE
249
250This pseudo-op must be inside kernel configuration (`.config`). Define float-mode.
251Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
252
253### .gds_segment_size
254
255Syntax: .gds_segment_size SIZE
256
257This pseudo-op must be inside kernel configuration (`.config`). Set
258`gds_segment_size` field in kernel configuration.
259
260### .globaldata
261
262Go to constant global data section (`.rodata`).
263
264### .gotsym
265
266Syntax: .gotsym SYMBOL[, OUTSYMBOL]
267
268Add GOT entry for SYMBOL. A SYMBOL must be defined in global scope. Optionally, pseudo-op
269set position of the GOT entry to OUTSYMBOL if symbol was given.
270
271### .group_segment_align
272
273Syntax: .group_segment_align ALIGN
274
275This pseudo-op must be inside kernel configuration (`.config`). Set
276`group_segment_align` field in kernel configuration.
277
278### .ieeemode
279
280Syntax: .ieeemode
281
282This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
283
284### .kcode
285
286Syntax: .kcode KERNEL1,.... 
287Syntax: .kcode +
288
289Open code that will be belonging to specified kernels. By default any code between
290two consecutive kernel labels belongs to the kernel with first label name.
291This pseudo-operation can change membership of the code to specified kernels.
292You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
293to kernels. The most important reason why this feature has been added is register usage
294calculation. Any kernel given in this pseudo-operation must be already defined.
295
296Sample usage:
297
298```
299.kcode + # this code belongs to all kernels
300.kcodeend
301.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
302    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
303    .kcodeend
304.kcodeend
305```
306
307### .kcodeend
308
309Close `.kcode` clause. Refer to `.kcode`.
310
311### .kernarg_segment_align
312
313Syntax: .kernarg_segment_align ALIGN
314
315This pseudo-op must be inside kernel configuration (`.config`). Set
316`kernarg_segment_alignment` field in kernel configuration. Value must be a power of two.
317
318### .kernarg_segment_size
319
320Syntax: .kernarg_segment_size SIZE
321
322This pseudo-op must be inside kernel configuration (`.config`). Set
323`kernarg_segment_byte_size` field in kernel configuration.
324
325### .kernel_code_entry_offset
326
327Syntax: .kernel_code_entry_offset OFFSET
328
329This pseudo-op must be inside kernel configuration (`.config`). Set
330`kernel_code_entry_byte_offset` field in kernel configuration. This field
331store offset between configuration and kernel code. By default is 256.
332
333### .kernel_code_prefetch_offset
334
335Syntax: .kernel_code_prefetch_offset OFFSET
336
337This pseudo-op must be inside kernel configuration (`.config`). Set
338`kernel_code_prefetch_byte_offset` field in kernel configuration.
339
340### .kernel_code_prefetch_size
341
342Syntax: .kernel_code_prefetch_size OFFSET
343
344This pseudo-op must be inside kernel configuration (`.config`). Set
345`kernel_code_prefetch_byte_size` field in kernel configuration.
346
347### .localsize
348
349Syntax: .localsize SIZE
350
351This pseudo-op must be inside kernel configuration (`.config`). Define initial
352local memory size used by kernel.
353
354### .machine
355
356Syntax: .machine KIND, MAJOR, MINOR, STEPPING
357
358This pseudo-op must be inside kernel configuration (`.config`). Set
359machine version fields in kernel configuration.
360
361### .max_flat_work_group_size
362
363Syntax: .max_flat_work_group_size SIZE
364
365This pseudo-op must be inside kernel configuration (`.config`).
366Set max flat work group size in metadata info.
367
368### .max_scratch_backing_memory
369
370Syntax: .max_scratch_backing_memory SIZE
371
372This pseudo-op must be inside kernel configuration (`.config`). Set
373`max_scratch_backing_memory_byte_size` field in kernel configuration.
374
375### .md_group_segment_fixed_size
376
377Syntax: .md_group_segment_fixed_size SIZE
378
379This pseudo-op must be inside kernel configuration (`.config`).
380Set group segment fixed size in metadata info.
381
382### .md_kernarg_segment_align
383
384Syntax: .md_kernarg_segment_align ALIGNMENT
385
386This pseudo-op must be inside kernel configuration (`.config`).
387Set kernel argument segment alignment in metadata info.
388
389### .md_kernarg_segment_size
390
391Syntax: .md_kernarg_segment_size SIZE
392
393This pseudo-op must be inside kernel configuration (`.config`).
394Set kernel argument segment size in metadata info.
395
396### .md_private_segment_fixed_size
397
398Syntax: .md_private_segment_fixed_size SIZE
399
400This pseudo-op must be inside kernel configuration (`.config`).
401Set private segment fixed size in metadata info.
402
403### .md_symname
404
405Syntax: .md_symname "SYMBOLNAME"
406
407This pseudo-op must be inside kernel configuration (`.config`).
408Set kernel symbol name in metadata info. It should be in format "NAME@kd".
409
410### .md_language
411
412Syntax .md_language "LANGUAGE"[, MAJOR, MINOR]
413
414This pseudo-op must be inside kernel configuration (`.config`).
415Set kernel language and its version in metadata info. The language name is as string.
416
417### .md_sgprsnum
418
419Syntax: .md_sgprsnum REGNUM
420
421This pseudo-op must be inside kernel configuration (`.config`).
422Define number of scalar registers for kernel in metadata info.
423
424### .md_version
425
426Syntax: .md_version MAJOR, MINOR
427
428This pseudo-ops defines metadata format version.
429
430### .md_wavefront_size
431
432Syntax: .md_wavefront_size SIZE
433
434This pseudo-op must be inside kernel configuration (`.config`).
435Define wavefront size in metadata info. If not specified then value get from HSA config.
436
437### .md_vgprsnum
438
439Syntax: .md_vgprsnum REGNUM
440
441This pseudo-op must be inside kernel configuration (`.config`).
442Define number of vector registers for kernel in metadata info.
443
444### .metadata
445
446This pseudo-operation must be inside kernel. Go to metadata (metadata ELF note) section.
447
448### .newbinfmt
449
450This pseudo-ops set new binary format.
451
452### .pgmrsrc1
453
454Syntax: .pgmrsrc1 VALUE
455
456This pseudo-op must be inside kernel configuration (`.config`).
457Define value of the PGMRSRC1.
458
459### .pgmrsrc2
460
461Syntax: .pgmrsrc2 VALUE
462
463This pseudo-op must be inside kernel configuration (`.config`).
464Define value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
465will be ignored. SCRATCH_EN bit will be ignored.
466
467### .printf
468
469Syntax: .printf [ID]\[,ARGSIZE,....],"FORMAT"
470
471This pseudo-op must be inside kernel configuration (`.config`).
472Adds new printf info entry to metadata info. The first argument is ID (must be unique)
473and is optional. Next arguments are argument size for printf call. The last argument
474is format string.
475
476### .priority
477
478Syntax: .priority PRIORITY
479
480This pseudo-op must be inside kernel configuration (`.config`). Define priority (0-3).
481
482### .private_elem_size
483
484Syntax: .private_elem_size ELEMSIZE
485
486This pseudo-op must be inside kernel configuration (`.config`). Set `private_element_size`
487field in kernel configuration. Must be a power of two between 2 and 16.
488
489### .private_segment_align
490
491Syntax: .private_segment ALIGN
492
493This pseudo-op must be inside kernel configuration (`.config`). Set
494`private_segment_alignment` field in kernel configuration. Value must be a power of two.
495
496### .privmode
497
498This pseudo-op must be inside kernel configuration (`.config`).
499Enable usage of the PRIV (privileged mode).
500
501### .reserved_sgprs
502
503Syntax: .reserved_sgprs FIRSTREG, LASTREG
504
505This pseudo-op must be inside kernel configuration (`.config`). Set
506`reserved_sgpr_first` and `reserved_sgpr_count` fields in kernel configuration.
507`reserved_sgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
508
509### .reserved_vgprs
510
511Syntax: .reserved_vgprs FIRSTREG, LASTREG
512
513This pseudo-op must be inside kernel configuration (`.config`). Set
514`reserved_vgpr_first` and `reserved_vgpr_count` fields in kernel configuration.
515`reserved_vgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
516
517### .runtime_handle
518
519Syntax: .runtime_handle "SYMBOLNAME"
520
521This pseudo-op must be inside kernel configuration (`.config`).
522Set runtime handle in metadata info
523
524### .runtime_loader_kernel_symbol
525
526Syntax: .runtime_loader_kernel_symbol ADDRESS
527
528This pseudo-op must be inside kernel configuration (`.config`). Set
529`runtime_loader_kernel_symbol` field in kernel configuration.
530
531### .scratchbuffer
532
533Syntax: .scratchbuffer SIZE
534
535This pseudo-op must be inside kernel configuration (`.config`). Define scratchbuffer size.
536
537### .sgprsnum
538
539Syntax: .sgprsnum REGNUM
540
541This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
542registers which can be used during kernel execution.
543It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
544
545### .spilledsgprs
546
547Syntax: .spilledsgprs REGNUM
548
549This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
550registers to spill in scratch buffer (in metadata info).
551
552### .spilledvgprs
553
554Syntax: .spilledvgprs REGNUM
555
556This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
557registers to spill in scratch buffer (in metadata info).
558
559### .target
560
561Syntax: .target "TARGET"
562
563Set LLVM target with device name. For example: "amdgcn-amd-amdhsa-amdgizcl-gfx803".
564
565### .tgsize
566
567This pseudo-op must be inside kernel configuration (`.config`).
568Enable usage of the TG_SIZE_EN.
569
570### .tripple
571
572Syntax: .tripple "TRIPPLE"
573
574Set LLVM target without device name. For example "amdgcn-amd-amdhsa-amdgizcl" with
575Fiji device generates target "amdgcn-amd-amdhsa-amdgizcl-gfx803".
576
577### .use_debug_enabled
578
579This pseudo-op must be inside kernel configuration (`.config`). Enable `is_debug_enabled`
580field in kernel configuration.
581
582### .use_dispatch_id
583
584This pseudo-op must be inside kernel configuration (`.config`). Enable
585`enable_sgpr_dispatch_id` field in kernel configuration.
586
587### .use_dispatch_ptr
588
589This pseudo-op must be inside kernel configuration (`.config`). Enable
590`enable_sgpr_dispatch_ptr` field in kernel configuration.
591
592### .use_dynamic_call_stack
593
594This pseudo-op must be inside kernel configuration (`.config`). Enable
595`is_dynamic_call_stack` field in kernel configuration.
596
597### .use_flat_scratch_init
598
599This pseudo-op must be inside kernel configuration (`.config`). Enable
600`enable_sgpr_flat_scratch_init` field in kernel configuration.
601
602### .use_grid_workgroup_count
603
604Syntax: .use_grid_workgroup_count DIMENSIONS
605
606This pseudo-op must be inside kernel configuration (`.config`). Enable
607`enable_sgpr_grid_workgroup_count_X`, `enable_sgpr_grid_workgroup_count_Y`
608and `enable_sgpr_grid_workgroup_count_Z` fields in kernel configuration,
609respectively by given dimensions.
610
611### .use_kernarg_segment_ptr
612
613This pseudo-op must be inside kernel configuration (`.config`). Enable
614`enable_sgpr_kernarg_segment_ptr` field in kernel configuration.
615
616### .use_ordered_append_gds
617
618This pseudo-op must be inside kernel configuration (`.config`). Enable
619`enable_ordered_append_gds` field in kernel configuration.
620
621### .use_private_segment_buffer
622
623This pseudo-op must be inside kernel configuration (`.config`). Enable
624`enable_sgpr_private_segment_buffer` field in kernel configuration.
625
626### .use_private_segment_size
627
628This pseudo-op must be inside kernel configuration (`.config`). Enable
629`enable_sgpr_private_segment_size` field in kernel configuration.
630
631### .use_ptr64
632
633This pseudo-op must be inside kernel configuration (`.config`). Enable `is_ptr64` field
634in kernel configuration.
635
636### .use_queue_ptr
637
638This pseudo-op must be inside kernel configuration (`.config`). Enable
639`enable_sgpr_queue_ptr` field in kernel configuration.
640
641### .use_xnack_enabled
642
643This pseudo-op must be inside kernel configuration (`.config`). Enable
644`is_xnack_enabled` field in kernel configuration.
645
646### .userdatanum
647
648Syntax: .userdatanum NUMBER
649
650This pseudo-op must be inside kernel configuration (`.config`). Set number of
651registers for USERDATA.
652
653### .vectypehint
654
655Syntax: .vectypehint "OPENCLTYPE"
656
657This pseudo-op must be inside kernel configuration (`.config`).
658Set vectypehint for kernel in metadata info. The argument is OpenCL type.
659
660### .vgprsnum
661
662Syntax: .vgprsnum REGNUM
663
664This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
665registers which can be used during kernel execution.
666
667### .wavefront_sgpr_count
668
669Syntax: .wavefront_sgpr_count REGNUM
670
671This pseudo-op must be inside kernel configuration (`.config`). Set
672`wavefront_sgpr_count` field in kernel configuration.
673
674### .wavefront_size
675
676Syntax: .wavefront_size POWEROFTWO
677
678This pseudo-op must be inside kernel configuration (`.config`). Set `wavefront_size`
679field in kernel configuration. Value must be a power of two.
680
681### .work_group_size_hint
682
683Syntax: .work_group_size_hint [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
684
685This pseudo-operation must be inside any kernel configuration.
686Set work_group_size_hint for this kernel in metadata info.
687
688### .workgroup_fbarrier_count
689
690Syntax: .workgroup_fbarrier_count COUNT
691
692This pseudo-op must be inside kernel configuration (`.config`). Set
693`workgroup_fbarrier_count` field in kernel configuration.
694
695### .workgroup_group_segment_size
696
697Syntax: .workgroup_group_segment_size SIZE
698
699This pseudo-op must be inside kernel configuration (`.config`). Set
700`workgroup_group_segment_byte_size` in kernel configuration.
701
702### .workitem_private_segment_size
703
704Syntax: .workitem_private_segment_size SIZE
705
706This pseudo-op must be inside kernel configuration (`.config`). Set
707`workitem_private_segment_byte_size` field in kernel configuration.
708
709### .workitem_vgpr_count
710
711Syntax: .workitem_vgpr_count REGNUM
712
713This pseudo-op must be inside kernel configuration (`.config`). Set
714`workitem_vgpr_count` field in kernel configuration.
715
716## Sample code
717
718This is sample example of the kernel setup:
719
720```
721.rocm
722.gpu Carrizo
723.arch_minor 0
724.arch_stepping 1
725.kernel test1
726.kernel test2
727.text
728test1:
729        .byte 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
730        .byte 0x01, 0x00, 0x08, 0x00, 0x00, 0x00, 0x01, 0x00
731        .byte 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
732        .fill 24, 1, 0x00
733        .byte 0x41, 0x00, 0x2c, 0x00, 0x90, 0x00, 0x00, 0x00
734        .byte 0x0b, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00
735        .fill 8, 1, 0x00
736        .byte 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
737        .byte 0x00, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x07, 0x00
738        .fill 8, 1, 0x00
739        .byte 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x06
740        .fill 152, 1, 0x00
741/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
742/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
743....
744```
745
746with kernel configuration:
747
748```
749.rocm
750.gpu Carrizo
751.arch_minor 0
752.arch_stepping 1
753.kernel test1
754    .config
755        .dims x
756        .sgprsnum 16
757        .vgprsnum 8
758        .dx10clamp
759        .floatmode 0xc0
760        .priority 0
761        .userdatanum 8
762        .pgmrsrc1 0x002c0041
763        .pgmrsrc2 0x00000090
764        .codeversion 1, 0
765        .machine 1, 8, 0, 1
766        .kernel_code_entry_offset 0x100
767        .use_private_segment_buffer
768        .use_dispatch_ptr
769        .use_kernarg_segment_ptr
770        .private_elem_size 4
771        .use_ptr64
772        .kernarg_segment_size 8
773        .wavefront_sgpr_count 15
774        .workitem_vgpr_count 7
775        .kernarg_segment_align 16
776        .group_segment_align 16
777        .private_segment_align 16
778        .wavefront_size 64
779        .call_convention 0x0
780    .control_directive          # optional
781        .fill 128, 1, 0x00
782.text
783test1:
784.skip 256           # skip ROCm kernel configuration (required)
785/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
786/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
787/*bf8c007f         */ s_waitcnt       lgkmcnt(0)
788/*8602ff02 0000ffff*/ s_and_b32       s2, s2, 0xffff
789/*92020802         */ s_mul_i32       s2, s2, s8
790/*32000002         */ v_add_u32       v0, vcc, s2, v0
791/*2202009f         */ v_ashrrev_i32   v1, 31, v0
792/*d28f0001 00020082*/ v_lshlrev_b64   v[1:2], 2, v[0:1]
793/*32060200         */ v_add_u32       v3, vcc, s0, v1
794...
795```
796
797The sample with metadata info:
798
799```
800.rocm
801.gpu Fiji
802.arch_minor 0
803.arch_stepping 4
804.eflags 2
805.newbinfmt
806.tripple "amdgcn-amd-amdhsa-amdgizcl"
807.md_version 1, 0
808.kernel vectorAdd
809    .config
810        .dims x
811        .codeversion 1, 1
812        .use_private_segment_buffer
813        .use_dispatch_ptr
814        .use_kernarg_segment_ptr
815        .private_elem_size 4
816        .use_ptr64
817        .kernarg_segment_align 16
818        .group_segment_align 16
819        .private_segment_align 16
820    .control_directive
821        .fill 128, 1, 0x00
822    .config
823        .md_language "OpenCL", 1, 2
824        .arg n, "uint", 4, , value, u32
825        .arg a, "float*", 8, , globalbuf, f32, global, default const volatile
826        .arg b, "float*", 8, , globalbuf, f32, global, default const
827        .arg c, "float*", 8, , globalbuf, f32, global, default
828        .arg , "", 8, , gox, i64
829        .arg , "", 8, , goy, i64
830        .arg , "", 8, , goz, i64
831        .arg , "", 8, , printfbuf, i8
832.text
833vectorAdd:
834.skip 256           # skip ROCm kernel configuration (required)
835...
836```
Note: See TracBrowser for help on using the repository browser.