source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmRocm.md @ 3756

Last change on this file since 3756 was 3756, checked in by matszpk, 16 months ago

CLRadeonExtender: CLRXDocs: Update '.metadata' description.

File size: 23.6 KB
Line 
1## CLRadeonExtender Assembler ROCm handling
2
3The ROCm platform is new an open-source  environment created by AMD for Radeon GPU
4(especially designed for HPC and their proffesional products). This platform uses HSACO
5binary object file format to store compiled code for GPU's.
6
7The ROCm binary format implementation and this documentation based on source:
8[ROCm-ComputeABI-Doc](https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc).
9
10## Binary format
11
12The binary file is stored in ELF file. The symbol table holds kernels and data's symbols.
13Main `.text` section contains all code for all kernels. Data
14(for example global constant datas) also stored in `.text' section.
15Kernel symbols points to configuration for kernel. Special offset field in configuration
16data's points where is kernel code.
17
18The assembler source code divided to three parts:
19
20* kernel configuration
21* kernel code and data (in `.text` section)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27Additional kernel informations and binary informations are in metadata ELF note.
28It holds informations about `printf` calls, kernel configuration and its arguments.
29
30## Register usage setup
31
32The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
33This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
34
35## Scalar register allocation
36
37Assembler for ROCm format counts all SGPR registers and add extra registers
38(FLAT_SCRATCH, XNACK_MASK). Special fields determines
39what extra SGPR extra has been added. The VCC register is included by default.
40
41## List of the specific pseudo-operations
42
43### .arch_minor
44
45Syntax: .arch_minor ARCH_MINOR
46
47Set architecture minor number.
48
49### .arch_stepping
50
51Syntax: .arch_minor ARCH_STEPPING
52
53Set architecture stepping number.
54
55### .arg
56
57Syntax arg: .arg [NAME]\[, "TYPENAME"], SIZE, [ALIGN], VALUEKIND, VALUETYPE[,POINTEEALIGN]\[, ADDRSPACE]\[,ACCQUAL]\[,ACTACCQUAL] \[FLAG1\] \[FLAG2\]...
58
59This pseudo-op must be inside kernel configuration (`.config`).
60Define kernel argument in metadata info. The argument name, type name, alignment are
61optional. The ADDRSPACE is address space and it present only if value kind is
62`globalbuf` or `dynshptr`. The POINTEEALIGN is pointee alignment in bytes and it present
63only if value kind is `dynshptr`. The ACCQUAL defines access qualifier and it present
64only if value kind is `image` or `pipe`. The ACTACCQUAL defines actual access qualifier
65and it present only if value kind is `image`, `pipe` or `globalbuf`.
66The FLAGS is list of flags delimited by spaces.
67
68The list of value kinds:
69
70* complact - hidden competion action
71* defqueue -hidden default command queue
72* dynshptr - dynamic shared pointer (local, private)
73* globalbuf - global buffer
74* gox, globaloffsetx - hidden global offset x
75* goy, globaloffsety - hidden global offset y
76* goz, globaloffsetz - hidden global offset z
77* image - image object
78* none - hidden none to make space between arguments
79* pipe - OpenCL 2.0 pipe object
80* printfbuf - hidden printf buffer
81* queue - command queue
82* sampler - image sampler
83* value - ByValue - argument holds value (integer, floats)
84
85The list of value types:
86
87* i8, char - signed 8-bit integer
88* i16, short - signed 16-bit integer
89* i32, int - signed 32-bit integer
90* i64, long - signed 64-bit integer
91* u8, uchar - unsigned 8-bit integer
92* u16, ushort - unsigned 16-bit integer
93* u32, uint - unsigned 32-bit integer
94* u64, ulong - unsigned 64-bit integer
95* f16, half - 16-bit half floating point
96* f32, float - 32-bit single floating point
97* f64, double - 64-bit double floating point
98* struct - structure
99
100The list of address spaces:
101
102* constant - constant space (???)
103* generic - generic (global or scratch or local)
104* global - global memory
105* local - local memory
106* private - private memory
107* region - ???
108
109This list of access qualifiers:
110
111* default - default access qualifier
112* read_only - read only
113* read_write - read and write
114* write_only - write only
115
116This list of flags:
117
118* const - constant value (only for global buffer)
119* restrict - restrict value (only for global buffer)
120* volatile - volatile (only for global buffer)
121* pipe - only for pipe value kind
122
123### .call_convention
124
125Syntax: .call_convention CALL_CONV
126
127This pseudo-op must be inside kernel configuration (`.config`).
128Set call convention for kernel.
129
130### .codeversion
131
132Syntax .codeversion MAJOR, MINOR
133
134This pseudo-op must be inside kernel configuration (`.config`). Set AMD code version.
135
136### .config
137
138Open kernel configuration. Must be inside kernel.
139
140The kernel metadata info config pseudo-ops:
141
142* .arg - add kernel argument
143* .md_language - kernel language
144* .cws, .reqd_work_group_size - reqd_work_group_size
145* .work_group_size_hint - work_group_size_hint
146* .fixed_work_group_size - fixed work group size
147* .max_flat_work_group_size - max flat work group size
148* .vectypehint - vector type hint
149* .runtime_handle - runtime handle symbol name
150* .md_kernarg_segment_align - kernel argument segment alignment
151* .md_kernarg_segment_size - kernel argument segment size
152* .md_group_segment_fixed_size - group segment fixed size
153* .md_private_segment_fixed_size - private segment fixed size
154* .md_symname - kernel symbol name
155* .md_sgprsnum - number of SGPRs
156* .md_vgprsnum - number of VGPRs
157* .spilledsgprs - number of spilled SGPRs
158* .spilledvgprs - number of spilled VGPRs
159* .md_wavefront_size - wavefront size
160
161### .control_directive
162
163Open control directive section. This section must be 128 bytes. The content of this
164section will be stored in control_directive field in kernel configuration.
165Must be defined inside kernel.
166
167### .cws, .reqd_work_group_size
168
169Syntax: .cws [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]] 
170Syntax: .reqd_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
171
172This pseudo-operation must be inside any kernel configuration.
173Set reqd_work_group_size hint for this kernel in metadata info.
174
175### .debug_private_segment_buffer_sgpr
176
177Syntax: .debug_private_segment_buffer_sgpr SGPRREG
178
179This pseudo-op must be inside kernel configuration (`.config`). Set
180`debug_private_segment_buffer_sgpr` field in kernel configuration.
181
182### .debug_wavefront_private_segment_offset_sgpr
183
184Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
185
186This pseudo-op must be inside kernel configuration (`.config`). Set
187`debug_wavefront_private_segment_offset_sgpr` field in kernel configuration.
188
189### .debugmode
190
191This pseudo-op must be inside kernel configuration (`.config`).
192Enable usage of the DEBUG_MODE.
193
194### .dims
195
196Syntax: .dims DIMENSIONS
197
198This pseudo-op must be inside kernel configuration (`.config`). Define what dimensions
199(from list: x, y, z) will be used to determine space of the kernel execution.
200
201### .dx10clamp
202
203This pseudo-op must be inside kernel configuration (`.config`).
204Enable usage of the DX10_CLAMP.
205
206### .eflags
207
208Syntax: .eflags EFLAGS
209
210Set value of ELF header e_flags field.
211
212### .exceptions
213
214Syntax: .exceptions EXCPMASK
215
216This pseudo-op must be inside kernel configuration (`.config`).
217Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
218
219### .fixed_work_group_size
220
221Syntax: .fixed_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
222
223This pseudo-operation must be inside any kernel configuration.
224Set fixed_work_group_size for this kernel in metadata info.
225
226### .fkernel
227
228Mark given kernel as function in ROCm. Must be inside kernel.
229
230### .floatmode
231
232Syntax: .floatmode BYTE-VALUE
233
234This pseudo-op must be inside kernel configuration (`.config`). Define float-mode.
235Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
236
237### .gds_segment_size
238
239Syntax: .gds_segment_size SIZE
240
241This pseudo-op must be inside kernel configuration (`.config`). Set
242`gds_segment_size` field in kernel configuration.
243
244### .globaldata
245
246Go to constant global data section (`.rodata`).
247
248### .group_segment_align
249
250Syntax: .group_segment_align ALIGN
251
252This pseudo-op must be inside kernel configuration (`.config`). Set
253`group_segment_align` field in kernel configuration.
254
255### .default_hsa_features
256
257This pseudo-op must be inside kernel configuration (`.config`).
258It sets default HSA kernel features and register features (extra SGPR registers usage).
259These default features are `.use_private_segment_buffer`, `.use_dispatch_ptr`,
260`.use_kernarg_segment_ptr`, `.use_ptr64` and private_elem_size to 4 bytes.
261
262### .ieeemode
263
264Syntax: .ieeemode
265
266This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
267
268### .kcode
269
270Syntax: .kcode KERNEL1,.... 
271Syntax: .kcode +
272
273Open code that will be belonging to specified kernels. By default any code between
274two consecutive kernel labels belongs to the kernel with first label name.
275This pseudo-operation can change membership of the code to specified kernels.
276You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
277to kernels. The most important reason why this feature has been added is register usage
278calculation. Any kernel given in this pseudo-operation must be already defined.
279
280Sample usage:
281
282```
283.kcode + # this code belongs to all kernels
284.kcodeend
285.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
286    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
287    .kcodeend
288.kcodeend
289```
290
291### .kcodeend
292
293Close `.kcode` clause. Refer to `.kcode`.
294
295### .kernarg_segment_align
296
297Syntax: .kernarg_segment_align ALIGN
298
299This pseudo-op must be inside kernel configuration (`.config`). Set
300`kernarg_segment_alignment` field in kernel configuration. Value must be a power of two.
301
302### .kernarg_segment_size
303
304Syntax: .kernarg_segment_size SIZE
305
306This pseudo-op must be inside kernel configuration (`.config`). Set
307`kernarg_segment_byte_size` field in kernel configuration.
308
309### .kernel_code_entry_offset
310
311Syntax: .kernel_code_entry_offset OFFSET
312
313This pseudo-op must be inside kernel configuration (`.config`). Set
314`kernel_code_entry_byte_offset` field in kernel configuration. This field
315store offset between configuration and kernel code. By default is 256.
316
317### .kernel_code_prefetch_offset
318
319Syntax: .kernel_code_prefetch_offset OFFSET
320
321This pseudo-op must be inside kernel configuration (`.config`). Set
322`kernel_code_prefetch_byte_offset` field in kernel configuration.
323
324### .kernel_code_prefetch_size
325
326Syntax: .kernel_code_prefetch_size OFFSET
327
328This pseudo-op must be inside kernel configuration (`.config`). Set
329`kernel_code_prefetch_byte_size` field in kernel configuration.
330
331### .localsize
332
333Syntax: .localsize SIZE
334
335This pseudo-op must be inside kernel configuration (`.config`). Define initial
336local memory size used by kernel.
337
338### .machine
339
340Syntax: .machine KIND, MAJOR, MINOR, STEPPING
341
342This pseudo-op must be inside kernel configuration (`.config`). Set
343machine version fields in kernel configuration.
344
345### .max_flat_work_group_size
346
347Syntax: .max_flat_work_group_size SIZE
348
349This pseudo-op must be inside kernel configuration (`.config`).
350Set max flat work group size in metadata info.
351
352### .max_scratch_backing_memory
353
354Syntax: .max_scratch_backing_memory SIZE
355
356This pseudo-op must be inside kernel configuration (`.config`). Set
357`max_scratch_backing_memory_byte_size` field in kernel configuration.
358
359### .md_group_segment_fixed_size
360
361Syntax: .md_group_segment_fixed_size SIZE
362
363This pseudo-op must be inside kernel configuration (`.config`).
364Set group segment fixed size in metadata info.
365
366### .md_kernarg_segment_align
367
368Syntax: .md_kernarg_segment_align ALIGNMENT
369
370This pseudo-op must be inside kernel configuration (`.config`).
371Set kernel argument segment alignment in metadata info.
372
373### .md_kernarg_segment_size
374
375Syntax: .md_kernarg_segment_size SIZE
376
377This pseudo-op must be inside kernel configuration (`.config`).
378Set kernel argument segment size in metadata info.
379
380### .md_private_segment_fixed_size
381
382Syntax: .md_private_segment_fixed_size SIZE
383
384This pseudo-op must be inside kernel configuration (`.config`).
385Set private segment fixed size in metadata info.
386
387### .md_symname
388
389Syntax: .md_symname "SYMBOLNAME"
390
391This pseudo-op must be inside kernel configuration (`.config`).
392Set kernel symbol name in metadata info. It should be in format "NAME@kd".
393
394### .md_language
395
396Syntax .md_language "LANGUAGE"[, MAJOR, MINOR]
397
398This pseudo-op must be inside kernel configuration (`.config`).
399Set kernel language and its version in metadata info. The language name is as string.
400
401### .md_sgprsnum
402
403Syntax: .md_sgprsnum REGNUM
404
405This pseudo-op must be inside kernel configuration (`.config`).
406Define number of scalar registers for kernel in metadata info.
407
408### .md_version
409
410Syntax: .md_version MAJOR, MINOR
411
412This pseudo-ops defines metadata format version.
413
414### .md_wavefront_size
415
416Syntax: .md_wavefront_size SIZE
417
418This pseudo-op must be inside kernel configuration (`.config`).
419Define wavefront size in metadata info. If not specified then value get from HSA config.
420
421### .md_vgprsnum
422
423Syntax: .md_vgprsnum REGNUM
424
425This pseudo-op must be inside kernel configuration (`.config`).
426Define number of vector registers for kernel in metadata info.
427
428### .metadata
429
430This pseudo-operation must be inside kernel. Go to metadata (metadata ELF note) section.
431
432### .newbinfmt
433
434This pseudo-ops set new binary format.
435
436### .pgmrsrc1
437
438Syntax: .pgmrsrc1 VALUE
439
440This pseudo-op must be inside kernel configuration (`.config`).
441Define value of the PGMRSRC1.
442
443### .pgmrsrc2
444
445Syntax: .pgmrsrc2 VALUE
446
447This pseudo-op must be inside kernel configuration (`.config`).
448Define value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
449will be ignored. SCRATCH_EN bit will be ignored.
450
451### .printf
452
453Syntax: .printf [ID]\[,ARGSIZE,....],"FORMAT"
454
455This pseudo-op must be inside kernel configuration (`.config`).
456Adds new printf info entry to metadata info. The first argument is ID (must be unique)
457and is optional. Next arguments are argument size for printf call. The last argument
458is format string.
459
460### .priority
461
462Syntax: .priority PRIORITY
463
464This pseudo-op must be inside kernel configuration (`.config`). Define priority (0-3).
465
466### .private_elem_size
467
468Syntax: .private_elem_size ELEMSIZE
469
470This pseudo-op must be inside kernel configuration (`.config`). Set `private_element_size`
471field in kernel configuration. Must be a power of two between 2 and 16.
472
473### .private_segment_align
474
475Syntax: .private_segment ALIGN
476
477This pseudo-op must be inside kernel configuration (`.config`). Set
478`private_segment_alignment` field in kernel configuration. Value must be a power of two.
479
480### .privmode
481
482This pseudo-op must be inside kernel configuration (`.config`).
483Enable usage of the PRIV (privileged mode).
484
485### .reserved_sgprs
486
487Syntax: .reserved_sgprs FIRSTREG, LASTREG
488
489This pseudo-op must be inside kernel configuration (`.config`). Set
490`reserved_sgpr_first` and `reserved_sgpr_count` fields in kernel configuration.
491`reserved_sgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
492
493### .reserved_vgprs
494
495Syntax: .reserved_vgprs FIRSTREG, LASTREG
496
497This pseudo-op must be inside kernel configuration (`.config`). Set
498`reserved_vgpr_first` and `reserved_vgpr_count` fields in kernel configuration.
499`reserved_vgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
500
501### .runtime_handle
502
503Syntax: .runtime_handle "SYMBOLNAME"
504
505This pseudo-op must be inside kernel configuration (`.config`).
506Set runtime handle in metadata info
507
508### .runtime_loader_kernel_symbol
509
510Syntax: .runtime_loader_kernel_symbol ADDRESS
511
512This pseudo-op must be inside kernel configuration (`.config`). Set
513`runtime_loader_kernel_symbol` field in kernel configuration.
514
515### .scratchbuffer
516
517Syntax: .scratchbuffer SIZE
518
519This pseudo-op must be inside kernel configuration (`.config`). Define scratchbuffer size.
520
521### .sgprsnum
522
523Syntax: .sgprsnum REGNUM
524
525This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
526registers which can be used during kernel execution.
527It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
528
529### .spilledsgprs
530
531Syntax: .spilledsgprs REGNUM
532
533This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
534registers to spill in scratch buffer (in metadata info).
535
536### .spilledvgprs
537
538Syntax: .spilledvgprs REGNUM
539
540This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
541registers to spill in scratch buffer (in metadata info).
542
543### .target
544
545Syntax: .target "TARGET"
546
547Set LLVM target with device name. For example: "amdgcn-amd-amdhsa-amdgizcl-gfx803".
548
549### .tgsize
550
551This pseudo-op must be inside kernel configuration (`.config`).
552Enable usage of the TG_SIZE_EN.
553
554### .tripple
555
556Syntax: .tripple "TRIPPLE"
557
558Set LLVM target without device name. For example "amdgcn-amd-amdhsa-amdgizcl" with
559Fiji device generates target "amdgcn-amd-amdhsa-amdgizcl-gfx803".
560
561### .use_debug_enabled
562
563This pseudo-op must be inside kernel configuration (`.config`). Enable `is_debug_enabled`
564field in kernel configuration.
565
566### .use_dispatch_id
567
568This pseudo-op must be inside kernel configuration (`.config`). Enable
569`enable_sgpr_dispatch_id` field in kernel configuration.
570
571### .use_dispatch_ptr
572
573This pseudo-op must be inside kernel configuration (`.config`). Enable
574`enable_sgpr_dispatch_ptr` field in kernel configuration.
575
576### .use_dynamic_call_stack
577
578This pseudo-op must be inside kernel configuration (`.config`). Enable
579`is_dynamic_call_stack` field in kernel configuration.
580
581### .use_flat_scratch_init
582
583This pseudo-op must be inside kernel configuration (`.config`). Enable
584`enable_sgpr_flat_scratch_init` field in kernel configuration.
585
586### .use_grid_workgroup_count
587
588Syntax: .use_grid_workgroup_count DIMENSIONS
589
590This pseudo-op must be inside kernel configuration (`.config`). Enable
591`enable_sgpr_grid_workgroup_count_X`, `enable_sgpr_grid_workgroup_count_Y`
592and `enable_sgpr_grid_workgroup_count_Z` fields in kernel configuration,
593respectively by given dimensions.
594
595### .use_kernarg_segment_ptr
596
597This pseudo-op must be inside kernel configuration (`.config`). Enable
598`enable_sgpr_kernarg_segment_ptr` field in kernel configuration.
599
600### .use_ordered_append_gds
601
602This pseudo-op must be inside kernel configuration (`.config`). Enable
603`enable_ordered_append_gds` field in kernel configuration.
604
605### .use_private_segment_buffer
606
607This pseudo-op must be inside kernel configuration (`.config`). Enable
608`enable_sgpr_private_segment_buffer` field in kernel configuration.
609
610### .use_private_segment_size
611
612This pseudo-op must be inside kernel configuration (`.config`). Enable
613`enable_sgpr_private_segment_size` field in kernel configuration.
614
615### .use_ptr64
616
617This pseudo-op must be inside kernel configuration (`.config`). Enable `is_ptr64` field
618in kernel configuration.
619
620### .use_queue_ptr
621
622This pseudo-op must be inside kernel configuration (`.config`). Enable
623`enable_sgpr_queue_ptr` field in kernel configuration.
624
625### .use_xnack_enabled
626
627This pseudo-op must be inside kernel configuration (`.config`). Enable
628`is_xnack_enabled` field in kernel configuration.
629
630### .userdatanum
631
632Syntax: .userdatanum NUMBER
633
634This pseudo-op must be inside kernel configuration (`.config`). Set number of
635registers for USERDATA.
636
637### .vectypehint
638
639Syntax: .vectypehint "OPENCLTYPE"
640
641This pseudo-op must be inside kernel configuration (`.config`).
642Set vectypehint for kernel in metadata info. The argument is OpenCL type.
643
644### .vgprsnum
645
646Syntax: .vgprsnum REGNUM
647
648This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
649registers which can be used during kernel execution.
650
651### .wavefront_sgpr_count
652
653Syntax: .wavefront_sgpr_count REGNUM
654
655This pseudo-op must be inside kernel configuration (`.config`). Set
656`wavefront_sgpr_count` field in kernel configuration.
657
658### .wavefront_size
659
660Syntax: .wavefront_size POWEROFTWO
661
662This pseudo-op must be inside kernel configuration (`.config`). Set `wavefront_size`
663field in kernel configuration. Value must be a power of two.
664
665### .work_group_size_hint
666
667Syntax: .work_group_size_hint [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
668
669This pseudo-operation must be inside any kernel configuration.
670Set work_group_size_hint for this kernel in metadata info.
671
672### .workgroup_fbarrier_count
673
674Syntax: .workgroup_fbarrier_count COUNT
675
676This pseudo-op must be inside kernel configuration (`.config`). Set
677`workgroup_fbarrier_count` field in kernel configuration.
678
679### .workgroup_group_segment_size
680
681Syntax: .workgroup_group_segment_size SIZE
682
683This pseudo-op must be inside kernel configuration (`.config`). Set
684`workgroup_group_segment_byte_size` in kernel configuration.
685
686### .workitem_private_segment_size
687
688Syntax: .workitem_private_segment_size SIZE
689
690This pseudo-op must be inside kernel configuration (`.config`). Set
691`workitem_private_segment_byte_size` field in kernel configuration.
692
693### .workitem_vgpr_count
694
695Syntax: .workitem_vgpr_count REGNUM
696
697This pseudo-op must be inside kernel configuration (`.config`). Set
698`workitem_vgpr_count` field in kernel configuration.
699
700## Sample code
701
702This is sample example of the kernel setup:
703
704```
705.rocm
706.gpu Carrizo
707.arch_minor 0
708.arch_stepping 1
709.kernel test1
710.kernel test2
711.text
712test1:
713        .byte 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
714        .byte 0x01, 0x00, 0x08, 0x00, 0x00, 0x00, 0x01, 0x00
715        .byte 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
716        .fill 24, 1, 0x00
717        .byte 0x41, 0x00, 0x2c, 0x00, 0x90, 0x00, 0x00, 0x00
718        .byte 0x0b, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00
719        .fill 8, 1, 0x00
720        .byte 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
721        .byte 0x00, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x07, 0x00
722        .fill 8, 1, 0x00
723        .byte 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x06
724        .fill 152, 1, 0x00
725/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
726/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
727....
728```
729
730with kernel configuration:
731
732```
733.rocm
734.gpu Carrizo
735.arch_minor 0
736.arch_stepping 1
737.kernel test1
738    .config
739        .dims x
740        .sgprsnum 16
741        .vgprsnum 8
742        .dx10clamp
743        .floatmode 0xc0
744        .priority 0
745        .userdatanum 8
746        .pgmrsrc1 0x002c0041
747        .pgmrsrc2 0x00000090
748        .codeversion 1, 0
749        .machine 1, 8, 0, 1
750        .kernel_code_entry_offset 0x100
751        .use_private_segment_buffer
752        .use_dispatch_ptr
753        .use_kernarg_segment_ptr
754        .private_elem_size 4
755        .use_ptr64
756        .kernarg_segment_size 8
757        .wavefront_sgpr_count 15
758        .workitem_vgpr_count 7
759        .kernarg_segment_align 16
760        .group_segment_align 16
761        .private_segment_align 16
762        .wavefront_size 64
763        .call_convention 0x0
764    .control_directive          # optional
765        .fill 128, 1, 0x00
766.text
767test1:
768.skip 256           # skip ROCm kernel configuration (required)
769/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
770/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
771/*bf8c007f         */ s_waitcnt       lgkmcnt(0)
772/*8602ff02 0000ffff*/ s_and_b32       s2, s2, 0xffff
773/*92020802         */ s_mul_i32       s2, s2, s8
774/*32000002         */ v_add_u32       v0, vcc, s2, v0
775/*2202009f         */ v_ashrrev_i32   v1, 31, v0
776/*d28f0001 00020082*/ v_lshlrev_b64   v[1:2], 2, v[0:1]
777/*32060200         */ v_add_u32       v3, vcc, s0, v1
778...
779```
780
781The sample with metadata info:
782
783```
784.rocm
785.gpu Fiji
786.arch_minor 0
787.arch_stepping 4
788.eflags 2
789.newbinfmt
790.tripple "amdgcn-amd-amdhsa-amdgizcl"
791.md_version 1, 0
792.kernel vectorAdd
793    .config
794        .dims x
795        .codeversion 1, 1
796        .use_private_segment_buffer
797        .use_dispatch_ptr
798        .use_kernarg_segment_ptr
799        .private_elem_size 4
800        .use_ptr64
801        .kernarg_segment_align 16
802        .group_segment_align 16
803        .private_segment_align 16
804    .control_directive
805        .fill 128, 1, 0x00
806    .config
807        .md_language "OpenCL", 1, 2
808        .arg n, "uint", 4, , value, u32
809        .arg a, "float*", 8, , globalbuf, f32, global, default const volatile
810        .arg b, "float*", 8, , globalbuf, f32, global, default const
811        .arg c, "float*", 8, , globalbuf, f32, global, default
812        .arg , "", 8, , gox, i64
813        .arg , "", 8, , goy, i64
814        .arg , "", 8, , goz, i64
815        .arg , "", 8, , printfbuf, i8
816.text
817vectorAdd:
818.skip 256           # skip ROCm kernel configuration (required)
819...
820```
Note: See TracBrowser for help on using the repository browser.