source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmRocm.md

Last change on this file was 4880, checked in by matszpk, 9 days ago

CLRadeonExtender: Add to ROCm HiddenMultiGridSyncArg? value kind of an argument.

File size: 27.2 KB
Line 
1## CLRadeonExtender Assembler ROCm handling
2
3The ROCm platform is new an open-source  environment created by AMD for Radeon GPU
4(especially designed for HPC and their proffesional products). This platform uses HSACO
5binary object file format to store compiled code for GPU's.
6
7The ROCm binary format implementation and this documentation based on source:
8[ROCm-ComputeABI-Doc](https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc).
9
10## Binary format
11
12The binary file is stored in ELF file. The symbol table holds kernels and data's symbols.
13Main `.text` section contains all code for all kernels. Data
14(for example global constant datas) also stored in `.text' section.
15Kernel symbols points to configuration for kernel. Special offset field in configuration
16data's points where is kernel code.
17
18The assembler source code divided to three parts:
19
20* kernel configuration
21* kernel code and data (in `.text` section)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27Additional kernel informations and binary informations are in metadata ELF note.
28It holds informations about `printf` calls, kernel configuration and its arguments.
29
30## Register usage setup
31
32The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
33This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
34
35## Scalar register allocation
36
37An assembler for ROCm format counts all SGPR registers and add extra registers
38(FLAT_SCRATCH, XNACK_MASK). Special fields determines
39what extra SGPR registers (FLAT_SCRATCH, VCC and XNACK_MASK) has been added.
40The VCC register is included by default.
41
42The `.sgprsnum` set number of all SGPRs including VCC, FLAT_SCRATCH and XNACK_MASK.
43
44## Expression with sections
45
46An assembler can calculate difference between symbols which present in one of three sections:
47globaldata (rodata) section, code section and GOT (Global Offset Table) section.
48For example, an expression `.-globaldata1` (if globaldata is defined in global data section)
49calculates distance between current position and `globaldata1` place.
50An assembler automcatically found section where symbol points to between code,
51globaldata and GOT. Because, layout of the sections is not known while assemblying,
52section differences are possible in places where expression can be evaluated later:
53in `.int` or similar pseudo-ops, in the literal values in instructions,
54in the symbol assignments, etc.
55
56## List of the specific pseudo-operations
57
58### .arch_minor
59
60Syntax: .arch_minor ARCH_MINOR
61
62Set architecture minor number.
63
64### .arch_stepping
65
66Syntax: .arch_minor ARCH_STEPPING
67
68Set architecture stepping number.
69
70### .arg
71
72Syntax arg: .arg [NAME]\[, "TYPENAME"], SIZE, [ALIGN], VALUEKIND, VALUETYPE[,POINTEEALIGN]\[, ADDRSPACE]\[,ACCQUAL]\[,ACTACCQUAL] \[FLAG1\] \[FLAG2\]...
73
74This pseudo-op must be inside kernel configuration (`.config`).
75Define kernel argument in metadata info. The argument name, type name, alignment are
76optional. The ADDRSPACE is address space and it present only if value kind is
77`globalbuf` or `dynshptr`. The POINTEEALIGN is pointee alignment in bytes and it present
78only if value kind is `dynshptr`. The ACCQUAL defines access qualifier and it present
79only if value kind is `image` or `pipe`. The ACTACCQUAL defines actual access qualifier
80and it present only if value kind is `image`, `pipe` or `globalbuf`.
81The FLAGS is list of flags delimited by spaces.
82
83The list of value kinds:
84
85* complact - hidden competion action
86* defqueue -hidden default command queue
87* dynshptr - dynamic shared pointer (local, private)
88* globalbuf - global buffer
89* gox, globaloffsetx - hidden global offset x
90* goy, globaloffsety - hidden global offset y
91* goz, globaloffsetz - hidden global offset z
92* image - image object
93* multigridsyncarg - global address to multigrid synchronization
94* none - hidden none to make space between arguments
95* pipe - OpenCL 2.0 pipe object
96* printfbuf - hidden printf buffer
97* queue - command queue
98* sampler - image sampler
99* value - ByValue - argument holds value (integer, floats)
100
101The list of value types:
102
103* i8, char - signed 8-bit integer
104* i16, short - signed 16-bit integer
105* i32, int - signed 32-bit integer
106* i64, long - signed 64-bit integer
107* u8, uchar - unsigned 8-bit integer
108* u16, ushort - unsigned 16-bit integer
109* u32, uint - unsigned 32-bit integer
110* u64, ulong - unsigned 64-bit integer
111* f16, half - 16-bit half floating point
112* f32, float - 32-bit single floating point
113* f64, double - 64-bit double floating point
114* struct - structure
115
116The list of address spaces:
117
118* constant - constant space (???)
119* generic - generic (global or scratch or local)
120* global - global memory
121* local - local memory
122* private - private memory
123* region - ???
124
125This list of access qualifiers:
126
127* default - default access qualifier
128* read_only, rdonly - read only
129* read_write, rdwr - read and write
130* write_only, wronly - write only
131
132This list of flags:
133
134* const - constant value (only for global buffer)
135* restrict - restrict value (only for global buffer)
136* volatile - volatile (only for global buffer)
137* pipe - only for pipe value kind
138
139### .call_convention
140
141Syntax: .call_convention CALL_CONV
142
143This pseudo-op must be inside kernel configuration (`.config`).
144Set call convention for kernel.
145
146### .codeversion
147
148Syntax .codeversion MAJOR, MINOR
149
150This pseudo-op must be inside kernel configuration (`.config`). Set AMD code version.
151
152### .config
153
154Open kernel configuration. Must be inside kernel.
155
156The kernel metadata info config pseudo-ops:
157
158* .arg - add kernel argument
159* .md_language - kernel language
160* .cws, .reqd_work_group_size - reqd_work_group_size
161* .work_group_size_hint - work_group_size_hint
162* .fixed_work_group_size - fixed work group size
163* .max_flat_work_group_size - max flat work group size
164* .vectypehint - vector type hint
165* .runtime_handle - runtime handle symbol name
166* .md_kernarg_segment_align - kernel argument segment alignment
167* .md_kernarg_segment_size - kernel argument segment size
168* .md_group_segment_fixed_size - group segment fixed size
169* .md_private_segment_fixed_size - private segment fixed size
170* .md_symname - kernel symbol name
171* .md_sgprsnum - number of SGPRs
172* .md_vgprsnum - number of VGPRs
173* .spilledsgprs - number of spilled SGPRs
174* .spilledvgprs - number of spilled VGPRs
175* .md_wavefront_size - wavefront size
176
177### .control_directive
178
179Open control directive section. This section must be 128 bytes. The content of this
180section will be stored in control_directive field in kernel configuration.
181Must be defined inside kernel.
182
183### .cws, .reqd_work_group_size
184
185Syntax: .cws [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]] 
186Syntax: .reqd_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
187
188This pseudo-operation must be inside any kernel configuration.
189Set reqd_work_group_size hint for this kernel in metadata info.
190
191### .debug_private_segment_buffer_sgpr
192
193Syntax: .debug_private_segment_buffer_sgpr SGPRREG
194
195This pseudo-op must be inside kernel configuration (`.config`). Set
196`debug_private_segment_buffer_sgpr` field in kernel configuration.
197
198### .debug_wavefront_private_segment_offset_sgpr
199
200Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
201
202This pseudo-op must be inside kernel configuration (`.config`). Set
203`debug_wavefront_private_segment_offset_sgpr` field in kernel configuration.
204
205### .debugmode
206
207This pseudo-op must be inside kernel configuration (`.config`).
208Enable usage of the DEBUG_MODE.
209
210### .default_hsa_features
211
212This pseudo-op must be inside kernel configuration (`.config`).
213It sets default HSA kernel features and register features (extra SGPR registers usage).
214These default features are `.use_private_segment_buffer`, `.use_dispatch_ptr`,
215`.use_kernarg_segment_ptr`, `.use_ptr64` and private_elem_size to 4 bytes.
216
217### .dims
218
219Syntax: .dims DIMENSIONS 
220Syntax: .dims GID_DIMS, LID_DIMS
221
222This pseudo-op must be inside kernel configuration (`.config`). Define what dimensions
223(from list: x, y, z) will be used to determine space of the kernel execution.
224In second syntax form, the dimensions are given for group_id (GID_DIMS) and for local_id
225(LID_DIMS) separately.
226
227### .dx10clamp
228
229This pseudo-op must be inside kernel configuration (`.config`).
230Enable usage of the DX10_CLAMP.
231
232### .eflags
233
234Syntax: .eflags EFLAGS
235
236Set value of ELF header e_flags field.
237
238### .exceptions
239
240Syntax: .exceptions EXCPMASK
241
242This pseudo-op must be inside kernel configuration (`.config`).
243Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
244
245### .fixed_work_group_size
246
247Syntax: .fixed_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
248
249This pseudo-operation must be inside any kernel configuration.
250Set fixed_work_group_size for this kernel in metadata info.
251
252### .fkernel
253
254Mark given kernel as function in ROCm. Must be inside kernel.
255
256### .floatmode
257
258Syntax: .floatmode BYTE-VALUE
259
260This pseudo-op must be inside kernel configuration (`.config`). Define float-mode.
261Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
262
263### .gds_segment_size
264
265Syntax: .gds_segment_size SIZE
266
267This pseudo-op must be inside kernel configuration (`.config`). Set
268`gds_segment_size` field in kernel configuration.
269
270### .globaldata
271
272Go to constant global data section (`.rodata`).
273
274### .gotsym
275
276Syntax: .gotsym SYMBOL[, OUTSYMBOL]
277
278Add GOT entry for SYMBOL. A SYMBOL must be defined in global scope. Optionally, pseudo-op
279set position of the GOT entry to OUTSYMBOL if symbol was given. A GOT entry take 8 bytes.
280
281### .group_segment_align
282
283Syntax: .group_segment_align ALIGN
284
285This pseudo-op must be inside kernel configuration (`.config`). Set
286`group_segment_align` field in kernel configuration.
287
288### .ieeemode
289
290Syntax: .ieeemode
291
292This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
293
294### .kcode
295
296Syntax: .kcode KERNEL1,.... 
297Syntax: .kcode +
298
299Open code that will be belonging to specified kernels. By default any code between
300two consecutive kernel labels belongs to the kernel with first label name.
301This pseudo-operation can change membership of the code to specified kernels.
302You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
303to kernels. The most important reason why this feature has been added is register usage
304calculation. Any kernel given in this pseudo-operation must be already defined.
305
306Sample usage:
307
308```
309.kcode + # this code belongs to all kernels
310.kcodeend
311.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
312    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
313    .kcodeend
314.kcodeend
315```
316
317### .kcodeend
318
319Close `.kcode` clause. Refer to `.kcode`.
320
321### .kernarg_segment_align
322
323Syntax: .kernarg_segment_align ALIGN
324
325This pseudo-op must be inside kernel configuration (`.config`). Set
326`kernarg_segment_alignment` field in kernel configuration. Value must be a power of two.
327
328### .kernarg_segment_size
329
330Syntax: .kernarg_segment_size SIZE
331
332This pseudo-op must be inside kernel configuration (`.config`). Set
333`kernarg_segment_byte_size` field in kernel configuration.
334
335### .kernel_code_entry_offset
336
337Syntax: .kernel_code_entry_offset OFFSET
338
339This pseudo-op must be inside kernel configuration (`.config`). Set
340`kernel_code_entry_byte_offset` field in kernel configuration. This field
341store offset between configuration and kernel code. By default is 256.
342
343### .kernel_code_prefetch_offset
344
345Syntax: .kernel_code_prefetch_offset OFFSET
346
347This pseudo-op must be inside kernel configuration (`.config`). Set
348`kernel_code_prefetch_byte_offset` field in kernel configuration.
349
350### .kernel_code_prefetch_size
351
352Syntax: .kernel_code_prefetch_size OFFSET
353
354This pseudo-op must be inside kernel configuration (`.config`). Set
355`kernel_code_prefetch_byte_size` field in kernel configuration.
356
357### .localsize
358
359Syntax: .localsize SIZE
360
361This pseudo-op must be inside kernel configuration (`.config`). Define initial
362local memory size used by kernel.
363
364### .machine
365
366Syntax: .machine KIND, MAJOR, MINOR, STEPPING
367
368This pseudo-op must be inside kernel configuration (`.config`). Set
369machine version fields in kernel configuration.
370
371### .max_flat_work_group_size
372
373Syntax: .max_flat_work_group_size SIZE
374
375This pseudo-op must be inside kernel configuration (`.config`).
376Set max flat work group size in metadata info.
377
378### .max_scratch_backing_memory
379
380Syntax: .max_scratch_backing_memory SIZE
381
382This pseudo-op must be inside kernel configuration (`.config`). Set
383`max_scratch_backing_memory_byte_size` field in kernel configuration.
384
385### .md_group_segment_fixed_size
386
387Syntax: .md_group_segment_fixed_size SIZE
388
389This pseudo-op must be inside kernel configuration (`.config`).
390Set group segment fixed size in metadata info.
391
392### .md_kernarg_segment_align
393
394Syntax: .md_kernarg_segment_align ALIGNMENT
395
396This pseudo-op must be inside kernel configuration (`.config`).
397Set kernel argument segment alignment in metadata info.
398
399### .md_kernarg_segment_size
400
401Syntax: .md_kernarg_segment_size SIZE
402
403This pseudo-op must be inside kernel configuration (`.config`).
404Set kernel argument segment size in metadata info.
405
406### .md_private_segment_fixed_size
407
408Syntax: .md_private_segment_fixed_size SIZE
409
410This pseudo-op must be inside kernel configuration (`.config`).
411Set private segment fixed size in metadata info.
412
413### .md_symname
414
415Syntax: .md_symname "SYMBOLNAME"
416
417This pseudo-op must be inside kernel configuration (`.config`).
418Set kernel symbol name in metadata info. It should be in format "NAME@kd".
419
420### .md_language
421
422Syntax .md_language "LANGUAGE"[, MAJOR, MINOR]
423
424This pseudo-op must be inside kernel configuration (`.config`).
425Set kernel language and its version in metadata info. The language name is as string.
426
427### .md_sgprsnum
428
429Syntax: .md_sgprsnum REGNUM
430
431This pseudo-op must be inside kernel configuration (`.config`).
432Define number of scalar registers for kernel in metadata info.
433
434### .md_version
435
436Syntax: .md_version MAJOR, MINOR
437
438This pseudo-ops defines metadata format version.
439
440### .md_wavefront_size
441
442Syntax: .md_wavefront_size SIZE
443
444This pseudo-op must be inside kernel configuration (`.config`).
445Define wavefront size in metadata info. If not specified then value get from HSA config.
446
447### .md_vgprsnum
448
449Syntax: .md_vgprsnum REGNUM
450
451This pseudo-op must be inside kernel configuration (`.config`).
452Define number of vector registers for kernel in metadata info.
453
454### .metadata
455
456This pseudo-operation must be inside kernel. Go to metadata (metadata ELF note) section.
457
458### .newbinfmt
459
460This pseudo-op set new binary format.
461
462### .nosectdiffs
463
464This pseudo-op disable section difference resolving. After disabling it, the global data
465and GOT sections are absolute addressable. This is old ROCm mode for compatibility with
466older an assembler's versions.
467
468### .pgmrsrc1
469
470Syntax: .pgmrsrc1 VALUE
471
472This pseudo-op must be inside kernel configuration (`.config`).
473Define value of the PGMRSRC1.
474
475### .pgmrsrc2
476
477Syntax: .pgmrsrc2 VALUE
478
479This pseudo-op must be inside kernel configuration (`.config`).
480Define value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
481will be ignored. SCRATCH_EN bit will be ignored.
482
483### .printf
484
485Syntax: .printf [ID]\[,ARGSIZE,....],"FORMAT"
486
487This pseudo-op must be inside kernel configuration (`.config`).
488Adds new printf info entry to metadata info. The first argument is ID (must be unique)
489and is optional. Next arguments are argument size for printf call. The last argument
490is format string.
491
492### .priority
493
494Syntax: .priority PRIORITY
495
496This pseudo-op must be inside kernel configuration (`.config`). Define priority (0-3).
497
498### .private_elem_size
499
500Syntax: .private_elem_size ELEMSIZE
501
502This pseudo-op must be inside kernel configuration (`.config`). Set `private_element_size`
503field in kernel configuration. Must be a power of two between 2 and 16.
504
505### .private_segment_align
506
507Syntax: .private_segment ALIGN
508
509This pseudo-op must be inside kernel configuration (`.config`). Set
510`private_segment_alignment` field in kernel configuration. Value must be a power of two.
511
512### .privmode
513
514This pseudo-op must be inside kernel configuration (`.config`).
515Enable usage of the PRIV (privileged mode).
516
517### .reserved_sgprs
518
519Syntax: .reserved_sgprs FIRSTREG, LASTREG
520
521This pseudo-op must be inside kernel configuration (`.config`). Set
522`reserved_sgpr_first` and `reserved_sgpr_count` fields in kernel configuration.
523`reserved_sgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
524
525### .reserved_vgprs
526
527Syntax: .reserved_vgprs FIRSTREG, LASTREG
528
529This pseudo-op must be inside kernel configuration (`.config`). Set
530`reserved_vgpr_first` and `reserved_vgpr_count` fields in kernel configuration.
531`reserved_vgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
532
533### .runtime_handle
534
535Syntax: .runtime_handle "SYMBOLNAME"
536
537This pseudo-op must be inside kernel configuration (`.config`).
538Set runtime handle in metadata info
539
540### .runtime_loader_kernel_symbol
541
542Syntax: .runtime_loader_kernel_symbol ADDRESS
543
544This pseudo-op must be inside kernel configuration (`.config`). Set
545`runtime_loader_kernel_symbol` field in kernel configuration.
546
547### .scratchbuffer
548
549Syntax: .scratchbuffer SIZE
550
551This pseudo-op must be inside kernel configuration (`.config`). Define scratchbuffer size.
552
553### .sgprsnum
554
555Syntax: .sgprsnum REGNUM
556
557This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
558registers which can be used during kernel execution.
559It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
560
561### .spilledsgprs
562
563Syntax: .spilledsgprs REGNUM
564
565This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
566registers to spill in scratch buffer (in metadata info).
567
568### .spilledvgprs
569
570Syntax: .spilledvgprs REGNUM
571
572This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
573registers to spill in scratch buffer (in metadata info).
574
575### .target
576
577Syntax: .target "TARGET"
578
579Set LLVM target with device name. For example: "amdgcn-amd-amdhsa-amdgizcl-gfx803".
580
581### .tgsize
582
583This pseudo-op must be inside kernel configuration (`.config`).
584Enable usage of the TG_SIZE_EN.
585
586### .tripple
587
588Syntax: .tripple "TRIPPLE"
589
590Set LLVM target without device name. For example "amdgcn-amd-amdhsa-amdgizcl" with
591Fiji device generates target "amdgcn-amd-amdhsa-amdgizcl-gfx803".
592
593### .use_debug_enabled
594
595This pseudo-op must be inside kernel configuration (`.config`). Enable `is_debug_enabled`
596field in kernel configuration.
597
598### .use_dispatch_id
599
600This pseudo-op must be inside kernel configuration (`.config`). Enable
601`enable_sgpr_dispatch_id` field in kernel configuration.
602
603### .use_dispatch_ptr
604
605This pseudo-op must be inside kernel configuration (`.config`). Enable
606`enable_sgpr_dispatch_ptr` field in kernel configuration.
607
608### .use_dynamic_call_stack
609
610This pseudo-op must be inside kernel configuration (`.config`). Enable
611`is_dynamic_call_stack` field in kernel configuration.
612
613### .use_flat_scratch_init
614
615This pseudo-op must be inside kernel configuration (`.config`). Enable
616`enable_sgpr_flat_scratch_init` field in kernel configuration.
617
618### .use_grid_workgroup_count
619
620Syntax: .use_grid_workgroup_count DIMENSIONS
621
622This pseudo-op must be inside kernel configuration (`.config`). Enable
623`enable_sgpr_grid_workgroup_count_X`, `enable_sgpr_grid_workgroup_count_Y`
624and `enable_sgpr_grid_workgroup_count_Z` fields in kernel configuration,
625respectively by given dimensions.
626
627### .use_kernarg_segment_ptr
628
629This pseudo-op must be inside kernel configuration (`.config`). Enable
630`enable_sgpr_kernarg_segment_ptr` field in kernel configuration.
631
632### .use_ordered_append_gds
633
634This pseudo-op must be inside kernel configuration (`.config`). Enable
635`enable_ordered_append_gds` field in kernel configuration.
636
637### .use_private_segment_buffer
638
639This pseudo-op must be inside kernel configuration (`.config`). Enable
640`enable_sgpr_private_segment_buffer` field in kernel configuration.
641
642### .use_private_segment_size
643
644This pseudo-op must be inside kernel configuration (`.config`). Enable
645`enable_sgpr_private_segment_size` field in kernel configuration.
646
647### .use_ptr64
648
649This pseudo-op must be inside kernel configuration (`.config`). Enable `is_ptr64` field
650in kernel configuration.
651
652### .use_queue_ptr
653
654This pseudo-op must be inside kernel configuration (`.config`). Enable
655`enable_sgpr_queue_ptr` field in kernel configuration.
656
657### .use_xnack_enabled
658
659This pseudo-op must be inside kernel configuration (`.config`). Enable
660`is_xnack_enabled` field in kernel configuration.
661
662### .userdatanum
663
664Syntax: .userdatanum NUMBER
665
666This pseudo-op must be inside kernel configuration (`.config`). Set number of
667registers for USERDATA.
668
669### .vectypehint
670
671Syntax: .vectypehint "OPENCLTYPE"
672
673This pseudo-op must be inside kernel configuration (`.config`).
674Set vectypehint for kernel in metadata info. The argument is OpenCL type.
675
676### .vgprsnum
677
678Syntax: .vgprsnum REGNUM
679
680This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
681registers which can be used during kernel execution.
682
683### .wavefront_sgpr_count
684
685Syntax: .wavefront_sgpr_count REGNUM
686
687This pseudo-op must be inside kernel configuration (`.config`). Set
688`wavefront_sgpr_count` field in kernel configuration.
689
690### .wavefront_size
691
692Syntax: .wavefront_size POWEROFTWO
693
694This pseudo-op must be inside kernel configuration (`.config`). Set `wavefront_size`
695field in kernel configuration. Value must be a power of two.
696
697### .work_group_size_hint
698
699Syntax: .work_group_size_hint [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
700
701This pseudo-operation must be inside any kernel configuration.
702Set work_group_size_hint for this kernel in metadata info.
703
704### .workgroup_fbarrier_count
705
706Syntax: .workgroup_fbarrier_count COUNT
707
708This pseudo-op must be inside kernel configuration (`.config`). Set
709`workgroup_fbarrier_count` field in kernel configuration.
710
711### .workgroup_group_segment_size
712
713Syntax: .workgroup_group_segment_size SIZE
714
715This pseudo-op must be inside kernel configuration (`.config`). Set
716`workgroup_group_segment_byte_size` in kernel configuration.
717
718### .workitem_private_segment_size
719
720Syntax: .workitem_private_segment_size SIZE
721
722This pseudo-op must be inside kernel configuration (`.config`). Set
723`workitem_private_segment_byte_size` field in kernel configuration.
724
725### .workitem_vgpr_count
726
727Syntax: .workitem_vgpr_count REGNUM
728
729This pseudo-op must be inside kernel configuration (`.config`). Set
730`workitem_vgpr_count` field in kernel configuration.
731
732## Sample code
733
734This is sample example of the kernel setup:
735
736```
737.rocm
738.gpu Carrizo
739.arch_minor 0
740.arch_stepping 1
741.kernel test1
742.kernel test2
743.text
744test1:
745        .byte 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
746        .byte 0x01, 0x00, 0x08, 0x00, 0x00, 0x00, 0x01, 0x00
747        .byte 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
748        .fill 24, 1, 0x00
749        .byte 0x41, 0x00, 0x2c, 0x00, 0x90, 0x00, 0x00, 0x00
750        .byte 0x0b, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00
751        .fill 8, 1, 0x00
752        .byte 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
753        .byte 0x00, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x07, 0x00
754        .fill 8, 1, 0x00
755        .byte 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x06
756        .fill 152, 1, 0x00
757/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
758/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
759....
760```
761
762with kernel configuration:
763
764```
765.rocm
766.gpu Carrizo
767.arch_minor 0
768.arch_stepping 1
769.kernel test1
770    .config
771        .dims x
772        .sgprsnum 16
773        .vgprsnum 8
774        .dx10clamp
775        .floatmode 0xc0
776        .priority 0
777        .userdatanum 8
778        .pgmrsrc1 0x002c0041
779        .pgmrsrc2 0x00000090
780        .codeversion 1, 0
781        .machine 1, 8, 0, 1
782        .kernel_code_entry_offset 0x100
783        .use_private_segment_buffer
784        .use_dispatch_ptr
785        .use_kernarg_segment_ptr
786        .private_elem_size 4
787        .use_ptr64
788        .kernarg_segment_size 8
789        .wavefront_sgpr_count 15
790        .workitem_vgpr_count 7
791        .kernarg_segment_align 16
792        .group_segment_align 16
793        .private_segment_align 16
794        .wavefront_size 64
795        .call_convention 0x0
796    .control_directive          # optional
797        .fill 128, 1, 0x00
798.text
799test1:
800.skip 256           # skip ROCm kernel configuration (required)
801/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
802/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
803/*bf8c007f         */ s_waitcnt       lgkmcnt(0)
804/*8602ff02 0000ffff*/ s_and_b32       s2, s2, 0xffff
805/*92020802         */ s_mul_i32       s2, s2, s8
806/*32000002         */ v_add_u32       v0, vcc, s2, v0
807/*2202009f         */ v_ashrrev_i32   v1, 31, v0
808/*d28f0001 00020082*/ v_lshlrev_b64   v[1:2], 2, v[0:1]
809/*32060200         */ v_add_u32       v3, vcc, s0, v1
810...
811```
812
813The sample with metadata info:
814
815```
816.rocm
817.gpu Fiji
818.arch_minor 0
819.arch_stepping 4
820.eflags 2
821.newbinfmt
822.tripple "amdgcn-amd-amdhsa-amdgizcl"
823.md_version 1, 0
824.kernel vectorAdd
825    .config
826        .dims x
827        .codeversion 1, 1
828        .use_private_segment_buffer
829        .use_dispatch_ptr
830        .use_kernarg_segment_ptr
831        .private_elem_size 4
832        .use_ptr64
833        .kernarg_segment_align 16
834        .group_segment_align 16
835        .private_segment_align 16
836    .control_directive
837        .fill 128, 1, 0x00
838    .config
839        .md_language "OpenCL", 1, 2
840        .arg n, "uint", 4, , value, u32
841        .arg a, "float*", 8, , globalbuf, f32, global, default const volatile
842        .arg b, "float*", 8, , globalbuf, f32, global, default const
843        .arg c, "float*", 8, , globalbuf, f32, global, default
844        .arg , "", 8, , gox, i64
845        .arg , "", 8, , goy, i64
846        .arg , "", 8, , goz, i64
847        .arg , "", 8, , printfbuf, i8
848.text
849vectorAdd:
850.skip 256           # skip ROCm kernel configuration (required)
851...
852```
853
854The sample with metadata info with two kernels:
855
856```
857.rocm
858.gpu Fiji
859.arch_minor 0
860.arch_stepping 4
861.eflags 2
862.newbinfmt
863.tripple "amdgcn-amd-amdhsa-amdgizcl"
864.md_version 1, 0
865.kernel vectorAdd
866    .config
867        .dims x
868        .codeversion 1, 1
869        .use_private_segment_buffer
870        .use_dispatch_ptr
871        .use_kernarg_segment_ptr
872        .private_elem_size 4
873        .use_ptr64
874        .kernarg_segment_align 16
875        .group_segment_align 16
876        .private_segment_align 16
877    .control_directive
878        .fill 128, 1, 0x00
879    .config
880        .md_language "OpenCL", 1, 2
881        .arg n, "uint", 4, , value, u32
882        .arg a, "float*", 8, , globalbuf, f32, global, default const volatile
883        .arg b, "float*", 8, , globalbuf, f32, global, default const
884        .arg c, "float*", 8, , globalbuf, f32, global, default
885        .arg , "", 8, , gox, i64
886        .arg , "", 8, , goy, i64
887        .arg , "", 8, , goz, i64
888        .arg , "", 8, , printfbuf, i8
889.kernel vectorAdd2
890    .config
891        .dims x
892        .codeversion 1, 1
893        .use_private_segment_buffer
894        .use_dispatch_ptr
895        .use_kernarg_segment_ptr
896        .private_elem_size 4
897        .use_ptr64
898        .kernarg_segment_align 16
899        .group_segment_align 16
900        .private_segment_align 16
901    .control_directive
902        .fill 128, 1, 0x00
903    .config
904        .md_language "OpenCL", 1, 2
905        .arg n, "uint", 4, , value, u32
906        .arg a, "float*", 8, , globalbuf, f32, global, default const volatile
907        .arg b, "float*", 8, , globalbuf, f32, global, default const
908        .arg c, "float*", 8, , globalbuf, f32, global, default
909        .arg , "", 8, , gox, i64
910        .arg , "", 8, , goy, i64
911        .arg , "", 8, , goz, i64
912        .arg , "", 8, , printfbuf, i8
913.text
914vectorAdd:
915.skip 256           # skip ROCm kernel configuration (required)
916            s_mov_b32 s8, s1
917...
918...
919            s_endpgm
920.p2align 8      # important alignment to 256-byte boundary
921vectorAdd2
922.skip 256
923            s_mov_b32 s8, s1
924...
925...
926            s_endpgm
927```
Note: See TracBrowser for help on using the repository browser.