source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmRocm.md

Last change on this file was 4453, checked in by matszpk, 11 months ago

CLRadeonExtender: CLRXDocs: Update samples for binary formats.

File size: 27.1 KB
Line 
1## CLRadeonExtender Assembler ROCm handling
2
3The ROCm platform is new an open-source  environment created by AMD for Radeon GPU
4(especially designed for HPC and their proffesional products). This platform uses HSACO
5binary object file format to store compiled code for GPU's.
6
7The ROCm binary format implementation and this documentation based on source:
8[ROCm-ComputeABI-Doc](https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc).
9
10## Binary format
11
12The binary file is stored in ELF file. The symbol table holds kernels and data's symbols.
13Main `.text` section contains all code for all kernels. Data
14(for example global constant datas) also stored in `.text' section.
15Kernel symbols points to configuration for kernel. Special offset field in configuration
16data's points where is kernel code.
17
18The assembler source code divided to three parts:
19
20* kernel configuration
21* kernel code and data (in `.text` section)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27Additional kernel informations and binary informations are in metadata ELF note.
28It holds informations about `printf` calls, kernel configuration and its arguments.
29
30## Register usage setup
31
32The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
33This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
34
35## Scalar register allocation
36
37An assembler for ROCm format counts all SGPR registers and add extra registers
38(FLAT_SCRATCH, XNACK_MASK). Special fields determines
39what extra SGPR registers (FLAT_SCRATCH, VCC and XNACK_MASK) has been added.
40The VCC register is included by default.
41
42The `.sgprsnum` set number of all SGPRs including VCC, FLAT_SCRATCH and XNACK_MASK.
43
44## Expression with sections
45
46An assembler can calculate difference between symbols which present in one of three sections:
47globaldata (rodata) section, code section and GOT (Global Offset Table) section.
48For example, an expression `.-globaldata1` (if globaldata is defined in global data section)
49calculates distance between current position and `globaldata1` place.
50An assembler automcatically found section where symbol points to between code,
51globaldata and GOT. Because, layout of the sections is not known while assemblying,
52section differences are possible in places where expression can be evaluated later:
53in `.int` or similar pseudo-ops, in the literal values in instructions,
54in the symbol assignments, etc.
55
56## List of the specific pseudo-operations
57
58### .arch_minor
59
60Syntax: .arch_minor ARCH_MINOR
61
62Set architecture minor number.
63
64### .arch_stepping
65
66Syntax: .arch_minor ARCH_STEPPING
67
68Set architecture stepping number.
69
70### .arg
71
72Syntax arg: .arg [NAME]\[, "TYPENAME"], SIZE, [ALIGN], VALUEKIND, VALUETYPE[,POINTEEALIGN]\[, ADDRSPACE]\[,ACCQUAL]\[,ACTACCQUAL] \[FLAG1\] \[FLAG2\]...
73
74This pseudo-op must be inside kernel configuration (`.config`).
75Define kernel argument in metadata info. The argument name, type name, alignment are
76optional. The ADDRSPACE is address space and it present only if value kind is
77`globalbuf` or `dynshptr`. The POINTEEALIGN is pointee alignment in bytes and it present
78only if value kind is `dynshptr`. The ACCQUAL defines access qualifier and it present
79only if value kind is `image` or `pipe`. The ACTACCQUAL defines actual access qualifier
80and it present only if value kind is `image`, `pipe` or `globalbuf`.
81The FLAGS is list of flags delimited by spaces.
82
83The list of value kinds:
84
85* complact - hidden competion action
86* defqueue -hidden default command queue
87* dynshptr - dynamic shared pointer (local, private)
88* globalbuf - global buffer
89* gox, globaloffsetx - hidden global offset x
90* goy, globaloffsety - hidden global offset y
91* goz, globaloffsetz - hidden global offset z
92* image - image object
93* none - hidden none to make space between arguments
94* pipe - OpenCL 2.0 pipe object
95* printfbuf - hidden printf buffer
96* queue - command queue
97* sampler - image sampler
98* value - ByValue - argument holds value (integer, floats)
99
100The list of value types:
101
102* i8, char - signed 8-bit integer
103* i16, short - signed 16-bit integer
104* i32, int - signed 32-bit integer
105* i64, long - signed 64-bit integer
106* u8, uchar - unsigned 8-bit integer
107* u16, ushort - unsigned 16-bit integer
108* u32, uint - unsigned 32-bit integer
109* u64, ulong - unsigned 64-bit integer
110* f16, half - 16-bit half floating point
111* f32, float - 32-bit single floating point
112* f64, double - 64-bit double floating point
113* struct - structure
114
115The list of address spaces:
116
117* constant - constant space (???)
118* generic - generic (global or scratch or local)
119* global - global memory
120* local - local memory
121* private - private memory
122* region - ???
123
124This list of access qualifiers:
125
126* default - default access qualifier
127* read_only, rdonly - read only
128* read_write, rdwr - read and write
129* write_only, wronly - write only
130
131This list of flags:
132
133* const - constant value (only for global buffer)
134* restrict - restrict value (only for global buffer)
135* volatile - volatile (only for global buffer)
136* pipe - only for pipe value kind
137
138### .call_convention
139
140Syntax: .call_convention CALL_CONV
141
142This pseudo-op must be inside kernel configuration (`.config`).
143Set call convention for kernel.
144
145### .codeversion
146
147Syntax .codeversion MAJOR, MINOR
148
149This pseudo-op must be inside kernel configuration (`.config`). Set AMD code version.
150
151### .config
152
153Open kernel configuration. Must be inside kernel.
154
155The kernel metadata info config pseudo-ops:
156
157* .arg - add kernel argument
158* .md_language - kernel language
159* .cws, .reqd_work_group_size - reqd_work_group_size
160* .work_group_size_hint - work_group_size_hint
161* .fixed_work_group_size - fixed work group size
162* .max_flat_work_group_size - max flat work group size
163* .vectypehint - vector type hint
164* .runtime_handle - runtime handle symbol name
165* .md_kernarg_segment_align - kernel argument segment alignment
166* .md_kernarg_segment_size - kernel argument segment size
167* .md_group_segment_fixed_size - group segment fixed size
168* .md_private_segment_fixed_size - private segment fixed size
169* .md_symname - kernel symbol name
170* .md_sgprsnum - number of SGPRs
171* .md_vgprsnum - number of VGPRs
172* .spilledsgprs - number of spilled SGPRs
173* .spilledvgprs - number of spilled VGPRs
174* .md_wavefront_size - wavefront size
175
176### .control_directive
177
178Open control directive section. This section must be 128 bytes. The content of this
179section will be stored in control_directive field in kernel configuration.
180Must be defined inside kernel.
181
182### .cws, .reqd_work_group_size
183
184Syntax: .cws [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]] 
185Syntax: .reqd_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
186
187This pseudo-operation must be inside any kernel configuration.
188Set reqd_work_group_size hint for this kernel in metadata info.
189
190### .debug_private_segment_buffer_sgpr
191
192Syntax: .debug_private_segment_buffer_sgpr SGPRREG
193
194This pseudo-op must be inside kernel configuration (`.config`). Set
195`debug_private_segment_buffer_sgpr` field in kernel configuration.
196
197### .debug_wavefront_private_segment_offset_sgpr
198
199Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
200
201This pseudo-op must be inside kernel configuration (`.config`). Set
202`debug_wavefront_private_segment_offset_sgpr` field in kernel configuration.
203
204### .debugmode
205
206This pseudo-op must be inside kernel configuration (`.config`).
207Enable usage of the DEBUG_MODE.
208
209### .default_hsa_features
210
211This pseudo-op must be inside kernel configuration (`.config`).
212It sets default HSA kernel features and register features (extra SGPR registers usage).
213These default features are `.use_private_segment_buffer`, `.use_dispatch_ptr`,
214`.use_kernarg_segment_ptr`, `.use_ptr64` and private_elem_size to 4 bytes.
215
216### .dims
217
218Syntax: .dims DIMENSIONS 
219Syntax: .dims GID_DIMS, LID_DIMS
220
221This pseudo-op must be inside kernel configuration (`.config`). Define what dimensions
222(from list: x, y, z) will be used to determine space of the kernel execution.
223In second syntax form, the dimensions are given for group_id (GID_DIMS) and for local_id
224(LID_DIMS) separately.
225
226### .dx10clamp
227
228This pseudo-op must be inside kernel configuration (`.config`).
229Enable usage of the DX10_CLAMP.
230
231### .eflags
232
233Syntax: .eflags EFLAGS
234
235Set value of ELF header e_flags field.
236
237### .exceptions
238
239Syntax: .exceptions EXCPMASK
240
241This pseudo-op must be inside kernel configuration (`.config`).
242Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
243
244### .fixed_work_group_size
245
246Syntax: .fixed_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
247
248This pseudo-operation must be inside any kernel configuration.
249Set fixed_work_group_size for this kernel in metadata info.
250
251### .fkernel
252
253Mark given kernel as function in ROCm. Must be inside kernel.
254
255### .floatmode
256
257Syntax: .floatmode BYTE-VALUE
258
259This pseudo-op must be inside kernel configuration (`.config`). Define float-mode.
260Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
261
262### .gds_segment_size
263
264Syntax: .gds_segment_size SIZE
265
266This pseudo-op must be inside kernel configuration (`.config`). Set
267`gds_segment_size` field in kernel configuration.
268
269### .globaldata
270
271Go to constant global data section (`.rodata`).
272
273### .gotsym
274
275Syntax: .gotsym SYMBOL[, OUTSYMBOL]
276
277Add GOT entry for SYMBOL. A SYMBOL must be defined in global scope. Optionally, pseudo-op
278set position of the GOT entry to OUTSYMBOL if symbol was given. A GOT entry take 8 bytes.
279
280### .group_segment_align
281
282Syntax: .group_segment_align ALIGN
283
284This pseudo-op must be inside kernel configuration (`.config`). Set
285`group_segment_align` field in kernel configuration.
286
287### .ieeemode
288
289Syntax: .ieeemode
290
291This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
292
293### .kcode
294
295Syntax: .kcode KERNEL1,.... 
296Syntax: .kcode +
297
298Open code that will be belonging to specified kernels. By default any code between
299two consecutive kernel labels belongs to the kernel with first label name.
300This pseudo-operation can change membership of the code to specified kernels.
301You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
302to kernels. The most important reason why this feature has been added is register usage
303calculation. Any kernel given in this pseudo-operation must be already defined.
304
305Sample usage:
306
307```
308.kcode + # this code belongs to all kernels
309.kcodeend
310.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
311    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
312    .kcodeend
313.kcodeend
314```
315
316### .kcodeend
317
318Close `.kcode` clause. Refer to `.kcode`.
319
320### .kernarg_segment_align
321
322Syntax: .kernarg_segment_align ALIGN
323
324This pseudo-op must be inside kernel configuration (`.config`). Set
325`kernarg_segment_alignment` field in kernel configuration. Value must be a power of two.
326
327### .kernarg_segment_size
328
329Syntax: .kernarg_segment_size SIZE
330
331This pseudo-op must be inside kernel configuration (`.config`). Set
332`kernarg_segment_byte_size` field in kernel configuration.
333
334### .kernel_code_entry_offset
335
336Syntax: .kernel_code_entry_offset OFFSET
337
338This pseudo-op must be inside kernel configuration (`.config`). Set
339`kernel_code_entry_byte_offset` field in kernel configuration. This field
340store offset between configuration and kernel code. By default is 256.
341
342### .kernel_code_prefetch_offset
343
344Syntax: .kernel_code_prefetch_offset OFFSET
345
346This pseudo-op must be inside kernel configuration (`.config`). Set
347`kernel_code_prefetch_byte_offset` field in kernel configuration.
348
349### .kernel_code_prefetch_size
350
351Syntax: .kernel_code_prefetch_size OFFSET
352
353This pseudo-op must be inside kernel configuration (`.config`). Set
354`kernel_code_prefetch_byte_size` field in kernel configuration.
355
356### .localsize
357
358Syntax: .localsize SIZE
359
360This pseudo-op must be inside kernel configuration (`.config`). Define initial
361local memory size used by kernel.
362
363### .machine
364
365Syntax: .machine KIND, MAJOR, MINOR, STEPPING
366
367This pseudo-op must be inside kernel configuration (`.config`). Set
368machine version fields in kernel configuration.
369
370### .max_flat_work_group_size
371
372Syntax: .max_flat_work_group_size SIZE
373
374This pseudo-op must be inside kernel configuration (`.config`).
375Set max flat work group size in metadata info.
376
377### .max_scratch_backing_memory
378
379Syntax: .max_scratch_backing_memory SIZE
380
381This pseudo-op must be inside kernel configuration (`.config`). Set
382`max_scratch_backing_memory_byte_size` field in kernel configuration.
383
384### .md_group_segment_fixed_size
385
386Syntax: .md_group_segment_fixed_size SIZE
387
388This pseudo-op must be inside kernel configuration (`.config`).
389Set group segment fixed size in metadata info.
390
391### .md_kernarg_segment_align
392
393Syntax: .md_kernarg_segment_align ALIGNMENT
394
395This pseudo-op must be inside kernel configuration (`.config`).
396Set kernel argument segment alignment in metadata info.
397
398### .md_kernarg_segment_size
399
400Syntax: .md_kernarg_segment_size SIZE
401
402This pseudo-op must be inside kernel configuration (`.config`).
403Set kernel argument segment size in metadata info.
404
405### .md_private_segment_fixed_size
406
407Syntax: .md_private_segment_fixed_size SIZE
408
409This pseudo-op must be inside kernel configuration (`.config`).
410Set private segment fixed size in metadata info.
411
412### .md_symname
413
414Syntax: .md_symname "SYMBOLNAME"
415
416This pseudo-op must be inside kernel configuration (`.config`).
417Set kernel symbol name in metadata info. It should be in format "NAME@kd".
418
419### .md_language
420
421Syntax .md_language "LANGUAGE"[, MAJOR, MINOR]
422
423This pseudo-op must be inside kernel configuration (`.config`).
424Set kernel language and its version in metadata info. The language name is as string.
425
426### .md_sgprsnum
427
428Syntax: .md_sgprsnum REGNUM
429
430This pseudo-op must be inside kernel configuration (`.config`).
431Define number of scalar registers for kernel in metadata info.
432
433### .md_version
434
435Syntax: .md_version MAJOR, MINOR
436
437This pseudo-ops defines metadata format version.
438
439### .md_wavefront_size
440
441Syntax: .md_wavefront_size SIZE
442
443This pseudo-op must be inside kernel configuration (`.config`).
444Define wavefront size in metadata info. If not specified then value get from HSA config.
445
446### .md_vgprsnum
447
448Syntax: .md_vgprsnum REGNUM
449
450This pseudo-op must be inside kernel configuration (`.config`).
451Define number of vector registers for kernel in metadata info.
452
453### .metadata
454
455This pseudo-operation must be inside kernel. Go to metadata (metadata ELF note) section.
456
457### .newbinfmt
458
459This pseudo-op set new binary format.
460
461### .nosectdiffs
462
463This pseudo-op disable section difference resolving. After disabling it, the global data
464and GOT sections are absolute addressable. This is old ROCm mode for compatibility with
465older an assembler's versions.
466
467### .pgmrsrc1
468
469Syntax: .pgmrsrc1 VALUE
470
471This pseudo-op must be inside kernel configuration (`.config`).
472Define value of the PGMRSRC1.
473
474### .pgmrsrc2
475
476Syntax: .pgmrsrc2 VALUE
477
478This pseudo-op must be inside kernel configuration (`.config`).
479Define value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
480will be ignored. SCRATCH_EN bit will be ignored.
481
482### .printf
483
484Syntax: .printf [ID]\[,ARGSIZE,....],"FORMAT"
485
486This pseudo-op must be inside kernel configuration (`.config`).
487Adds new printf info entry to metadata info. The first argument is ID (must be unique)
488and is optional. Next arguments are argument size for printf call. The last argument
489is format string.
490
491### .priority
492
493Syntax: .priority PRIORITY
494
495This pseudo-op must be inside kernel configuration (`.config`). Define priority (0-3).
496
497### .private_elem_size
498
499Syntax: .private_elem_size ELEMSIZE
500
501This pseudo-op must be inside kernel configuration (`.config`). Set `private_element_size`
502field in kernel configuration. Must be a power of two between 2 and 16.
503
504### .private_segment_align
505
506Syntax: .private_segment ALIGN
507
508This pseudo-op must be inside kernel configuration (`.config`). Set
509`private_segment_alignment` field in kernel configuration. Value must be a power of two.
510
511### .privmode
512
513This pseudo-op must be inside kernel configuration (`.config`).
514Enable usage of the PRIV (privileged mode).
515
516### .reserved_sgprs
517
518Syntax: .reserved_sgprs FIRSTREG, LASTREG
519
520This pseudo-op must be inside kernel configuration (`.config`). Set
521`reserved_sgpr_first` and `reserved_sgpr_count` fields in kernel configuration.
522`reserved_sgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
523
524### .reserved_vgprs
525
526Syntax: .reserved_vgprs FIRSTREG, LASTREG
527
528This pseudo-op must be inside kernel configuration (`.config`). Set
529`reserved_vgpr_first` and `reserved_vgpr_count` fields in kernel configuration.
530`reserved_vgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
531
532### .runtime_handle
533
534Syntax: .runtime_handle "SYMBOLNAME"
535
536This pseudo-op must be inside kernel configuration (`.config`).
537Set runtime handle in metadata info
538
539### .runtime_loader_kernel_symbol
540
541Syntax: .runtime_loader_kernel_symbol ADDRESS
542
543This pseudo-op must be inside kernel configuration (`.config`). Set
544`runtime_loader_kernel_symbol` field in kernel configuration.
545
546### .scratchbuffer
547
548Syntax: .scratchbuffer SIZE
549
550This pseudo-op must be inside kernel configuration (`.config`). Define scratchbuffer size.
551
552### .sgprsnum
553
554Syntax: .sgprsnum REGNUM
555
556This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
557registers which can be used during kernel execution.
558It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
559
560### .spilledsgprs
561
562Syntax: .spilledsgprs REGNUM
563
564This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
565registers to spill in scratch buffer (in metadata info).
566
567### .spilledvgprs
568
569Syntax: .spilledvgprs REGNUM
570
571This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
572registers to spill in scratch buffer (in metadata info).
573
574### .target
575
576Syntax: .target "TARGET"
577
578Set LLVM target with device name. For example: "amdgcn-amd-amdhsa-amdgizcl-gfx803".
579
580### .tgsize
581
582This pseudo-op must be inside kernel configuration (`.config`).
583Enable usage of the TG_SIZE_EN.
584
585### .tripple
586
587Syntax: .tripple "TRIPPLE"
588
589Set LLVM target without device name. For example "amdgcn-amd-amdhsa-amdgizcl" with
590Fiji device generates target "amdgcn-amd-amdhsa-amdgizcl-gfx803".
591
592### .use_debug_enabled
593
594This pseudo-op must be inside kernel configuration (`.config`). Enable `is_debug_enabled`
595field in kernel configuration.
596
597### .use_dispatch_id
598
599This pseudo-op must be inside kernel configuration (`.config`). Enable
600`enable_sgpr_dispatch_id` field in kernel configuration.
601
602### .use_dispatch_ptr
603
604This pseudo-op must be inside kernel configuration (`.config`). Enable
605`enable_sgpr_dispatch_ptr` field in kernel configuration.
606
607### .use_dynamic_call_stack
608
609This pseudo-op must be inside kernel configuration (`.config`). Enable
610`is_dynamic_call_stack` field in kernel configuration.
611
612### .use_flat_scratch_init
613
614This pseudo-op must be inside kernel configuration (`.config`). Enable
615`enable_sgpr_flat_scratch_init` field in kernel configuration.
616
617### .use_grid_workgroup_count
618
619Syntax: .use_grid_workgroup_count DIMENSIONS
620
621This pseudo-op must be inside kernel configuration (`.config`). Enable
622`enable_sgpr_grid_workgroup_count_X`, `enable_sgpr_grid_workgroup_count_Y`
623and `enable_sgpr_grid_workgroup_count_Z` fields in kernel configuration,
624respectively by given dimensions.
625
626### .use_kernarg_segment_ptr
627
628This pseudo-op must be inside kernel configuration (`.config`). Enable
629`enable_sgpr_kernarg_segment_ptr` field in kernel configuration.
630
631### .use_ordered_append_gds
632
633This pseudo-op must be inside kernel configuration (`.config`). Enable
634`enable_ordered_append_gds` field in kernel configuration.
635
636### .use_private_segment_buffer
637
638This pseudo-op must be inside kernel configuration (`.config`). Enable
639`enable_sgpr_private_segment_buffer` field in kernel configuration.
640
641### .use_private_segment_size
642
643This pseudo-op must be inside kernel configuration (`.config`). Enable
644`enable_sgpr_private_segment_size` field in kernel configuration.
645
646### .use_ptr64
647
648This pseudo-op must be inside kernel configuration (`.config`). Enable `is_ptr64` field
649in kernel configuration.
650
651### .use_queue_ptr
652
653This pseudo-op must be inside kernel configuration (`.config`). Enable
654`enable_sgpr_queue_ptr` field in kernel configuration.
655
656### .use_xnack_enabled
657
658This pseudo-op must be inside kernel configuration (`.config`). Enable
659`is_xnack_enabled` field in kernel configuration.
660
661### .userdatanum
662
663Syntax: .userdatanum NUMBER
664
665This pseudo-op must be inside kernel configuration (`.config`). Set number of
666registers for USERDATA.
667
668### .vectypehint
669
670Syntax: .vectypehint "OPENCLTYPE"
671
672This pseudo-op must be inside kernel configuration (`.config`).
673Set vectypehint for kernel in metadata info. The argument is OpenCL type.
674
675### .vgprsnum
676
677Syntax: .vgprsnum REGNUM
678
679This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
680registers which can be used during kernel execution.
681
682### .wavefront_sgpr_count
683
684Syntax: .wavefront_sgpr_count REGNUM
685
686This pseudo-op must be inside kernel configuration (`.config`). Set
687`wavefront_sgpr_count` field in kernel configuration.
688
689### .wavefront_size
690
691Syntax: .wavefront_size POWEROFTWO
692
693This pseudo-op must be inside kernel configuration (`.config`). Set `wavefront_size`
694field in kernel configuration. Value must be a power of two.
695
696### .work_group_size_hint
697
698Syntax: .work_group_size_hint [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
699
700This pseudo-operation must be inside any kernel configuration.
701Set work_group_size_hint for this kernel in metadata info.
702
703### .workgroup_fbarrier_count
704
705Syntax: .workgroup_fbarrier_count COUNT
706
707This pseudo-op must be inside kernel configuration (`.config`). Set
708`workgroup_fbarrier_count` field in kernel configuration.
709
710### .workgroup_group_segment_size
711
712Syntax: .workgroup_group_segment_size SIZE
713
714This pseudo-op must be inside kernel configuration (`.config`). Set
715`workgroup_group_segment_byte_size` in kernel configuration.
716
717### .workitem_private_segment_size
718
719Syntax: .workitem_private_segment_size SIZE
720
721This pseudo-op must be inside kernel configuration (`.config`). Set
722`workitem_private_segment_byte_size` field in kernel configuration.
723
724### .workitem_vgpr_count
725
726Syntax: .workitem_vgpr_count REGNUM
727
728This pseudo-op must be inside kernel configuration (`.config`). Set
729`workitem_vgpr_count` field in kernel configuration.
730
731## Sample code
732
733This is sample example of the kernel setup:
734
735```
736.rocm
737.gpu Carrizo
738.arch_minor 0
739.arch_stepping 1
740.kernel test1
741.kernel test2
742.text
743test1:
744        .byte 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
745        .byte 0x01, 0x00, 0x08, 0x00, 0x00, 0x00, 0x01, 0x00
746        .byte 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
747        .fill 24, 1, 0x00
748        .byte 0x41, 0x00, 0x2c, 0x00, 0x90, 0x00, 0x00, 0x00
749        .byte 0x0b, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00
750        .fill 8, 1, 0x00
751        .byte 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
752        .byte 0x00, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x07, 0x00
753        .fill 8, 1, 0x00
754        .byte 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x06
755        .fill 152, 1, 0x00
756/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
757/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
758....
759```
760
761with kernel configuration:
762
763```
764.rocm
765.gpu Carrizo
766.arch_minor 0
767.arch_stepping 1
768.kernel test1
769    .config
770        .dims x
771        .sgprsnum 16
772        .vgprsnum 8
773        .dx10clamp
774        .floatmode 0xc0
775        .priority 0
776        .userdatanum 8
777        .pgmrsrc1 0x002c0041
778        .pgmrsrc2 0x00000090
779        .codeversion 1, 0
780        .machine 1, 8, 0, 1
781        .kernel_code_entry_offset 0x100
782        .use_private_segment_buffer
783        .use_dispatch_ptr
784        .use_kernarg_segment_ptr
785        .private_elem_size 4
786        .use_ptr64
787        .kernarg_segment_size 8
788        .wavefront_sgpr_count 15
789        .workitem_vgpr_count 7
790        .kernarg_segment_align 16
791        .group_segment_align 16
792        .private_segment_align 16
793        .wavefront_size 64
794        .call_convention 0x0
795    .control_directive          # optional
796        .fill 128, 1, 0x00
797.text
798test1:
799.skip 256           # skip ROCm kernel configuration (required)
800/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
801/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
802/*bf8c007f         */ s_waitcnt       lgkmcnt(0)
803/*8602ff02 0000ffff*/ s_and_b32       s2, s2, 0xffff
804/*92020802         */ s_mul_i32       s2, s2, s8
805/*32000002         */ v_add_u32       v0, vcc, s2, v0
806/*2202009f         */ v_ashrrev_i32   v1, 31, v0
807/*d28f0001 00020082*/ v_lshlrev_b64   v[1:2], 2, v[0:1]
808/*32060200         */ v_add_u32       v3, vcc, s0, v1
809...
810```
811
812The sample with metadata info:
813
814```
815.rocm
816.gpu Fiji
817.arch_minor 0
818.arch_stepping 4
819.eflags 2
820.newbinfmt
821.tripple "amdgcn-amd-amdhsa-amdgizcl"
822.md_version 1, 0
823.kernel vectorAdd
824    .config
825        .dims x
826        .codeversion 1, 1
827        .use_private_segment_buffer
828        .use_dispatch_ptr
829        .use_kernarg_segment_ptr
830        .private_elem_size 4
831        .use_ptr64
832        .kernarg_segment_align 16
833        .group_segment_align 16
834        .private_segment_align 16
835    .control_directive
836        .fill 128, 1, 0x00
837    .config
838        .md_language "OpenCL", 1, 2
839        .arg n, "uint", 4, , value, u32
840        .arg a, "float*", 8, , globalbuf, f32, global, default const volatile
841        .arg b, "float*", 8, , globalbuf, f32, global, default const
842        .arg c, "float*", 8, , globalbuf, f32, global, default
843        .arg , "", 8, , gox, i64
844        .arg , "", 8, , goy, i64
845        .arg , "", 8, , goz, i64
846        .arg , "", 8, , printfbuf, i8
847.text
848vectorAdd:
849.skip 256           # skip ROCm kernel configuration (required)
850...
851```
852
853The sample with metadata info with two kernels:
854
855```
856.rocm
857.gpu Fiji
858.arch_minor 0
859.arch_stepping 4
860.eflags 2
861.newbinfmt
862.tripple "amdgcn-amd-amdhsa-amdgizcl"
863.md_version 1, 0
864.kernel vectorAdd
865    .config
866        .dims x
867        .codeversion 1, 1
868        .use_private_segment_buffer
869        .use_dispatch_ptr
870        .use_kernarg_segment_ptr
871        .private_elem_size 4
872        .use_ptr64
873        .kernarg_segment_align 16
874        .group_segment_align 16
875        .private_segment_align 16
876    .control_directive
877        .fill 128, 1, 0x00
878    .config
879        .md_language "OpenCL", 1, 2
880        .arg n, "uint", 4, , value, u32
881        .arg a, "float*", 8, , globalbuf, f32, global, default const volatile
882        .arg b, "float*", 8, , globalbuf, f32, global, default const
883        .arg c, "float*", 8, , globalbuf, f32, global, default
884        .arg , "", 8, , gox, i64
885        .arg , "", 8, , goy, i64
886        .arg , "", 8, , goz, i64
887        .arg , "", 8, , printfbuf, i8
888.kernel vectorAdd2
889    .config
890        .dims x
891        .codeversion 1, 1
892        .use_private_segment_buffer
893        .use_dispatch_ptr
894        .use_kernarg_segment_ptr
895        .private_elem_size 4
896        .use_ptr64
897        .kernarg_segment_align 16
898        .group_segment_align 16
899        .private_segment_align 16
900    .control_directive
901        .fill 128, 1, 0x00
902    .config
903        .md_language "OpenCL", 1, 2
904        .arg n, "uint", 4, , value, u32
905        .arg a, "float*", 8, , globalbuf, f32, global, default const volatile
906        .arg b, "float*", 8, , globalbuf, f32, global, default const
907        .arg c, "float*", 8, , globalbuf, f32, global, default
908        .arg , "", 8, , gox, i64
909        .arg , "", 8, , goy, i64
910        .arg , "", 8, , goz, i64
911        .arg , "", 8, , printfbuf, i8
912.text
913vectorAdd:
914.skip 256           # skip ROCm kernel configuration (required)
915            s_mov_b32 s8, s1
916...
917...
918            s_endpgm
919.p2align 8      # important alignment to 256-byte boundary
920vectorAdd2
921.skip 256
922            s_mov_b32 s8, s1
923...
924...
925            s_endpgm
926```
Note: See TracBrowser for help on using the repository browser.