source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmRocm.md @ 3996

Last change on this file since 3996 was 3996, checked in by matszpk, 13 months ago

CLRadeonExtender: CLRXDocs: add extra info about setting up number of the SGPRs registers.

File size: 24.9 KB
Line 
1## CLRadeonExtender Assembler ROCm handling
2
3The ROCm platform is new an open-source  environment created by AMD for Radeon GPU
4(especially designed for HPC and their proffesional products). This platform uses HSACO
5binary object file format to store compiled code for GPU's.
6
7The ROCm binary format implementation and this documentation based on source:
8[ROCm-ComputeABI-Doc](https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc).
9
10## Binary format
11
12The binary file is stored in ELF file. The symbol table holds kernels and data's symbols.
13Main `.text` section contains all code for all kernels. Data
14(for example global constant datas) also stored in `.text' section.
15Kernel symbols points to configuration for kernel. Special offset field in configuration
16data's points where is kernel code.
17
18The assembler source code divided to three parts:
19
20* kernel configuration
21* kernel code and data (in `.text` section)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27Additional kernel informations and binary informations are in metadata ELF note.
28It holds informations about `printf` calls, kernel configuration and its arguments.
29
30## Register usage setup
31
32The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
33This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
34
35## Scalar register allocation
36
37An assembler for ROCm format counts all SGPR registers and add extra registers
38(FLAT_SCRATCH, XNACK_MASK). Special fields determines
39what extra SGPR registers (FLAT_SCRATCH, VCC and XNACK_MASK) has been added.
40The VCC register is included by default.
41
42The `.sgprsnum` set number of all SGPRs including VCC, FLAT_SCRATCH and XNACK_MASK.
43
44## Expression with sections
45
46An assembler can calculate difference between symbols which present in one of three sections:
47globaldata (rodata) section, code section and GOT (Global Offset Table) section.
48For example, an expression `.-globaldata1` (if globaldata is defined in global data section)
49calculates distance between current position and `globaldata1` place.
50An assembler automcatically found section where symbol points to between code,
51globaldata and GOT. Because, layout of the sections is not known while assemblying,
52section differences are possible in places where expression can be evaluated later:
53in `.int` or similar pseudo-ops, in the literal values in instructions,
54in the symbol assignments, etc.
55
56## List of the specific pseudo-operations
57
58### .arch_minor
59
60Syntax: .arch_minor ARCH_MINOR
61
62Set architecture minor number.
63
64### .arch_stepping
65
66Syntax: .arch_minor ARCH_STEPPING
67
68Set architecture stepping number.
69
70### .arg
71
72Syntax arg: .arg [NAME]\[, "TYPENAME"], SIZE, [ALIGN], VALUEKIND, VALUETYPE[,POINTEEALIGN]\[, ADDRSPACE]\[,ACCQUAL]\[,ACTACCQUAL] \[FLAG1\] \[FLAG2\]...
73
74This pseudo-op must be inside kernel configuration (`.config`).
75Define kernel argument in metadata info. The argument name, type name, alignment are
76optional. The ADDRSPACE is address space and it present only if value kind is
77`globalbuf` or `dynshptr`. The POINTEEALIGN is pointee alignment in bytes and it present
78only if value kind is `dynshptr`. The ACCQUAL defines access qualifier and it present
79only if value kind is `image` or `pipe`. The ACTACCQUAL defines actual access qualifier
80and it present only if value kind is `image`, `pipe` or `globalbuf`.
81The FLAGS is list of flags delimited by spaces.
82
83The list of value kinds:
84
85* complact - hidden competion action
86* defqueue -hidden default command queue
87* dynshptr - dynamic shared pointer (local, private)
88* globalbuf - global buffer
89* gox, globaloffsetx - hidden global offset x
90* goy, globaloffsety - hidden global offset y
91* goz, globaloffsetz - hidden global offset z
92* image - image object
93* none - hidden none to make space between arguments
94* pipe - OpenCL 2.0 pipe object
95* printfbuf - hidden printf buffer
96* queue - command queue
97* sampler - image sampler
98* value - ByValue - argument holds value (integer, floats)
99
100The list of value types:
101
102* i8, char - signed 8-bit integer
103* i16, short - signed 16-bit integer
104* i32, int - signed 32-bit integer
105* i64, long - signed 64-bit integer
106* u8, uchar - unsigned 8-bit integer
107* u16, ushort - unsigned 16-bit integer
108* u32, uint - unsigned 32-bit integer
109* u64, ulong - unsigned 64-bit integer
110* f16, half - 16-bit half floating point
111* f32, float - 32-bit single floating point
112* f64, double - 64-bit double floating point
113* struct - structure
114
115The list of address spaces:
116
117* constant - constant space (???)
118* generic - generic (global or scratch or local)
119* global - global memory
120* local - local memory
121* private - private memory
122* region - ???
123
124This list of access qualifiers:
125
126* default - default access qualifier
127* read_only, rdonly - read only
128* read_write, rdwr - read and write
129* write_only, wronly - write only
130
131This list of flags:
132
133* const - constant value (only for global buffer)
134* restrict - restrict value (only for global buffer)
135* volatile - volatile (only for global buffer)
136* pipe - only for pipe value kind
137
138### .call_convention
139
140Syntax: .call_convention CALL_CONV
141
142This pseudo-op must be inside kernel configuration (`.config`).
143Set call convention for kernel.
144
145### .codeversion
146
147Syntax .codeversion MAJOR, MINOR
148
149This pseudo-op must be inside kernel configuration (`.config`). Set AMD code version.
150
151### .config
152
153Open kernel configuration. Must be inside kernel.
154
155The kernel metadata info config pseudo-ops:
156
157* .arg - add kernel argument
158* .md_language - kernel language
159* .cws, .reqd_work_group_size - reqd_work_group_size
160* .work_group_size_hint - work_group_size_hint
161* .fixed_work_group_size - fixed work group size
162* .max_flat_work_group_size - max flat work group size
163* .vectypehint - vector type hint
164* .runtime_handle - runtime handle symbol name
165* .md_kernarg_segment_align - kernel argument segment alignment
166* .md_kernarg_segment_size - kernel argument segment size
167* .md_group_segment_fixed_size - group segment fixed size
168* .md_private_segment_fixed_size - private segment fixed size
169* .md_symname - kernel symbol name
170* .md_sgprsnum - number of SGPRs
171* .md_vgprsnum - number of VGPRs
172* .spilledsgprs - number of spilled SGPRs
173* .spilledvgprs - number of spilled VGPRs
174* .md_wavefront_size - wavefront size
175
176### .control_directive
177
178Open control directive section. This section must be 128 bytes. The content of this
179section will be stored in control_directive field in kernel configuration.
180Must be defined inside kernel.
181
182### .cws, .reqd_work_group_size
183
184Syntax: .cws [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]] 
185Syntax: .reqd_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
186
187This pseudo-operation must be inside any kernel configuration.
188Set reqd_work_group_size hint for this kernel in metadata info.
189
190### .debug_private_segment_buffer_sgpr
191
192Syntax: .debug_private_segment_buffer_sgpr SGPRREG
193
194This pseudo-op must be inside kernel configuration (`.config`). Set
195`debug_private_segment_buffer_sgpr` field in kernel configuration.
196
197### .debug_wavefront_private_segment_offset_sgpr
198
199Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
200
201This pseudo-op must be inside kernel configuration (`.config`). Set
202`debug_wavefront_private_segment_offset_sgpr` field in kernel configuration.
203
204### .debugmode
205
206This pseudo-op must be inside kernel configuration (`.config`).
207Enable usage of the DEBUG_MODE.
208
209### .default_hsa_features
210
211This pseudo-op must be inside kernel configuration (`.config`).
212It sets default HSA kernel features and register features (extra SGPR registers usage).
213These default features are `.use_private_segment_buffer`, `.use_dispatch_ptr`,
214`.use_kernarg_segment_ptr`, `.use_ptr64` and private_elem_size to 4 bytes.
215
216### .dims
217
218Syntax: .dims DIMENSIONS
219
220This pseudo-op must be inside kernel configuration (`.config`). Define what dimensions
221(from list: x, y, z) will be used to determine space of the kernel execution.
222
223### .dx10clamp
224
225This pseudo-op must be inside kernel configuration (`.config`).
226Enable usage of the DX10_CLAMP.
227
228### .eflags
229
230Syntax: .eflags EFLAGS
231
232Set value of ELF header e_flags field.
233
234### .exceptions
235
236Syntax: .exceptions EXCPMASK
237
238This pseudo-op must be inside kernel configuration (`.config`).
239Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
240
241### .fixed_work_group_size
242
243Syntax: .fixed_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
244
245This pseudo-operation must be inside any kernel configuration.
246Set fixed_work_group_size for this kernel in metadata info.
247
248### .fkernel
249
250Mark given kernel as function in ROCm. Must be inside kernel.
251
252### .floatmode
253
254Syntax: .floatmode BYTE-VALUE
255
256This pseudo-op must be inside kernel configuration (`.config`). Define float-mode.
257Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
258
259### .gds_segment_size
260
261Syntax: .gds_segment_size SIZE
262
263This pseudo-op must be inside kernel configuration (`.config`). Set
264`gds_segment_size` field in kernel configuration.
265
266### .globaldata
267
268Go to constant global data section (`.rodata`).
269
270### .gotsym
271
272Syntax: .gotsym SYMBOL[, OUTSYMBOL]
273
274Add GOT entry for SYMBOL. A SYMBOL must be defined in global scope. Optionally, pseudo-op
275set position of the GOT entry to OUTSYMBOL if symbol was given. A GOT entry take 8 bytes.
276
277### .group_segment_align
278
279Syntax: .group_segment_align ALIGN
280
281This pseudo-op must be inside kernel configuration (`.config`). Set
282`group_segment_align` field in kernel configuration.
283
284### .ieeemode
285
286Syntax: .ieeemode
287
288This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
289
290### .kcode
291
292Syntax: .kcode KERNEL1,.... 
293Syntax: .kcode +
294
295Open code that will be belonging to specified kernels. By default any code between
296two consecutive kernel labels belongs to the kernel with first label name.
297This pseudo-operation can change membership of the code to specified kernels.
298You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
299to kernels. The most important reason why this feature has been added is register usage
300calculation. Any kernel given in this pseudo-operation must be already defined.
301
302Sample usage:
303
304```
305.kcode + # this code belongs to all kernels
306.kcodeend
307.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
308    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
309    .kcodeend
310.kcodeend
311```
312
313### .kcodeend
314
315Close `.kcode` clause. Refer to `.kcode`.
316
317### .kernarg_segment_align
318
319Syntax: .kernarg_segment_align ALIGN
320
321This pseudo-op must be inside kernel configuration (`.config`). Set
322`kernarg_segment_alignment` field in kernel configuration. Value must be a power of two.
323
324### .kernarg_segment_size
325
326Syntax: .kernarg_segment_size SIZE
327
328This pseudo-op must be inside kernel configuration (`.config`). Set
329`kernarg_segment_byte_size` field in kernel configuration.
330
331### .kernel_code_entry_offset
332
333Syntax: .kernel_code_entry_offset OFFSET
334
335This pseudo-op must be inside kernel configuration (`.config`). Set
336`kernel_code_entry_byte_offset` field in kernel configuration. This field
337store offset between configuration and kernel code. By default is 256.
338
339### .kernel_code_prefetch_offset
340
341Syntax: .kernel_code_prefetch_offset OFFSET
342
343This pseudo-op must be inside kernel configuration (`.config`). Set
344`kernel_code_prefetch_byte_offset` field in kernel configuration.
345
346### .kernel_code_prefetch_size
347
348Syntax: .kernel_code_prefetch_size OFFSET
349
350This pseudo-op must be inside kernel configuration (`.config`). Set
351`kernel_code_prefetch_byte_size` field in kernel configuration.
352
353### .localsize
354
355Syntax: .localsize SIZE
356
357This pseudo-op must be inside kernel configuration (`.config`). Define initial
358local memory size used by kernel.
359
360### .machine
361
362Syntax: .machine KIND, MAJOR, MINOR, STEPPING
363
364This pseudo-op must be inside kernel configuration (`.config`). Set
365machine version fields in kernel configuration.
366
367### .max_flat_work_group_size
368
369Syntax: .max_flat_work_group_size SIZE
370
371This pseudo-op must be inside kernel configuration (`.config`).
372Set max flat work group size in metadata info.
373
374### .max_scratch_backing_memory
375
376Syntax: .max_scratch_backing_memory SIZE
377
378This pseudo-op must be inside kernel configuration (`.config`). Set
379`max_scratch_backing_memory_byte_size` field in kernel configuration.
380
381### .md_group_segment_fixed_size
382
383Syntax: .md_group_segment_fixed_size SIZE
384
385This pseudo-op must be inside kernel configuration (`.config`).
386Set group segment fixed size in metadata info.
387
388### .md_kernarg_segment_align
389
390Syntax: .md_kernarg_segment_align ALIGNMENT
391
392This pseudo-op must be inside kernel configuration (`.config`).
393Set kernel argument segment alignment in metadata info.
394
395### .md_kernarg_segment_size
396
397Syntax: .md_kernarg_segment_size SIZE
398
399This pseudo-op must be inside kernel configuration (`.config`).
400Set kernel argument segment size in metadata info.
401
402### .md_private_segment_fixed_size
403
404Syntax: .md_private_segment_fixed_size SIZE
405
406This pseudo-op must be inside kernel configuration (`.config`).
407Set private segment fixed size in metadata info.
408
409### .md_symname
410
411Syntax: .md_symname "SYMBOLNAME"
412
413This pseudo-op must be inside kernel configuration (`.config`).
414Set kernel symbol name in metadata info. It should be in format "NAME@kd".
415
416### .md_language
417
418Syntax .md_language "LANGUAGE"[, MAJOR, MINOR]
419
420This pseudo-op must be inside kernel configuration (`.config`).
421Set kernel language and its version in metadata info. The language name is as string.
422
423### .md_sgprsnum
424
425Syntax: .md_sgprsnum REGNUM
426
427This pseudo-op must be inside kernel configuration (`.config`).
428Define number of scalar registers for kernel in metadata info.
429
430### .md_version
431
432Syntax: .md_version MAJOR, MINOR
433
434This pseudo-ops defines metadata format version.
435
436### .md_wavefront_size
437
438Syntax: .md_wavefront_size SIZE
439
440This pseudo-op must be inside kernel configuration (`.config`).
441Define wavefront size in metadata info. If not specified then value get from HSA config.
442
443### .md_vgprsnum
444
445Syntax: .md_vgprsnum REGNUM
446
447This pseudo-op must be inside kernel configuration (`.config`).
448Define number of vector registers for kernel in metadata info.
449
450### .metadata
451
452This pseudo-operation must be inside kernel. Go to metadata (metadata ELF note) section.
453
454### .newbinfmt
455
456This pseudo-op set new binary format.
457
458### .nosectdiffs
459
460This pseudo-op disable section difference resolving. After disabling it, the global data
461and GOT sections are absolute addressable. This is old ROCm mode for compatibility with
462older an assembler's versions.
463
464### .pgmrsrc1
465
466Syntax: .pgmrsrc1 VALUE
467
468This pseudo-op must be inside kernel configuration (`.config`).
469Define value of the PGMRSRC1.
470
471### .pgmrsrc2
472
473Syntax: .pgmrsrc2 VALUE
474
475This pseudo-op must be inside kernel configuration (`.config`).
476Define value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
477will be ignored. SCRATCH_EN bit will be ignored.
478
479### .printf
480
481Syntax: .printf [ID]\[,ARGSIZE,....],"FORMAT"
482
483This pseudo-op must be inside kernel configuration (`.config`).
484Adds new printf info entry to metadata info. The first argument is ID (must be unique)
485and is optional. Next arguments are argument size for printf call. The last argument
486is format string.
487
488### .priority
489
490Syntax: .priority PRIORITY
491
492This pseudo-op must be inside kernel configuration (`.config`). Define priority (0-3).
493
494### .private_elem_size
495
496Syntax: .private_elem_size ELEMSIZE
497
498This pseudo-op must be inside kernel configuration (`.config`). Set `private_element_size`
499field in kernel configuration. Must be a power of two between 2 and 16.
500
501### .private_segment_align
502
503Syntax: .private_segment ALIGN
504
505This pseudo-op must be inside kernel configuration (`.config`). Set
506`private_segment_alignment` field in kernel configuration. Value must be a power of two.
507
508### .privmode
509
510This pseudo-op must be inside kernel configuration (`.config`).
511Enable usage of the PRIV (privileged mode).
512
513### .reserved_sgprs
514
515Syntax: .reserved_sgprs FIRSTREG, LASTREG
516
517This pseudo-op must be inside kernel configuration (`.config`). Set
518`reserved_sgpr_first` and `reserved_sgpr_count` fields in kernel configuration.
519`reserved_sgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
520
521### .reserved_vgprs
522
523Syntax: .reserved_vgprs FIRSTREG, LASTREG
524
525This pseudo-op must be inside kernel configuration (`.config`). Set
526`reserved_vgpr_first` and `reserved_vgpr_count` fields in kernel configuration.
527`reserved_vgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
528
529### .runtime_handle
530
531Syntax: .runtime_handle "SYMBOLNAME"
532
533This pseudo-op must be inside kernel configuration (`.config`).
534Set runtime handle in metadata info
535
536### .runtime_loader_kernel_symbol
537
538Syntax: .runtime_loader_kernel_symbol ADDRESS
539
540This pseudo-op must be inside kernel configuration (`.config`). Set
541`runtime_loader_kernel_symbol` field in kernel configuration.
542
543### .scratchbuffer
544
545Syntax: .scratchbuffer SIZE
546
547This pseudo-op must be inside kernel configuration (`.config`). Define scratchbuffer size.
548
549### .sgprsnum
550
551Syntax: .sgprsnum REGNUM
552
553This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
554registers which can be used during kernel execution.
555It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
556
557### .spilledsgprs
558
559Syntax: .spilledsgprs REGNUM
560
561This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
562registers to spill in scratch buffer (in metadata info).
563
564### .spilledvgprs
565
566Syntax: .spilledvgprs REGNUM
567
568This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
569registers to spill in scratch buffer (in metadata info).
570
571### .target
572
573Syntax: .target "TARGET"
574
575Set LLVM target with device name. For example: "amdgcn-amd-amdhsa-amdgizcl-gfx803".
576
577### .tgsize
578
579This pseudo-op must be inside kernel configuration (`.config`).
580Enable usage of the TG_SIZE_EN.
581
582### .tripple
583
584Syntax: .tripple "TRIPPLE"
585
586Set LLVM target without device name. For example "amdgcn-amd-amdhsa-amdgizcl" with
587Fiji device generates target "amdgcn-amd-amdhsa-amdgizcl-gfx803".
588
589### .use_debug_enabled
590
591This pseudo-op must be inside kernel configuration (`.config`). Enable `is_debug_enabled`
592field in kernel configuration.
593
594### .use_dispatch_id
595
596This pseudo-op must be inside kernel configuration (`.config`). Enable
597`enable_sgpr_dispatch_id` field in kernel configuration.
598
599### .use_dispatch_ptr
600
601This pseudo-op must be inside kernel configuration (`.config`). Enable
602`enable_sgpr_dispatch_ptr` field in kernel configuration.
603
604### .use_dynamic_call_stack
605
606This pseudo-op must be inside kernel configuration (`.config`). Enable
607`is_dynamic_call_stack` field in kernel configuration.
608
609### .use_flat_scratch_init
610
611This pseudo-op must be inside kernel configuration (`.config`). Enable
612`enable_sgpr_flat_scratch_init` field in kernel configuration.
613
614### .use_grid_workgroup_count
615
616Syntax: .use_grid_workgroup_count DIMENSIONS
617
618This pseudo-op must be inside kernel configuration (`.config`). Enable
619`enable_sgpr_grid_workgroup_count_X`, `enable_sgpr_grid_workgroup_count_Y`
620and `enable_sgpr_grid_workgroup_count_Z` fields in kernel configuration,
621respectively by given dimensions.
622
623### .use_kernarg_segment_ptr
624
625This pseudo-op must be inside kernel configuration (`.config`). Enable
626`enable_sgpr_kernarg_segment_ptr` field in kernel configuration.
627
628### .use_ordered_append_gds
629
630This pseudo-op must be inside kernel configuration (`.config`). Enable
631`enable_ordered_append_gds` field in kernel configuration.
632
633### .use_private_segment_buffer
634
635This pseudo-op must be inside kernel configuration (`.config`). Enable
636`enable_sgpr_private_segment_buffer` field in kernel configuration.
637
638### .use_private_segment_size
639
640This pseudo-op must be inside kernel configuration (`.config`). Enable
641`enable_sgpr_private_segment_size` field in kernel configuration.
642
643### .use_ptr64
644
645This pseudo-op must be inside kernel configuration (`.config`). Enable `is_ptr64` field
646in kernel configuration.
647
648### .use_queue_ptr
649
650This pseudo-op must be inside kernel configuration (`.config`). Enable
651`enable_sgpr_queue_ptr` field in kernel configuration.
652
653### .use_xnack_enabled
654
655This pseudo-op must be inside kernel configuration (`.config`). Enable
656`is_xnack_enabled` field in kernel configuration.
657
658### .userdatanum
659
660Syntax: .userdatanum NUMBER
661
662This pseudo-op must be inside kernel configuration (`.config`). Set number of
663registers for USERDATA.
664
665### .vectypehint
666
667Syntax: .vectypehint "OPENCLTYPE"
668
669This pseudo-op must be inside kernel configuration (`.config`).
670Set vectypehint for kernel in metadata info. The argument is OpenCL type.
671
672### .vgprsnum
673
674Syntax: .vgprsnum REGNUM
675
676This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
677registers which can be used during kernel execution.
678
679### .wavefront_sgpr_count
680
681Syntax: .wavefront_sgpr_count REGNUM
682
683This pseudo-op must be inside kernel configuration (`.config`). Set
684`wavefront_sgpr_count` field in kernel configuration.
685
686### .wavefront_size
687
688Syntax: .wavefront_size POWEROFTWO
689
690This pseudo-op must be inside kernel configuration (`.config`). Set `wavefront_size`
691field in kernel configuration. Value must be a power of two.
692
693### .work_group_size_hint
694
695Syntax: .work_group_size_hint [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
696
697This pseudo-operation must be inside any kernel configuration.
698Set work_group_size_hint for this kernel in metadata info.
699
700### .workgroup_fbarrier_count
701
702Syntax: .workgroup_fbarrier_count COUNT
703
704This pseudo-op must be inside kernel configuration (`.config`). Set
705`workgroup_fbarrier_count` field in kernel configuration.
706
707### .workgroup_group_segment_size
708
709Syntax: .workgroup_group_segment_size SIZE
710
711This pseudo-op must be inside kernel configuration (`.config`). Set
712`workgroup_group_segment_byte_size` in kernel configuration.
713
714### .workitem_private_segment_size
715
716Syntax: .workitem_private_segment_size SIZE
717
718This pseudo-op must be inside kernel configuration (`.config`). Set
719`workitem_private_segment_byte_size` field in kernel configuration.
720
721### .workitem_vgpr_count
722
723Syntax: .workitem_vgpr_count REGNUM
724
725This pseudo-op must be inside kernel configuration (`.config`). Set
726`workitem_vgpr_count` field in kernel configuration.
727
728## Sample code
729
730This is sample example of the kernel setup:
731
732```
733.rocm
734.gpu Carrizo
735.arch_minor 0
736.arch_stepping 1
737.kernel test1
738.kernel test2
739.text
740test1:
741        .byte 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
742        .byte 0x01, 0x00, 0x08, 0x00, 0x00, 0x00, 0x01, 0x00
743        .byte 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
744        .fill 24, 1, 0x00
745        .byte 0x41, 0x00, 0x2c, 0x00, 0x90, 0x00, 0x00, 0x00
746        .byte 0x0b, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00
747        .fill 8, 1, 0x00
748        .byte 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
749        .byte 0x00, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x07, 0x00
750        .fill 8, 1, 0x00
751        .byte 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x06
752        .fill 152, 1, 0x00
753/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
754/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
755....
756```
757
758with kernel configuration:
759
760```
761.rocm
762.gpu Carrizo
763.arch_minor 0
764.arch_stepping 1
765.kernel test1
766    .config
767        .dims x
768        .sgprsnum 16
769        .vgprsnum 8
770        .dx10clamp
771        .floatmode 0xc0
772        .priority 0
773        .userdatanum 8
774        .pgmrsrc1 0x002c0041
775        .pgmrsrc2 0x00000090
776        .codeversion 1, 0
777        .machine 1, 8, 0, 1
778        .kernel_code_entry_offset 0x100
779        .use_private_segment_buffer
780        .use_dispatch_ptr
781        .use_kernarg_segment_ptr
782        .private_elem_size 4
783        .use_ptr64
784        .kernarg_segment_size 8
785        .wavefront_sgpr_count 15
786        .workitem_vgpr_count 7
787        .kernarg_segment_align 16
788        .group_segment_align 16
789        .private_segment_align 16
790        .wavefront_size 64
791        .call_convention 0x0
792    .control_directive          # optional
793        .fill 128, 1, 0x00
794.text
795test1:
796.skip 256           # skip ROCm kernel configuration (required)
797/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
798/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
799/*bf8c007f         */ s_waitcnt       lgkmcnt(0)
800/*8602ff02 0000ffff*/ s_and_b32       s2, s2, 0xffff
801/*92020802         */ s_mul_i32       s2, s2, s8
802/*32000002         */ v_add_u32       v0, vcc, s2, v0
803/*2202009f         */ v_ashrrev_i32   v1, 31, v0
804/*d28f0001 00020082*/ v_lshlrev_b64   v[1:2], 2, v[0:1]
805/*32060200         */ v_add_u32       v3, vcc, s0, v1
806...
807```
808
809The sample with metadata info:
810
811```
812.rocm
813.gpu Fiji
814.arch_minor 0
815.arch_stepping 4
816.eflags 2
817.newbinfmt
818.tripple "amdgcn-amd-amdhsa-amdgizcl"
819.md_version 1, 0
820.kernel vectorAdd
821    .config
822        .dims x
823        .codeversion 1, 1
824        .use_private_segment_buffer
825        .use_dispatch_ptr
826        .use_kernarg_segment_ptr
827        .private_elem_size 4
828        .use_ptr64
829        .kernarg_segment_align 16
830        .group_segment_align 16
831        .private_segment_align 16
832    .control_directive
833        .fill 128, 1, 0x00
834    .config
835        .md_language "OpenCL", 1, 2
836        .arg n, "uint", 4, , value, u32
837        .arg a, "float*", 8, , globalbuf, f32, global, default const volatile
838        .arg b, "float*", 8, , globalbuf, f32, global, default const
839        .arg c, "float*", 8, , globalbuf, f32, global, default
840        .arg , "", 8, , gox, i64
841        .arg , "", 8, , goy, i64
842        .arg , "", 8, , goz, i64
843        .arg , "", 8, , printfbuf, i8
844.text
845vectorAdd:
846.skip 256           # skip ROCm kernel configuration (required)
847...
848```
Note: See TracBrowser for help on using the repository browser.