source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmRocm.md @ 3748

Last change on this file since 3748 was 3748, checked in by matszpk, 2 years ago

CLRadeonExtender: CLRXDocs: Fix typo. Add description of the ROCm metadata pseudo-ops.

File size: 20.2 KB
Line 
1## CLRadeonExtender Assembler ROCm handling
2
3The ROCm platform is new an open-source  environment created by AMD for Radeon GPU
4(especially designed for HPC and their proffesional products). This platform uses HSACO
5binary object file format to store compiled code for GPU's.
6
7The ROCm binary format implementation and this documentation based on source:
8[ROCm-ComputeABI-Doc](https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc).
9
10## Binary format
11
12The binary file is stored in ELF file. The symbol table holds kernels and data's symbols.
13Main `.text` section contains all code for all kernels. Data
14(for example global constant datas) also stored in `.text' section.
15Kernel symbols points to configuration for kernel. Special offset field in configuration
16data's points where is kernel code.
17
18The assembler source code divided to three parts:
19
20* kernel configuration
21* kernel code and data (in `.text` section`)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27Additional kernel informations and binary informations are in metadata ELF note.
28It holds informations about `printf` calls, kernel configuration and its arguments.
29
30## Register usage setup
31
32The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
33This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
34
35## Scalar register allocation
36
37Assembler for ROCm format counts all SGPR registers and add extra registers
38(FLAT_SCRATCH, XNACK_MASK). Special fields determines
39what extra SGPR extra has been added. The VCC register is included by default.
40
41## List of the specific pseudo-operations
42
43### .arch_minor
44
45Syntax: .arch_minor ARCH_MINOR
46
47Set architecture minor number.
48
49### .arch_stepping
50
51Syntax: .arch_minor ARCH_STEPPING
52
53Set architecture stepping number.
54
55### .call_convention
56
57Syntax: .call_convention CALL_CONV
58
59This pseudo-op must be inside kernel configuration (`.config`).
60Set call convention for kernel.
61
62### .codeversion
63
64Syntax .codeversion MAJOR, MINOR
65
66This pseudo-op must be inside kernel configuration (`.config`). Set AMD code version.
67
68### .config
69
70Open kernel configuration. Must be inside kernel.
71
72The kernel metadata info config pseudo-ops:
73
74* .arg - add kernel argument
75* .md_language - kernel language
76* .cws, .reqd_work_group_size - reqd_work_group_size
77* .work_group_size_hint - work_group_size_hint
78* .fixed_work_group_size - fixed work group size
79* .max_flat_work_group_size - max flat work group size
80* .vectypehint - vector type hint
81* .runtime_handle - runtime handle symbol name
82* .md_kernarg_segment_align - kernel argument segment alignment
83* .md_kernarg_segment_size - kernel argument segment size
84* .md_group_segment_fixed_size - group segment fixed size
85* .md_private_segment_fixed_size - private segment fixed size
86* .md_symname - kernel symbol name
87* .md_sgprsnum - number of SGPRs
88* .md_vgprsnum - number of VGPRs
89* .spilledsgprs - number of spilled SGPRs
90* .spilledvgprs - number of spilled VGPRs
91* .md_wavefront_size - wavefront size
92
93### .control_directive
94
95Open control directive section. This section must be 128 bytes. The content of this
96section will be stored in control_directive field in kernel configuration.
97Must be defined inside kernel.
98
99### .cws, .reqd_work_group_size
100
101Syntax: .cws SIZEHINT[, SIZEHINT[, SIZEHINT]]
102Syntax: .reqd_work_group_size SIZEHINT[, SIZEHINT[, SIZEHINT]]
103
104This pseudo-operation must be inside any kernel configuration.
105Set reqd_work_group_size hint for this kernel in metadata info.
106
107### .debug_private_segment_buffer_sgpr
108
109Syntax: .debug_private_segment_buffer_sgpr SGPRREG
110
111This pseudo-op must be inside kernel configuration (`.config`). Set
112`debug_private_segment_buffer_sgpr` field in kernel configuration.
113
114### .debug_wavefront_private_segment_offset_sgpr
115
116Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
117
118This pseudo-op must be inside kernel configuration (`.config`). Set
119`debug_wavefront_private_segment_offset_sgpr` field in kernel configuration.
120
121### .debugmode
122
123This pseudo-op must be inside kernel configuration (`.config`).
124Enable usage of the DEBUG_MODE.
125
126### .dims
127
128Syntax: .dims DIMENSIONS
129
130This pseudo-op must be inside kernel configuration (`.config`). Defines what dimensions
131(from list: x, y, z) will be used to determine space of the kernel execution.
132
133### .dx10clamp
134
135This pseudo-op must be inside kernel configuration (`.config`).
136Enable usage of the DX10_CLAMP.
137
138### .eflags
139
140Syntax: .eflags EFLAGS
141
142Set value of ELF header e_flags field.
143
144### .exceptions
145
146Syntax: .exceptions EXCPMASK
147
148This pseudo-op must be inside kernel configuration (`.config`).
149Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
150
151### .fixed_work_group_size
152
153Syntax: .fixed_work_group_size SIZEHINT[, SIZEHINT[, SIZEHINT]]
154
155This pseudo-operation must be inside any kernel configuration.
156Set fixed_work_group_size for this kernel in metadata info.
157
158### .fkernel
159
160Mark given kernel as function in ROCm. Must be inside kernel.
161
162### .floatmode
163
164Syntax: .floatmode BYTE-VALUE
165
166This pseudo-op must be inside kernel configuration (`.config`). Defines float-mode.
167Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
168
169### .gds_segment_size
170
171Syntax: .gds_segment_size SIZE
172
173This pseudo-op must be inside kernel configuration (`.config`). Set
174`gds_segment_size` field in kernel configuration.
175
176### .globaldata
177
178Go to constant global data section (`.rodata`).
179
180### .group_segment_align
181
182Syntax: .group_segment_align ALIGN
183
184This pseudo-op must be inside kernel configuration (`.config`). Set
185`group_segment_align` field in kernel configuration.
186
187### .default_hsa_features
188
189This pseudo-op must be inside kernel configuration (`.config`).
190It sets default HSA kernel features and register features (extra SGPR registers usage).
191These default features are `.use_private_segment_buffer`, `.use_dispatch_ptr`,
192`.use_kernarg_segment_ptr`, `.use_ptr64` and private_elem_size to 4 bytes.
193
194### .ieeemode
195
196Syntax: .ieeemode
197
198This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
199
200### .kcode
201
202Syntax: .kcode KERNEL1,.... 
203Syntax: .kcode +
204
205Open code that will be belonging to specified kernels. By default any code between
206two consecutive kernel labels belongs to the kernel with first label name.
207This pseudo-operation can change membership of the code to specified kernels.
208You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
209to kernels. The most important reason why this feature has been added is register usage
210calculation. Any kernel given in this pseudo-operation must be already defined.
211
212Sample usage:
213
214```
215.kcode + # this code belongs to all kernels
216.kcodeend
217.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
218    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
219    .kcodeend
220.kcodeend
221```
222
223### .kcodeend
224
225Close `.kcode` clause. Refer to `.kcode`.
226
227### .kernarg_segment_align
228
229Syntax: .kernarg_segment_align ALIGN
230
231This pseudo-op must be inside kernel configuration (`.config`). Set
232`kernarg_segment_alignment` field in kernel configuration. Value must be a power of two.
233
234### .kernarg_segment_size
235
236Syntax: .kernarg_segment_size SIZE
237
238This pseudo-op must be inside kernel configuration (`.config`). Set
239`kernarg_segment_byte_size` field in kernel configuration.
240
241### .kernel_code_entry_offset
242
243Syntax: .kernel_code_entry_offset OFFSET
244
245This pseudo-op must be inside kernel configuration (`.config`). Set
246`kernel_code_entry_byte_offset` field in kernel configuration. This field
247store offset between configuration and kernel code. By default is 256.
248
249### .kernel_code_prefetch_offset
250
251Syntax: .kernel_code_prefetch_offset OFFSET
252
253This pseudo-op must be inside kernel configuration (`.config`). Set
254`kernel_code_prefetch_byte_offset` field in kernel configuration.
255
256### .kernel_code_prefetch_size
257
258Syntax: .kernel_code_prefetch_size OFFSET
259
260This pseudo-op must be inside kernel configuration (`.config`). Set
261`kernel_code_prefetch_byte_size` field in kernel configuration.
262
263### .localsize
264
265Syntax: .localsize SIZE
266
267This pseudo-op must be inside kernel configuration (`.config`). Defines initial
268local memory size used by kernel.
269
270### .machine
271
272Syntax: .machine KIND, MAJOR, MINOR, STEPPING
273
274This pseudo-op must be inside kernel configuration (`.config`). Set
275machine version fields in kernel configuration.
276
277### .max_flat_work_group_size
278
279Syntax: .max_flat_work_group_size SIZE
280
281This pseudo-op must be inside kernel configuration (`.config`).
282Set max flat work group size in metadata info.
283
284### .max_scratch_backing_memory
285
286Syntax: .max_scratch_backing_memory SIZE
287
288This pseudo-op must be inside kernel configuration (`.config`). Set
289`max_scratch_backing_memory_byte_size` field in kernel configuration.
290
291### .md_group_segment_fixed_size
292
293Syntax: .md_group_segment_fixed_size SIZE
294
295This pseudo-op must be inside kernel configuration (`.config`).
296Set group segment fixed size in metadata info.
297
298### .md_kernarg_segment_align
299
300Syntax: .md_kernarg_segment_align ALIGNMENT
301
302This pseudo-op must be inside kernel configuration (`.config`).
303Set kernel argument segment alignment in metadata info.
304
305### .md_kernarg_segment_size
306
307Syntax: .md_kernarg_segment_size SIZE
308
309This pseudo-op must be inside kernel configuration (`.config`).
310Set kernel argument segment size in metadata info.
311
312### .md_private_segment_fixed_size
313
314Syntax: .md_private_segment_fixed_size SIZE
315
316This pseudo-op must be inside kernel configuration (`.config`).
317Set private segment fixed size in metadata info.
318
319### .md_symname
320
321Syntax: .md_symname "SYMBOLNAME"
322
323This pseudo-op must be inside kernel configuration (`.config`).
324Set kernel symbol name in metadata info. It should be in format "NAME@kd".
325
326### .md_language
327
328Syntax .md_language "LANGUAGE"[, MAJOR, MINOR]
329
330This pseudo-op must be inside kernel configuration (`.config`).
331Set kernel language and its version in metadata info. The language name is as string.
332
333### .md_sgprsnum
334
335Syntax: .md_sgprsnum REGNUM
336
337This pseudo-op must be inside kernel configuration (`.config`).
338Defines number of scalar registers for kernel in metadata info.
339
340### .md_version
341
342Syntax: .md_version MAJOR, MINOR
343
344This pseudo-ops defines metadata format version.
345
346### .md_wavefront_size
347
348Syntax: .md_wavefront_size SIZE
349
350This pseudo-op must be inside kernel configuration (`.config`).
351Defines wavefront size in metadata info. If not specified then value get from HSA config.
352
353### .md_vgprsnum
354
355Syntax: .md_vgprsnum REGNUM
356
357This pseudo-op must be inside kernel configuration (`.config`).
358Defines number of vector registers for kernel in metadata info.
359
360### .newbinfmt
361
362This pseudo-ops set new binary format.
363
364### .pgmrsrc1
365
366Syntax: .pgmrsrc1 VALUE
367
368This pseudo-op must be inside kernel configuration (`.config`).
369Defines value of the PGMRSRC1.
370
371### .pgmrsrc2
372
373Syntax: .pgmrsrc2 VALUE
374
375This pseudo-op must be inside kernel configuration (`.config`).
376Defines value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
377will be ignored. SCRATCH_EN bit will be ignored.
378
379### .printf
380
381Syntax: .printf [ID]\[,ARGSIZE,....],"FORMAT"
382
383This pseudo-op must be inside kernel configuration (`.config`).
384Adds new printf info entry to metadata info. The first argument is ID (must be unique)
385and is optional. Next arguments are argument size for printf call. The last argument
386is format string.
387
388### .priority
389
390Syntax: .priority PRIORITY
391
392This pseudo-op must be inside kernel configuration (`.config`). Defines priority (0-3).
393
394### .private_elem_size
395
396Syntax: .private_elem_size ELEMSIZE
397
398This pseudo-op must be inside kernel configuration (`.config`). Set `private_element_size`
399field in kernel configuration. Must be a power of two between 2 and 16.
400
401### .private_segment_align
402
403Syntax: .private_segment ALIGN
404
405This pseudo-op must be inside kernel configuration (`.config`). Set
406`private_segment_alignment` field in kernel configuration. Value must be a power of two.
407
408### .privmode
409
410This pseudo-op must be inside kernel configuration (`.config`).
411Enable usage of the PRIV (privileged mode).
412
413### .reserved_sgprs
414
415Syntax: .reserved_sgprs FIRSTREG, LASTREG
416
417This pseudo-op must be inside kernel configuration (`.config`). Set
418`reserved_sgpr_first` and `reserved_sgpr_count` fields in kernel configuration.
419`reserved_sgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
420
421### .reserved_vgprs
422
423Syntax: .reserved_vgprs FIRSTREG, LASTREG
424
425This pseudo-op must be inside kernel configuration (`.config`). Set
426`reserved_vgpr_first` and `reserved_vgpr_count` fields in kernel configuration.
427`reserved_vgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
428
429### .runtime_handle
430
431Syntax: .runtime_handle "SYMBOLNAME"
432
433This pseudo-op must be inside kernel configuration (`.config`).
434Set runtime handle in metadata info
435
436### .runtime_loader_kernel_symbol
437
438Syntax: .runtime_loader_kernel_symbol ADDRESS
439
440This pseudo-op must be inside kernel configuration (`.config`). Set
441`runtime_loader_kernel_symbol` field in kernel configuration.
442
443### .scratchbuffer
444
445Syntax: .scratchbuffer SIZE
446
447This pseudo-op must be inside kernel configuration (`.config`). Defines scratchbuffer size.
448
449### .sgprsnum
450
451Syntax: .sgprsnum REGNUM
452
453This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
454registers which can be used during kernel execution.
455It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
456
457### .spilledsgprs
458
459Syntax: .spilledsgprs REGNUM
460
461This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
462registers to spill in scratch buffer (in metadata info).
463
464### .spilledvgprs
465
466Syntax: .spilledvgprs REGNUM
467
468This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
469registers to spill in scratch buffer (in metadata info).
470
471### .target
472
473Syntax: .target "TARGET"
474
475Set LLVM target with device name. For example: "amdgcn-amd-amdhsa-amdgizcl-gfx803".
476
477### .tgsize
478
479This pseudo-op must be inside kernel configuration (`.config`).
480Enable usage of the TG_SIZE_EN.
481
482### .tripple
483
484Syntax: .tripple "TRIPPLE"
485
486Set LLVM target without device name. For example "amdgcn-amd-amdhsa-amdgizcl" with
487Fiji device generates target "amdgcn-amd-amdhsa-amdgizcl-gfx803".
488
489### .use_debug_enabled
490
491This pseudo-op must be inside kernel configuration (`.config`). Enable `is_debug_enabled`
492field in kernel configuration.
493
494### .use_dispatch_id
495
496This pseudo-op must be inside kernel configuration (`.config`). Enable
497`enable_sgpr_dispatch_id` field in kernel configuration.
498
499### .use_dispatch_ptr
500
501This pseudo-op must be inside kernel configuration (`.config`). Enable
502`enable_sgpr_dispatch_ptr` field in kernel configuration.
503
504### .use_dynamic_call_stack
505
506This pseudo-op must be inside kernel configuration (`.config`). Enable
507`is_dynamic_call_stack` field in kernel configuration.
508
509### .use_flat_scratch_init
510
511This pseudo-op must be inside kernel configuration (`.config`). Enable
512`enable_sgpr_flat_scratch_init` field in kernel configuration.
513
514### .use_grid_workgroup_count
515
516Syntax: .use_grid_workgroup_count DIMENSIONS
517
518This pseudo-op must be inside kernel configuration (`.config`). Enable
519`enable_sgpr_grid_workgroup_count_X`, `enable_sgpr_grid_workgroup_count_Y`
520and `enable_sgpr_grid_workgroup_count_Z` fields in kernel configuration,
521respectively by given dimensions.
522
523### .use_kernarg_segment_ptr
524
525This pseudo-op must be inside kernel configuration (`.config`). Enable
526`enable_sgpr_kernarg_segment_ptr` field in kernel configuration.
527
528### .use_ordered_append_gds
529
530This pseudo-op must be inside kernel configuration (`.config`). Enable
531`enable_ordered_append_gds` field in kernel configuration.
532
533### .use_private_segment_buffer
534
535This pseudo-op must be inside kernel configuration (`.config`). Enable
536`enable_sgpr_private_segment_buffer` field in kernel configuration.
537
538### .use_private_segment_size
539
540This pseudo-op must be inside kernel configuration (`.config`). Enable
541`enable_sgpr_private_segment_size` field in kernel configuration.
542
543### .use_ptr64
544
545This pseudo-op must be inside kernel configuration (`.config`). Enable `is_ptr64` field
546in kernel configuration.
547
548### .use_queue_ptr
549
550This pseudo-op must be inside kernel configuration (`.config`). Enable
551`enable_sgpr_queue_ptr` field in kernel configuration.
552
553### .use_xnack_enabled
554
555This pseudo-op must be inside kernel configuration (`.config`). Enable
556`is_xnack_enabled` field in kernel configuration.
557
558### .userdatanum
559
560Syntax: .userdatanum NUMBER
561
562This pseudo-op must be inside kernel configuration (`.config`). Set number of
563registers for USERDATA.
564
565### .vectypehint
566
567Syntax: .vectypehint "OPENCLTYPE"
568
569This pseudo-op must be inside kernel configuration (`.config`).
570Set vectypehint for kernel in metadata info. The argument is OpenCL type.
571
572### .vgprsnum
573
574Syntax: .vgprsnum REGNUM
575
576This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
577registers which can be used during kernel execution.
578
579### .wavefront_sgpr_count
580
581Syntax: .wavefront_sgpr_count REGNUM
582
583This pseudo-op must be inside kernel configuration (`.config`). Set
584`wavefront_sgpr_count` field in kernel configuration.
585
586### .wavefront_size
587
588Syntax: .wavefront_size POWEROFTWO
589
590This pseudo-op must be inside kernel configuration (`.config`). Set `wavefront_size`
591field in kernel configuration. Value must be a power of two.
592
593### .workgroup_fbarrier_count
594
595Syntax: .workgroup_fbarrier_count COUNT
596
597This pseudo-op must be inside kernel configuration (`.config`). Set
598`workgroup_fbarrier_count` field in kernel configuration.
599
600### .workgroup_group_segment_size
601
602Syntax: .workgroup_group_segment_size SIZE
603
604This pseudo-op must be inside kernel configuration (`.config`). Set
605`workgroup_group_segment_byte_size` in kernel configuration.
606
607### .workitem_private_segment_size
608
609Syntax: .workitem_private_segment_size SIZE
610
611This pseudo-op must be inside kernel configuration (`.config`). Set
612`workitem_private_segment_byte_size` field in kernel configuration.
613
614### .workitem_vgpr_count
615
616Syntax: .workitem_vgpr_count REGNUM
617
618This pseudo-op must be inside kernel configuration (`.config`). Set
619`workitem_vgpr_count` field in kernel configuration.
620
621### .work_group_size_hint
622
623Syntax: .work_group_size_hint SIZEHINT[, SIZEHINT[, SIZEHINT]]
624
625This pseudo-operation must be inside any kernel configuration.
626Set work_group_size_hint for this kernel in metadata info.
627
628## Sample code
629
630This is sample example of the kernel setup:
631
632```
633.rocm
634.gpu Carrizo
635.arch_minor 0
636.arch_stepping 1
637.kernel test1
638.kernel test2
639.text
640test1:
641        .byte 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
642        .byte 0x01, 0x00, 0x08, 0x00, 0x00, 0x00, 0x01, 0x00
643        .byte 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
644        .fill 24, 1, 0x00
645        .byte 0x41, 0x00, 0x2c, 0x00, 0x90, 0x00, 0x00, 0x00
646        .byte 0x0b, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00
647        .fill 8, 1, 0x00
648        .byte 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
649        .byte 0x00, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x07, 0x00
650        .fill 8, 1, 0x00
651        .byte 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x06
652        .fill 152, 1, 0x00
653/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
654/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
655....
656```
657
658with kernel configuration:
659
660```
661.rocm
662.gpu Carrizo
663.arch_minor 0
664.arch_stepping 1
665.kernel test1
666    .config
667        .dims x
668        .sgprsnum 16
669        .vgprsnum 8
670        .dx10clamp
671        .floatmode 0xc0
672        .priority 0
673        .userdatanum 8
674        .pgmrsrc1 0x002c0041
675        .pgmrsrc2 0x00000090
676        .codeversion 1, 0
677        .machine 1, 8, 0, 1
678        .kernel_code_entry_offset 0x100
679        .use_private_segment_buffer
680        .use_dispatch_ptr
681        .use_kernarg_segment_ptr
682        .private_elem_size 4
683        .use_ptr64
684        .kernarg_segment_size 8
685        .wavefront_sgpr_count 15
686        .workitem_vgpr_count 7
687        .kernarg_segment_align 16
688        .group_segment_align 16
689        .private_segment_align 16
690        .wavefront_size 64
691        .call_convention 0x0
692    .control_directive          # optional
693        .fill 128, 1, 0x00
694.text
695test1:
696.skip 256           # skip ROCm kernel configuration (required)
697/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
698/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
699/*bf8c007f         */ s_waitcnt       lgkmcnt(0)
700/*8602ff02 0000ffff*/ s_and_b32       s2, s2, 0xffff
701/*92020802         */ s_mul_i32       s2, s2, s8
702/*32000002         */ v_add_u32       v0, vcc, s2, v0
703/*2202009f         */ v_ashrrev_i32   v1, 31, v0
704/*d28f0001 00020082*/ v_lshlrev_b64   v[1:2], 2, v[0:1]
705/*32060200         */ v_add_u32       v3, vcc, s0, v1
706...
707```
Note: See TracBrowser for help on using the repository browser.