source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmRocm.md @ 3750

Last change on this file since 3750 was 3750, checked in by matszpk, 2 years ago

CLRadeonExtender: CLRXDocs: Replace 'Defines' by 'Define'.

File size: 20.2 KB
Line 
1## CLRadeonExtender Assembler ROCm handling
2
3The ROCm platform is new an open-source  environment created by AMD for Radeon GPU
4(especially designed for HPC and their proffesional products). This platform uses HSACO
5binary object file format to store compiled code for GPU's.
6
7The ROCm binary format implementation and this documentation based on source:
8[ROCm-ComputeABI-Doc](https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc).
9
10## Binary format
11
12The binary file is stored in ELF file. The symbol table holds kernels and data's symbols.
13Main `.text` section contains all code for all kernels. Data
14(for example global constant datas) also stored in `.text' section.
15Kernel symbols points to configuration for kernel. Special offset field in configuration
16data's points where is kernel code.
17
18The assembler source code divided to three parts:
19
20* kernel configuration
21* kernel code and data (in `.text` section`)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27Additional kernel informations and binary informations are in metadata ELF note.
28It holds informations about `printf` calls, kernel configuration and its arguments.
29
30## Register usage setup
31
32The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
33This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
34
35## Scalar register allocation
36
37Assembler for ROCm format counts all SGPR registers and add extra registers
38(FLAT_SCRATCH, XNACK_MASK). Special fields determines
39what extra SGPR extra has been added. The VCC register is included by default.
40
41## List of the specific pseudo-operations
42
43### .arch_minor
44
45Syntax: .arch_minor ARCH_MINOR
46
47Set architecture minor number.
48
49### .arch_stepping
50
51Syntax: .arch_minor ARCH_STEPPING
52
53Set architecture stepping number.
54
55### .call_convention
56
57Syntax: .call_convention CALL_CONV
58
59This pseudo-op must be inside kernel configuration (`.config`).
60Set call convention for kernel.
61
62### .codeversion
63
64Syntax .codeversion MAJOR, MINOR
65
66This pseudo-op must be inside kernel configuration (`.config`). Set AMD code version.
67
68### .config
69
70Open kernel configuration. Must be inside kernel.
71
72The kernel metadata info config pseudo-ops:
73
74* .arg - add kernel argument
75* .md_language - kernel language
76* .cws, .reqd_work_group_size - reqd_work_group_size
77* .work_group_size_hint - work_group_size_hint
78* .fixed_work_group_size - fixed work group size
79* .max_flat_work_group_size - max flat work group size
80* .vectypehint - vector type hint
81* .runtime_handle - runtime handle symbol name
82* .md_kernarg_segment_align - kernel argument segment alignment
83* .md_kernarg_segment_size - kernel argument segment size
84* .md_group_segment_fixed_size - group segment fixed size
85* .md_private_segment_fixed_size - private segment fixed size
86* .md_symname - kernel symbol name
87* .md_sgprsnum - number of SGPRs
88* .md_vgprsnum - number of VGPRs
89* .spilledsgprs - number of spilled SGPRs
90* .spilledvgprs - number of spilled VGPRs
91* .md_wavefront_size - wavefront size
92
93### .control_directive
94
95Open control directive section. This section must be 128 bytes. The content of this
96section will be stored in control_directive field in kernel configuration.
97Must be defined inside kernel.
98
99### .cws, .reqd_work_group_size
100
101Syntax: .cws SIZEHINT[, SIZEHINT[, SIZEHINT]]
102Syntax: .reqd_work_group_size SIZEHINT[, SIZEHINT[, SIZEHINT]]
103
104This pseudo-operation must be inside any kernel configuration.
105Set reqd_work_group_size hint for this kernel in metadata info.
106
107### .debug_private_segment_buffer_sgpr
108
109Syntax: .debug_private_segment_buffer_sgpr SGPRREG
110
111This pseudo-op must be inside kernel configuration (`.config`). Set
112`debug_private_segment_buffer_sgpr` field in kernel configuration.
113
114### .debug_wavefront_private_segment_offset_sgpr
115
116Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
117
118This pseudo-op must be inside kernel configuration (`.config`). Set
119`debug_wavefront_private_segment_offset_sgpr` field in kernel configuration.
120
121### .debugmode
122
123This pseudo-op must be inside kernel configuration (`.config`).
124Enable usage of the DEBUG_MODE.
125
126### .dims
127
128Syntax: .dims DIMENSIONS
129
130This pseudo-op must be inside kernel configuration (`.config`). Define what dimensions
131(from list: x, y, z) will be used to determine space of the kernel execution.
132
133### .dx10clamp
134
135This pseudo-op must be inside kernel configuration (`.config`).
136Enable usage of the DX10_CLAMP.
137
138### .eflags
139
140Syntax: .eflags EFLAGS
141
142Set value of ELF header e_flags field.
143
144### .exceptions
145
146Syntax: .exceptions EXCPMASK
147
148This pseudo-op must be inside kernel configuration (`.config`).
149Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
150
151### .fixed_work_group_size
152
153Syntax: .fixed_work_group_size SIZEHINT[, SIZEHINT[, SIZEHINT]]
154
155This pseudo-operation must be inside any kernel configuration.
156Set fixed_work_group_size for this kernel in metadata info.
157
158### .fkernel
159
160Mark given kernel as function in ROCm. Must be inside kernel.
161
162### .floatmode
163
164Syntax: .floatmode BYTE-VALUE
165
166This pseudo-op must be inside kernel configuration (`.config`). Define float-mode.
167Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
168
169### .gds_segment_size
170
171Syntax: .gds_segment_size SIZE
172
173This pseudo-op must be inside kernel configuration (`.config`). Set
174`gds_segment_size` field in kernel configuration.
175
176### .globaldata
177
178Go to constant global data section (`.rodata`).
179
180### .group_segment_align
181
182Syntax: .group_segment_align ALIGN
183
184This pseudo-op must be inside kernel configuration (`.config`). Set
185`group_segment_align` field in kernel configuration.
186
187### .default_hsa_features
188
189This pseudo-op must be inside kernel configuration (`.config`).
190It sets default HSA kernel features and register features (extra SGPR registers usage).
191These default features are `.use_private_segment_buffer`, `.use_dispatch_ptr`,
192`.use_kernarg_segment_ptr`, `.use_ptr64` and private_elem_size to 4 bytes.
193
194### .ieeemode
195
196Syntax: .ieeemode
197
198This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
199
200### .kcode
201
202Syntax: .kcode KERNEL1,.... 
203Syntax: .kcode +
204
205Open code that will be belonging to specified kernels. By default any code between
206two consecutive kernel labels belongs to the kernel with first label name.
207This pseudo-operation can change membership of the code to specified kernels.
208You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
209to kernels. The most important reason why this feature has been added is register usage
210calculation. Any kernel given in this pseudo-operation must be already defined.
211
212Sample usage:
213
214```
215.kcode + # this code belongs to all kernels
216.kcodeend
217.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
218    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
219    .kcodeend
220.kcodeend
221```
222
223### .kcodeend
224
225Close `.kcode` clause. Refer to `.kcode`.
226
227### .kernarg_segment_align
228
229Syntax: .kernarg_segment_align ALIGN
230
231This pseudo-op must be inside kernel configuration (`.config`). Set
232`kernarg_segment_alignment` field in kernel configuration. Value must be a power of two.
233
234### .kernarg_segment_size
235
236Syntax: .kernarg_segment_size SIZE
237
238This pseudo-op must be inside kernel configuration (`.config`). Set
239`kernarg_segment_byte_size` field in kernel configuration.
240
241### .kernel_code_entry_offset
242
243Syntax: .kernel_code_entry_offset OFFSET
244
245This pseudo-op must be inside kernel configuration (`.config`). Set
246`kernel_code_entry_byte_offset` field in kernel configuration. This field
247store offset between configuration and kernel code. By default is 256.
248
249### .kernel_code_prefetch_offset
250
251Syntax: .kernel_code_prefetch_offset OFFSET
252
253This pseudo-op must be inside kernel configuration (`.config`). Set
254`kernel_code_prefetch_byte_offset` field in kernel configuration.
255
256### .kernel_code_prefetch_size
257
258Syntax: .kernel_code_prefetch_size OFFSET
259
260This pseudo-op must be inside kernel configuration (`.config`). Set
261`kernel_code_prefetch_byte_size` field in kernel configuration.
262
263### .localsize
264
265Syntax: .localsize SIZE
266
267This pseudo-op must be inside kernel configuration (`.config`). Define initial
268local memory size used by kernel.
269
270### .machine
271
272Syntax: .machine KIND, MAJOR, MINOR, STEPPING
273
274This pseudo-op must be inside kernel configuration (`.config`). Set
275machine version fields in kernel configuration.
276
277### .max_flat_work_group_size
278
279Syntax: .max_flat_work_group_size SIZE
280
281This pseudo-op must be inside kernel configuration (`.config`).
282Set max flat work group size in metadata info.
283
284### .max_scratch_backing_memory
285
286Syntax: .max_scratch_backing_memory SIZE
287
288This pseudo-op must be inside kernel configuration (`.config`). Set
289`max_scratch_backing_memory_byte_size` field in kernel configuration.
290
291### .md_group_segment_fixed_size
292
293Syntax: .md_group_segment_fixed_size SIZE
294
295This pseudo-op must be inside kernel configuration (`.config`).
296Set group segment fixed size in metadata info.
297
298### .md_kernarg_segment_align
299
300Syntax: .md_kernarg_segment_align ALIGNMENT
301
302This pseudo-op must be inside kernel configuration (`.config`).
303Set kernel argument segment alignment in metadata info.
304
305### .md_kernarg_segment_size
306
307Syntax: .md_kernarg_segment_size SIZE
308
309This pseudo-op must be inside kernel configuration (`.config`).
310Set kernel argument segment size in metadata info.
311
312### .md_private_segment_fixed_size
313
314Syntax: .md_private_segment_fixed_size SIZE
315
316This pseudo-op must be inside kernel configuration (`.config`).
317Set private segment fixed size in metadata info.
318
319### .md_symname
320
321Syntax: .md_symname "SYMBOLNAME"
322
323This pseudo-op must be inside kernel configuration (`.config`).
324Set kernel symbol name in metadata info. It should be in format "NAME@kd".
325
326### .md_language
327
328Syntax .md_language "LANGUAGE"[, MAJOR, MINOR]
329
330This pseudo-op must be inside kernel configuration (`.config`).
331Set kernel language and its version in metadata info. The language name is as string.
332
333### .md_sgprsnum
334
335Syntax: .md_sgprsnum REGNUM
336
337This pseudo-op must be inside kernel configuration (`.config`).
338Define number of scalar registers for kernel in metadata info.
339
340### .md_version
341
342Syntax: .md_version MAJOR, MINOR
343
344This pseudo-ops defines metadata format version.
345
346### .md_wavefront_size
347
348Syntax: .md_wavefront_size SIZE
349
350This pseudo-op must be inside kernel configuration (`.config`).
351Define wavefront size in metadata info. If not specified then value get from HSA config.
352
353### .md_vgprsnum
354
355Syntax: .md_vgprsnum REGNUM
356
357This pseudo-op must be inside kernel configuration (`.config`).
358Define number of vector registers for kernel in metadata info.
359
360### .metadata
361
362Go to metadata (metadata ELF note) section.
363
364### .newbinfmt
365
366This pseudo-ops set new binary format.
367
368### .pgmrsrc1
369
370Syntax: .pgmrsrc1 VALUE
371
372This pseudo-op must be inside kernel configuration (`.config`).
373Define value of the PGMRSRC1.
374
375### .pgmrsrc2
376
377Syntax: .pgmrsrc2 VALUE
378
379This pseudo-op must be inside kernel configuration (`.config`).
380Define value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
381will be ignored. SCRATCH_EN bit will be ignored.
382
383### .printf
384
385Syntax: .printf [ID]\[,ARGSIZE,....],"FORMAT"
386
387This pseudo-op must be inside kernel configuration (`.config`).
388Adds new printf info entry to metadata info. The first argument is ID (must be unique)
389and is optional. Next arguments are argument size for printf call. The last argument
390is format string.
391
392### .priority
393
394Syntax: .priority PRIORITY
395
396This pseudo-op must be inside kernel configuration (`.config`). Define priority (0-3).
397
398### .private_elem_size
399
400Syntax: .private_elem_size ELEMSIZE
401
402This pseudo-op must be inside kernel configuration (`.config`). Set `private_element_size`
403field in kernel configuration. Must be a power of two between 2 and 16.
404
405### .private_segment_align
406
407Syntax: .private_segment ALIGN
408
409This pseudo-op must be inside kernel configuration (`.config`). Set
410`private_segment_alignment` field in kernel configuration. Value must be a power of two.
411
412### .privmode
413
414This pseudo-op must be inside kernel configuration (`.config`).
415Enable usage of the PRIV (privileged mode).
416
417### .reserved_sgprs
418
419Syntax: .reserved_sgprs FIRSTREG, LASTREG
420
421This pseudo-op must be inside kernel configuration (`.config`). Set
422`reserved_sgpr_first` and `reserved_sgpr_count` fields in kernel configuration.
423`reserved_sgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
424
425### .reserved_vgprs
426
427Syntax: .reserved_vgprs FIRSTREG, LASTREG
428
429This pseudo-op must be inside kernel configuration (`.config`). Set
430`reserved_vgpr_first` and `reserved_vgpr_count` fields in kernel configuration.
431`reserved_vgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
432
433### .runtime_handle
434
435Syntax: .runtime_handle "SYMBOLNAME"
436
437This pseudo-op must be inside kernel configuration (`.config`).
438Set runtime handle in metadata info
439
440### .runtime_loader_kernel_symbol
441
442Syntax: .runtime_loader_kernel_symbol ADDRESS
443
444This pseudo-op must be inside kernel configuration (`.config`). Set
445`runtime_loader_kernel_symbol` field in kernel configuration.
446
447### .scratchbuffer
448
449Syntax: .scratchbuffer SIZE
450
451This pseudo-op must be inside kernel configuration (`.config`). Define scratchbuffer size.
452
453### .sgprsnum
454
455Syntax: .sgprsnum REGNUM
456
457This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
458registers which can be used during kernel execution.
459It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
460
461### .spilledsgprs
462
463Syntax: .spilledsgprs REGNUM
464
465This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
466registers to spill in scratch buffer (in metadata info).
467
468### .spilledvgprs
469
470Syntax: .spilledvgprs REGNUM
471
472This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
473registers to spill in scratch buffer (in metadata info).
474
475### .target
476
477Syntax: .target "TARGET"
478
479Set LLVM target with device name. For example: "amdgcn-amd-amdhsa-amdgizcl-gfx803".
480
481### .tgsize
482
483This pseudo-op must be inside kernel configuration (`.config`).
484Enable usage of the TG_SIZE_EN.
485
486### .tripple
487
488Syntax: .tripple "TRIPPLE"
489
490Set LLVM target without device name. For example "amdgcn-amd-amdhsa-amdgizcl" with
491Fiji device generates target "amdgcn-amd-amdhsa-amdgizcl-gfx803".
492
493### .use_debug_enabled
494
495This pseudo-op must be inside kernel configuration (`.config`). Enable `is_debug_enabled`
496field in kernel configuration.
497
498### .use_dispatch_id
499
500This pseudo-op must be inside kernel configuration (`.config`). Enable
501`enable_sgpr_dispatch_id` field in kernel configuration.
502
503### .use_dispatch_ptr
504
505This pseudo-op must be inside kernel configuration (`.config`). Enable
506`enable_sgpr_dispatch_ptr` field in kernel configuration.
507
508### .use_dynamic_call_stack
509
510This pseudo-op must be inside kernel configuration (`.config`). Enable
511`is_dynamic_call_stack` field in kernel configuration.
512
513### .use_flat_scratch_init
514
515This pseudo-op must be inside kernel configuration (`.config`). Enable
516`enable_sgpr_flat_scratch_init` field in kernel configuration.
517
518### .use_grid_workgroup_count
519
520Syntax: .use_grid_workgroup_count DIMENSIONS
521
522This pseudo-op must be inside kernel configuration (`.config`). Enable
523`enable_sgpr_grid_workgroup_count_X`, `enable_sgpr_grid_workgroup_count_Y`
524and `enable_sgpr_grid_workgroup_count_Z` fields in kernel configuration,
525respectively by given dimensions.
526
527### .use_kernarg_segment_ptr
528
529This pseudo-op must be inside kernel configuration (`.config`). Enable
530`enable_sgpr_kernarg_segment_ptr` field in kernel configuration.
531
532### .use_ordered_append_gds
533
534This pseudo-op must be inside kernel configuration (`.config`). Enable
535`enable_ordered_append_gds` field in kernel configuration.
536
537### .use_private_segment_buffer
538
539This pseudo-op must be inside kernel configuration (`.config`). Enable
540`enable_sgpr_private_segment_buffer` field in kernel configuration.
541
542### .use_private_segment_size
543
544This pseudo-op must be inside kernel configuration (`.config`). Enable
545`enable_sgpr_private_segment_size` field in kernel configuration.
546
547### .use_ptr64
548
549This pseudo-op must be inside kernel configuration (`.config`). Enable `is_ptr64` field
550in kernel configuration.
551
552### .use_queue_ptr
553
554This pseudo-op must be inside kernel configuration (`.config`). Enable
555`enable_sgpr_queue_ptr` field in kernel configuration.
556
557### .use_xnack_enabled
558
559This pseudo-op must be inside kernel configuration (`.config`). Enable
560`is_xnack_enabled` field in kernel configuration.
561
562### .userdatanum
563
564Syntax: .userdatanum NUMBER
565
566This pseudo-op must be inside kernel configuration (`.config`). Set number of
567registers for USERDATA.
568
569### .vectypehint
570
571Syntax: .vectypehint "OPENCLTYPE"
572
573This pseudo-op must be inside kernel configuration (`.config`).
574Set vectypehint for kernel in metadata info. The argument is OpenCL type.
575
576### .vgprsnum
577
578Syntax: .vgprsnum REGNUM
579
580This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
581registers which can be used during kernel execution.
582
583### .wavefront_sgpr_count
584
585Syntax: .wavefront_sgpr_count REGNUM
586
587This pseudo-op must be inside kernel configuration (`.config`). Set
588`wavefront_sgpr_count` field in kernel configuration.
589
590### .wavefront_size
591
592Syntax: .wavefront_size POWEROFTWO
593
594This pseudo-op must be inside kernel configuration (`.config`). Set `wavefront_size`
595field in kernel configuration. Value must be a power of two.
596
597### .workgroup_fbarrier_count
598
599Syntax: .workgroup_fbarrier_count COUNT
600
601This pseudo-op must be inside kernel configuration (`.config`). Set
602`workgroup_fbarrier_count` field in kernel configuration.
603
604### .workgroup_group_segment_size
605
606Syntax: .workgroup_group_segment_size SIZE
607
608This pseudo-op must be inside kernel configuration (`.config`). Set
609`workgroup_group_segment_byte_size` in kernel configuration.
610
611### .workitem_private_segment_size
612
613Syntax: .workitem_private_segment_size SIZE
614
615This pseudo-op must be inside kernel configuration (`.config`). Set
616`workitem_private_segment_byte_size` field in kernel configuration.
617
618### .workitem_vgpr_count
619
620Syntax: .workitem_vgpr_count REGNUM
621
622This pseudo-op must be inside kernel configuration (`.config`). Set
623`workitem_vgpr_count` field in kernel configuration.
624
625### .work_group_size_hint
626
627Syntax: .work_group_size_hint SIZEHINT[, SIZEHINT[, SIZEHINT]]
628
629This pseudo-operation must be inside any kernel configuration.
630Set work_group_size_hint for this kernel in metadata info.
631
632## Sample code
633
634This is sample example of the kernel setup:
635
636```
637.rocm
638.gpu Carrizo
639.arch_minor 0
640.arch_stepping 1
641.kernel test1
642.kernel test2
643.text
644test1:
645        .byte 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
646        .byte 0x01, 0x00, 0x08, 0x00, 0x00, 0x00, 0x01, 0x00
647        .byte 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
648        .fill 24, 1, 0x00
649        .byte 0x41, 0x00, 0x2c, 0x00, 0x90, 0x00, 0x00, 0x00
650        .byte 0x0b, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00
651        .fill 8, 1, 0x00
652        .byte 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
653        .byte 0x00, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x07, 0x00
654        .fill 8, 1, 0x00
655        .byte 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x06
656        .fill 152, 1, 0x00
657/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
658/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
659....
660```
661
662with kernel configuration:
663
664```
665.rocm
666.gpu Carrizo
667.arch_minor 0
668.arch_stepping 1
669.kernel test1
670    .config
671        .dims x
672        .sgprsnum 16
673        .vgprsnum 8
674        .dx10clamp
675        .floatmode 0xc0
676        .priority 0
677        .userdatanum 8
678        .pgmrsrc1 0x002c0041
679        .pgmrsrc2 0x00000090
680        .codeversion 1, 0
681        .machine 1, 8, 0, 1
682        .kernel_code_entry_offset 0x100
683        .use_private_segment_buffer
684        .use_dispatch_ptr
685        .use_kernarg_segment_ptr
686        .private_elem_size 4
687        .use_ptr64
688        .kernarg_segment_size 8
689        .wavefront_sgpr_count 15
690        .workitem_vgpr_count 7
691        .kernarg_segment_align 16
692        .group_segment_align 16
693        .private_segment_align 16
694        .wavefront_size 64
695        .call_convention 0x0
696    .control_directive          # optional
697        .fill 128, 1, 0x00
698.text
699test1:
700.skip 256           # skip ROCm kernel configuration (required)
701/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
702/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
703/*bf8c007f         */ s_waitcnt       lgkmcnt(0)
704/*8602ff02 0000ffff*/ s_and_b32       s2, s2, 0xffff
705/*92020802         */ s_mul_i32       s2, s2, s8
706/*32000002         */ v_add_u32       v0, vcc, s2, v0
707/*2202009f         */ v_ashrrev_i32   v1, 31, v0
708/*d28f0001 00020082*/ v_lshlrev_b64   v[1:2], 2, v[0:1]
709/*32060200         */ v_add_u32       v3, vcc, s0, v1
710...
711```
Note: See TracBrowser for help on using the repository browser.