source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmRocm.md @ 3702

Last change on this file since 3702 was 3702, checked in by matszpk, 3 years ago

CLRadeonExtender: CLRXDocs: Add info about registers kernel setup.

File size: 15.5 KB
Line 
1## CLRadeonExtender Assembler ROCm handling
2
3The ROCm platform is new an open-source  environment created by AMD for Radeon GPU
4(especially designed for HPC and their proffesional products). This platform uses HSACO
5binary object file format to store compiled code for GPU's.
6
7The ROCm binary format implementation and this documentation based on source:
8[ROCm-ComputeABI-Doc](https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc).
9
10## Binary format
11
12The binary file is stored in ELF file. The symbol table holds kernels and data's symbols.
13Main `.text` section contains all code for all kernels. Data
14(for example global constant datas) also stored in `.text' section.
15Kernel symbols points to configuration for kernel. Special offset field in configuration
16data's points where is kernel code.
17
18The assembler source code divided to three parts:
19
20* kernel configuration
21* kernel code and data (in `.text` section`)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27## Register usage setup
28
29The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
30This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
31
32## Scalar register allocation
33
34Assembler for ROCm format counts all SGPR registers and add extra registers
35(FLAT_SCRATCH, XNACK_MASK). Special fields determines
36what extra SGPR extra has been added. The VCC register is included by default.
37
38## List of the specific pseudo-operations
39
40### .arch_minor
41
42Syntax: .arch_minor ARCH_MINOR
43
44Set architecture minor number.
45
46### .arch_stepping
47
48Syntax: .arch_minor ARCH_STEPPING
49
50Set architecture stepping number.
51
52### .call_convention
53
54Syntax: .call_convention CALL_CONV
55
56This pseudo-op must be inside kernel configuration (`.config`).
57Set call convention for kernel.
58
59### .codeversion
60
61Syntax .codeversion MAJOR, MINOR
62
63This pseudo-op must be inside kernel configuration (`.config`). Set AMD code version.
64
65### .config
66
67Open kernel configuration. Must be inside kernel.
68
69### .control_directive
70
71Open control directive section. This section must be 128 bytes. The content of this
72section will be stored in control_directive field in kernel configuration.
73Must be defined inside kernel.
74
75### .debug_private_segment_buffer_sgpr
76
77Syntax: .debug_private_segment_buffer_sgpr SGPRREG
78
79This pseudo-op must be inside kernel configuration (`.config`). Set
80`debug_private_segment_buffer_sgpr` field in kernel configuration.
81
82### .debug_wavefront_private_segment_offset_sgpr
83
84Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
85
86This pseudo-op must be inside kernel configuration (`.config`). Set
87`debug_wavefront_private_segment_offset_sgpr` field in kernel configuration.
88
89### .debugmode
90
91This pseudo-op must be inside kernel configuration (`.config`).
92Enable usage of the DEBUG_MODE.
93
94### .dims
95
96Syntax: .dims DIMENSIONS
97
98This pseudo-op must be inside kernel configuration (`.config`). Defines what dimensions
99(from list: x, y, z) will be used to determine space of the kernel execution.
100
101### .dx10clamp
102
103This pseudo-op must be inside kernel configuration (`.config`).
104Enable usage of the DX10_CLAMP.
105
106### .eflags
107
108Syntax: .eflags EFLAGS
109
110Set value of ELF header e_flags field.
111
112### .exceptions
113
114Syntax: .exceptions EXCPMASK
115
116This pseudo-op must be inside kernel configuration (`.config`).
117Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
118
119### .fkernel
120
121Mark given kernel as function in ROCm. Must be inside kernel.
122
123### .floatmode
124
125Syntax: .floatmode BYTE-VALUE
126
127This pseudo-op must be inside kernel configuration (`.config`). Defines float-mode.
128Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
129
130### .gds_segment_size
131
132Syntax: .gds_segment_size SIZE
133
134This pseudo-op must be inside kernel configuration (`.config`). Set
135`gds_segment_size` field in kernel configuration.
136
137### .globaldata
138
139Go to constant global data section (`.rodata`).
140
141### .group_segment_align
142
143Syntax: .group_segment_align ALIGN
144
145This pseudo-op must be inside kernel configuration (`.config`). Set
146`group_segment_align` field in kernel configuration.
147
148### .default_hsa_features
149
150This pseudo-op must be inside kernel configuration (`.config`).
151It sets default HSA kernel features and register features (extra SGPR registers usage).
152These default features are `.use_private_segment_buffer`, `.use_dispatch_ptr`,
153`.use_kernarg_segment_ptr`, `.use_ptr64` and private_elem_size to 4 bytes.
154
155### .ieeemode
156
157Syntax: .ieeemode
158
159This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
160
161### .kcode
162
163Syntax: .kcode KERNEL1,.... 
164Syntax: .kcode +
165
166Open code that will be belonging to specified kernels. By default any code between
167two consecutive kernel labels belongs to the kernel with first label name.
168This pseudo-operation can change membership of the code to specified kernels.
169You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
170to kernels. The most important reason why this feature has been added is register usage
171calculation. Any kernel given in this pseudo-operation must be already defined.
172
173Sample usage:
174
175```
176.kcode + # this code belongs to all kernels
177.kcodeend
178.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
179    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
180    .kcodeend
181.kcodeend
182```
183
184### .kcodeend
185
186Close `.kcode` clause. Refer to `.kcode`.
187
188### .kernarg_segment_align
189
190Syntax: .kernarg_segment_align ALIGN
191
192This pseudo-op must be inside kernel configuration (`.config`). Set
193`kernarg_segment_alignment` field in kernel configuration. Value must be a power of two.
194
195### .kernarg_segment_size
196
197Syntax: .kernarg_segment_size SIZE
198
199This pseudo-op must be inside kernel configuration (`.config`). Set
200`kernarg_segment_byte_size` field in kernel configuration.
201
202### .kernel_code_entry_offset
203
204Syntax: .kernel_code_entry_offset OFFSET
205
206This pseudo-op must be inside kernel configuration (`.config`). Set
207`kernel_code_entry_byte_offset` field in kernel configuration. This field
208store offset between configuration and kernel code. By default is 256.
209
210### .kernel_code_prefetch_offset
211
212Syntax: .kernel_code_prefetch_offset OFFSET
213
214This pseudo-op must be inside kernel configuration (`.config`). Set
215`kernel_code_prefetch_byte_offset` field in kernel configuration.
216
217### .kernel_code_prefetch_size
218
219Syntax: .kernel_code_prefetch_size OFFSET
220
221This pseudo-op must be inside kernel configuration (`.config`). Set
222`kernel_code_prefetch_byte_size` field in kernel configuration.
223
224### .localsize
225
226Syntax: .localsize SIZE
227
228This pseudo-op must be inside kernel configuration (`.config`). Defines initial
229local memory size used by kernel.
230
231### .machine
232
233Syntax: .machine KIND, MAJOR, MINOR, STEPPING
234
235This pseudo-op must be inside kernel configuration (`.config`). Set
236machine version fields in kernel configuration.
237
238### .max_scratch_backing_memory
239
240Syntax: .max_scratch_backing_memory SIZE
241
242This pseudo-op must be inside kernel configuration (`.config`). Set
243`max_scratch_backing_memory_byte_size` field in kernel configuration.
244
245### .newbinfmt
246
247This pseudo-ops set new binary format.
248
249### .pgmrsrc1
250
251Syntax: .pgmrsrc1 VALUE
252
253This pseudo-op must be inside kernel configuration (`.config`).
254Defines value of the PGMRSRC1.
255
256### .pgmrsrc2
257
258Syntax: .pgmrsrc2 VALUE
259
260This pseudo-op must be inside kernel configuration (`.config`).
261Defines value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
262will be ignored. SCRATCH_EN bit will be ignored.
263
264### .priority
265
266Syntax: .priority PRIORITY
267
268This pseudo-op must be inside kernel configuration (`.config`). Defines priority (0-3).
269
270### .private_elem_size
271
272Syntax: .private_elem_size ELEMSIZE
273
274This pseudo-op must be inside kernel configuration (`.config`). Set `private_element_size`
275field in kernel configuration. Must be a power of two between 2 and 16.
276
277### .private_segment_align
278
279Syntax: .private_segment ALIGN
280
281This pseudo-op must be inside kernel configuration (`.config`). Set
282`private_segment_alignment` field in kernel configuration. Value must be a power of two.
283
284### .privmode
285
286This pseudo-op must be inside kernel configuration (`.config`).
287Enable usage of the PRIV (privileged mode).
288
289### .reserved_sgprs
290
291Syntax: .reserved_sgprs FIRSTREG, LASTREG
292
293This pseudo-op must be inside kernel configuration (`.config`). Set
294`reserved_sgpr_first` and `reserved_sgpr_count` fields in kernel configuration.
295`reserved_sgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
296
297### .reserved_vgprs
298
299Syntax: .reserved_vgprs FIRSTREG, LASTREG
300
301This pseudo-op must be inside kernel configuration (`.config`). Set
302`reserved_vgpr_first` and `reserved_vgpr_count` fields in kernel configuration.
303`reserved_vgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
304
305### .runtime_loader_kernel_symbol
306
307Syntax: .runtime_loader_kernel_symbol ADDRESS
308
309This pseudo-op must be inside kernel configuration (`.config`). Set
310`runtime_loader_kernel_symbol` field in kernel configuration.
311
312### .scratchbuffer
313
314Syntax: .scratchbuffer SIZE
315
316This pseudo-op must be inside kernel configuration (`.config`). Defines scratchbuffer size.
317
318### .sgprsnum
319
320Syntax: .sgprsnum REGNUM
321
322This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
323registers which can be used during kernel execution.
324It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
325
326### .target
327
328Syntax: .target "TARGET"
329
330Set LLVM target with device name. For example: "amdgcn-amd-amdhsa-amdgizcl-gfx803".
331
332### .tgsize
333
334This pseudo-op must be inside kernel configuration (`.config`).
335Enable usage of the TG_SIZE_EN.
336
337### .tripple
338
339Syntax: .tripple "TRIPPLE"
340
341Set LLVM target without device name. For example "amdgcn-amd-amdhsa-amdgizcl" with
342Fiji device generates target "amdgcn-amd-amdhsa-amdgizcl-gfx803".
343
344### .use_debug_enabled
345
346This pseudo-op must be inside kernel configuration (`.config`). Enable `is_debug_enabled`
347field in kernel configuration.
348
349### .use_dispatch_id
350
351This pseudo-op must be inside kernel configuration (`.config`). Enable
352`enable_sgpr_dispatch_id` field in kernel configuration.
353
354### .use_dispatch_ptr
355
356This pseudo-op must be inside kernel configuration (`.config`). Enable
357`enable_sgpr_dispatch_ptr` field in kernel configuration.
358
359### .use_dynamic_call_stack
360
361This pseudo-op must be inside kernel configuration (`.config`). Enable
362`is_dynamic_call_stack` field in kernel configuration.
363
364### .use_flat_scratch_init
365
366This pseudo-op must be inside kernel configuration (`.config`). Enable
367`enable_sgpr_flat_scratch_init` field in kernel configuration.
368
369### .use_grid_workgroup_count
370
371Syntax: .use_grid_workgroup_count DIMENSIONS
372
373This pseudo-op must be inside kernel configuration (`.config`). Enable
374`enable_sgpr_grid_workgroup_count_X`, `enable_sgpr_grid_workgroup_count_Y`
375and `enable_sgpr_grid_workgroup_count_Z` fields in kernel configuration,
376respectively by given dimensions.
377
378### .use_kernarg_segment_ptr
379
380This pseudo-op must be inside kernel configuration (`.config`). Enable
381`enable_sgpr_kernarg_segment_ptr` field in kernel configuration.
382
383### .use_ordered_append_gds
384
385This pseudo-op must be inside kernel configuration (`.config`). Enable
386`enable_ordered_append_gds` field in kernel configuration.
387
388### .use_private_segment_buffer
389
390This pseudo-op must be inside kernel configuration (`.config`). Enable
391`enable_sgpr_private_segment_buffer` field in kernel configuration.
392
393### .use_private_segment_size
394
395This pseudo-op must be inside kernel configuration (`.config`). Enable
396`enable_sgpr_private_segment_size` field in kernel configuration.
397
398### .use_ptr64
399
400This pseudo-op must be inside kernel configuration (`.config`). Enable `is_ptr64` field
401in kernel configuration.
402
403### .use_queue_ptr
404
405This pseudo-op must be inside kernel configuration (`.config`). Enable
406`enable_sgpr_queue_ptr` field in kernel configuration.
407
408### .use_xnack_enabled
409
410This pseudo-op must be inside kernel configuration (`.config`). Enable
411`is_xnack_enabled` field in kernel configuration.
412
413### .userdatanum
414
415Syntax: .userdatanum NUMBER
416
417This pseudo-op must be inside kernel configuration (`.config`). Set number of
418registers for USERDATA.
419
420### .vgprsnum
421
422Syntax: .vgprsnum REGNUM
423
424This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
425registers which can be used during kernel execution.
426
427### .wavefront_sgpr_count
428
429Syntax: .wavefront_sgpr_count REGNUM
430
431This pseudo-op must be inside kernel configuration (`.config`). Set
432`wavefront_sgpr_count` field in kernel configuration.
433
434### .wavefront_size
435
436Syntax: .wavefront_size POWEROFTWO
437
438This pseudo-op must be inside kernel configuration (`.config`). Set `wavefront_size`
439field in kernel configuration. Value must be a power of two.
440
441### .workgroup_fbarrier_count
442
443Syntax: .workgroup_fbarrier_count COUNT
444
445This pseudo-op must be inside kernel configuration (`.config`). Set
446`workgroup_fbarrier_count` field in kernel configuration.
447
448### .workgroup_group_segment_size
449
450Syntax: .workgroup_group_segment_size SIZE
451
452This pseudo-op must be inside kernel configuration (`.config`). Set
453`workgroup_group_segment_byte_size` in kernel configuration.
454
455### .workitem_private_segment_size
456
457Syntax: .workitem_private_segment_size SIZE
458
459This pseudo-op must be inside kernel configuration (`.config`). Set
460`workitem_private_segment_byte_size` field in kernel configuration.
461
462### .workitem_vgpr_count
463
464Syntax: .workitem_vgpr_count REGNUM
465
466This pseudo-op must be inside kernel configuration (`.config`). Set
467`workitem_vgpr_count` field in kernel configuration.
468
469## Sample code
470
471This is sample example of the kernel setup:
472
473```
474.rocm
475.gpu Carrizo
476.arch_minor 0
477.arch_stepping 1
478.kernel test1
479.kernel test2
480.text
481test1:
482        .byte 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
483        .byte 0x01, 0x00, 0x08, 0x00, 0x00, 0x00, 0x01, 0x00
484        .byte 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
485        .fill 24, 1, 0x00
486        .byte 0x41, 0x00, 0x2c, 0x00, 0x90, 0x00, 0x00, 0x00
487        .byte 0x0b, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00
488        .fill 8, 1, 0x00
489        .byte 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
490        .byte 0x00, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x07, 0x00
491        .fill 8, 1, 0x00
492        .byte 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x06
493        .fill 152, 1, 0x00
494/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
495/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
496....
497```
498
499with kernel configuration:
500
501```
502.rocm
503.gpu Carrizo
504.arch_minor 0
505.arch_stepping 1
506.kernel test1
507    .config
508        .dims x
509        .sgprsnum 16
510        .vgprsnum 8
511        .dx10clamp
512        .floatmode 0xc0
513        .priority 0
514        .userdatanum 8
515        .pgmrsrc1 0x002c0041
516        .pgmrsrc2 0x00000090
517        .codeversion 1, 0
518        .machine 1, 8, 0, 1
519        .kernel_code_entry_offset 0x100
520        .use_private_segment_buffer
521        .use_dispatch_ptr
522        .use_kernarg_segment_ptr
523        .private_elem_size 4
524        .use_ptr64
525        .kernarg_segment_size 8
526        .wavefront_sgpr_count 15
527        .workitem_vgpr_count 7
528        .kernarg_segment_align 16
529        .group_segment_align 16
530        .private_segment_align 16
531        .wavefront_size 64
532        .call_convention 0x0
533    .control_directive          # optional
534        .fill 128, 1, 0x00
535.text
536test1:
537.skip 256           # skip ROCm kernel configuration (required)
538/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
539/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
540/*bf8c007f         */ s_waitcnt       lgkmcnt(0)
541/*8602ff02 0000ffff*/ s_and_b32       s2, s2, 0xffff
542/*92020802         */ s_mul_i32       s2, s2, s8
543/*32000002         */ v_add_u32       v0, vcc, s2, v0
544/*2202009f         */ v_ashrrev_i32   v1, 31, v0
545/*d28f0001 00020082*/ v_lshlrev_b64   v[1:2], 2, v[0:1]
546/*32060200         */ v_add_u32       v3, vcc, s0, v1
547...
548```
Note: See TracBrowser for help on using the repository browser.