source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmRocm.md @ 3668

Last change on this file since 3668 was 3668, checked in by matszpk, 3 years ago

CLRadeonExtender: AsmROCm: Add '.newbinfmt' and '.globaldata' pseudo-ops. ROCmBinGen: small fixes.

File size: 15.0 KB
Line 
1## CLRadeonExtender Assembler ROCm handling
2
3The ROCm platform is new an open-source  environment created by AMD for Radeon GPU
4(especially designed for HPC and their proffesional products). This platform uses HSACO
5binary object file format to store compiled code for GPU's.
6
7The ROCm binary format implementation and this documentation based on source:
8[ROCm-ComputeABI-Doc](https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc).
9
10## Binary format
11
12The binary file is stored in ELF file. The symbol table holds kernels and data's symbols.
13Main `.text` section contains all code for all kernels. Data
14(for example global constant datas) also stored in `.text' section.
15Kernel symbols points to configuration for kernel. Special offset field in configuration
16data's points where is kernel code.
17
18The assembler source code divided to three parts:
19
20* kernel configuration
21* kernel code and data (in `.text` section`)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27## Scalar register allocation
28
29Assembler for ROCm format counts all SGPR registers and add extra registers
30(FLAT_SCRATCH, XNACK_MASK). Special fields determines
31what extra SGPR extra has been added. The VCC register is included by default.
32
33## List of the specific pseudo-operations
34
35### .arch_minor
36
37Syntax: .arch_minor ARCH_MINOR
38
39Set architecture minor number.
40
41### .arch_stepping
42
43Syntax: .arch_minor ARCH_STEPPING
44
45Set architecture stepping number.
46
47### .call_convention
48
49Syntax: .call_convention CALL_CONV
50
51This pseudo-op must be inside kernel configuration (`.config`).
52Set call convention for kernel.
53
54### .codeversion
55
56Syntax .codeversion MAJOR, MINOR
57
58This pseudo-op must be inside kernel configuration (`.config`). Set AMD code version.
59
60### .config
61
62Open kernel configuration. Must be inside kernel.
63
64### .control_directive
65
66Open control directive section. This section must be 128 bytes. The content of this
67section will be stored in control_directive field in kernel configuration.
68Must be defined inside kernel.
69
70### .debug_private_segment_buffer_sgpr
71
72Syntax: .debug_private_segment_buffer_sgpr SGPRREG
73
74This pseudo-op must be inside kernel configuration (`.config`). Set
75`debug_private_segment_buffer_sgpr` field in kernel configuration.
76
77### .debug_wavefront_private_segment_offset_sgpr
78
79Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
80
81This pseudo-op must be inside kernel configuration (`.config`). Set
82`debug_wavefront_private_segment_offset_sgpr` field in kernel configuration.
83
84### .debugmode
85
86This pseudo-op must be inside kernel configuration (`.config`).
87Enable usage of the DEBUG_MODE.
88
89### .dims
90
91Syntax: .dims DIMENSIONS
92
93This pseudo-op must be inside kernel configuration (`.config`). Defines what dimensions
94(from list: x, y, z) will be used to determine space of the kernel execution.
95
96### .dx10clamp
97
98This pseudo-op must be inside kernel configuration (`.config`).
99Enable usage of the DX10_CLAMP.
100
101### .eflags
102
103Syntax: .eflags EFLAGS
104
105Set value of ELF header e_flags field.
106
107### .exceptions
108
109Syntax: .exceptions EXCPMASK
110
111This pseudo-op must be inside kernel configuration (`.config`).
112Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
113
114### .fkernel
115
116Mark given kernel as function in ROCm. Must be inside kernel.
117
118### .floatmode
119
120Syntax: .floatmode BYTE-VALUE
121
122This pseudo-op must be inside kernel configuration (`.config`). Defines float-mode.
123Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
124
125### .gds_segment_size
126
127Syntax: .gds_segment_size SIZE
128
129This pseudo-op must be inside kernel configuration (`.config`). Set
130`gds_segment_size` field in kernel configuration.
131
132### .globaldata
133
134Go to constant global data section (`.rodata`).
135
136### .group_segment_align
137
138Syntax: .group_segment_align ALIGN
139
140This pseudo-op must be inside kernel configuration (`.config`). Set
141`group_segment_align` field in kernel configuration.
142
143### .default_hsa_features
144
145This pseudo-op must be inside kernel configuration (`.config`).
146It sets default HSA kernel features and register features (extra SGPR registers usage).
147These default features are `.use_private_segment_buffer`, `.use_dispatch_ptr`,
148`.use_kernarg_segment_ptr`, `.use_ptr64` and private_elem_size to 4 bytes.
149
150### .ieeemode
151
152Syntax: .ieeemode
153
154This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
155
156### .kcode
157
158Syntax: .kcode KERNEL1,.... 
159Syntax: .kcode +
160
161Open code that will be belonging to specified kernels. By default any code between
162two consecutive kernel labels belongs to the kernel with first label name.
163This pseudo-operation can change membership of the code to specified kernels.
164You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
165to kernels. The most important reason why this feature has been added is register usage
166calculation. Any kernel given in this pseudo-operation must be already defined.
167
168Sample usage:
169
170```
171.kcode + # this code belongs to all kernels
172.kcodeend
173.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
174    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
175    .kcodeend
176.kcodeend
177```
178
179### .kcodeend
180
181Close `.kcode` clause. Refer to `.kcode`.
182
183### .kernarg_segment_align
184
185Syntax: .kernarg_segment_align ALIGN
186
187This pseudo-op must be inside kernel configuration (`.config`). Set
188`kernarg_segment_alignment` field in kernel configuration. Value must be a power of two.
189
190### .kernarg_segment_size
191
192Syntax: .kernarg_segment_size SIZE
193
194This pseudo-op must be inside kernel configuration (`.config`). Set
195`kernarg_segment_byte_size` field in kernel configuration.
196
197### .kernel_code_entry_offset
198
199Syntax: .kernel_code_entry_offset OFFSET
200
201This pseudo-op must be inside kernel configuration (`.config`). Set
202`kernel_code_entry_byte_offset` field in kernel configuration. This field
203store offset between configuration and kernel code. By default is 256.
204
205### .kernel_code_prefetch_offset
206
207Syntax: .kernel_code_prefetch_offset OFFSET
208
209This pseudo-op must be inside kernel configuration (`.config`). Set
210`kernel_code_prefetch_byte_offset` field in kernel configuration.
211
212### .kernel_code_prefetch_size
213
214Syntax: .kernel_code_prefetch_size OFFSET
215
216This pseudo-op must be inside kernel configuration (`.config`). Set
217`kernel_code_prefetch_byte_size` field in kernel configuration.
218
219### .localsize
220
221Syntax: .localsize SIZE
222
223This pseudo-op must be inside kernel configuration (`.config`). Defines initial
224local memory size used by kernel.
225
226### .machine
227
228Syntax: .machine KIND, MAJOR, MINOR, STEPPING
229
230This pseudo-op must be inside kernel configuration (`.config`). Set
231machine version fields in kernel configuration.
232
233### .max_scratch_backing_memory
234
235Syntax: .max_scratch_backing_memory SIZE
236
237This pseudo-op must be inside kernel configuration (`.config`). Set
238`max_scratch_backing_memory_byte_size` field in kernel configuration.
239
240### .newbinfmt
241
242This pseudo-ops set new binary format.
243
244### .pgmrsrc1
245
246Syntax: .pgmrsrc1 VALUE
247
248This pseudo-op must be inside kernel configuration (`.config`).
249Defines value of the PGMRSRC1.
250
251### .pgmrsrc2
252
253Syntax: .pgmrsrc2 VALUE
254
255This pseudo-op must be inside kernel configuration (`.config`).
256Defines value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
257will be ignored. SCRATCH_EN bit will be ignored.
258
259### .priority
260
261Syntax: .priority PRIORITY
262
263This pseudo-op must be inside kernel configuration (`.config`). Defines priority (0-3).
264
265### .private_elem_size
266
267Syntax: .private_elem_size ELEMSIZE
268
269This pseudo-op must be inside kernel configuration (`.config`). Set `private_element_size`
270field in kernel configuration. Must be a power of two between 2 and 16.
271
272### .private_segment_align
273
274Syntax: .private_segment ALIGN
275
276This pseudo-op must be inside kernel configuration (`.config`). Set
277`private_segment_alignment` field in kernel configuration. Value must be a power of two.
278
279### .privmode
280
281This pseudo-op must be inside kernel configuration (`.config`).
282Enable usage of the PRIV (privileged mode).
283
284### .reserved_sgprs
285
286Syntax: .reserved_sgprs FIRSTREG, LASTREG
287
288This pseudo-op must be inside kernel configuration (`.config`). Set
289`reserved_sgpr_first` and `reserved_sgpr_count` fields in kernel configuration.
290`reserved_sgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
291
292### .reserved_vgprs
293
294Syntax: .reserved_vgprs FIRSTREG, LASTREG
295
296This pseudo-op must be inside kernel configuration (`.config`). Set
297`reserved_vgpr_first` and `reserved_vgpr_count` fields in kernel configuration.
298`reserved_vgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
299
300### .runtime_loader_kernel_symbol
301
302Syntax: .runtime_loader_kernel_symbol ADDRESS
303
304This pseudo-op must be inside kernel configuration (`.config`). Set
305`runtime_loader_kernel_symbol` field in kernel configuration.
306
307### .scratchbuffer
308
309Syntax: .scratchbuffer SIZE
310
311This pseudo-op must be inside kernel configuration (`.config`). Defines scratchbuffer size.
312
313### .sgprsnum
314
315Syntax: .sgprsnum REGNUM
316
317This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
318registers which can be used during kernel execution.
319It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
320
321### .tgsize
322
323This pseudo-op must be inside kernel configuration (`.config`).
324Enable usage of the TG_SIZE_EN.
325
326### .use_debug_enabled
327
328This pseudo-op must be inside kernel configuration (`.config`). Enable `is_debug_enabled`
329field in kernel configuration.
330
331### .use_dispatch_id
332
333This pseudo-op must be inside kernel configuration (`.config`). Enable
334`enable_sgpr_dispatch_id` field in kernel configuration.
335
336### .use_dispatch_ptr
337
338This pseudo-op must be inside kernel configuration (`.config`). Enable
339`enable_sgpr_dispatch_ptr` field in kernel configuration.
340
341### .use_dynamic_call_stack
342
343This pseudo-op must be inside kernel configuration (`.config`). Enable
344`is_dynamic_call_stack` field in kernel configuration.
345
346### .use_flat_scratch_init
347
348This pseudo-op must be inside kernel configuration (`.config`). Enable
349`enable_sgpr_flat_scratch_init` field in kernel configuration.
350
351### .use_grid_workgroup_count
352
353Syntax: .use_grid_workgroup_count DIMENSIONS
354
355This pseudo-op must be inside kernel configuration (`.config`). Enable
356`enable_sgpr_grid_workgroup_count_X`, `enable_sgpr_grid_workgroup_count_Y`
357and `enable_sgpr_grid_workgroup_count_Z` fields in kernel configuration,
358respectively by given dimensions.
359
360### .use_kernarg_segment_ptr
361
362This pseudo-op must be inside kernel configuration (`.config`). Enable
363`enable_sgpr_kernarg_segment_ptr` field in kernel configuration.
364
365### .use_ordered_append_gds
366
367This pseudo-op must be inside kernel configuration (`.config`). Enable
368`enable_ordered_append_gds` field in kernel configuration.
369
370### .use_private_segment_buffer
371
372This pseudo-op must be inside kernel configuration (`.config`). Enable
373`enable_sgpr_private_segment_buffer` field in kernel configuration.
374
375### .use_private_segment_size
376
377This pseudo-op must be inside kernel configuration (`.config`). Enable
378`enable_sgpr_private_segment_size` field in kernel configuration.
379
380### .use_ptr64
381
382This pseudo-op must be inside kernel configuration (`.config`). Enable `is_ptr64` field
383in kernel configuration.
384
385### .use_queue_ptr
386
387This pseudo-op must be inside kernel configuration (`.config`). Enable
388`enable_sgpr_queue_ptr` field in kernel configuration.
389
390### .use_xnack_enabled
391
392This pseudo-op must be inside kernel configuration (`.config`). Enable
393`is_xnack_enabled` field in kernel configuration.
394
395### .userdatanum
396
397Syntax: .userdatanum NUMBER
398
399This pseudo-op must be inside kernel configuration (`.config`). Set number of
400registers for USERDATA.
401
402### .vgprsnum
403
404Syntax: .vgprsnum REGNUM
405
406This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
407registers which can be used during kernel execution.
408
409### .wavefront_sgpr_count
410
411Syntax: .wavefront_sgpr_count REGNUM
412
413This pseudo-op must be inside kernel configuration (`.config`). Set
414`wavefront_sgpr_count` field in kernel configuration.
415
416### .wavefront_size
417
418Syntax: .wavefront_size POWEROFTWO
419
420This pseudo-op must be inside kernel configuration (`.config`). Set `wavefront_size`
421field in kernel configuration. Value must be a power of two.
422
423### .workgroup_fbarrier_count
424
425Syntax: .workgroup_fbarrier_count COUNT
426
427This pseudo-op must be inside kernel configuration (`.config`). Set
428`workgroup_fbarrier_count` field in kernel configuration.
429
430### .workgroup_group_segment_size
431
432Syntax: .workgroup_group_segment_size SIZE
433
434This pseudo-op must be inside kernel configuration (`.config`). Set
435`workgroup_group_segment_byte_size` in kernel configuration.
436
437### .workitem_private_segment_size
438
439Syntax: .workitem_private_segment_size SIZE
440
441This pseudo-op must be inside kernel configuration (`.config`). Set
442`workitem_private_segment_byte_size` field in kernel configuration.
443
444### .workitem_vgpr_count
445
446Syntax: .workitem_vgpr_count REGNUM
447
448This pseudo-op must be inside kernel configuration (`.config`). Set
449`workitem_vgpr_count` field in kernel configuration.
450
451## Sample code
452
453This is sample example of the kernel setup:
454
455```
456.rocm
457.gpu Carrizo
458.arch_minor 0
459.arch_stepping 1
460.kernel test1
461.kernel test2
462.text
463test1:
464        .byte 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
465        .byte 0x01, 0x00, 0x08, 0x00, 0x00, 0x00, 0x01, 0x00
466        .byte 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
467        .fill 24, 1, 0x00
468        .byte 0x41, 0x00, 0x2c, 0x00, 0x90, 0x00, 0x00, 0x00
469        .byte 0x0b, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00
470        .fill 8, 1, 0x00
471        .byte 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
472        .byte 0x00, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x07, 0x00
473        .fill 8, 1, 0x00
474        .byte 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x06
475        .fill 152, 1, 0x00
476/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
477/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
478....
479```
480
481with kernel configuration:
482
483```
484.rocm
485.gpu Carrizo
486.arch_minor 0
487.arch_stepping 1
488.kernel test1
489    .config
490        .dims x
491        .sgprsnum 16
492        .vgprsnum 8
493        .dx10clamp
494        .floatmode 0xc0
495        .priority 0
496        .userdatanum 8
497        .pgmrsrc1 0x002c0041
498        .pgmrsrc2 0x00000090
499        .codeversion 1, 0
500        .machine 1, 8, 0, 1
501        .kernel_code_entry_offset 0x100
502        .use_private_segment_buffer
503        .use_dispatch_ptr
504        .use_kernarg_segment_ptr
505        .private_elem_size 4
506        .use_ptr64
507        .kernarg_segment_size 8
508        .wavefront_sgpr_count 15
509        .workitem_vgpr_count 7
510        .kernarg_segment_align 16
511        .group_segment_align 16
512        .private_segment_align 16
513        .wavefront_size 64
514        .call_convention 0x0
515    .control_directive          # optional
516        .fill 128, 1, 0x00
517.text
518test1:
519.skip 256           # skip ROCm kernel configuration (required)
520/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
521/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
522/*bf8c007f         */ s_waitcnt       lgkmcnt(0)
523/*8602ff02 0000ffff*/ s_and_b32       s2, s2, 0xffff
524/*92020802         */ s_mul_i32       s2, s2, s8
525/*32000002         */ v_add_u32       v0, vcc, s2, v0
526/*2202009f         */ v_ashrrev_i32   v1, 31, v0
527/*d28f0001 00020082*/ v_lshlrev_b64   v[1:2], 2, v[0:1]
528/*32060200         */ v_add_u32       v3, vcc, s0, v1
529...
530```
Note: See TracBrowser for help on using the repository browser.