source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmRocm.md @ 3663

Last change on this file since 3663 was 3663, checked in by matszpk, 2 years ago

CLRadeonExtender: CLRXDocs: Add '.eflags' pseudo-ops to CLRX docs.

File size: 14.9 KB
Line 
1## CLRadeonExtender Assembler ROCm handling
2
3The ROCm platform is new an open-source  environment created by AMD for Radeon GPU
4(especially designed for HPC and their proffesional products). This platform uses HSACO
5binary object file format to store compiled code for GPU's.
6
7The ROCm binary format implementation and this documentation based on source:
8[ROCm-ComputeABI-Doc](https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc).
9
10## Binary format
11
12The binary file is stored in ELF file. The symbol table holds kernels and data's symbols.
13Main `.text` section contains all code for all kernels. Data
14(for example global constant datas) also stored in `.text' section.
15Kernel symbols points to configuration for kernel. Special offset field in configuration
16data's points where is kernel code.
17
18The assembler source code divided to three parts:
19
20* kernel configuration
21* kernel code and data (in `.text` section`)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27## Scalar register allocation
28
29Assembler for ROCm format counts all SGPR registers and add extra registers
30(FLAT_SCRATCH, XNACK_MASK). Special fields determines
31what extra SGPR extra has been added. The VCC register is included by default.
32
33## List of the specific pseudo-operations
34
35### .arch_minor
36
37Syntax: .arch_minor ARCH_MINOR
38
39Set architecture minor number.
40
41### .arch_stepping
42
43Syntax: .arch_minor ARCH_STEPPING
44
45Set architecture stepping number.
46
47### .call_convention
48
49Syntax: .call_convention CALL_CONV
50
51This pseudo-op must be inside kernel configuration (`.config`).
52Set call convention for kernel.
53
54### .codeversion
55
56Syntax .codeversion MAJOR, MINOR
57
58This pseudo-op must be inside kernel configuration (`.config`). Set AMD code version.
59
60### .config
61
62Open kernel configuration. Must be inside kernel.
63
64### .control_directive
65
66Open control directive section. This section must be 128 bytes. The content of this
67section will be stored in control_directive field in kernel configuration.
68Must be defined inside kernel.
69
70### .debug_private_segment_buffer_sgpr
71
72Syntax: .debug_private_segment_buffer_sgpr SGPRREG
73
74This pseudo-op must be inside kernel configuration (`.config`). Set
75`debug_private_segment_buffer_sgpr` field in kernel configuration.
76
77### .debug_wavefront_private_segment_offset_sgpr
78
79Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
80
81This pseudo-op must be inside kernel configuration (`.config`). Set
82`debug_wavefront_private_segment_offset_sgpr` field in kernel configuration.
83
84### .debugmode
85
86This pseudo-op must be inside kernel configuration (`.config`).
87Enable usage of the DEBUG_MODE.
88
89### .dims
90
91Syntax: .dims DIMENSIONS
92
93This pseudo-op must be inside kernel configuration (`.config`). Defines what dimensions
94(from list: x, y, z) will be used to determine space of the kernel execution.
95
96### .dx10clamp
97
98This pseudo-op must be inside kernel configuration (`.config`).
99Enable usage of the DX10_CLAMP.
100
101### .eflags
102
103Syntax: .eflags EFLAGS
104
105Set value of ELF header e_flags field.
106
107### .exceptions
108
109Syntax: .exceptions EXCPMASK
110
111This pseudo-op must be inside kernel configuration (`.config`).
112Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
113
114### .fkernel
115
116Mark given kernel as function in ROCm. Must be inside kernel.
117
118### .floatmode
119
120Syntax: .floatmode BYTE-VALUE
121
122This pseudo-op must be inside kernel configuration (`.config`). Defines float-mode.
123Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
124
125### .gds_segment_size
126
127Syntax: .gds_segment_size SIZE
128
129This pseudo-op must be inside kernel configuration (`.config`). Set
130`gds_segment_size` field in kernel configuration.
131
132### .group_segment_align
133
134Syntax: .group_segment_align ALIGN
135
136This pseudo-op must be inside kernel configuration (`.config`). Set
137`group_segment_align` field in kernel configuration.
138
139### .default_hsa_features
140
141This pseudo-op must be inside kernel configuration (`.config`).
142It sets default HSA kernel features and register features (extra SGPR registers usage).
143These default features are `.use_private_segment_buffer`, `.use_dispatch_ptr`,
144`.use_kernarg_segment_ptr`, `.use_ptr64` and private_elem_size to 4 bytes.
145
146### .ieeemode
147
148Syntax: .ieeemode
149
150This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
151
152### .kcode
153
154Syntax: .kcode KERNEL1,.... 
155Syntax: .kcode +
156
157Open code that will be belonging to specified kernels. By default any code between
158two consecutive kernel labels belongs to the kernel with first label name.
159This pseudo-operation can change membership of the code to specified kernels.
160You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
161to kernels. The most important reason why this feature has been added is register usage
162calculation. Any kernel given in this pseudo-operation must be already defined.
163
164Sample usage:
165
166```
167.kcode + # this code belongs to all kernels
168.kcodeend
169.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
170    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
171    .kcodeend
172.kcodeend
173```
174
175### .kcodeend
176
177Close `.kcode` clause. Refer to `.kcode`.
178
179### .kernarg_segment_align
180
181Syntax: .kernarg_segment_align ALIGN
182
183This pseudo-op must be inside kernel configuration (`.config`). Set
184`kernarg_segment_alignment` field in kernel configuration. Value must be a power of two.
185
186### .kernarg_segment_size
187
188Syntax: .kernarg_segment_size SIZE
189
190This pseudo-op must be inside kernel configuration (`.config`). Set
191`kernarg_segment_byte_size` field in kernel configuration.
192
193### .kernel_code_entry_offset
194
195Syntax: .kernel_code_entry_offset OFFSET
196
197This pseudo-op must be inside kernel configuration (`.config`). Set
198`kernel_code_entry_byte_offset` field in kernel configuration. This field
199store offset between configuration and kernel code. By default is 256.
200
201### .kernel_code_prefetch_offset
202
203Syntax: .kernel_code_prefetch_offset OFFSET
204
205This pseudo-op must be inside kernel configuration (`.config`). Set
206`kernel_code_prefetch_byte_offset` field in kernel configuration.
207
208### .kernel_code_prefetch_size
209
210Syntax: .kernel_code_prefetch_size OFFSET
211
212This pseudo-op must be inside kernel configuration (`.config`). Set
213`kernel_code_prefetch_byte_size` field in kernel configuration.
214
215### .localsize
216
217Syntax: .localsize SIZE
218
219This pseudo-op must be inside kernel configuration (`.config`). Defines initial
220local memory size used by kernel.
221
222### .machine
223
224Syntax: .machine KIND, MAJOR, MINOR, STEPPING
225
226This pseudo-op must be inside kernel configuration (`.config`). Set
227machine version fields in kernel configuration.
228
229### .max_scratch_backing_memory
230
231Syntax: .max_scratch_backing_memory SIZE
232
233This pseudo-op must be inside kernel configuration (`.config`). Set
234`max_scratch_backing_memory_byte_size` field in kernel configuration.
235
236### .pgmrsrc1
237
238Syntax: .pgmrsrc1 VALUE
239
240This pseudo-op must be inside kernel configuration (`.config`).
241Defines value of the PGMRSRC1.
242
243### .pgmrsrc2
244
245Syntax: .pgmrsrc2 VALUE
246
247This pseudo-op must be inside kernel configuration (`.config`).
248Defines value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
249will be ignored. SCRATCH_EN bit will be ignored.
250
251### .priority
252
253Syntax: .priority PRIORITY
254
255This pseudo-op must be inside kernel configuration (`.config`). Defines priority (0-3).
256
257### .private_elem_size
258
259Syntax: .private_elem_size ELEMSIZE
260
261This pseudo-op must be inside kernel configuration (`.config`). Set `private_element_size`
262field in kernel configuration. Must be a power of two between 2 and 16.
263
264### .private_segment_align
265
266Syntax: .private_segment ALIGN
267
268This pseudo-op must be inside kernel configuration (`.config`). Set
269`private_segment_alignment` field in kernel configuration. Value must be a power of two.
270
271### .privmode
272
273This pseudo-op must be inside kernel configuration (`.config`).
274Enable usage of the PRIV (privileged mode).
275
276### .reserved_sgprs
277
278Syntax: .reserved_sgprs FIRSTREG, LASTREG
279
280This pseudo-op must be inside kernel configuration (`.config`). Set
281`reserved_sgpr_first` and `reserved_sgpr_count` fields in kernel configuration.
282`reserved_sgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
283
284### .reserved_vgprs
285
286Syntax: .reserved_vgprs FIRSTREG, LASTREG
287
288This pseudo-op must be inside kernel configuration (`.config`). Set
289`reserved_vgpr_first` and `reserved_vgpr_count` fields in kernel configuration.
290`reserved_vgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
291
292### .runtime_loader_kernel_symbol
293
294Syntax: .runtime_loader_kernel_symbol ADDRESS
295
296This pseudo-op must be inside kernel configuration (`.config`). Set
297`runtime_loader_kernel_symbol` field in kernel configuration.
298
299### .scratchbuffer
300
301Syntax: .scratchbuffer SIZE
302
303This pseudo-op must be inside kernel configuration (`.config`). Defines scratchbuffer size.
304
305### .sgprsnum
306
307Syntax: .sgprsnum REGNUM
308
309This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
310registers which can be used during kernel execution.
311It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
312
313### .tgsize
314
315This pseudo-op must be inside kernel configuration (`.config`).
316Enable usage of the TG_SIZE_EN.
317
318### .use_debug_enabled
319
320This pseudo-op must be inside kernel configuration (`.config`). Enable `is_debug_enabled`
321field in kernel configuration.
322
323### .use_dispatch_id
324
325This pseudo-op must be inside kernel configuration (`.config`). Enable
326`enable_sgpr_dispatch_id` field in kernel configuration.
327
328### .use_dispatch_ptr
329
330This pseudo-op must be inside kernel configuration (`.config`). Enable
331`enable_sgpr_dispatch_ptr` field in kernel configuration.
332
333### .use_dynamic_call_stack
334
335This pseudo-op must be inside kernel configuration (`.config`). Enable
336`is_dynamic_call_stack` field in kernel configuration.
337
338### .use_flat_scratch_init
339
340This pseudo-op must be inside kernel configuration (`.config`). Enable
341`enable_sgpr_flat_scratch_init` field in kernel configuration.
342
343### .use_grid_workgroup_count
344
345Syntax: .use_grid_workgroup_count DIMENSIONS
346
347This pseudo-op must be inside kernel configuration (`.config`). Enable
348`enable_sgpr_grid_workgroup_count_X`, `enable_sgpr_grid_workgroup_count_Y`
349and `enable_sgpr_grid_workgroup_count_Z` fields in kernel configuration,
350respectively by given dimensions.
351
352### .use_kernarg_segment_ptr
353
354This pseudo-op must be inside kernel configuration (`.config`). Enable
355`enable_sgpr_kernarg_segment_ptr` field in kernel configuration.
356
357### .use_ordered_append_gds
358
359This pseudo-op must be inside kernel configuration (`.config`). Enable
360`enable_ordered_append_gds` field in kernel configuration.
361
362### .use_private_segment_buffer
363
364This pseudo-op must be inside kernel configuration (`.config`). Enable
365`enable_sgpr_private_segment_buffer` field in kernel configuration.
366
367### .use_private_segment_size
368
369This pseudo-op must be inside kernel configuration (`.config`). Enable
370`enable_sgpr_private_segment_size` field in kernel configuration.
371
372### .use_ptr64
373
374This pseudo-op must be inside kernel configuration (`.config`). Enable `is_ptr64` field
375in kernel configuration.
376
377### .use_queue_ptr
378
379This pseudo-op must be inside kernel configuration (`.config`). Enable
380`enable_sgpr_queue_ptr` field in kernel configuration.
381
382### .use_xnack_enabled
383
384This pseudo-op must be inside kernel configuration (`.config`). Enable
385`is_xnack_enabled` field in kernel configuration.
386
387### .userdatanum
388
389Syntax: .userdatanum NUMBER
390
391This pseudo-op must be inside kernel configuration (`.config`). Set number of
392registers for USERDATA.
393
394### .vgprsnum
395
396Syntax: .vgprsnum REGNUM
397
398This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
399registers which can be used during kernel execution.
400
401### .wavefront_sgpr_count
402
403Syntax: .wavefront_sgpr_count REGNUM
404
405This pseudo-op must be inside kernel configuration (`.config`). Set
406`wavefront_sgpr_count` field in kernel configuration.
407
408### .wavefront_size
409
410Syntax: .wavefront_size POWEROFTWO
411
412This pseudo-op must be inside kernel configuration (`.config`). Set `wavefront_size`
413field in kernel configuration. Value must be a power of two.
414
415### .workgroup_fbarrier_count
416
417Syntax: .workgroup_fbarrier_count COUNT
418
419This pseudo-op must be inside kernel configuration (`.config`). Set
420`workgroup_fbarrier_count` field in kernel configuration.
421
422### .workgroup_group_segment_size
423
424Syntax: .workgroup_group_segment_size SIZE
425
426This pseudo-op must be inside kernel configuration (`.config`). Set
427`workgroup_group_segment_byte_size` in kernel configuration.
428
429### .workitem_private_segment_size
430
431Syntax: .workitem_private_segment_size SIZE
432
433This pseudo-op must be inside kernel configuration (`.config`). Set
434`workitem_private_segment_byte_size` field in kernel configuration.
435
436### .workitem_vgpr_count
437
438Syntax: .workitem_vgpr_count REGNUM
439
440This pseudo-op must be inside kernel configuration (`.config`). Set
441`workitem_vgpr_count` field in kernel configuration.
442
443## Sample code
444
445This is sample example of the kernel setup:
446
447```
448.rocm
449.gpu Carrizo
450.arch_minor 0
451.arch_stepping 1
452.kernel test1
453.kernel test2
454.text
455test1:
456        .byte 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
457        .byte 0x01, 0x00, 0x08, 0x00, 0x00, 0x00, 0x01, 0x00
458        .byte 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
459        .fill 24, 1, 0x00
460        .byte 0x41, 0x00, 0x2c, 0x00, 0x90, 0x00, 0x00, 0x00
461        .byte 0x0b, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00
462        .fill 8, 1, 0x00
463        .byte 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
464        .byte 0x00, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x07, 0x00
465        .fill 8, 1, 0x00
466        .byte 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x06
467        .fill 152, 1, 0x00
468/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
469/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
470....
471```
472
473with kernel configuration:
474
475```
476.rocm
477.gpu Carrizo
478.arch_minor 0
479.arch_stepping 1
480.kernel test1
481    .config
482        .dims x
483        .sgprsnum 16
484        .vgprsnum 8
485        .dx10clamp
486        .floatmode 0xc0
487        .priority 0
488        .userdatanum 8
489        .pgmrsrc1 0x002c0041
490        .pgmrsrc2 0x00000090
491        .codeversion 1, 0
492        .machine 1, 8, 0, 1
493        .kernel_code_entry_offset 0x100
494        .use_private_segment_buffer
495        .use_dispatch_ptr
496        .use_kernarg_segment_ptr
497        .private_elem_size 4
498        .use_ptr64
499        .kernarg_segment_size 8
500        .wavefront_sgpr_count 15
501        .workitem_vgpr_count 7
502        .kernarg_segment_align 16
503        .group_segment_align 16
504        .private_segment_align 16
505        .wavefront_size 64
506        .call_convention 0x0
507    .control_directive          # optional
508        .fill 128, 1, 0x00
509.text
510test1:
511.skip 256           # skip ROCm kernel configuration (required)
512/*c0020082 00000004*/ s_load_dword    s2, s[4:5], 0x4
513/*c0060003 00000000*/ s_load_dwordx2  s[0:1], s[6:7], 0x0
514/*bf8c007f         */ s_waitcnt       lgkmcnt(0)
515/*8602ff02 0000ffff*/ s_and_b32       s2, s2, 0xffff
516/*92020802         */ s_mul_i32       s2, s2, s8
517/*32000002         */ v_add_u32       v0, vcc, s2, v0
518/*2202009f         */ v_ashrrev_i32   v1, 31, v0
519/*d28f0001 00020082*/ v_lshlrev_b64   v[1:2], 2, v[0:1]
520/*32060200         */ v_add_u32       v3, vcc, s0, v1
521...
522```
Note: See TracBrowser for help on using the repository browser.