source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmGallium.md @ 3714

Last change on this file since 3714 was 3714, checked in by matszpk, 2 years ago

CLRadeonExtender: CLRXDocs: Typo in AmdCL2. Gallium: Add info about scratch symbol relocations.

File size: 26.1 KB
Line 
1## CLRadeonExtender Assembler Gallium handling
2
3The GalliumCompute is an open-source the OpenCL implementation for the Mesa3D
4drivers. It divided into three components: CLover, libclc, LLVM AMDGPU. Since LLVM v3.6
5and Mesa3D v10.5, GalliumCompute binary format with native code. CLRadeonExtender
6supports only these binaries.
7
8## Binary format
9
10The binary format contains: kernel informations and the main binary in the ELF format.
11Main `.text` section contains all code for all kernels. Optionally,
12section `.rodata` contains constant global data for all kernels.
13Main binary have the kernel configuration (ProgInfo) in the `.AMDGPU.config` section.
14ProgInfo holds three addresses and values that describes runtime environment for kernel:
15floating point setup, register usage, local data usage and rest.
16
17The assembler source code divided to three parts:
18
19* kernel configuration
20* kernel constant data (in `.rodata` section)
21* kernel code (in `.text` section)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27## Relocations
28
29A CLRX assembler handles relocations to scratch symbol (`.scratchsym` pseudo-op).
30These relocations can be applied to places that accepts
3132-bit literal immediates. Only two types of relocations is allowed:
32
33* `place`, `place&0xffffffff`, `place%0x10000000`, `place%%0x10000000` -
34low 32 bits of value
35* `place>>32`, `place/0x100000000`, `place//0x100000000` - high 32 bits of value
36
37The `place` indicates an expression with scratch symbol. Additional offsets
38are not accepted (only same scratch symbol).
39
40Examples:
41
42```
43s_mov_b32       s13, scratchsym>>32
44s_mov_b32       s12, scratchsym&0xffffffff
45```
46
47## Register usage setup
48
49The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
50This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
51
52## Scalar register allocation
53
54Assembler for GalliumCompute format counts all SGPR registers and add extra registers
55(VCC, FLAT_SCRATCH, XNACK_MASK) if any used to register pool.
56 The VCC register is included by default.
57In AMDHSA configuration (LLVM >= 4.0.0) then special fields determines
58what extra SGPR extra has been added.
59
60## List of the specific pseudo-operations
61
62### .arch_minor
63
64Syntax: .arch_minor ARCH_MINOR
65
66Set architecture minor number. Used only if LLVM version is 4.0.0 or later.
67
68### .arch_stepping
69
70Syntax: .arch_minor ARCH_STEPPING
71
72Set architecture stepping number. Used only if LLVM version is 4.0.0 or later.
73
74### .arg
75
76Syntax: .arg ARGTYPE, SIZE[, TARGETSIZE[, ALIGNMENT[, NUMEXT[, SEMANTIC]]]]
77
78Adds kernel argument definition. Must be inside argument configuration.
79First argument is type:
80
81* scalar - scalar value (including vector values likes uint4)
82* contant - constant pointer (32-bit ???)
83* global - global pointer (64-bit)
84* local - local pointer
85* image2d_rdonly - ??
86* image2d_wronly - ??
87* image3d_rdonly - ??
88* image3d_wronly - ??
89* sampler - ??
90* griddim - shortcut for griddim argument definition
91* gridoffset - shortcut for gridoffset argument definition
92
93Second argument is size of argument. Third argument is targetSize which
94should be a multiplier of 4. Fourth argument is target alignment. By default target
95alignment is power of 2 not less than size.
96Fifth argument determines how extend numeric value to larger target size:
97`sext` - signed, `zext` - zero extend. If argument is smaller than 4 byte,
98then `sext` can be to define signed integer, `zext` to unsigned integer.
99Sixth argument is semantic:
100
101* general - general argument
102* griddim - griddim argument
103* gridoffset - gridoffset argument
104* imgsize - image size
105* imgformat - image format
106
107Example argument definition:
108
109```
110.arg scalar, 4, 4, 4, zext, general
111.arg global, 8, 8, 8, zext, general
112.arg scalar, 2, 4, 4, sext, general # short
113.arg scalar, 16, 16, 16, zext, general # uint4 or double2
114.arg scalar, 4, 4, 4, zext, griddim # shortcut: .arg griddim
115.arg scalar, 4, 4, 4, zext, gridoffset # shortcut .arg gridoffset
116```
117
118Last two arguments (griddim, gridoffset) shall to be defined in any kernel definition.
119
120### .args
121
122Open kernel argument configuration. Must be inside kernel.
123
124### .call_convention
125
126Syntax: .call_convention CALL_CONV
127
128This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
129LLVM version is 4.0.0 or later. Set call convention for kernel.
130
131### .codeversion
132
133Syntax .codeversion MAJOR, MINOR
134
135This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
136LLVM version is 4.0.0 or later. Set AMD code version.
137
138### .config
139
140Open kernel configuration. Must be inside kernel. Kernel configuration can not be
141defined if proginfo configuration was defined (by using `.proginfo`).
142Following pseudo-ops can be inside kernel config:
143
144* .debugmode - enables using of DEBUG_MODE
145* .dims DIMS - choose dimensions used by kernel function. Can be: x,y,z.
146* .dx10clamp - enables using of DX10_CLAMP
147* .floatmode VALUE - choose float mode for kernel (byte value).
148Default value is 0xc0
149* .ieeemode - choose IEEE mode for kernel
150* .localsize SIZE - initial local data size for kernel in bytes
151* .pgmrsrc2 VALUE - value of the PGMRSRC2 (only bits that is not set by other pseudo-ops)
152* .priority VALUE - set priority for kernel (0-3). Default value is 0.
153* .privmode - enables using of PRIV (privileged mode)
154* .scratchbuffer SIZE - size of scratchbuffer (???). Default value is 0.
155* .sgprsnum NUMBER - number of SGPR registers used by kernel (excluding VCC,FLAT_SCRATCH).
156By default, automatically computed by assembler.
157* .vgprsnum NUMBER - number of VGPR registers used by kernel.
158By default, automatically computed by assembler.
159* .userdatanum NUMBER - number of USERDATA used by kernel (0-16). Default value is 4.
160* .tgsize - enables using of TG_SIZE_EN (we recommend to add this always)
161* .spillesgprs - number of scalar registers to spill
162* .spillevgprs - number of vector registers to spill
163* AMDHSA pseudo-ops
164
165Example configuration:
166
167```
168.config
169    .dims xyz
170    .tgsize
171```
172
173### .control_directive
174
175Open control directive section. This section must be 128 bytes. The content of this
176section will be stored in control_directive field in kernel configuration.
177Must be defined inside kernel. Can ben used only if LLVM version is 4.0.0 or later
178
179### .debug_private_segment_buffer_sgpr
180
181Syntax: .debug_private_segment_buffer_sgpr SGPRREG
182
183This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
184LLVM version is 4.0.0 or later. Set `debug_private_segment_buffer_sgpr` field in
185kernel configuration.
186
187### .debug_wavefront_private_segment_offset_sgpr
188
189Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
190
191This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
192LLVM version is 4.0.0 or later. Set `debug_wavefront_private_segment_offset_sgpr` field in
193kernel configuration.
194
195### .debugmode
196
197This pseudo-op must be inside kernel configuration (`.config`).
198Enable usage of the DEBUG_MODE.
199
200### .default_hsa_features
201
202This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
203LLVM version is 4.0.0 or later. It sets default HSA kernel features and register features
204(extra SGPR registers usage). These default features are `.use_private_segment_buffer`,
205`.use_dispatch_ptr`, `.use_kernarg_segment_ptr`, `.use_ptr64` and
206private_elem_size to 4 bytes.
207
208### .dims
209
210Syntax: .dims DIMENSIONS
211
212This pseudo-op must be inside kernel configuration (`.config`). Defines what dimensions
213(from list: x, y, z) will be used to determine space of the kernel execution.
214
215### .driver_version
216
217Syntax: .driver_version VERSION
218
219Set driver (Mesa3D) version for this binary. Version in form: MajorVersion*100+MinorVersion.
220This pseudo-op replaces driver info.
221
222### .dx10clamp
223
224This pseudo-op must be inside kernel configuration (`.config`).
225Enable usage of the DX10_CLAMP.
226
227### .entry
228
229Syntax: .entry ADDRESS, VALUE
230
231Add entry of proginfo. Must be inside proginfo configuration. Sample proginfo:
232
233```
234.entry 0x0000b848, 0x000c0080
235.entry 0x0000b84c, 0x00001788
236.entry 0x0000b860, 0x00000000
237```
238
239### .exceptions
240
241Syntax: .exceptions EXCPMASK
242
243This pseudo-op must be inside kernel configuration (`.config`).
244Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
245
246### .floatmode
247
248Syntax: .floatmode BYTE-VALUE
249
250This pseudo-op must be inside kernel configuration (`.config`). Defines float-mode.
251Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
252
253### .gds_segment_size
254
255Syntax: .gds_segment_size SIZE
256
257This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
258LLVM version is 4.0.0 or later. Set `gds_segment_size` field in kernel configuration.
259
260### .get_driver_version
261
262Syntax: .get_driver_version SYMBOL
263
264Store current driver version to SYMBOL. Version in form:
265`major_version*10000 + minor_version*100 + micro_version`.
266
267### .get_llvm_version
268
269Syntax: .get_llvm_version SYMBOL
270
271Store current LLVM compiler version to SYMBOL. Version in form:
272`major_version*10000 + minor_version*100 + micro_version`.
273
274### .globaldata
275
276Go to constant global data section (`.rodata`).
277
278### .group_segment_align
279
280Syntax: .group_segment_align ALIGN
281
282This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
283LLVM version is 4.0.0 or later. Set `group_segment_align` field in kernel configuration.
284
285### .hsa_debugmode
286
287This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
288LLVM version is 4.0.0 or later. Enable usage of the DEBUG_MODE in kernel HSA configuration.
289
290### .hsa_dims
291
292Syntax: .hsa_dims DIMENSIONS
293
294This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
295LLVM version is 4.0.0 or later. Defines what dimensions (from list: x, y, z) will be used
296to determine space of the kernel execution in kernel HSA configuration.
297
298### .hsa_dx10clamp
299
300This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
301LLVM version is 4.0.0 or later. Enable usage of the DX10_CLAMP in kernel HSA configuration.
302
303### .hsa_exceptions
304
305Syntax: .hsa_exceptions EXCPMASK
306
307This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
308LLVM version is 4.0.0 or later. Set exception mask in PGMRSRC2 register value in
309kernel HSA configuration. Value should be 7-bit.
310
311### .hsa_floatmode
312
313Syntax: .hsa_floatmode BYTE-VALUE
314
315This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
316LLVM version is 4.0.0 or later. Defines float-mode in kernel HSA configuration.
317Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
318
319### .hsa_ieeemode
320
321Syntax: .hsa_ieeemode
322
323This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
324LLVM version is 4.0.0 or later. Set ieee-mode in kernel HSA configuration.
325
326### .hsa_localsize
327
328Syntax: .hsa_localsize SIZE
329
330This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
331LLVM version is 4.0.0 or later. Defines initial local memory size used by kernel in
332kernel HSA configuration.
333
334### .hsa_pgmrsrc1
335
336Syntax: .hsa_pgmrsrc1 VALUE
337
338This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
339LLVM version is 4.0.0 or later. Defines value of the PGMRSRC1 in kernel HSA configuration.
340
341### .hsa_pgmrsrc2
342
343Syntax: .hsa_pgmrsrc2 VALUE
344
345This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
346LLVM version is 4.0.0 or later. Defines value of the PGMRSRC2 in kernel HSA configration.
347If dimensions is set then bits that controls dimension setup will be ignored.
348SCRATCH_EN bit will be ignored.
349
350### .hsa_priority
351
352Syntax: .hsa_priority PRIORITY
353
354This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
355LLVM version is 4.0.0 or later. Defines priority (0-3) in kernel HSA configuration.
356
357### .hsa_privmode
358
359This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
360LLVM version is 4.0.0 or later. Enable usage of the PRIV (privileged mode) in
361kernel HSA configuration.
362
363### .hsa_scratchbuffer
364
365Syntax: .hsa_scratchbuffer SIZE
366
367This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
368LLVM version is 4.0.0 or later. Defines scratchbuffer size in kernel HSA configuration.
369
370### .hsa_sgprsnum
371
372Syntax: .hsa_sgprsnum REGNUM
373
374This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
375LLVM version is 4.0.0 or later. Set number of scalar registers which can be used during
376kernel execution in kernel HSA configuration.
377
378### .hsa_tgsize
379
380This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
381LLVM version is 4.0.0 or later. Enable usage of the TG_SIZE_EN in kernel HSA configuration.
382
383### .hsa_userdatanum
384
385Syntax: .hsa_userdatanum NUMBER
386
387This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
388LLVM version is 4.0.0 or later. Set number of registers for USERDATA in
389kernel HSA configuration.
390
391### .hsa_vgprsnum
392
393Syntax: .hsa_vgprsnum REGNUM
394
395This pseudo-op must be inside kernel configuration (`.config`) can ben used only if
396LLVM version is 4.0.0 or later. Set number of vector registers which can be used during
397kernel execution in kernel HSA configuration.
398
399### .ieeemode
400
401Syntax: .ieeemode
402
403This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
404
405### .kcode
406
407Syntax: .kcode KERNEL1,.... 
408Syntax: .kcode +
409
410Open code that will be belonging to specified kernels. By default any code between
411two consecutive kernel labels belongs to the kernel with first label name.
412This pseudo-operation can change membership of the code to specified kernels.
413You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
414to kernels. The most important reason why this feature has been added is register usage
415calculation. Any kernel given in this pseudo-operation must be already defined.
416
417Sample usage:
418
419```
420.kcode + # this code belongs to all kernels
421.kcodeend
422.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
423    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
424    .kcodeend
425.kcodeend
426```
427
428### .kcodeend
429
430Close `.kcode` clause. Refer to `.kcode`.
431
432### .kernarg_segment_align
433
434Syntax: .kernarg_segment_align ALIGN
435
436This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
437LLVM version is 4.0.0 or later. Set `kernarg_segment_alignment` field in
438kernel configuration. Value must be a power of two.
439
440### .kernarg_segment_size
441
442Syntax: .kernarg_segment_size SIZE
443
444This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
445LLVM version is 4.0.0 or later. Set `kernarg_segment_byte_size` field in
446kernel configuration.
447
448### .kernel_code_entry_offset
449
450Syntax: .kernel_code_entry_offset OFFSET
451
452This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
453LLVM version is 4.0.0 or later. Set `kernel_code_entry_byte_offset` field in
454kernel configuration. This field store offset between configuration and kernel code.
455By default is 256.
456
457### .kernel_code_prefetch_offset
458
459Syntax: .kernel_code_prefetch_offset OFFSET
460
461This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
462LLVM version is 4.0.0 or later. Set `kernel_code_prefetch_byte_offset` field in kernel
463configuration.
464
465### .kernel_code_prefetch_size
466
467Syntax: .kernel_code_prefetch_size OFFSET
468
469This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
470LLVM version is 4.0.0 or later. Set `kernel_code_prefetch_byte_size` field in kernel configuration.
471
472### .llvm_version
473
474Syntax: .llvm_version VERSION
475
476Set LLVM compiler version for this binary. Version in form: MajorVersion*100+MinorVersion.
477This pseudo-op replaces driver info.
478
479### .localsize
480
481Syntax: .localsize SIZE
482
483This pseudo-op must be inside kernel configuration (`.config`). Defines initial
484local memory size used by kernel.
485
486### .machine
487
488Syntax: .machine KIND, MAJOR, MINOR, STEPPING
489
490This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
491LLVM version is 4.0.0 or later. Set machine version fields in kernel configuration.
492
493### .max_scratch_backing_memory
494
495Syntax: .max_scratch_backing_memory SIZE
496
497This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
498LLVM version is 4.0.0 or later. Set `max_scratch_backing_memory_byte_size` field
499in kernel configuration.
500
501### .pgmrsrc1
502
503Syntax: .pgmrsrc1 VALUE
504
505This pseudo-op must be inside kernel configuration (`.config`).
506Defines value of the PGMRSRC1.
507
508### .pgmrsrc2
509
510Syntax: .pgmrsrc2 VALUE
511
512This pseudo-op must be inside kernel configuration (`.config`).
513Defines value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
514will be ignored. SCRATCH_EN bit will be ignored.
515
516### .priority
517
518Syntax: .priority PRIORITY
519
520This pseudo-op must be inside kernel configuration (`.config`). Defines priority (0-3).
521
522### .private_elem_size
523
524Syntax: .private_elem_size ELEMSIZE
525
526This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
527LLVM version is 4.0.0 or later. Set `private_element_size` field in kernel configuration.
528Must be a power of two between 2 and 16.
529
530### .private_segment_align
531
532Syntax: .private_segment ALIGN
533
534This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
535LLVM version is 4.0.0 or later. Set `private_segment_alignment` field in kernel
536configuration. Value must be a power of two.
537
538### .privmode
539
540This pseudo-op must be inside kernel configuration (`.config`).
541Enable usage of the PRIV (privileged mode).
542
543### .proginfo
544
545Open progInfo definition. Must be inside kernel.
546ProgInfo shall to be containing 3 entries. ProgInfo can not be defined if kernel config
547was defined (by using `.config`).
548
549### .reserved_sgprs
550
551Syntax: .reserved_sgprs FIRSTREG, LASTREG
552
553This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
554LLVM version is 4.0.0 or later. Set `reserved_sgpr_first` and `reserved_sgpr_count`
555fields in kernel configuration. `reserved_sgpr_count` filled by number of registers
556(LASTREG-FIRSTREG+1).
557
558### .reserved_vgprs
559
560Syntax: .reserved_vgprs FIRSTREG, LASTREG
561
562This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
563LLVM version is 4.0.0 or later. Set `reserved_vgpr_first` and `reserved_vgpr_count`
564fields in kernel configuration. `reserved_vgpr_count` filled by number of registers
565(LASTREG-FIRSTREG+1).
566
567### .runtime_loader_kernel_symbol
568
569Syntax: .runtime_loader_kernel_symbol ADDRESS
570
571This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
572LLVM version is 4.0.0 or later. Set `runtime_loader_kernel_symbol` field in kernel
573configuration.
574
575### .scratchbuffer
576
577Syntax: .scratchbuffer SIZE
578
579This pseudo-op must be inside kernel configuration (`.config`). Defines scratchbuffer size.
580
581### .scratchsym
582
583Syntax: .scratchsym SYMBOL
584
585Set symbol as scratch symbol. This symbol points to scratch buffer offset an will be used
586while generating scratch buffer relocations.
587
588### .sgprsnum
589
590Syntax: .sgprsnum REGNUM
591
592This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
593registers which can be used during kernel execution.
594It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
595
596### .spilledsgprs
597
598Syntax: .sgpilledsgprs REGNUM
599
600This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
601registers to spill in scratch buffer. It have meaning for LLVM 3.9 or later.
602
603### .spilledvgprs
604
605Syntax: .sgpilledvgprs REGNUM
606
607This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
608registers to spill in scratch buffer. It have meaning for LLVM 3.9 or later.
609
610### .tgsize
611
612This pseudo-op must be inside kernel configuration (`.config`).
613Enable usage of the TG_SIZE_EN. Should be set.
614
615### .use_debug_enabled
616
617This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
618LLVM version is 4.0.0 or later. Enable `is_debug_enabled` field in kernel configuration.
619
620### .use_dispatch_id
621
622This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
623LLVM version is 4.0.0 or later. Enable `enable_sgpr_dispatch_id` field in kernel
624configuration.
625
626### .use_dispatch_ptr
627
628This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
629LLVM version is 4.0.0 or later. Enable `enable_sgpr_dispatch_ptr` field in kernel
630configuration.
631
632### .use_dynamic_call_stack
633
634This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
635LLVM version is 4.0.0 or later. Enable `is_dynamic_call_stack` field in
636kernel configuration.
637
638### .use_flat_scratch_init
639
640This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
641LLVM version is 4.0.0 or later. Enable `enable_sgpr_flat_scratch_init` field in
642kernel configuration.
643
644### .use_grid_workgroup_count
645
646Syntax: .use_grid_workgroup_count DIMENSIONS
647
648This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
649LLVM version is 4.0.0 or later. Enable `enable_sgpr_grid_workgroup_count_X`,
650`enable_sgpr_grid_workgroup_count_Y` and `enable_sgpr_grid_workgroup_count_Z` fields
651in kernel configuration, respectively by given dimensions.
652
653### .use_kernarg_segment_ptr
654
655This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
656LLVM version is 4.0.0 or later. Enable `enable_sgpr_kernarg_segment_ptr` field in
657kernel configuration.
658
659### .use_ordered_append_gds
660
661This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
662LLVM version is 4.0.0 or later. Enable `enable_ordered_append_gds` field in
663kernel configuration.
664
665### .use_private_segment_buffer
666
667This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
668LLVM version is 4.0.0 or later. Enable `enable_sgpr_private_segment_buffer` field in
669kernel configuration.
670
671### .use_private_segment_size
672
673This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
674LLVM version is 4.0.0 or later. Enable `enable_sgpr_private_segment_size` field in
675kernel configuration.
676
677### .use_ptr64
678
679This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
680LLVM version is 4.0.0 or later. Enable `is_ptr64` field in kernel configuration.
681
682### .use_queue_ptr
683
684This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
685LLVM version is 4.0.0 or later. Enable `enable_sgpr_queue_ptr` field in
686kernel configuration.
687
688### .use_xnack_enabled
689
690This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
691LLVM version is 4.0.0 or later. Enable `is_xnack_enabled` field in kernel configuration.
692
693### .userdatanum
694
695Syntax: .userdatanum NUMBER
696
697This pseudo-op must be inside kernel configuration (`.config`). Set number of
698registers for USERDATA.
699
700### .vgprsnum
701
702Syntax: .vgprsnum REGNUM
703
704This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
705registers which can be used during kernel execution.
706
707### .wavefront_sgpr_count
708
709Syntax: .wavefront_sgpr_count REGNUM
710
711This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
712LLVM version is 4.0.0 or later. Set `wavefront_sgpr_count` field in kernel configuration.
713
714### .wavefront_size
715
716Syntax: .wavefront_size POWEROFTWO
717
718This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
719LLVM version is 4.0.0 or later. Set `wavefront_size` field in kernel configuration.
720Value must be a power of two.
721
722### .workgroup_fbarrier_count
723
724Syntax: .workgroup_fbarrier_count COUNT
725
726This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
727LLVM version is 4.0.0 or later. Set `workgroup_fbarrier_count` field in
728kernel configuration.
729
730### .workgroup_group_segment_size
731
732Syntax: .workgroup_group_segment_size SIZE
733
734This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
735LLVM version is 4.0.0 or later. Set `workgroup_group_segment_byte_size` in
736kernel configuration.
737
738### .workitem_private_segment_size
739
740Syntax: .workitem_private_segment_size SIZE
741
742This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
743LLVM version is 4.0.0 or later. Set `workitem_private_segment_byte_size` field in
744kernel configuration.
745
746### .workitem_vgpr_count
747
748Syntax: .workitem_vgpr_count REGNUM
749
750This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
751LLVM version is 4.0.0 or later. Set `workitem_vgpr_count` field in kernel configuration.
752
753
754## Sample code
755
756This is sample example of the kernel setup:
757
758```
759.kernel DCT
760    .args
761        .arg global, 8, 8, 8, zext, general
762        .arg global, 8, 8, 8, zext, general
763        .arg global, 8, 8, 8, zext, general
764        .arg local, 4, 4, 4, zext, general
765        .arg scalar, 4, 4, 4, zext, general
766        .arg scalar, 4, 4, 4, zext, general
767        .arg scalar, 4, 4, 4, zext, general
768        .arg scalar, 4, 4, 4, zext, griddim
769        .arg scalar, 4, 4, 4, zext, gridoffset
770    .proginfo
771        .entry 0x0000b848, 0x000c0183
772        .entry 0x0000b84c, 0x00001788
773        .entry 0x0000b860, 0x00000000
774```
775
776with kernel configuration:
777
778```
779    .args
780        .arg global, 8, 8, 8, zext, general
781        .arg global, 8, 8, 8, zext, general
782        .arg global, 8, 8, 8, zext, general
783        .arg local, 4, 4, 4, zext, general
784        .arg scalar, 4, 4, 4, zext, general
785        .arg scalar, 4, 4, 4, zext, general
786        .arg scalar, 4, 4, 4, zext, general
787        .arg scalar, 4, 4, 4, zext, griddim
788        .arg scalar, 4, 4, 4, zext, gridoffset
789    .config
790        .dims xyz
791        .tgsize
792```
793
794All code:
795
796```
797.gallium
798.gpu CapeVerde
799.kernel DCT
800    .args
801        .arg global, 8, 8, 8, zext, general
802        .arg global, 8, 8, 8, zext, general
803        .arg global, 8, 8, 8, zext, general
804        .arg local, 4, 4, 4, zext, general
805        .arg scalar, 4, 4, 4, zext, general
806        .arg scalar, 4, 4, 4, zext, general
807        .arg scalar, 4, 4, 4, zext, general
808        .arg scalar, 4, 4, 4, zext, griddim
809        .arg scalar, 4, 4, 4, zext, gridoffset
810    .proginfo
811        .entry 0x0000b848, 0x000c0183
812        .entry 0x0000b84c, 0x00001788
813        .entry 0x0000b860, 0x00000000
814.text
815DCT:
816/*c0030106         */ s_load_dword    s6, s[0:1], 0x6
817/*c0038107         */ s_load_dword    s7, s[0:1], 0x7
818/* we skip rest of instruction to demonstrate how to write GalliumCompute program */
819/*bf810000         */ s_endpgm
820```
Note: See TracBrowser for help on using the repository browser.