source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmGallium.md @ 3996

Last change on this file since 3996 was 3996, checked in by matszpk, 15 months ago

CLRadeonExtender: CLRXDocs: add extra info about setting up number of the SGPRs registers.

File size: 26.2 KB
Line 
1## CLRadeonExtender Assembler Gallium handling
2
3The GalliumCompute is an open-source the OpenCL implementation for the Mesa3D
4drivers. It divided into three components: CLover, libclc, LLVM AMDGPU. Since LLVM v3.6
5and Mesa3D v10.5, GalliumCompute binary format with native code. CLRadeonExtender
6supports only these binaries.
7
8## Binary format
9
10The binary format contains: kernel informations and the main binary in the ELF format.
11Main `.text` section contains all code for all kernels. Optionally,
12section `.rodata` contains constant global data for all kernels.
13Main binary have the kernel configuration (ProgInfo) in the `.AMDGPU.config` section.
14ProgInfo holds three addresses and values that describes runtime environment for kernel:
15floating point setup, register usage, local data usage and rest.
16
17The assembler source code divided to three parts:
18
19* kernel configuration
20* kernel constant data (in `.rodata` section)
21* kernel code (in `.text` section)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27## Relocations
28
29A CLRX assembler handles relocations to scratch symbol (`.scratchsym` pseudo-op).
30These relocations can be applied to places that accepts
3132-bit literal immediates. Only two types of relocations is allowed:
32
33* `place`, `place&0xffffffff`, `place%0x10000000`, `place%%0x10000000` -
34low 32 bits of value
35* `place>>32`, `place/0x100000000`, `place//0x100000000` - high 32 bits of value
36
37The `place` indicates an expression with scratch symbol. Additional offsets
38are not accepted (only same scratch symbol).
39
40Examples:
41
42```
43s_mov_b32       s13, scratchsym>>32
44s_mov_b32       s12, scratchsym&0xffffffff
45```
46
47## Register usage setup
48
49The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
50This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
51
52## Scalar register allocation
53
54Assembler for GalliumCompute format counts all SGPR registers and add extra registers
55(VCC, FLAT_SCRATCH, XNACK_MASK) if any used to register pool.
56 The VCC register is included by default.
57In AMDHSA configuration (LLVM >= 4.0.0) then special fields determines
58what extra SGPR registers (FLAT_SCRATCH, VCC and XNACK_MASK) has been added.
59
60The `.sgprsnum` set number of all SGPRs including VCC, FLAT_SCRATCH and XNACK_MASK.
61
62## List of the specific pseudo-operations
63
64### .arch_minor
65
66Syntax: .arch_minor ARCH_MINOR
67
68Set architecture minor number. Used only if LLVM version is 4.0.0 or later.
69
70### .arch_stepping
71
72Syntax: .arch_minor ARCH_STEPPING
73
74Set architecture stepping number. Used only if LLVM version is 4.0.0 or later.
75
76### .arg
77
78Syntax: .arg ARGTYPE, SIZE[, TARGETSIZE[, ALIGNMENT[, NUMEXT[, SEMANTIC]]]]
79
80Adds kernel argument definition. Must be inside argument configuration.
81First argument is type:
82
83* scalar - scalar value (including vector values likes uint4)
84* contant - constant pointer (32-bit ???)
85* global - global pointer (64-bit)
86* local - local pointer
87* image2d_rdonly - ??
88* image2d_wronly - ??
89* image3d_rdonly - ??
90* image3d_wronly - ??
91* sampler - ??
92* griddim - shortcut for griddim argument definition
93* gridoffset - shortcut for gridoffset argument definition
94
95Second argument is size of argument. Third argument is targetSize which
96should be a multiplier of 4. Fourth argument is target alignment. By default target
97alignment is power of 2 not less than size.
98Fifth argument determines how extend numeric value to larger target size:
99`sext` - signed, `zext` - zero extend. If argument is smaller than 4 byte,
100then `sext` can be to define signed integer, `zext` to unsigned integer.
101Sixth argument is semantic:
102
103* general - general argument
104* griddim - griddim argument
105* gridoffset - gridoffset argument
106* imgsize - image size
107* imgformat - image format
108
109Example argument definition:
110
111```
112.arg scalar, 4, 4, 4, zext, general
113.arg global, 8, 8, 8, zext, general
114.arg scalar, 2, 4, 4, sext, general # short
115.arg scalar, 16, 16, 16, zext, general # uint4 or double2
116.arg scalar, 4, 4, 4, zext, griddim # shortcut: .arg griddim
117.arg scalar, 4, 4, 4, zext, gridoffset # shortcut .arg gridoffset
118```
119
120Last two arguments (griddim, gridoffset) shall to be defined in any kernel definition.
121
122### .args
123
124Open kernel argument configuration. Must be inside kernel.
125
126### .call_convention
127
128Syntax: .call_convention CALL_CONV
129
130This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
131LLVM version is 4.0.0 or later. Set call convention for kernel.
132
133### .codeversion
134
135Syntax .codeversion MAJOR, MINOR
136
137This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
138LLVM version is 4.0.0 or later. Set AMD code version.
139
140### .config
141
142Open kernel configuration. Must be inside kernel. Kernel configuration can not be
143defined if proginfo configuration was defined (by using `.proginfo`).
144Following pseudo-ops can be inside kernel config:
145
146* .debugmode - enables using of DEBUG_MODE
147* .dims DIMS - choose dimensions used by kernel function. Can be: x,y,z.
148* .dx10clamp - enables using of DX10_CLAMP
149* .floatmode VALUE - choose float mode for kernel (byte value).
150Default value is 0xc0
151* .ieeemode - choose IEEE mode for kernel
152* .localsize SIZE - initial local data size for kernel in bytes
153* .pgmrsrc2 VALUE - value of the PGMRSRC2 (only bits that is not set by other pseudo-ops)
154* .priority VALUE - set priority for kernel (0-3). Default value is 0.
155* .privmode - enables using of PRIV (privileged mode)
156* .scratchbuffer SIZE - size of scratchbuffer (???). Default value is 0.
157* .sgprsnum NUMBER - number of SGPR registers used by kernel (excluding VCC,FLAT_SCRATCH).
158By default, automatically computed by assembler.
159* .vgprsnum NUMBER - number of VGPR registers used by kernel.
160By default, automatically computed by assembler.
161* .userdatanum NUMBER - number of USERDATA used by kernel (0-16). Default value is 4.
162* .tgsize - enables using of TG_SIZE_EN (we recommend to add this always)
163* .spillesgprs - number of scalar registers to spill
164* .spillevgprs - number of vector registers to spill
165* AMDHSA pseudo-ops
166
167Example configuration:
168
169```
170.config
171    .dims xyz
172    .tgsize
173```
174
175### .control_directive
176
177Open control directive section. This section must be 128 bytes. The content of this
178section will be stored in control_directive field in kernel configuration.
179Must be defined inside kernel. Can ben used only if LLVM version is 4.0.0 or later
180
181### .debug_private_segment_buffer_sgpr
182
183Syntax: .debug_private_segment_buffer_sgpr SGPRREG
184
185This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
186LLVM version is 4.0.0 or later. Set `debug_private_segment_buffer_sgpr` field in
187kernel configuration.
188
189### .debug_wavefront_private_segment_offset_sgpr
190
191Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
192
193This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
194LLVM version is 4.0.0 or later. Set `debug_wavefront_private_segment_offset_sgpr` field in
195kernel configuration.
196
197### .debugmode
198
199This pseudo-op must be inside kernel configuration (`.config`).
200Enable usage of the DEBUG_MODE.
201
202### .default_hsa_features
203
204This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
205LLVM version is 4.0.0 or later. It sets default HSA kernel features and register features
206(extra SGPR registers usage). These default features are `.use_private_segment_buffer`,
207`.use_dispatch_ptr`, `.use_kernarg_segment_ptr`, `.use_ptr64` and
208private_elem_size to 4 bytes.
209
210### .dims
211
212Syntax: .dims DIMENSIONS
213
214This pseudo-op must be inside kernel configuration (`.config`). Define what dimensions
215(from list: x, y, z) will be used to determine space of the kernel execution.
216
217### .driver_version
218
219Syntax: .driver_version VERSION
220
221Set driver (Mesa3D) version for this binary. Version in form: MajorVersion*100+MinorVersion.
222This pseudo-op replaces driver info.
223
224### .dx10clamp
225
226This pseudo-op must be inside kernel configuration (`.config`).
227Enable usage of the DX10_CLAMP.
228
229### .entry
230
231Syntax: .entry ADDRESS, VALUE
232
233Add entry of proginfo. Must be inside proginfo configuration. Sample proginfo:
234
235```
236.entry 0x0000b848, 0x000c0080
237.entry 0x0000b84c, 0x00001788
238.entry 0x0000b860, 0x00000000
239```
240
241### .exceptions
242
243Syntax: .exceptions EXCPMASK
244
245This pseudo-op must be inside kernel configuration (`.config`).
246Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
247
248### .floatmode
249
250Syntax: .floatmode BYTE-VALUE
251
252This pseudo-op must be inside kernel configuration (`.config`). Define float-mode.
253Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
254
255### .gds_segment_size
256
257Syntax: .gds_segment_size SIZE
258
259This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
260LLVM version is 4.0.0 or later. Set `gds_segment_size` field in kernel configuration.
261
262### .get_driver_version
263
264Syntax: .get_driver_version SYMBOL
265
266Store current driver version to SYMBOL. Version in form:
267`major_version*10000 + minor_version*100 + micro_version`.
268
269### .get_llvm_version
270
271Syntax: .get_llvm_version SYMBOL
272
273Store current LLVM compiler version to SYMBOL. Version in form:
274`major_version*10000 + minor_version*100 + micro_version`.
275
276### .globaldata
277
278Go to constant global data section (`.rodata`).
279
280### .group_segment_align
281
282Syntax: .group_segment_align ALIGN
283
284This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
285LLVM version is 4.0.0 or later. Set `group_segment_align` field in kernel configuration.
286
287### .hsa_debugmode
288
289This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
290LLVM version is 4.0.0 or later. Enable usage of the DEBUG_MODE in kernel HSA configuration.
291
292### .hsa_dims
293
294Syntax: .hsa_dims DIMENSIONS
295
296This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
297LLVM version is 4.0.0 or later. Define what dimensions (from list: x, y, z) will be used
298to determine space of the kernel execution in kernel HSA configuration.
299
300### .hsa_dx10clamp
301
302This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
303LLVM version is 4.0.0 or later. Enable usage of the DX10_CLAMP in kernel HSA configuration.
304
305### .hsa_exceptions
306
307Syntax: .hsa_exceptions EXCPMASK
308
309This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
310LLVM version is 4.0.0 or later. Set exception mask in PGMRSRC2 register value in
311kernel HSA configuration. Value should be 7-bit.
312
313### .hsa_floatmode
314
315Syntax: .hsa_floatmode BYTE-VALUE
316
317This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
318LLVM version is 4.0.0 or later. Define float-mode in kernel HSA configuration.
319Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
320
321### .hsa_ieeemode
322
323Syntax: .hsa_ieeemode
324
325This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
326LLVM version is 4.0.0 or later. Set ieee-mode in kernel HSA configuration.
327
328### .hsa_localsize
329
330Syntax: .hsa_localsize SIZE
331
332This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
333LLVM version is 4.0.0 or later. Define initial local memory size used by kernel in
334kernel HSA configuration.
335
336### .hsa_pgmrsrc1
337
338Syntax: .hsa_pgmrsrc1 VALUE
339
340This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
341LLVM version is 4.0.0 or later. Define value of the PGMRSRC1 in kernel HSA configuration.
342
343### .hsa_pgmrsrc2
344
345Syntax: .hsa_pgmrsrc2 VALUE
346
347This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
348LLVM version is 4.0.0 or later. Define value of the PGMRSRC2 in kernel HSA configration.
349If dimensions is set then bits that controls dimension setup will be ignored.
350SCRATCH_EN bit will be ignored.
351
352### .hsa_priority
353
354Syntax: .hsa_priority PRIORITY
355
356This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
357LLVM version is 4.0.0 or later. Define priority (0-3) in kernel HSA configuration.
358
359### .hsa_privmode
360
361This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
362LLVM version is 4.0.0 or later. Enable usage of the PRIV (privileged mode) in
363kernel HSA configuration.
364
365### .hsa_scratchbuffer
366
367Syntax: .hsa_scratchbuffer SIZE
368
369This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
370LLVM version is 4.0.0 or later. Define scratchbuffer size in kernel HSA configuration.
371
372### .hsa_sgprsnum
373
374Syntax: .hsa_sgprsnum REGNUM
375
376This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
377LLVM version is 4.0.0 or later. Set number of scalar registers which can be used during
378kernel execution in kernel HSA configuration.
379
380### .hsa_tgsize
381
382This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
383LLVM version is 4.0.0 or later. Enable usage of the TG_SIZE_EN in kernel HSA configuration.
384
385### .hsa_userdatanum
386
387Syntax: .hsa_userdatanum NUMBER
388
389This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
390LLVM version is 4.0.0 or later. Set number of registers for USERDATA in
391kernel HSA configuration.
392
393### .hsa_vgprsnum
394
395Syntax: .hsa_vgprsnum REGNUM
396
397This pseudo-op must be inside kernel configuration (`.config`) can ben used only if
398LLVM version is 4.0.0 or later. Set number of vector registers which can be used during
399kernel execution in kernel HSA configuration.
400
401### .ieeemode
402
403Syntax: .ieeemode
404
405This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
406
407### .kcode
408
409Syntax: .kcode KERNEL1,.... 
410Syntax: .kcode +
411
412Open code that will be belonging to specified kernels. By default any code between
413two consecutive kernel labels belongs to the kernel with first label name.
414This pseudo-operation can change membership of the code to specified kernels.
415You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
416to kernels. The most important reason why this feature has been added is register usage
417calculation. Any kernel given in this pseudo-operation must be already defined.
418
419Sample usage:
420
421```
422.kcode + # this code belongs to all kernels
423.kcodeend
424.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
425    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
426    .kcodeend
427.kcodeend
428```
429
430### .kcodeend
431
432Close `.kcode` clause. Refer to `.kcode`.
433
434### .kernarg_segment_align
435
436Syntax: .kernarg_segment_align ALIGN
437
438This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
439LLVM version is 4.0.0 or later. Set `kernarg_segment_alignment` field in
440kernel configuration. Value must be a power of two.
441
442### .kernarg_segment_size
443
444Syntax: .kernarg_segment_size SIZE
445
446This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
447LLVM version is 4.0.0 or later. Set `kernarg_segment_byte_size` field in
448kernel configuration.
449
450### .kernel_code_entry_offset
451
452Syntax: .kernel_code_entry_offset OFFSET
453
454This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
455LLVM version is 4.0.0 or later. Set `kernel_code_entry_byte_offset` field in
456kernel configuration. This field store offset between configuration and kernel code.
457By default is 256.
458
459### .kernel_code_prefetch_offset
460
461Syntax: .kernel_code_prefetch_offset OFFSET
462
463This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
464LLVM version is 4.0.0 or later. Set `kernel_code_prefetch_byte_offset` field in kernel
465configuration.
466
467### .kernel_code_prefetch_size
468
469Syntax: .kernel_code_prefetch_size OFFSET
470
471This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
472LLVM version is 4.0.0 or later. Set `kernel_code_prefetch_byte_size` field in kernel configuration.
473
474### .llvm_version
475
476Syntax: .llvm_version VERSION
477
478Set LLVM compiler version for this binary. Version in form: MajorVersion*100+MinorVersion.
479This pseudo-op replaces driver info.
480
481### .localsize
482
483Syntax: .localsize SIZE
484
485This pseudo-op must be inside kernel configuration (`.config`). Define initial
486local memory size used by kernel.
487
488### .machine
489
490Syntax: .machine KIND, MAJOR, MINOR, STEPPING
491
492This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
493LLVM version is 4.0.0 or later. Set machine version fields in kernel configuration.
494
495### .max_scratch_backing_memory
496
497Syntax: .max_scratch_backing_memory SIZE
498
499This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
500LLVM version is 4.0.0 or later. Set `max_scratch_backing_memory_byte_size` field
501in kernel configuration.
502
503### .pgmrsrc1
504
505Syntax: .pgmrsrc1 VALUE
506
507This pseudo-op must be inside kernel configuration (`.config`).
508Define value of the PGMRSRC1.
509
510### .pgmrsrc2
511
512Syntax: .pgmrsrc2 VALUE
513
514This pseudo-op must be inside kernel configuration (`.config`).
515Define value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
516will be ignored. SCRATCH_EN bit will be ignored.
517
518### .priority
519
520Syntax: .priority PRIORITY
521
522This pseudo-op must be inside kernel configuration (`.config`). Define priority (0-3).
523
524### .private_elem_size
525
526Syntax: .private_elem_size ELEMSIZE
527
528This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
529LLVM version is 4.0.0 or later. Set `private_element_size` field in kernel configuration.
530Must be a power of two between 2 and 16.
531
532### .private_segment_align
533
534Syntax: .private_segment ALIGN
535
536This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
537LLVM version is 4.0.0 or later. Set `private_segment_alignment` field in kernel
538configuration. Value must be a power of two.
539
540### .privmode
541
542This pseudo-op must be inside kernel configuration (`.config`).
543Enable usage of the PRIV (privileged mode).
544
545### .proginfo
546
547Open progInfo definition. Must be inside kernel.
548ProgInfo shall to be containing 3 entries. ProgInfo can not be defined if kernel config
549was defined (by using `.config`).
550
551### .reserved_sgprs
552
553Syntax: .reserved_sgprs FIRSTREG, LASTREG
554
555This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
556LLVM version is 4.0.0 or later. Set `reserved_sgpr_first` and `reserved_sgpr_count`
557fields in kernel configuration. `reserved_sgpr_count` filled by number of registers
558(LASTREG-FIRSTREG+1).
559
560### .reserved_vgprs
561
562Syntax: .reserved_vgprs FIRSTREG, LASTREG
563
564This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
565LLVM version is 4.0.0 or later. Set `reserved_vgpr_first` and `reserved_vgpr_count`
566fields in kernel configuration. `reserved_vgpr_count` filled by number of registers
567(LASTREG-FIRSTREG+1).
568
569### .runtime_loader_kernel_symbol
570
571Syntax: .runtime_loader_kernel_symbol ADDRESS
572
573This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
574LLVM version is 4.0.0 or later. Set `runtime_loader_kernel_symbol` field in kernel
575configuration.
576
577### .scratchbuffer
578
579Syntax: .scratchbuffer SIZE
580
581This pseudo-op must be inside kernel configuration (`.config`). Define scratchbuffer size.
582
583### .scratchsym
584
585Syntax: .scratchsym SYMBOL
586
587Set symbol as scratch symbol. This symbol points to scratch buffer offset an will be used
588while generating scratch buffer relocations.
589
590### .sgprsnum
591
592Syntax: .sgprsnum REGNUM
593
594This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
595registers which can be used during kernel execution.
596It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
597
598### .spilledsgprs
599
600Syntax: .spilledsgprs REGNUM
601
602This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
603registers to spill in scratch buffer. It have meaning for LLVM 3.9 or later.
604
605### .spilledvgprs
606
607Syntax: .spilledvgprs REGNUM
608
609This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
610registers to spill in scratch buffer. It have meaning for LLVM 3.9 or later.
611
612### .tgsize
613
614This pseudo-op must be inside kernel configuration (`.config`).
615Enable usage of the TG_SIZE_EN. Should be set.
616
617### .use_debug_enabled
618
619This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
620LLVM version is 4.0.0 or later. Enable `is_debug_enabled` field in kernel configuration.
621
622### .use_dispatch_id
623
624This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
625LLVM version is 4.0.0 or later. Enable `enable_sgpr_dispatch_id` field in kernel
626configuration.
627
628### .use_dispatch_ptr
629
630This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
631LLVM version is 4.0.0 or later. Enable `enable_sgpr_dispatch_ptr` field in kernel
632configuration.
633
634### .use_dynamic_call_stack
635
636This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
637LLVM version is 4.0.0 or later. Enable `is_dynamic_call_stack` field in
638kernel configuration.
639
640### .use_flat_scratch_init
641
642This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
643LLVM version is 4.0.0 or later. Enable `enable_sgpr_flat_scratch_init` field in
644kernel configuration.
645
646### .use_grid_workgroup_count
647
648Syntax: .use_grid_workgroup_count DIMENSIONS
649
650This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
651LLVM version is 4.0.0 or later. Enable `enable_sgpr_grid_workgroup_count_X`,
652`enable_sgpr_grid_workgroup_count_Y` and `enable_sgpr_grid_workgroup_count_Z` fields
653in kernel configuration, respectively by given dimensions.
654
655### .use_kernarg_segment_ptr
656
657This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
658LLVM version is 4.0.0 or later. Enable `enable_sgpr_kernarg_segment_ptr` field in
659kernel configuration.
660
661### .use_ordered_append_gds
662
663This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
664LLVM version is 4.0.0 or later. Enable `enable_ordered_append_gds` field in
665kernel configuration.
666
667### .use_private_segment_buffer
668
669This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
670LLVM version is 4.0.0 or later. Enable `enable_sgpr_private_segment_buffer` field in
671kernel configuration.
672
673### .use_private_segment_size
674
675This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
676LLVM version is 4.0.0 or later. Enable `enable_sgpr_private_segment_size` field in
677kernel configuration.
678
679### .use_ptr64
680
681This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
682LLVM version is 4.0.0 or later. Enable `is_ptr64` field in kernel configuration.
683
684### .use_queue_ptr
685
686This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
687LLVM version is 4.0.0 or later. Enable `enable_sgpr_queue_ptr` field in
688kernel configuration.
689
690### .use_xnack_enabled
691
692This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
693LLVM version is 4.0.0 or later. Enable `is_xnack_enabled` field in kernel configuration.
694
695### .userdatanum
696
697Syntax: .userdatanum NUMBER
698
699This pseudo-op must be inside kernel configuration (`.config`). Set number of
700registers for USERDATA.
701
702### .vgprsnum
703
704Syntax: .vgprsnum REGNUM
705
706This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
707registers which can be used during kernel execution.
708
709### .wavefront_sgpr_count
710
711Syntax: .wavefront_sgpr_count REGNUM
712
713This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
714LLVM version is 4.0.0 or later. Set `wavefront_sgpr_count` field in kernel configuration.
715
716### .wavefront_size
717
718Syntax: .wavefront_size POWEROFTWO
719
720This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
721LLVM version is 4.0.0 or later. Set `wavefront_size` field in kernel configuration.
722Value must be a power of two.
723
724### .workgroup_fbarrier_count
725
726Syntax: .workgroup_fbarrier_count COUNT
727
728This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
729LLVM version is 4.0.0 or later. Set `workgroup_fbarrier_count` field in
730kernel configuration.
731
732### .workgroup_group_segment_size
733
734Syntax: .workgroup_group_segment_size SIZE
735
736This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
737LLVM version is 4.0.0 or later. Set `workgroup_group_segment_byte_size` in
738kernel configuration.
739
740### .workitem_private_segment_size
741
742Syntax: .workitem_private_segment_size SIZE
743
744This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
745LLVM version is 4.0.0 or later. Set `workitem_private_segment_byte_size` field in
746kernel configuration.
747
748### .workitem_vgpr_count
749
750Syntax: .workitem_vgpr_count REGNUM
751
752This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
753LLVM version is 4.0.0 or later. Set `workitem_vgpr_count` field in kernel configuration.
754
755
756## Sample code
757
758This is sample example of the kernel setup:
759
760```
761.kernel DCT
762    .args
763        .arg global, 8, 8, 8, zext, general
764        .arg global, 8, 8, 8, zext, general
765        .arg global, 8, 8, 8, zext, general
766        .arg local, 4, 4, 4, zext, general
767        .arg scalar, 4, 4, 4, zext, general
768        .arg scalar, 4, 4, 4, zext, general
769        .arg scalar, 4, 4, 4, zext, general
770        .arg scalar, 4, 4, 4, zext, griddim
771        .arg scalar, 4, 4, 4, zext, gridoffset
772    .proginfo
773        .entry 0x0000b848, 0x000c0183
774        .entry 0x0000b84c, 0x00001788
775        .entry 0x0000b860, 0x00000000
776```
777
778with kernel configuration:
779
780```
781    .args
782        .arg global, 8, 8, 8, zext, general
783        .arg global, 8, 8, 8, zext, general
784        .arg global, 8, 8, 8, zext, general
785        .arg local, 4, 4, 4, zext, general
786        .arg scalar, 4, 4, 4, zext, general
787        .arg scalar, 4, 4, 4, zext, general
788        .arg scalar, 4, 4, 4, zext, general
789        .arg scalar, 4, 4, 4, zext, griddim
790        .arg scalar, 4, 4, 4, zext, gridoffset
791    .config
792        .dims xyz
793        .tgsize
794```
795
796All code:
797
798```
799.gallium
800.gpu CapeVerde
801.kernel DCT
802    .args
803        .arg global, 8, 8, 8, zext, general
804        .arg global, 8, 8, 8, zext, general
805        .arg global, 8, 8, 8, zext, general
806        .arg local, 4, 4, 4, zext, general
807        .arg scalar, 4, 4, 4, zext, general
808        .arg scalar, 4, 4, 4, zext, general
809        .arg scalar, 4, 4, 4, zext, general
810        .arg scalar, 4, 4, 4, zext, griddim
811        .arg scalar, 4, 4, 4, zext, gridoffset
812    .proginfo
813        .entry 0x0000b848, 0x000c0183
814        .entry 0x0000b84c, 0x00001788
815        .entry 0x0000b860, 0x00000000
816.text
817DCT:
818/*c0030106         */ s_load_dword    s6, s[0:1], 0x6
819/*c0038107         */ s_load_dword    s7, s[0:1], 0x7
820/* we skip rest of instruction to demonstrate how to write GalliumCompute program */
821/*bf810000         */ s_endpgm
822```
Note: See TracBrowser for help on using the repository browser.