source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmGallium.md @ 3702

Last change on this file since 3702 was 3702, checked in by matszpk, 2 years ago

CLRadeonExtender: CLRXDocs: Add info about registers kernel setup.

File size: 25.5 KB
Line 
1## CLRadeonExtender Assembler Gallium handling
2
3The GalliumCompute is an open-source the OpenCL implementation for the Mesa3D
4drivers. It divided into three components: CLover, libclc, LLVM AMDGPU. Since LLVM v3.6
5and Mesa3D v10.5, GalliumCompute binary format with native code. CLRadeonExtender
6supports only these binaries.
7
8## Binary format
9
10The binary format contains: kernel informations and the main binary in the ELF format.
11Main `.text` section contains all code for all kernels. Optionally,
12section `.rodata` contains constant global data for all kernels.
13Main binary have the kernel configuration (ProgInfo) in the `.AMDGPU.config` section.
14ProgInfo holds three addresses and values that describes runtime environment for kernel:
15floating point setup, register usage, local data usage and rest.
16
17The assembler source code divided to three parts:
18
19* kernel configuration
20* kernel constant data (in `.rodata` section)
21* kernel code (in `.text` section)
22
23Order of these parts doesn't matter.
24
25Kernel function should to be aligned to 256 byte boundary.
26
27## Register usage setup
28
29The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
30This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
31
32## Scalar register allocation
33
34Assembler for GalliumCompute format counts all SGPR registers and add extra registers
35(VCC, FLAT_SCRATCH, XNACK_MASK) if any used to register pool.
36 The VCC register is included by default.
37In AMDHSA configuration (LLVM >= 4.0.0) then special fields determines
38what extra SGPR extra has been added.
39
40## List of the specific pseudo-operations
41
42### .arch_minor
43
44Syntax: .arch_minor ARCH_MINOR
45
46Set architecture minor number. Used only if LLVM version is 4.0.0 or later.
47
48### .arch_stepping
49
50Syntax: .arch_minor ARCH_STEPPING
51
52Set architecture stepping number. Used only if LLVM version is 4.0.0 or later.
53
54### .arg
55
56Syntax: .arg ARGTYPE, SIZE[, TARGETSIZE[, ALIGNMENT[, NUMEXT[, SEMANTIC]]]]
57
58Adds kernel argument definition. Must be inside argument configuration.
59First argument is type:
60
61* scalar - scalar value (including vector values likes uint4)
62* contant - constant pointer (32-bit ???)
63* global - global pointer (64-bit)
64* local - local pointer
65* image2d_rdonly - ??
66* image2d_wronly - ??
67* image3d_rdonly - ??
68* image3d_wronly - ??
69* sampler - ??
70* griddim - shortcut for griddim argument definition
71* gridoffset - shortcut for gridoffset argument definition
72
73Second argument is size of argument. Third argument is targetSize which
74should be a multiplier of 4. Fourth argument is target alignment. By default target
75alignment is power of 2 not less than size.
76Fifth argument determines how extend numeric value to larger target size:
77`sext` - signed, `zext` - zero extend. If argument is smaller than 4 byte,
78then `sext` can be to define signed integer, `zext` to unsigned integer.
79Sixth argument is semantic:
80
81* general - general argument
82* griddim - griddim argument
83* gridoffset - gridoffset argument
84* imgsize - image size
85* imgformat - image format
86
87Example argument definition:
88
89```
90.arg scalar, 4, 4, 4, zext, general
91.arg global, 8, 8, 8, zext, general
92.arg scalar, 2, 4, 4, sext, general # short
93.arg scalar, 16, 16, 16, zext, general # uint4 or double2
94.arg scalar, 4, 4, 4, zext, griddim # shortcut: .arg griddim
95.arg scalar, 4, 4, 4, zext, gridoffset # shortcut .arg gridoffset
96```
97
98Last two arguments (griddim, gridoffset) shall to be defined in any kernel definition.
99
100### .args
101
102Open kernel argument configuration. Must be inside kernel.
103
104### .call_convention
105
106Syntax: .call_convention CALL_CONV
107
108This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
109LLVM version is 4.0.0 or later. Set call convention for kernel.
110
111### .codeversion
112
113Syntax .codeversion MAJOR, MINOR
114
115This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
116LLVM version is 4.0.0 or later. Set AMD code version.
117
118### .config
119
120Open kernel configuration. Must be inside kernel. Kernel configuration can not be
121defined if proginfo configuration was defined (by using `.proginfo`).
122Following pseudo-ops can be inside kernel config:
123
124* .debugmode - enables using of DEBUG_MODE
125* .dims DIMS - choose dimensions used by kernel function. Can be: x,y,z.
126* .dx10clamp - enables using of DX10_CLAMP
127* .floatmode VALUE - choose float mode for kernel (byte value).
128Default value is 0xc0
129* .ieeemode - choose IEEE mode for kernel
130* .localsize SIZE - initial local data size for kernel in bytes
131* .pgmrsrc2 VALUE - value of the PGMRSRC2 (only bits that is not set by other pseudo-ops)
132* .priority VALUE - set priority for kernel (0-3). Default value is 0.
133* .privmode - enables using of PRIV (privileged mode)
134* .scratchbuffer SIZE - size of scratchbuffer (???). Default value is 0.
135* .sgprsnum NUMBER - number of SGPR registers used by kernel (excluding VCC,FLAT_SCRATCH).
136By default, automatically computed by assembler.
137* .vgprsnum NUMBER - number of VGPR registers used by kernel.
138By default, automatically computed by assembler.
139* .userdatanum NUMBER - number of USERDATA used by kernel (0-16). Default value is 4.
140* .tgsize - enables using of TG_SIZE_EN (we recommend to add this always)
141* .spillesgprs - number of scalar registers to spill
142* .spillevgprs - number of vector registers to spill
143* AMDHSA pseudo-ops
144
145Example configuration:
146
147```
148.config
149    .dims xyz
150    .tgsize
151```
152
153### .control_directive
154
155Open control directive section. This section must be 128 bytes. The content of this
156section will be stored in control_directive field in kernel configuration.
157Must be defined inside kernel. Can ben used only if LLVM version is 4.0.0 or later
158
159### .debug_private_segment_buffer_sgpr
160
161Syntax: .debug_private_segment_buffer_sgpr SGPRREG
162
163This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
164LLVM version is 4.0.0 or later. Set `debug_private_segment_buffer_sgpr` field in
165kernel configuration.
166
167### .debug_wavefront_private_segment_offset_sgpr
168
169Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
170
171This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
172LLVM version is 4.0.0 or later. Set `debug_wavefront_private_segment_offset_sgpr` field in
173kernel configuration.
174
175### .debugmode
176
177This pseudo-op must be inside kernel configuration (`.config`).
178Enable usage of the DEBUG_MODE.
179
180### .default_hsa_features
181
182This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
183LLVM version is 4.0.0 or later. It sets default HSA kernel features and register features
184(extra SGPR registers usage). These default features are `.use_private_segment_buffer`,
185`.use_dispatch_ptr`, `.use_kernarg_segment_ptr`, `.use_ptr64` and
186private_elem_size to 4 bytes.
187
188### .dims
189
190Syntax: .dims DIMENSIONS
191
192This pseudo-op must be inside kernel configuration (`.config`). Defines what dimensions
193(from list: x, y, z) will be used to determine space of the kernel execution.
194
195### .driver_version
196
197Syntax: .driver_version VERSION
198
199Set driver (Mesa3D) version for this binary. Version in form: MajorVersion*100+MinorVersion.
200This pseudo-op replaces driver info.
201
202### .dx10clamp
203
204This pseudo-op must be inside kernel configuration (`.config`).
205Enable usage of the DX10_CLAMP.
206
207### .entry
208
209Syntax: .entry ADDRESS, VALUE
210
211Add entry of proginfo. Must be inside proginfo configuration. Sample proginfo:
212
213```
214.entry 0x0000b848, 0x000c0080
215.entry 0x0000b84c, 0x00001788
216.entry 0x0000b860, 0x00000000
217```
218
219### .exceptions
220
221Syntax: .exceptions EXCPMASK
222
223This pseudo-op must be inside kernel configuration (`.config`).
224Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
225
226### .floatmode
227
228Syntax: .floatmode BYTE-VALUE
229
230This pseudo-op must be inside kernel configuration (`.config`). Defines float-mode.
231Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
232
233### .gds_segment_size
234
235Syntax: .gds_segment_size SIZE
236
237This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
238LLVM version is 4.0.0 or later. Set `gds_segment_size` field in kernel configuration.
239
240### .get_driver_version
241
242Syntax: .get_driver_version SYMBOL
243
244Store current driver version to SYMBOL. Version in form:
245`major_version*10000 + minor_version*100 + micro_version`.
246
247### .get_llvm_version
248
249Syntax: .get_llvm_version SYMBOL
250
251Store current LLVM compiler version to SYMBOL. Version in form:
252`major_version*10000 + minor_version*100 + micro_version`.
253
254### .globaldata
255
256Go to constant global data section (`.rodata`).
257
258### .group_segment_align
259
260Syntax: .group_segment_align ALIGN
261
262This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
263LLVM version is 4.0.0 or later. Set `group_segment_align` field in kernel configuration.
264
265### .hsa_debugmode
266
267This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
268LLVM version is 4.0.0 or later. Enable usage of the DEBUG_MODE in kernel HSA configuration.
269
270### .hsa_dims
271
272Syntax: .hsa_dims DIMENSIONS
273
274This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
275LLVM version is 4.0.0 or later. Defines what dimensions (from list: x, y, z) will be used
276to determine space of the kernel execution in kernel HSA configuration.
277
278### .hsa_dx10clamp
279
280This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
281LLVM version is 4.0.0 or later. Enable usage of the DX10_CLAMP in kernel HSA configuration.
282
283### .hsa_exceptions
284
285Syntax: .hsa_exceptions EXCPMASK
286
287This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
288LLVM version is 4.0.0 or later. Set exception mask in PGMRSRC2 register value in
289kernel HSA configuration. Value should be 7-bit.
290
291### .hsa_floatmode
292
293Syntax: .hsa_floatmode BYTE-VALUE
294
295This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
296LLVM version is 4.0.0 or later. Defines float-mode in kernel HSA configuration.
297Set floatmode (FP_ROUND and FP_DENORM fields of the MODE register). Default value is 0xc0.
298
299### .hsa_ieeemode
300
301Syntax: .hsa_ieeemode
302
303This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
304LLVM version is 4.0.0 or later. Set ieee-mode in kernel HSA configuration.
305
306### .hsa_localsize
307
308Syntax: .hsa_localsize SIZE
309
310This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
311LLVM version is 4.0.0 or later. Defines initial local memory size used by kernel in
312kernel HSA configuration.
313
314### .hsa_pgmrsrc1
315
316Syntax: .hsa_pgmrsrc1 VALUE
317
318This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
319LLVM version is 4.0.0 or later. Defines value of the PGMRSRC1 in kernel HSA configuration.
320
321### .hsa_pgmrsrc2
322
323Syntax: .hsa_pgmrsrc2 VALUE
324
325This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
326LLVM version is 4.0.0 or later. Defines value of the PGMRSRC2 in kernel HSA configration.
327If dimensions is set then bits that controls dimension setup will be ignored.
328SCRATCH_EN bit will be ignored.
329
330### .hsa_priority
331
332Syntax: .hsa_priority PRIORITY
333
334This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
335LLVM version is 4.0.0 or later. Defines priority (0-3) in kernel HSA configuration.
336
337### .hsa_privmode
338
339This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
340LLVM version is 4.0.0 or later. Enable usage of the PRIV (privileged mode) in
341kernel HSA configuration.
342
343### .hsa_scratchbuffer
344
345Syntax: .hsa_scratchbuffer SIZE
346
347This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
348LLVM version is 4.0.0 or later. Defines scratchbuffer size in kernel HSA configuration.
349
350### .hsa_sgprsnum
351
352Syntax: .hsa_sgprsnum REGNUM
353
354This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
355LLVM version is 4.0.0 or later. Set number of scalar registers which can be used during
356kernel execution in kernel HSA configuration.
357
358### .hsa_tgsize
359
360This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
361LLVM version is 4.0.0 or later. Enable usage of the TG_SIZE_EN in kernel HSA configuration.
362
363### .hsa_userdatanum
364
365Syntax: .hsa_userdatanum NUMBER
366
367This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
368LLVM version is 4.0.0 or later. Set number of registers for USERDATA in
369kernel HSA configuration.
370
371### .hsa_vgprsnum
372
373Syntax: .hsa_vgprsnum REGNUM
374
375This pseudo-op must be inside kernel configuration (`.config`) can ben used only if
376LLVM version is 4.0.0 or later. Set number of vector registers which can be used during
377kernel execution in kernel HSA configuration.
378
379### .ieeemode
380
381Syntax: .ieeemode
382
383This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode.
384
385### .kcode
386
387Syntax: .kcode KERNEL1,.... 
388Syntax: .kcode +
389
390Open code that will be belonging to specified kernels. By default any code between
391two consecutive kernel labels belongs to the kernel with first label name.
392This pseudo-operation can change membership of the code to specified kernels.
393You can nest this `.kcode` any times. Just next .kcode adds or remove membership code
394to kernels. The most important reason why this feature has been added is register usage
395calculation. Any kernel given in this pseudo-operation must be already defined.
396
397Sample usage:
398
399```
400.kcode + # this code belongs to all kernels
401.kcodeend
402.kcode kernel1, kernel2 #  this code belongs to kernel1, kernel2
403    .kcode -kernel1 #  this code belongs only to kernel2 (kernel1 removed)
404    .kcodeend
405.kcodeend
406```
407
408### .kcodeend
409
410Close `.kcode` clause. Refer to `.kcode`.
411
412### .kernarg_segment_align
413
414Syntax: .kernarg_segment_align ALIGN
415
416This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
417LLVM version is 4.0.0 or later. Set `kernarg_segment_alignment` field in
418kernel configuration. Value must be a power of two.
419
420### .kernarg_segment_size
421
422Syntax: .kernarg_segment_size SIZE
423
424This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
425LLVM version is 4.0.0 or later. Set `kernarg_segment_byte_size` field in
426kernel configuration.
427
428### .kernel_code_entry_offset
429
430Syntax: .kernel_code_entry_offset OFFSET
431
432This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
433LLVM version is 4.0.0 or later. Set `kernel_code_entry_byte_offset` field in
434kernel configuration. This field store offset between configuration and kernel code.
435By default is 256.
436
437### .kernel_code_prefetch_offset
438
439Syntax: .kernel_code_prefetch_offset OFFSET
440
441This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
442LLVM version is 4.0.0 or later. Set `kernel_code_prefetch_byte_offset` field in kernel
443configuration.
444
445### .kernel_code_prefetch_size
446
447Syntax: .kernel_code_prefetch_size OFFSET
448
449This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
450LLVM version is 4.0.0 or later. Set `kernel_code_prefetch_byte_size` field in kernel configuration.
451
452### .llvm_version
453
454Syntax: .llvm_version VERSION
455
456Set LLVM compiler version for this binary. Version in form: MajorVersion*100+MinorVersion.
457This pseudo-op replaces driver info.
458
459### .localsize
460
461Syntax: .localsize SIZE
462
463This pseudo-op must be inside kernel configuration (`.config`). Defines initial
464local memory size used by kernel.
465
466### .machine
467
468Syntax: .machine KIND, MAJOR, MINOR, STEPPING
469
470This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
471LLVM version is 4.0.0 or later. Set machine version fields in kernel configuration.
472
473### .max_scratch_backing_memory
474
475Syntax: .max_scratch_backing_memory SIZE
476
477This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
478LLVM version is 4.0.0 or later. Set `max_scratch_backing_memory_byte_size` field
479in kernel configuration.
480
481### .pgmrsrc1
482
483Syntax: .pgmrsrc1 VALUE
484
485This pseudo-op must be inside kernel configuration (`.config`).
486Defines value of the PGMRSRC1.
487
488### .pgmrsrc2
489
490Syntax: .pgmrsrc2 VALUE
491
492This pseudo-op must be inside kernel configuration (`.config`).
493Defines value of the PGMRSRC2. If dimensions is set then bits that controls dimension setup
494will be ignored. SCRATCH_EN bit will be ignored.
495
496### .priority
497
498Syntax: .priority PRIORITY
499
500This pseudo-op must be inside kernel configuration (`.config`). Defines priority (0-3).
501
502### .private_elem_size
503
504Syntax: .private_elem_size ELEMSIZE
505
506This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
507LLVM version is 4.0.0 or later. Set `private_element_size` field in kernel configuration.
508Must be a power of two between 2 and 16.
509
510### .private_segment_align
511
512Syntax: .private_segment ALIGN
513
514This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
515LLVM version is 4.0.0 or later. Set `private_segment_alignment` field in kernel
516configuration. Value must be a power of two.
517
518### .privmode
519
520This pseudo-op must be inside kernel configuration (`.config`).
521Enable usage of the PRIV (privileged mode).
522
523### .proginfo
524
525Open progInfo definition. Must be inside kernel.
526ProgInfo shall to be containing 3 entries. ProgInfo can not be defined if kernel config
527was defined (by using `.config`).
528
529### .reserved_sgprs
530
531Syntax: .reserved_sgprs FIRSTREG, LASTREG
532
533This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
534LLVM version is 4.0.0 or later. Set `reserved_sgpr_first` and `reserved_sgpr_count`
535fields in kernel configuration. `reserved_sgpr_count` filled by number of registers
536(LASTREG-FIRSTREG+1).
537
538### .reserved_vgprs
539
540Syntax: .reserved_vgprs FIRSTREG, LASTREG
541
542This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
543LLVM version is 4.0.0 or later. Set `reserved_vgpr_first` and `reserved_vgpr_count`
544fields in kernel configuration. `reserved_vgpr_count` filled by number of registers
545(LASTREG-FIRSTREG+1).
546
547### .runtime_loader_kernel_symbol
548
549Syntax: .runtime_loader_kernel_symbol ADDRESS
550
551This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
552LLVM version is 4.0.0 or later. Set `runtime_loader_kernel_symbol` field in kernel
553configuration.
554
555### .scratchbuffer
556
557Syntax: .scratchbuffer SIZE
558
559This pseudo-op must be inside kernel configuration (`.config`). Defines scratchbuffer size.
560
561### .scratchsym
562
563Syntax: .scratchsym SYMBOL
564
565Set symbol as scratch symbol. This symbol points to scratch buffer offset an will be used
566while generating scratch buffer relocations.
567
568### .sgprsnum
569
570Syntax: .sgprsnum REGNUM
571
572This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
573registers which can be used during kernel execution.
574It counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK.
575
576### .spilledsgprs
577
578Syntax: .sgpilledsgprs REGNUM
579
580This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar
581registers to spill in scratch buffer. It have meaning for LLVM 3.9 or later.
582
583### .spilledvgprs
584
585Syntax: .sgpilledvgprs REGNUM
586
587This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
588registers to spill in scratch buffer. It have meaning for LLVM 3.9 or later.
589
590### .tgsize
591
592This pseudo-op must be inside kernel configuration (`.config`).
593Enable usage of the TG_SIZE_EN. Should be set.
594
595### .use_debug_enabled
596
597This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
598LLVM version is 4.0.0 or later. Enable `is_debug_enabled` field in kernel configuration.
599
600### .use_dispatch_id
601
602This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
603LLVM version is 4.0.0 or later. Enable `enable_sgpr_dispatch_id` field in kernel
604configuration.
605
606### .use_dispatch_ptr
607
608This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
609LLVM version is 4.0.0 or later. Enable `enable_sgpr_dispatch_ptr` field in kernel
610configuration.
611
612### .use_dynamic_call_stack
613
614This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
615LLVM version is 4.0.0 or later. Enable `is_dynamic_call_stack` field in
616kernel configuration.
617
618### .use_flat_scratch_init
619
620This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
621LLVM version is 4.0.0 or later. Enable `enable_sgpr_flat_scratch_init` field in
622kernel configuration.
623
624### .use_grid_workgroup_count
625
626Syntax: .use_grid_workgroup_count DIMENSIONS
627
628This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
629LLVM version is 4.0.0 or later. Enable `enable_sgpr_grid_workgroup_count_X`,
630`enable_sgpr_grid_workgroup_count_Y` and `enable_sgpr_grid_workgroup_count_Z` fields
631in kernel configuration, respectively by given dimensions.
632
633### .use_kernarg_segment_ptr
634
635This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
636LLVM version is 4.0.0 or later. Enable `enable_sgpr_kernarg_segment_ptr` field in
637kernel configuration.
638
639### .use_ordered_append_gds
640
641This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
642LLVM version is 4.0.0 or later. Enable `enable_ordered_append_gds` field in
643kernel configuration.
644
645### .use_private_segment_buffer
646
647This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
648LLVM version is 4.0.0 or later. Enable `enable_sgpr_private_segment_buffer` field in
649kernel configuration.
650
651### .use_private_segment_size
652
653This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
654LLVM version is 4.0.0 or later. Enable `enable_sgpr_private_segment_size` field in
655kernel configuration.
656
657### .use_ptr64
658
659This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
660LLVM version is 4.0.0 or later. Enable `is_ptr64` field in kernel configuration.
661
662### .use_queue_ptr
663
664This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
665LLVM version is 4.0.0 or later. Enable `enable_sgpr_queue_ptr` field in
666kernel configuration.
667
668### .use_xnack_enabled
669
670This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
671LLVM version is 4.0.0 or later. Enable `is_xnack_enabled` field in kernel configuration.
672
673### .userdatanum
674
675Syntax: .userdatanum NUMBER
676
677This pseudo-op must be inside kernel configuration (`.config`). Set number of
678registers for USERDATA.
679
680### .vgprsnum
681
682Syntax: .vgprsnum REGNUM
683
684This pseudo-op must be inside kernel configuration (`.config`). Set number of vector
685registers which can be used during kernel execution.
686
687### .wavefront_sgpr_count
688
689Syntax: .wavefront_sgpr_count REGNUM
690
691This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
692LLVM version is 4.0.0 or later. Set `wavefront_sgpr_count` field in kernel configuration.
693
694### .wavefront_size
695
696Syntax: .wavefront_size POWEROFTWO
697
698This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
699LLVM version is 4.0.0 or later. Set `wavefront_size` field in kernel configuration.
700Value must be a power of two.
701
702### .workgroup_fbarrier_count
703
704Syntax: .workgroup_fbarrier_count COUNT
705
706This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
707LLVM version is 4.0.0 or later. Set `workgroup_fbarrier_count` field in
708kernel configuration.
709
710### .workgroup_group_segment_size
711
712Syntax: .workgroup_group_segment_size SIZE
713
714This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
715LLVM version is 4.0.0 or later. Set `workgroup_group_segment_byte_size` in
716kernel configuration.
717
718### .workitem_private_segment_size
719
720Syntax: .workitem_private_segment_size SIZE
721
722This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
723LLVM version is 4.0.0 or later. Set `workitem_private_segment_byte_size` field in
724kernel configuration.
725
726### .workitem_vgpr_count
727
728Syntax: .workitem_vgpr_count REGNUM
729
730This pseudo-op must be inside kernel configuration (`.config`) and can ben used only if
731LLVM version is 4.0.0 or later. Set `workitem_vgpr_count` field in kernel configuration.
732
733
734## Sample code
735
736This is sample example of the kernel setup:
737
738```
739.kernel DCT
740    .args
741        .arg global, 8, 8, 8, zext, general
742        .arg global, 8, 8, 8, zext, general
743        .arg global, 8, 8, 8, zext, general
744        .arg local, 4, 4, 4, zext, general
745        .arg scalar, 4, 4, 4, zext, general
746        .arg scalar, 4, 4, 4, zext, general
747        .arg scalar, 4, 4, 4, zext, general
748        .arg scalar, 4, 4, 4, zext, griddim
749        .arg scalar, 4, 4, 4, zext, gridoffset
750    .proginfo
751        .entry 0x0000b848, 0x000c0183
752        .entry 0x0000b84c, 0x00001788
753        .entry 0x0000b860, 0x00000000
754```
755
756with kernel configuration:
757
758```
759    .args
760        .arg global, 8, 8, 8, zext, general
761        .arg global, 8, 8, 8, zext, general
762        .arg global, 8, 8, 8, zext, general
763        .arg local, 4, 4, 4, zext, general
764        .arg scalar, 4, 4, 4, zext, general
765        .arg scalar, 4, 4, 4, zext, general
766        .arg scalar, 4, 4, 4, zext, general
767        .arg scalar, 4, 4, 4, zext, griddim
768        .arg scalar, 4, 4, 4, zext, gridoffset
769    .config
770        .dims xyz
771        .tgsize
772```
773
774All code:
775
776```
777.gallium
778.gpu CapeVerde
779.kernel DCT
780    .args
781        .arg global, 8, 8, 8, zext, general
782        .arg global, 8, 8, 8, zext, general
783        .arg global, 8, 8, 8, zext, general
784        .arg local, 4, 4, 4, zext, general
785        .arg scalar, 4, 4, 4, zext, general
786        .arg scalar, 4, 4, 4, zext, general
787        .arg scalar, 4, 4, 4, zext, general
788        .arg scalar, 4, 4, 4, zext, griddim
789        .arg scalar, 4, 4, 4, zext, gridoffset
790    .proginfo
791        .entry 0x0000b848, 0x000c0183
792        .entry 0x0000b84c, 0x00001788
793        .entry 0x0000b860, 0x00000000
794.text
795DCT:
796/*c0030106         */ s_load_dword    s6, s[0:1], 0x6
797/*c0038107         */ s_load_dword    s7, s[0:1], 0x7
798/* we skip rest of instruction to demonstrate how to write GalliumCompute program */
799/*bf810000         */ s_endpgm
800```
Note: See TracBrowser for help on using the repository browser.