source: CLRX/CLRadeonExtender/trunk/doc/ClrxAsmAmdCl2.md @ 3996

Last change on this file since 3996 was 3996, checked in by matszpk, 13 months ago

CLRadeonExtender: CLRXDocs: add extra info about setting up number of the SGPRs registers.

File size: 20.6 KB
Line 
1## CLRadeonExtender Assembler AMD Catalyst OpenCL 2.0 handling
2
3The AMD Catalyst driver provides own OpenCL implementation that can generates
4own binaries of the OpenCL programs. The CLRX assembler supports both OpenCL 1.2
5and OpenCL 2.0 binary format. This chapter describes Amd OpenCL 2.0 binary format.
6The first Catalyst drivers uses this format for OpenCL 2.0 programs.
7Current AMD drivers uses this format for OpenCL 1.2 and OpenCL 2.0 programs for
8GCN 1.1 and later architectures.
9
10## Binary format
11
12An AMD Catalyst binary format for OpenCL 2.0 support significantly differs from
13prevbious binary format for OpenCL 1.2. The Kernel codes are in single text inner binary.
14Instead of AMD CAL notes and ProgInfo entries, the kernel setup is in special
15format structure. Metadatas mainly holds arguments definitions of kernels.
16
17A CLRadeonExtender supports two versions of binary formats for OpenCL 2.0: newer (since
18AMD OpenCL 1912.05) and older (before 1912.05 driver version).
19
20Special section to define global data for all kernels:
21
22* `rodata`, `.globaldata` - read-only constant (global) data
23* `.rwdata`, `.data` - read-write global data
24* `.bss`, `.bssdata` - allocatable read-write data
25
26## Relocations
27
28A CLRX assembler handles relocations to symbol at global data, global rwdata and
29global bss data in kernel code. These relocations can be applied to places that accepts
3032-bit literal immediates. Only two types of relocations is allowed:
31
32* `place`, `place&0xffffffff`, `place%0x10000000`, `place%%0x10000000` -
33low 32 bits of value
34* `place>>32`, `place/0x100000000`, `place//0x100000000` - high 32 bits of value
35
36The `place` indicates an expression that result points to some place in one of
37allowed sections.
38
39Examples:
40
41```
42s_mov_b32       s13, (gdata+152)>>32
43s_mov_b32       s12, (gdata+152)&0xffffffff
44s_mov_b32       s15, (gdata+160)>>32
45s_mov_b32       s14, (gdata+160)&0xffffffff
46```
47
48## Layout of the source code
49
50The CLRX assembler allow to use one of two ways to configure kernel setup:
51for human (`.config`) and for quick recompilation (kernel setup, stub, metadata content).
52
53## Register usage setup
54
55The CLRX assembler automatically sets number of used VGPRs and number of used SGPRs.
56This setup can be replaced by pseudo-ops '.sgprsnum' and '.vgprsnum'.
57
58## Scalar register allocation
59
60Depend on configuration options, an assembler add VCC, FLAT_SCRATCH and XNACK_MASK
61(if `.useenqueue` or `.usegeneric` enabled).
62In HSA configuration mode, a special fields determines
63what extra SGPR registers (FLAT_SCRATCH, VCC and XNACK_MASK) has been added.
64
65While using HSA kernel configuration (`.hsaconfig`) the `.sgprsnum` set number of all SGPRs
66including VCC, FLAT_SCRATCH and XNACK_MASK.
67While using kernel configuration (`.config`) the `.sgprsnum` set number of all SGPRs
68except VCC and FLAT_SCRATCH and XNACK_MASK (rule from AMD binary format support).
69
70## List of the specific pseudo-operations
71
72### .acl_version
73
74Syntax: .acl_version "STRING"
75
76Set ACL version string.
77
78### .arch_minor
79
80Syntax: .arch_minor ARCH_MINOR
81
82Set architecture minor number.
83
84### .arch_stepping
85
86Syntax: .arch_minor ARCH_STEPPING
87
88Set architecture stepping number.
89
90### .arg
91
92Syntax for scalar: .arg ARGNAME \[, "ARGTYPENAME"], ARGTYPE[, unused] 
93Syntax for structure: .arg ARGNAME, \[, "ARGTYPENAME"], ARGTYPE[, STRUCTSIZE[, unused]] 
94Syntax for image: .arg ARGNAME\[, "ARGTYPENAME"], ARGTYPE[, [ACCESS] [, RESID[, unused]]] 
95Syntax for sampler: .arg ARGNAME\[, "ARGTYPENAME"], ARGTYPE[, RESID[, unused]] 
96Syntax for global pointer: .arg ARGNAME\[, "ARGTYPENAME"],
97ARGTYPE\[\[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, unused]]] 
98Syntax for local pointer: .arg ARGNAME\[, "ARGTYPENAME"],
99ARGTYPE\[\[, STRUCTSIZE], PTRSPACE[, [ACCESS] [, unused]]] 
100Syntax for constant pointer: .arg ARGNAME\[, "ARGTYPENAME"],
101ARGTYPE\[\[, STRUCTSIZE], PTRSPACE\[, [ACCESS] [, [CONSTSIZE] [, unused]]]
102
103Adds kernel argument definition. Must be inside any kernel configuration. First argument is
104argument name from OpenCL kernel definition. Next optional argument is argument type name
105from OpenCL kernel definition. Next arugment is argument type:
106
107* char, uchar, short, ushort, int, uint, ulong, long, float, double - simple scalar types
108* charX, ucharX, shortX, ushortX, intX, uintX, ulongX, longX, floatX, doubleX - vector types
109(X indicates number of elements: 2, 3, 4, 8 or 16)
110* structure - structure
111* image, image1d, image1d_array, image1d_buffer, image2d, image2d_array, image3d -
112image types
113* sampler - sampler
114* queue - command queue
115* clkevent - clkevent
116* type* - pointer to data
117
118Rest of the argument depends on type of the kernel argument. STRUCTSIZE determines size of
119structure. ACCESS for image determines can be one of the: `read_only`, `rdonly`,
120`write_only`, `wronly` or 'read_write', 'rdwr'.
121PTRSPACE determines space where pointer points to.
122It can be one of: `local`, `constant` or `global`.
123ACCESS for pointers can be: `const`, `restrict` and `volatile`.
124CONSTSIZE determines maximum size in bytes for constant buffer.
125RESID determines resource id (only for samplers and images).
126
127* for read only images range is in 0-127.
128* for other images is in 0-63.
129* for samplers is in 0-15.
130
131The last argument `unused` indicates that argument will not be used by kernel. In this
132argument can be given 'rdonly' (argument used for read-only) and 'wronly'
133(argument used for write-only).
134
135Sample usage:
136
137```
138.arg v1,"double_t",double
139.arg v2,double2
140.arg v3,double3
141.arg v23,image2d,
142.arg v30,image2d,,5
143.arg v41,ulong16  *,global
144.arg v42,ulong16  *,global, restrict
145.arg v57,structure*,82,global
146```
147
148### .bssdata
149
150Syntax: .bssdata [align=ALIGNMENT]
151
152Go to global data bss section. Optional argument sets alignment of section.
153
154### .call_convention
155
156Syntax: .call_convention CALL_CONV
157
158This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`).
159Set call convention for kernel.
160
161### .codeversion
162
163Syntax .codeversion MAJOR, MINOR
164
165This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`).
166Set AMD code version.
167
168### .compile_options
169
170Syntax: .compile_options "STRING"
171
172Set compile options for this binary.
173
174### .config
175
176Open kernel configuration. Must be inside kernel. Kernel configuration can not be
177defined if any isametadata, metadata or stub was defined.
178Following pseudo-ops can be inside kernel config:
179
180* .arg
181* .cws
182* .debugmode
183* .dims
184* .dx10clamp
185* .exceptions
186* .localsize
187* .ieeemode
188* .pgmrsrc1
189* .pgmrsrc2
190* .priority
191* .privmode
192* .sampler
193* .scratchbuffer
194* .setupargs
195* .sgprsnum
196* .tgsize
197* .uavid
198* .useargs
199* .useenqueue
200* .usegeneric
201* .usesetup
202* .vgprsnum
203
204### .control_directive
205
206Open control directive section. This section must be 128 bytes. The content of this
207section will be stored in control_directive field in kernel configuration.
208Must be defined inside kernel.
209
210### .cws, .reqd_work_group_size
211
212Syntax: .cws [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]] 
213Syntax: .reqd_work_group_size [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
214
215This pseudo-operation must be inside any kernel configuration.
216Set reqd_work_group_size hint for this kernel.
217In versions earlier than 0.1.7 this pseudo-op has been broken and this pseudo-op
218set zeroes in two last component instead ones. We recomment to fill all components.
219
220### .debug_private_segment_buffer_sgpr
221
222Syntax: .debug_private_segment_buffer_sgpr SGPRREG
223
224This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
225`debug_private_segment_buffer_sgpr` field in kernel configuration.
226
227### .debug_wavefront_private_segment_offset_sgpr
228
229Syntax: .debug_wavefront_private_segment_offset_sgpr SGPRREG
230
231This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
232`debug_wavefront_private_segment_offset_sgpr` field in kernel configuration.
233
234### .debugmode
235
236This pseudo-operation must be inside any kernel configuration.
237Enable usage of the DEBUG_MODE.
238
239### .default_hsa_features
240
241This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`).
242It sets default HSA kernel features and register features (extra SGPR registers usage).
243These default features are `.use_private_segment_buffer`, `.use_kernarg_segment_ptr`,
244`.use_ptr64` (if 64-bit binaries) and private_elem_size is 4 bytes.
245
246### .dims
247
248Syntax: .dims DIMENSIONS
249
250This pseudo-operation must be inside any kernel configuration. Define what dimensions
251(from list: x, y, z) will be used to determine space of the kernel execution.
252
253### .driver_version
254
255Syntax: .driver_version VERSION
256
257Set driver version for this binary. Version in form: MajorVersion*100+MinorVersion.
258This pseudo-op replaces driver info.
259
260### .dx10clamp
261
262This pseudo-operation must be inside any kernel configuration.
263Enable usage of the DX10_CLAMP.
264
265### .exceptions
266
267Syntax: .exceptions EXCPMASK
268
269This pseudo-operation must be inside any kernel configuration.
270Set exception mask in PGMRSRC2 register value. Value should be 7-bit.
271
272### .gds_segment_size
273
274Syntax: .gds_segment_size SIZE
275
276This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
277`gds_segment_size` field in kernel configuration.
278
279### .gdssize
280
281Syntax: .gdssize SIZE
282
283This pseudo-operation must be inside any kernel configuration. Set the GDS
284(global data share) size.
285
286### .get_driver_version
287
288Syntax: .get_driver_version SYMBOL
289
290Store current driver version to SYMBOL. Version in form `version*100 + revision`.
291
292### .globaldata
293
294Go to constant global data section.
295
296### .group_segment_align
297
298Syntax: .group_segment_align ALIGN
299
300This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
301`group_segment_align` field in kernel configuration.
302
303### .hsaconfig
304
305Open kernel HSA configuration. Must be inside kernel. Kernel configuration can not be
306defined if any isametadata, metadata or stub was defined. Do not mix with `.config`.
307
308### .ieeemode
309
310This pseudo-op must be inside any kernel configuration. Set ieee-mode.
311
312### .inner
313
314Go to inner binary place. By default assembler is in main binary.
315
316### .isametadata
317
318This pseudo-operation must be inside kernel. Go to ISA metadata content
319(only older driver binaries).
320
321### .kernarg_segment_align
322
323Syntax: .kernarg_segment_align ALIGN
324
325This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
326`kernarg_segment_alignment` field in kernel configuration. Value must be a power of two.
327
328### .kernarg_segment_size
329
330Syntax: .kernarg_segment_size SIZE
331
332This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
333`kernarg_segment_byte_size` field in kernel configuration.
334
335### .kernel_code_entry_offset
336
337Syntax: .kernel_code_entry_offset OFFSET
338
339This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
340`kernel_code_entry_byte_offset` field in kernel configuration. This field
341store offset between configuration and kernel code. By default is 256.
342
343### .kernel_code_prefetch_offset
344
345Syntax: .kernel_code_prefetch_offset OFFSET
346
347This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
348`kernel_code_prefetch_byte_offset` field in kernel configuration.
349
350### .kernel_code_prefetch_size
351
352Syntax: .kernel_code_prefetch_size OFFSET
353
354This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
355`kernel_code_prefetch_byte_size` field in kernel configuration.
356
357### .localsize
358
359Syntax: .localsize SIZE
360
361This pseudo-operation must be inside any kernel configuration. Set the initial
362local data size.
363
364### .machine
365
366Syntax: .machine KIND, MAJOR, MINOR, STEPPING
367
368This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
369machine version fields in kernel configuration.
370
371### .max_scratch_backing_memory
372
373Syntax: .max_scratch_backing_memory SIZE
374
375This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
376`max_scratch_backing_memory_byte_size` field in kernel configuration.
377
378### .metadata
379
380This pseudo-operation must be inside kernel. Go to metadata content.
381
382### .pgmrsrc1
383
384Syntax: .pgmrsrc1 VALUE
385
386This pseudo-operation must be inside kernel.
387Define value of the PGMRSRC1.
388
389
390### .pgmrsrc2
391
392Syntax: .pgmrsrc2 VALUE
393
394This pseudo-operation must be inside any kernel configuration. Set PGMRSRC2 value.
395If dimensions is set then bits that controls dimension setup will be ignored.
396SCRATCH_EN bit will be ignored.
397
398### .priority
399
400Syntax: .priority PRIORITY
401
402This pseudo-operation must be inside kernel. Define priority (0-3).
403
404### .private_elem_size
405
406Syntax: .private_elem_size ELEMSIZE
407
408This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`).
409Set `private_element_size` field in kernel configuration.
410Must be a power of two between 2 and 16.
411
412### .private_segment_align
413
414Syntax: .private_segment ALIGN
415
416This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
417`private_segment_alignment` field in kernel configuration. Value must be a power of two.
418
419### .privmode
420
421This pseudo-operation must be inside kernel.
422Enable usage of the PRIV (privileged mode).
423
424### .reserved_sgprs
425
426Syntax: .reserved_sgprs FIRSTREG, LASTREG
427
428This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
429`reserved_sgpr_first` and `reserved_sgpr_count` fields in kernel configuration.
430`reserved_sgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
431
432### .reserved_vgprs
433
434Syntax: .reserved_vgprs FIRSTREG, LASTREG
435
436This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
437`reserved_vgpr_first` and `reserved_vgpr_count` fields in kernel configuration.
438`reserved_vgpr_count` filled by number of registers (LASTREG-FIRSTREG+1).
439
440### .runtime_loader_kernel_symbol
441
442Syntax: .runtime_loader_kernel_symbol ADDRESS
443
444This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
445`runtime_loader_kernel_symbol` field in kernel configuration.
446
447### .rwdata
448
449Go to read-write global data section.
450
451### .sampler
452
453Syntax: .sampler VALUE,...
454
455Inside main and inner binary: add sampler definitions.
456Only legal when no samplerinit section. Inside kernel configuration:
457add samplers to kernel (values are sampler ids).
458
459### .samplerinit
460
461Go to samplerinit content section. Only legal if no sampler definitions.
462
463### .samplerreloc
464
465Syntax: .samplerreloc OFFSET, SAMPLERID
466
467Add sampler relocation that points to constant global data (rodata).
468
469### .scratchbuffer
470
471Syntax: .scratchbuffer SIZE
472
473This pseudo-operation must be inside any kernel configuration.
474Set scratchbuffer size.
475
476### .setup
477
478Go to kernel setup content section.
479
480### .setupargs
481
482This pseudo-op must be inside any kernel configuration. Add first kernel setup arguments.
483This pseudo-op must be before any other arguments.
484
485### .sgprsnum
486
487Syntax: .sgprsnum REGNUM
488
489This pseudo-op must be inside any kernel configuration. Set number of scalar
490registers which can be used during kernel execution. In old-config style,
491it counts SGPR registers excluding VCC, FLAT_SCRATCH and XNACK_MASK.
492In HSA-config style, it counts SGPR registers including VCC, FLAT_SCRATCH and XNACK_MASK
493(like ROCm).
494
495### .stub
496
497Go to kernel stub content section. Only allowed for older driver version binaries.
498
499### .tgsize
500
501This pseudo-op must be inside any kernel configuration.
502Enable usage of the TG_SIZE_EN.
503
504### .use_debug_enabled
505
506This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
507`is_debug_enabled` field in kernel configuration.
508
509### .use_dispatch_id
510
511This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
512`enable_sgpr_dispatch_id` field in kernel configuration.
513
514### .use_dispatch_ptr
515
516This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
517`enable_sgpr_dispatch_ptr` field in kernel configuration.
518
519### .use_dynamic_call_stack
520
521This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
522`is_dynamic_call_stack` field in kernel configuration.
523
524### .use_flat_scratch_init
525
526This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
527`enable_sgpr_flat_scratch_init` field in kernel configuration.
528
529### .use_grid_workgroup_count
530
531Syntax: .use_grid_workgroup_count DIMENSIONS
532
533This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
534`enable_sgpr_grid_workgroup_count_X`, `enable_sgpr_grid_workgroup_count_Y`
535and `enable_sgpr_grid_workgroup_count_Z` fields in kernel configuration,
536respectively by given dimensions.
537
538### .use_kernarg_segment_ptr
539
540This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
541`enable_sgpr_kernarg_segment_ptr` field in kernel configuration.
542
543### .use_ordered_append_gds
544
545This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
546`enable_ordered_append_gds` field in kernel configuration.
547
548### .use_private_segment_buffer
549
550This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
551`enable_sgpr_private_segment_buffer` field in kernel configuration.
552
553### .use_private_segment_size
554
555This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
556`enable_sgpr_private_segment_size` field in kernel configuration.
557
558### .use_ptr64
559
560This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`).
561Enable `is_ptr64` field in kernel configuration.
562
563### .use_queue_ptr
564
565This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
566`enable_sgpr_queue_ptr` field in kernel configuration.
567
568### .use_xnack_enabled
569
570This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Enable
571`is_xnack_enabled` field in kernel configuration.
572
573### .useargs
574
575This pseudo-op must be inside any kernel (non-HSA) configuration.
576Indicate that kernel uses arguments.
577
578### .useenqueue
579
580This pseudo-op must be inside any kernel (non-HSA) configuration.
581Indicate that kernel uses enqueue mechanism.
582
583### .usegeneric
584
585This pseudo-op must be inside any kernel (non-HSA) configuration.
586Indicate that kernel uses generic pointers mechanism (FLAT instructions).
587
588### .usesetup
589
590This pseudo-op must be inside any kernel (non-HSA) configuration.
591Indicate that kernel uses setup data (global sizes, local sizes, work groups num).
592
593### .userdatanum
594
595Syntax: .userdatanum NUMBER
596
597This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set number of
598registers for USERDATA.
599
600### .vectypehint
601
602Syntax: .vectypehint OPENCLTYPE
603
604This pseudo-operation must be inside any kernel configuration.
605Set vectypehint for kernel. The argument is OpenCL type.
606
607### .vgprsnum
608
609Syntax: .vgprsnum REGNUM
610
611This pseudo-op must be inside any kernel configuration. Set number of vector
612registers which can be used during kernel execution.
613
614### .wavefront_sgpr_count
615
616Syntax: .wavefront_sgpr_count REGNUM
617
618This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
619`wavefront_sgpr_count` field in kernel configuration.
620
621### .wavefront_size
622
623Syntax: .wavefront_size POWEROFTWO
624
625This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`).
626Set `wavefront_size` field in kernel configuration. Value must be a power of two.
627
628### .work_group_size_hint
629
630Syntax: .work_group_size_hint [SIZEHINT][, [SIZEHINT][, [SIZEHINT]]]
631
632This pseudo-operation must be inside any kernel configuration.
633Set work_group_size_hint for this kernel.
634
635### .workgroup_fbarrier_count
636
637Syntax: .workgroup_fbarrier_count COUNT
638
639This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
640`workgroup_fbarrier_count` field in kernel configuration.
641
642### .workgroup_group_segment_size
643
644Syntax: .workgroup_group_segment_size SIZE
645
646This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
647`workgroup_group_segment_byte_size` in kernel configuration.
648
649### .workitem_private_segment_size
650
651Syntax: .workitem_private_segment_size SIZE
652
653This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
654`workitem_private_segment_byte_size` field in kernel configuration.
655
656### .workitem_vgpr_count
657
658Syntax: .workitem_vgpr_count REGNUM
659
660This pseudo-op must be inside kernel HSA configuration (`.hsaconfig`). Set
661`workitem_vgpr_count` field in kernel configuration.
662
663## Sample code
664
665This is sample example of the kernel setup:
666
667```
668.amdcl2
669.64bit
670.gpu Bonaire
671.driver_version 191205
672.compile_options "-I ./ -cl-std=CL2.0"
673.acl_version "AMD-COMP-LIB-v0.8 (0.0.SC_BUILD_NUMBER)"
674.kernel DCT
675    .metadata
676        .byte 0xe0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
677        ...,
678    .setup
679        .byte 0x01, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00
680        .byte 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
681        ....
682    .text
683/*c0000501         */ s_load_dword    s0, s[4:5], 0x1
684....
685/*bf810000         */ s_endpgm
686```
687
688This is sample of the kernel with configuration:
689
690```
691.amdcl2
692.64bit
693.gpu Bonaire
694.driver_version 191205
695.compile_options "-I ./ -cl-std=CL2.0"
696.acl_version "AMD-COMP-LIB-v0.8 (0.0.SC_BUILD_NUMBER)"
697.kernel DCT
698    .config
699        .dims xy
700        .useargs
701        .usesetup
702        .setupargs
703        .arg output,float*
704        .arg input,float*
705        .arg dct8x8,float*
706        .arg dct8x8_trans,float*
707        .arg inter,float*,local
708        .arg width,uint
709        .arg blockWidth,uint
710        .arg inverse,uint
711        .......
712    .text
713/*c0000501         */ s_load_dword    s0, s[4:5], 0x1
714....
715/*bf810000         */ s_endpgm
716```
Note: See TracBrowser for help on using the repository browser.