source: CLRX/CLRadeonExtender/trunk/doc/AmdCl2Abi.md @ 3552

Last change on this file since 3552 was 3552, checked in by matszpk, 4 months ago

CLRadeonExtender: CLRXDocs: Update AmdCL2ABI: add info about additional setup arguments.

File size: 5.7 KB
Line 
1## AMD Catalyst OpenCL 2.0 ABI description
2
3This chapter describes how kernel gets its argument, how access to constant data. Because
4Kernel setup is AMD HSA configuration, hence we recommend to refer to ROCm-ABI documentation
5to get information about kernel setup and kernel arguments passing. Now assembler have
6all the AMD HSA configuration's pseudo-ops to do it.
7
8In this chapter, size is given in dwords. Dword is 4-byte value.
9
10### Passing options
11
12CLRX assembler give ability to set what feature will be used by kernel in configuration.
13Following feature can be enabled:
14
15* usesetup - use sizes information. Add kernel setup and sizes buffer
16to user data registers.
17* useargs - kernel uses arguments. Add kernel arguments to user data registers.
18* useenqueue - enable enqueue mechanism support
19* usegeneric - enable generic pointers support
20
21Number of user data registers depends on set of an enabled features. Following rules will
22be applied:
23
24* if no feature enabled only 4 user data registers will be used.
25* if useargs enabled, then 6 user data registers will be used. 4-5 user data are
26argument's pointer.
27* if usesetup enabled, then 8 user data registers will be used. 4-5 user data are kernel
28setup pointer. 6-7 user data regs are argument's pointer.
29* if useenqueue enabled, then 10 user data registers will be used. 4-5 user data regs
30are kernel setup pointer. 6-7 user data regs are argument's pointer.
31* if usegeneric enabled, then 12 user data registers will be used. 4-5 user data regs
32are kernel setup pointer. 8-9 user data regs are argument's pointer.
33* for VEGA (GFX9) architecture, then 10 user data registers will be used. 4-5 user data regs
34are kernel setup pointer. 6-7 user data regs are argument's pointer.
35
36### Argument passing and kernel setup
37
38First pointer that is present in user data registers is kernel setup pointer.
39This pointer points to setup buffer that holds kernel execution setup. Following
40dwords:
41
42* 0 dword - general setup. Bit 16-31 - dimensions number
43* 1 dword - enqueued local size ??. Bit 0-15 - local size X, bit 16-31 - local size Y
44* 2 dword - enqueued local size. Bit 0-15 - local size Z
45* 3-5 dword - global size for each dimension
46
47Second pointer is argument's pointer. This pointer points to argument's buffer.
48First argument are setup arguments.
49
50* size_t global_offset_0 - 32-bit or 64-bit global offset for X
51* size_t global_offset_1 - 32-bit or 64-bit global offset for Y
52* size_t global_offset_2 - 32-bit or 64-bit global offset for Z
53* void* printf_buffer - 32-bit or 64-bit printf buffer
54* void* vqueue_pointer - 32-bit or 64-bit
55* void* aqlwrap_pointer - 32-bit or 64-bit
56
57Further arguments in that buffer are an user arguments defined for kernel. Any pointer,
58command queue, image, sampler, structure tooks 8 bytes (64-bit pointer) or
594 bytes (32-bit pointer) in 32-bit AMD OpenCL 2.0.
603 component vector tooks number of bytes  of 4 element vector.
61Smaller types likes (char, short) tooks 1-3 bytes. Alignment depends on same type
62or type of element (for vectors).
63
64For 64-bit AMD OpenCL 2.0 all setup arguments and pointers are 64-bit.
65For 32-bit AMD OpenCL 2.0 all setup arguments and pointers are 32-bit.
66
67### Image arguments
68
69An images are passed via pointers to argument's buffer. An image pointers points to
70image resource and image informations. Image resources tooks 8 dwords. 8 dword hold
71information about channel data type. Following table describes data channel type value's
72and their counterpart from OpenCL:
73
74 Value | OpenCL value          | Value | OpenCL value
75-------|-----------------------|-------|-----------------------
76 0     | CL_SNORM_INT8         | 8     | CL_SIGNED_INT8
77 1     | CL_SNORM_INT16        | 9     | CL_SIGNED_INT16
78 2     | CL_UNORM_INT8         | 10    | CL_SIGNED_INT32
79 3     | CL_UNORM_INT16        | 11    | CL_UNSIGNED_INT8
80 4     | CL_UNORM_INT24        | 12    | CL_UNSIGNED_INT16
81 5     | CL_UNORM_SHORT_555    | 13    | CL_UNSIGNED_INT32
82 6     | CL_UNORM_SHORT_565    | 14    | CL_HALF_FLOAT
83 7     | CL_UNORM_INT_101010   | 15    | CL_FLOAT
84
85Before looking up table, value should be masked: (value&0xf).
86
87Likewise, 9 dword holds channel order information. Following table describes values and
88OpenCL counterparts:
89
90 Value | OpenCL value | Value  | OpenCL value
91-------|--------------|--------|------------------
92 0     | CL_A         |  10    | CL_ARGB
93 1     | CL_R         |  11    | CL_ABGR
94 2     | CL_Rx        |  12    | CL_sRGB
95 3     | CL_RG        |  13    | CL_sRGBx
96 4     | CL_RGx       |  14    | CL_sRGBA
97 5     | CL_RA        |  15    | CL_sBGRA
98 6     | CL_RGB       |  16    | CL_INTENSITY
99 7     | CL_RGBx      |  17    | CL_LUMINANCE
100 8     | CL_RGBA      |  18    | CL_DEPTH
101 9     | CL_BGRA      |  19    | CL_DEPTH_STENCIL
102
103Before looking up table, value should be masked: (value&0x1f).
104
105### Sampler arguments
106
107A samplers are passed via pointers. A sampler pointers points to sampler resource.
108
109### Scratch buffer access
110
111First four scalar registers holds scratch buffer descriptor. Refer to
112[GCN Machine State](GcnState) to learn about vector and scalar initial registers.
113
114### Flat access
115
116By default, FLAT instructions read or write values from main memory.
117Generic addressing (usegeneric) allow to access to LDS and scratch buffer by using
118FLAT instructions. A following rules gives ability to correctly setting up that mechanism.
119Registers S[6-7] holds special buffer that hold a LDS and scratch buffer base addresses for
120FLAT instructions.
12116 dword of that buffer holds 32-63 bits of LDS base address for FLAT instructions.
12217 dword of that buffer holds 32-63 bits of scratch buffer base address for
123FLAT instructions.
124Register S10 holds base scratch buffer offset for FLAT_SCRATCH. Register S11 holds
125size of scratch per thread (for FLAT_SCRATCH).
Note: See TracBrowser for help on using the repository browser.