source: CLRX/CLRadeonExtender/trunk/doc/AmdCl2Abi.md @ 3449

Last change on this file since 3449 was 3449, checked in by matszpk, 3 years ago

CLRadeonExtender: CLRXDocs: Updsate AmdCL2ABI (VEGA and AMD HSA).

File size: 5.4 KB
Line 
1## AMD Catalyst OpenCL 2.0 ABI description
2
3This chapter describes how kernel gets its argument, how access to constant data. Because
4Kernel setup is AMD HSA configuration, hence we recommend to refer to ROCm-ABI documentation
5to get information about kernel setup and kernel arguments passing. Now assembler have
6all the AMD HSA configuration's pseudo-ops to do it.
7
8In this chapter, size is given in dwords. Dword is 4-byte value.
9
10### Passing options
11
12CLRX assembler give ability to set what feature will be used by kernel in configuration.
13Following feature can be enabled:
14
15* usesetup - use sizes information. Add kernel setup and sizes buffer
16to user data registers.
17* useargs - kernel uses arguments. Add kernel arguments to user data registers.
18* useenqueue - enable enqueue mechanism support
19* usegeneric - enable generic pointers support
20
21Number of user data registers depends on set of an enabled features. Following rules will
22be applied:
23
24* if no feature enabled only 4 user data registers will be used.
25* if useargs enabled, then 6 user data registers will be used. 4-5 user data are
26argument's pointer.
27* if usesetup enabled, then 8 user data registers will be used. 4-5 user data are kernel
28setup pointer. 6-7 user data regs are argument's pointer.
29* if useenqueue enabled, then 10 user data registers will be used. 4-5 user data regs
30are kernel setup pointer. 6-7 user data regs are argument's pointer.
31* if usegeneric enabled, then 12 user data registers will be used. 4-5 user data regs
32are kernel setup pointer. 8-9 user data regs are argument's pointer.
33* for VEGA (GFX9) architecture, then 10 user data registers will be used. 4-5 user data regs
34are kernel setup pointer. 6-7 user data regs are argument's pointer.
35
36### Argument passing and kernel setup
37
38First pointer that is present in user data registers is kernel setup pointer.
39This pointer points to setup buffer that holds kernel execution setup. Following
40dwords:
41
42* 0 dword - general setup. Bit 16-31 - dimensions number
43* 1 dword - enqueued local size ??. Bit 0-15 - local size X, bit 16-31 - local size Y
44* 2 dword - enqueued local size. Bit 0-15 - local size Z
45* 3-5 dword - global size for each dimension
46
47Second pointer is argument's pointer. This pointer points to argument's buffer.
48First argument are setup arguments.
49
50* ulong global_offset_0 - 64-bit global offset for X
51* ulong global_offset_1 - 64-bit global offset for Y
52* ulong global_offset_2 - 64-bit global offset for Z
53
54Further arguments in that buffer are an user arguments defined for kernel. Any pointer,
55command queue, image, sampler, structure tooks 8 bytes (64-bit pointer).
563 componet vector tooks number of bytes  of 4 element vector.
57Smaller types likes (char, short) tooks 1-3 bytes. Alignment depends on same type
58or type of element (for vectors).
59
60For 32-bit AMD OpenCL 2.0 all setup arguments are 32-bit.
61
62### Image arguments
63
64An images are passed via pointers to argument's buffer. An image pointers points to
65image resource and image informations. Image resources tooks 8 dwords. 8 dword hold
66information about channel data type. Following table describes data channel type value's
67and their counterpart from OpenCL:
68
69 Value | OpenCL value          | Value | OpenCL value
70-------|-----------------------|-------|-----------------------
71 0     | CL_SNORM_INT8         | 8     | CL_SIGNED_INT8
72 1     | CL_SNORM_INT16        | 9     | CL_SIGNED_INT16
73 2     | CL_UNORM_INT8         | 10    | CL_SIGNED_INT32
74 3     | CL_UNORM_INT16        | 11    | CL_UNSIGNED_INT8
75 4     | CL_UNORM_INT24        | 12    | CL_UNSIGNED_INT16
76 5     | CL_UNORM_SHORT_555    | 13    | CL_UNSIGNED_INT32
77 6     | CL_UNORM_SHORT_565    | 14    | CL_HALF_FLOAT
78 7     | CL_UNORM_INT_101010   | 15    | CL_FLOAT
79
80Before looking up table, value should be masked: (value&0xf).
81
82Likewise, 9 dword holds channel order information. Following table describes values and
83OpenCL counterparts:
84
85 Value | OpenCL value | Value  | OpenCL value
86-------|--------------|--------|------------------
87 0     | CL_A         |  10    | CL_ARGB
88 1     | CL_R         |  11    | CL_ABGR
89 2     | CL_Rx        |  12    | CL_sRGB
90 3     | CL_RG        |  13    | CL_sRGBx
91 4     | CL_RGx       |  14    | CL_sRGBA
92 5     | CL_RA        |  15    | CL_sBGRA
93 6     | CL_RGB       |  16    | CL_INTENSITY
94 7     | CL_RGBx      |  17    | CL_LUMINANCE
95 8     | CL_RGBA      |  18    | CL_DEPTH
96 9     | CL_BGRA      |  19    | CL_DEPTH_STENCIL
97
98Before looking up table, value should be masked: (value&0x1f).
99
100### Sampler arguments
101
102A samplers are passed via pointers. A sampler pointers points to sampler resource.
103
104### Scratch buffer access
105
106First four scalar registers holds scratch buffer descriptor. Refer to
107[GCN Machine State](GcnState) to learn about vector and scalar initial registers.
108
109### Flat access
110
111By default, FLAT instructions read or write values from main memory.
112Generic addressing (usegeneric) allow to access to LDS and scratch buffer by using
113FLAT instructions. A following rules gives ability to correctly setting up that mechanism.
114Registers S[6-7] holds special buffer that hold a LDS and scratch buffer base addresses for
115FLAT instructions.
11616 dword of that buffer holds 32-63 bits of LDS base address for FLAT instructions.
11717 dword of that buffer holds 32-63 bits of scratch buffer base address for
118FLAT instructions.
119Register S10 holds base scratch buffer offset for FLAT_SCRATCH. Register S11 holds
120size of scratch per thread (for FLAT_SCRATCH).
Note: See TracBrowser for help on using the repository browser.