source: CLRX/CLRadeonExtender/trunk/doc/AmdCl2Abi.md @ 3255

Last change on this file since 3255 was 3255, checked in by matszpk, 3 years ago

CLRadeonExtender: CLRXDocs: small typo in AmdCL2ABI (wrong name of use).

File size: 5.0 KB
Line 
1## AMD Catalyst OpenCL 2.0 ABI description
2
3This chapter describes how kernel gets its argument, how access to constant data.
4
5In this chapter, size is given in dwords. Dword is 4-byte value.
6
7### Passing options
8
9CLRX assembler give ability to set what feature will be used by kernel in configuration.
10Following feature can be enabled:
11
12* usesetup - use sizes information. Add kernel setup and sizes buffer
13to user data registers.
14* useargs - kernel uses arguments. Add kernel arguments to user data registers.
15* useenqueue - enable enqueue mechanism support
16* usegeneric - enable generic pointers support
17
18Number of user data registers depends on set of an enabled features. Following rules will
19be applied:
20
21* if no feature enabled only 4 user data registers will be used.
22* if useargs enabled, then 6 user data registers will be used. 4-5 user data are
23argument's pointer.
24* if usesetup enabled, then 8 user data registers will be used. 4-5 user data are kernel
25setup pointer. 6-7 user data regs are argument's pointer.
26* if useenqueue enabled, then 10 user data registers will be used. 4-5 user data regs
27are kernel setup pointer. 6-7 user data regs are argument's pointer.
28* if usegeneric enabled, then 12 user data registers will be used. 4-5 user data regs
29are kernel setup pointer. 8-9 user data regs are argument's pointer.
30
31### Argument passing and kernel setup
32
33First pointer that is present in user data registers is kernel setup pointer.
34This pointer points to setup buffer that holds kernel execution setup. Following
35dwords:
36
37* 0 dword - general setup. Bit 16-31 - dimensions number
38* 1 dword - enqueued local size ??. Bit 0-15 - local size X, bit 16-31 - local size Y
39* 2 dword - enqueued local size. Bit 0-15 - local size Z
40* 3-5 dword - global size for each dimension
41
42Second pointer is argument's pointer. This pointer points to argument's buffer.
43First argument are setup arguments.
44
45* ulong global_offset_0 - 64-bit global offset for X
46* ulong global_offset_1 - 64-bit global offset for Y
47* ulong global_offset_2 - 64-bit global offset for Z
48
49Further arguments in that buffer are an user arguments defined for kernel. Any pointer,
50command queue, image, sampler, structure tooks 8 bytes (64-bit pointer).
513 componet vector tooks number of bytes  of 4 element vector.
52Smaller types likes (char, short) tooks 1-3 bytes. Alignment depends on same type
53or type of element (for vectors).
54
55For 32-bit AMD OpenCL 2.0 all setup arguments are 32-bit.
56
57### Image arguments
58
59An images are passed via pointers to argument's buffer. An image pointers points to
60image resource and image informations. Image resources tooks 8 dwords. 8 dword hold
61information about channel data type. Following table describes data channel type value's
62and their counterpart from OpenCL:
63
64 Value | OpenCL value          | Value | OpenCL value
65-------|-----------------------|-------|-----------------------
66 0     | CL_SNORM_INT8         | 8     | CL_SIGNED_INT8
67 1     | CL_SNORM_INT16        | 9     | CL_SIGNED_INT16
68 2     | CL_UNORM_INT8         | 10    | CL_SIGNED_INT32
69 3     | CL_UNORM_INT16        | 11    | CL_UNSIGNED_INT8
70 4     | CL_UNORM_INT24        | 12    | CL_UNSIGNED_INT16
71 5     | CL_UNORM_SHORT_555    | 13    | CL_UNSIGNED_INT32
72 6     | CL_UNORM_SHORT_565    | 14    | CL_HALF_FLOAT
73 7     | CL_UNORM_INT_101010   | 15    | CL_FLOAT
74
75Before looking up table, value should be masked: (value&0xf).
76
77Likewise, 9 dword holds channel order information. Following table describes values and
78OpenCL counterparts:
79
80 Value | OpenCL value | Value  | OpenCL value
81-------|--------------|--------|------------------
82 0     | CL_A         |  10    | CL_ARGB
83 1     | CL_R         |  11    | CL_ABGR
84 2     | CL_Rx        |  12    | CL_sRGB
85 3     | CL_RG        |  13    | CL_sRGBx
86 4     | CL_RGx       |  14    | CL_sRGBA
87 5     | CL_RA        |  15    | CL_sBGRA
88 6     | CL_RGB       |  16    | CL_INTENSITY
89 7     | CL_RGBx      |  17    | CL_LUMINANCE
90 8     | CL_RGBA      |  18    | CL_DEPTH
91 9     | CL_BGRA      |  19    | CL_DEPTH_STENCIL
92
93Before looking up table, value should be masked: (value&0x1f).
94
95### Sampler arguments
96
97A samplers are passed via pointers. A sampler pointers points to sampler resource.
98
99### Scratch buffer access
100
101First four scalar registers holds scratch buffer descriptor. Refer to
102[GCN Machine State](GcnState) to learn about vector and scalar initial registers.
103
104### Flat access
105
106By default, FLAT instructions read or write values from main memory.
107Generic addressing (usegeneric) allow to access to LDS and scratch buffer by using
108FLAT instructions. A following rules gives ability to correctly setting up that mechanism.
109Registers S[6-7] holds special buffer that hold a LDS and scratch buffer base addresses for
110FLAT instructions.
11116 dword of that buffer holds 32-63 bits of LDS base address for FLAT instructions.
11217 dword of that buffer holds 32-63 bits of scratch buffer base address for
113FLAT instructions.
114Register S10 holds base scratch buffer offset for FLAT_SCRATCH. Register S11 holds
115size of scratch per thread (for FLAT_SCRATCH).
Note: See TracBrowser for help on using the repository browser.