Version 4 (modified by trac, 8 years ago) (diff) |
---|
AMD Catalyst OpenCL 2.0 ABI description
This chapter describes how kernel gets its argument, how access to constant data.
In this chapter, size is given in dwords. Dword is 4-byte value.
Passing options
CLRX assembler give ability to set what feature will be used by kernel in configuration. Following feature can be enabled:
- usesetup - use sizes information. Add kernel setup and sizes buffer to user data registers.
- useargs - kernel uses arguments. Add kernel arguments to user data registers.
- useenqueue - enable enqueue mechanism support
- usegeneric - enable generic pointers support
Number of user data registers depends on set of an enabled features. Following rules will be applied:
- if no feature enabled only 4 user data registers will be used.
- if useargs enabled, then 6 user data registers will be used. 4-5 user data are argument's pointer.
- if usesetup enabled, then 8 user data registers will be used. 4-5 user data are kernel setup pointer. 6-7 user data regs are argument's pointer.
- if useenqueue enabled, then 10 user data registers will be used. 4-5 user data are kernel setup pointer. 6-7 user data regs are argument's pointer.
- if useenqueue enabled, then 12 user data registers will be used. 4-5 user data are kernel setup pointer. 6-7 user data regs are argument's pointer.
Argument passing and kernel setup
First pointer that is present in user data registers is kernel setup pointer. This pointer points to setup buffer that holds kernel execution setup. Following dwords:
- 0 dword - general setup. Bit 16-31 - dimensions number
- 1 dword - enqueued local size ??. Bit 0-15 - local size X, bit 16-31 - local size Y
- 2 dword - enqueued local size. Bit 0-15 - local size Z
- 3-5 dword - global size for each dimension
Second pointer is argument's pointer. This pointer points to argument's buffer. First argument are setup arguments.
- ulong global_offset_0 - 64-bit global offset for X
- ulong global_offset_1 - 64-bit global offset for Y
- ulong global_offset_2 - 64-bit global offset for Z
Further arguments in that buffer are an user arguments defined for kernel. Any pointer, command queue, image, sampler, structure tooks 8 bytes (64-bit pointer). 3 componet vector tooks number of bytes of 4 element vector. Smaller types likes (char, short) tooks 1-3 bytes. Alignment depends on same type or type of element (for vectors).
Image arguments
An images are passed via pointers to argument's buffer. An image pointers points to image resource and image informations. Image resources tooks 8 dwords. 8 dword hold information about channel data type. Following table describes data channel type value's and their counterpart from OpenCL:
Value | OpenCL value | Value | OpenCL value |
---|---|---|---|
0 | CL_SNORM_INT8 | 8 | CL_SIGNED_INT8 |
1 | CL_SNORM_INT16 | 9 | CL_SIGNED_INT16 |
2 | CL_UNORM_INT8 | 10 | CL_SIGNED_INT32 |
3 | CL_UNORM_INT16 | 11 | CL_UNSIGNED_INT8 |
4 | CL_UNORM_INT24 | 12 | CL_UNSIGNED_INT16 |
5 | CL_UNORM_SHORT_555 | 13 | CL_UNSIGNED_INT32 |
6 | CL_UNORM_SHORT_565 | 14 | CL_HALF_FLOAT |
7 | CL_UNORM_INT_101010 | 15 | CL_FLOAT |
Before looking up table, value should be masked: (value&0xf).
Likewise, 9 dword holds channel order information. Following table describes values and OpenCL counterparts:
Value | OpenCL value | Value | OpenCL value |
---|---|---|---|
0 | CL_A | 10 | CL_ARGB |
1 | CL_R | 11 | CL_ABGR |
2 | CL_Rx | 12 | CL_sRGB |
3 | CL_RG | 13 | CL_sRGBx |
4 | CL_RGx | 14 | CL_sRGBA |
5 | CL_RA | 15 | CL_sBGRA |
6 | CL_RGB | 16 | CL_INTENSITY |
7 | CL_RGBx | 17 | CL_LUMINANCE |
8 | CL_RGBA | 18 | CL_DEPTH |
9 | CL_BGRA | 19 | CL_DEPTH_STENCIL |
Before looking up table, value should be masked: (value&0x1f).
Sampler arguments
A samplers are passed via pointers. A sampler pointers points to sampler resource.
Scratch buffer access
First four scalar registers holds scratch buffer descriptor. s[n+enabled_dims+tgsize] register holds wavefront offset to scratch buffer. where n is userdatanum, enabled_dims is number of enabled dimensions, tgsize is 1 if tgsize is enabled, otherwise is 0.