Changeset 3568 in CLRX
 Timestamp:
 Dec 29, 2017, 7:17:27 AM (3 years ago)
 Location:
 CLRadeonExtender/trunk/doc
 Files:

 5 edited
Legend:
 Unmodified
 Added
 Removed

CLRadeonExtender/trunk/doc/AmdAbi.md
r2331 r3568 7 7 ### User data classes 8 8 9 User data is stored in first scalar registers. Data class indicates what data are stored.9 User data is stored in first scalar registers. Data class indicates what a data are stored. 10 10 Following data classes: 11 11 … … 63 63 Second const buffer (id=1) holds arguments aligned to 4 dwords. 64 64 65 Global pointers holds vector offset (64bit for 64bit binary) to memory.65 Global pointers holds vector offset (64bit for 64bit binary) to the memory. 66 66 Local pointers holds its offset in bytes (1 dword). 67 67 
CLRadeonExtender/trunk/doc/AmdCl2Abi.md
r3552 r3568 3 3 This chapter describes how kernel gets its argument, how access to constant data. Because 4 4 Kernel setup is AMD HSA configuration, hence we recommend to refer to ROCmABI documentation 5 to get information about kernel setup and kernel arguments passing. Now a ssembler have5 to get information about kernel setup and kernel arguments passing. Now an assembler have 6 6 all the AMD HSA configuration's pseudoops to do it. 7 7 … … 19 19 * usegeneric  enable generic pointers support 20 20 21 Number of user data registers depends on set of an enabled features. Following rules will21 The number of user data registers depends on set of an enabled features. Following rules will 22 22 be applied: 23 23 … … 55 55 * void* aqlwrap_pointer  32bit or 64bit 56 56 57 Further arguments in that buffer are an user arguments defined for kernel. Any pointer,57 Further arguments in that buffer are an user arguments defined for a kernel. Any pointer, 58 58 command queue, image, sampler, structure tooks 8 bytes (64bit pointer) or 59 59 4 bytes (32bit pointer) in 32bit AMD OpenCL 2.0. 60 60 3 component vector tooks number of bytes of 4 element vector. 61 Smaller types likes (char, short) tooks 13 bytes. A lignment depends on same type61 Smaller types likes (char, short) tooks 13 bytes. An alignment depends on same type 62 62 or type of element (for vectors). 63 63 
CLRadeonExtender/trunk/doc/GalliumAbi.md
r3263 r3568 14 14 * 68  local size for each dimension 15 15 16 A rgument griddim holds number of dimensions. Argument gridoffset holds 3 values of the16 An argument griddim holds number of dimensions. Argument gridoffset holds 3 values of the 17 17 global offset. 18 18 … … 24 24 ### Other data and resources 25 25 26 Section '.rodata' ('.globaldata') hold constant data for kernels.27 Constant data is placed after code of kernels. Use PC pointer to get this data.26 The section '.rodata' ('.globaldata') hold constant data for kernels. 27 The constant data is placed after code of kernels. Use PC pointer to get this data. 28 28 29 29 ## Gallium ABI description AMDHSA … … 38 38 * 13  global offsets for each dimensions 39 39 40 Local sizes and other kernel setup is in memory which address is stored in s[4:5].40 Local sizes and other kernel setup is in the memory which address is stored in s[4:5]. 41 41 List of data (number is dword offset after kernel argument): 42 42 
CLRadeonExtender/trunk/doc/GcnInstrsVop1.md
r3501 r3568 269 269 Opcode VOP3A: 389 (0x185) for GCN 1.2 270 270 Syntax: V_CEIL_F16 VDST, SRC0 271 Description: Truncate half floating point valu from SRC0 with rounding to positive infinity271 Description: Truncate half floating point value from SRC0 with rounding to positive infinity 272 272 (ceilling), and store result to VDST. Implemented by flooring. 273 273 If SRC0 is infinity or NaN then copy SRC0 to VDST. … … 285 285 Opcode VOP3A: 418 (0x1a2) for GCN 1.0/1.1; 349 (0x15d) for GCN 1.2 286 286 Syntax: V_CEIL_F32 VDST, SRC0 287 Description: Truncate floating point valu from SRC0 with rounding to positive infinity287 Description: Truncate floating point value from SRC0 with rounding to positive infinity 288 288 (ceilling), and store result to VDST. Implemented by flooring. 289 289 If SRC0 is infinity or NaN then copy SRC0 to VDST. … … 301 301 Opcode VOP3A: 408 (0x198) for GCN 1.1; 344 (0x158) for GCN 1.2 302 302 Syntax: V_CEIL_F64 VDST(2), SRC0(2) 303 Description: Truncate double floating point valu from SRC0 with rounding to303 Description: Truncate double floating point value from SRC0 with rounding to 304 304 positive infinity (ceilling), and store result to VDST. Implemented by flooring. 305 305 If SRC0 is infinity or NaN then copy SRC0 to VDST. … … 969 969 Opcode VOP3A: 422 (0x1a6) for GCN 1.0/1.1 970 970 Syntax: V_LOG_CLAMP_F32 VDST, SRC0 971 Description: Approximate logarithm of base 2 from floating point value SRC0 with971 Description: Approximate logarithm of the base 2 from floating point value SRC0 with 972 972 clamping infinities to MAX_FLOAT. Result is stored in VDST. 973 973 If SRC0 is negative then store NaN to VDST. This instruction doesn't handle denormalized … … 993 993 Opcode VOP3A: 384 (0x180) for GCN 1.2 994 994 Syntax: V_LOG_F16 VDST, SRC0 995 Description: Approximate logarithm of base 2 from half floating point value SRC0, and store996 result to VDST. If SRC0 is negative then store NaN to VDST.995 Description: Approximate logarithm of the base 2 from half floating point value SRC0, 996 and store result to VDST. If SRC0 is negative then store NaN to VDST. 997 997 Operation: 998 998 ``` … … 1011 1011 Opcode VOP3A: 423 (0x1a7) for GCN 1.0/1.1; 353 (0x161) for GCN 1.2 1012 1012 Syntax: V_LOG_F32 VDST, SRC0 1013 Description: Approximate logarithm of base 2 from floating point value SRC0, and store1013 Description: Approximate logarithm of base the 2 from floating point value SRC0, and store 1014 1014 result to VDST. If SRC0 is negative then store NaN to VDST. 1015 1015 This instruction doesn't handle denormalized values regardless FLOAT MODE register setup. … … 1030 1030 Opcode VOP3A: 453 (0x1c5) for GCN 1.1; 396 (0x18c) for GCN 1.2 1031 1031 Syntax: V_LOG_LEGACY_F32 VDST, SRC0 1032 Description: Approximate logarithm of base 2 from floating point value SRC0, and store1032 Description: Approximate logarithm of the base 2 from floating point value SRC0, and store 1033 1033 result to VDST. If SRC0 is negative then store NaN to VDST. 1034 1034 This instruction doesn't handle denormalized values regardless FLOAT MODE register setup. 
CLRadeonExtender/trunk/doc/GcnOperands.md
r3469 r3568 2 2 3 3 The GCN1.0/1.1 delivers maximum 104 registers (with VCC). Basic list of destination 4 scalar operands have 128 entries. Source operands codes is in range 0255.4 scalar operands have 128 entries. The source operands codes is in range 0255. 5 5 6 6 **Important**: Two SGPR's must be aligned to 2. Four or more SGPR's must be aligned to 4. 7 This rule do not apply to vector instruction where is more complex rule:8 SGPR's can be unaligned only if SGPR register range do not cross line (4 SGPR registers).7 This rule do not apply to the vector instruction where is more complex rule: 8 SGPR's can be unaligned only if SGPR register range do not cross a line (4 SGPR registers). 9 9 10 10 Following list describes all operand codes values: … … 70 70 ### Operand syntax 71 71 72 Single operands can be given by their name: `s0`, `v54`. CLRX assemblers accepts syntax with 72 THe Single operands can be given by their name: `s0`, `v54`. 73 CLRX assembler accepts the syntax with 73 74 brackets: `s[0]`, `s[z]`, `v[66]`. In many instructions operands are 74 75 64bit, 96bit or even 128bit. These operands consists several registers that can be … … 76 77 last register's number. 77 78 78 Names of the registers are caseinsensitive.79 The names of the registers are caseinsensitive. 79 80 80 Constant values are automatically resolved ifexpression have already value.81 The constant values are automatically resolved if an expression have already value. 81 82 The 1/(2*PI), 1.0, 2.0 and other floating point constant values will be 82 83 resolved if that accurate floating point value will be given.
Note: See TracChangeset
for help on using the changeset viewer.