source: CLRX/CLRadeonExtender/trunk/doc/GcnOperands.md @ 3568

Last change on this file since 3568 was 3568, checked in by matszpk, 12 months ago

CLRadeonExtender: CLRXDocs: Small updates and fixes (typo/grammar).

File size: 8.6 KB
Line 
1### Operand encoding
2
3The GCN1.0/1.1 delivers maximum 104 registers (with VCC). Basic list of destination
4scalar operands have 128 entries. The source operands codes is in range 0-255.
5
6**Important**: Two SGPR's must be aligned to 2. Four or more SGPR's must be aligned to 4.
7This rule do not apply to the vector instruction where is more complex rule:
8SGPR's can be unaligned only if SGPR register range do not cross a line (4 SGPR registers).
9
10Following list describes all operand codes values:
11
12Code     | Name              | Description
13---------|-------------------|------------------------
140-103    | S0 - S103         | SGPR's (GCN1.0/1.1)
150-101    | S0 - S101         | SGPR's (GCN1.2)
16104-105  | FLAT_SCRATCH      | FLAT_SCRATCH register (GCN1.1)
17104      | FLAT_SCRATCH_LO   | Low half of FLAT_SCRATCH register (GCN1.1)
18105      | FLAT_SCRATCH_HI   | High half of FLAT_SCRATCH register (GCN1.1)
19102-103  | FLAT_SCRATCH      | FLAT_SCRATCH register (GCN1.2)
20102      | FLAT_SCRATCH_LO   | Low half of FLAT_SCRATCH register (GCN1.2)
21103      | FLAT_SCRATCH_HI   | High half of FLAT_SCRATCH register (GCN1.2)
22104-105  | XNACK_MASK        | XNACK_MASK register
23104      | XNACK_MASK_LO     | Low half of XNACK_MASK register
24105      | XNACK_MASK_HI     | High half of XNACK_MASK register
25106-107  | VCC               | VCC (vector carry register) two last SGPR's
26106      | VCC_LO            | Low half of VCC
27107      | VCC_HI            | High half of VCC
28108-109  | TBA               | Trap handler base address
29108      | TBA_LO            | Low half of TBA register
30109      | TBA_HI            | High half of TBA register
31110-111  | TMA               | Pointer to data in memory used by trap handler
32110      | TMA_LO            | Low half of TMA register
33111      | TMA_HI            | High half of TMA register
34112-123  | TTMP0 - TTMP11    | Trap handler temporary registers (GCN 1.0/1.1/1.2)
35108-123  | TTMP0 - TTMP15    | Trap handler temporary registers (GCN 1.4)
36124      | M0                | M0. Memory register
37125      | -                 | reserved
38126-127  | EXEC              | EXEC register
39126      | EXEC_LO           | Low half of EXEC register
40127      | EXEC_HI           | High half of EXEC register
41128      | 0                 | 0
42129-192  | 1-64              | 1 to 64 constant value
43193-208  | -1 - -16          | -1 to -16 constant value
44209-239  | -                 | reserved
45235      | SRC_SHARED_BASE   | Memory aperture
46236      | SRC_SHARED_LIMIT  | Memory aperture
47237      | SRC_PRIVATE_BASE  | Memory aperture
48238      | SRC_PRIVATE_LIMIT | Memory aperture
49239      | POPS_EXITING_WAVE_ID | Primitive Ordered Pixel Shading wave ID
50240      | 0.5               | 0.5 floating point value
51241      | -0.5              | -0.5 floating point value
52242      | 1.0               | 1.0 floating point value
53243      | -1.0              | -1.0 floating point value
54244      | 2.0               | 2.0 floating point value
55245      | -2.0              | -2.0 floating point value
56246      | 4.0               | 4.0 floating point value
57247      | -4.0              | -4.0 floating point value
58248      | 1/(2*PI)          | 1/(2*PI)
59249      | --                | SDWA dword (GCN1.2)
60250      | --                | DPP dword (GCN1.2)
61251      | VCCZ              | VCCZ register
62252      | EXECZ             | EXECZ register
63253      | SCC               | SCC register
64254      | LDS_DIRECT        | LDS direct access
65254      | LDS               | LDS direct access
66254      | SRC_LDS_DIRECT    | LDS direct access
67255      | 255               | Literal constant (follows instruction dword)
68256-511  | V0-V255           | VGPR's (only VOP3 encoding operands)
69
70### Operand syntax
71
72THe Single operands can be given by their name: `s0`, `v54`.
73CLRX assembler accepts the syntax with
74brackets: `s[0]`, `s[z]`, `v[66]`. In many instructions operands are
7564-bit, 96-bit or even 128-bit. These operands consists several registers that can be
76expressed by ranges: `v[3:4]`, `s[8:11]`, `s[16:23]`, where second value is
77last register's number.
78
79The names of the registers are case-insensitive.
80
81The constant values are automatically resolved if an expression have already value.
82The 1/(2*PI), 1.0, -2.0 and other floating point constant values will be
83resolved if that accurate floating point value will be given.
84
85In instruction syntax, operands are listed by name of the encoding field. Optionally, in
86parentheses is given number of the registers. The ranges of number of a registers are in
87form 'START:LAST'. Example:
88
89Syntax: S_SUB_I32 SDST, SSRC0, SSRC1 
90Syntax: S_AND_B64 SDST(2), SSRC0(2), SSRC1(2) 
91Syntax: S_AND_B64 SDST(2), SSRC0(2), SSRC1(2:4) 
92
93### Constants and literals
94
95There are two ways to supply immediate value to GCN instruction: first is builtin constants
96(both  integer and floating points) and second is 32-bit immediate. Some type encoding
97allow to supply immediate with various size (16-bit or 12-bit).
98
99The literals are differently treated for scalar/vector instructions and for
100double floating point operands in vector instructions.
101In scalar or vector instructions if operand is 64-bit, the literal value is exact value
10264-bit value (sign or zero extended). By contrast, in FP64 operands in vector instructions,
103for 64-bit operand, the literal is higher 32-bits of value (lower 32-bit are zero). Unhapilly, the CLRX assembler always encodes and decodes literal immediate as 32-bit
104value (except floating values).
105The immediate constants are always exact value, either for 32-bit and 64-bit operands.
106For example, instructions `v_frexp_exp_i32_f64 v3, lit(45)` and
107`v_frexp_exp_i32_f64 v3, 45` generates different results, because literal and constant
108will be have different meaning.
109
110**NOTE:** These same literals and constants gives different values for 64-bit operand in
111vector instructions. To distinguish values, please use `lit()` function.
112
113**OLD_VERSIONS**: This version of CLRadeonExtender adds '--buggyFPLit' option to support
114sources for older versions (to 0.1.2). Versions to 0.1.2 incorrectly handles floating
115point literals and constants due to wrong assumptions. This and later versions fix
116that behaviour.
117
118Old and buggy behaviour:
119
120* support only half and single floating point literals (and constants)
121* shorten literals to constant only for single floating point literals
122
123New behaviour:
124
125* support half, single and double (only higher 32-bits) floating point literals
126(and constants)
127* shorten literals to constant for half, single and double literals (type depends
128from operand type)
129
130### Hardware registers
131
132These register could be read or written by S_GETREG_\* and S_SETREG_\* instruction.
133
134List of hardware registers:
135
136* GPR_ALLOC, HWREG_GPR_ALLOC -
137* HW_ID, HWREG_HW_ID -
138* IB_DBG0, HWREG_DBG0 -
139* IB_STS, HWREG_IB_STS -
140* INST_DW0, HWREG_INST_DW0 -
141* INST_DW1, HWREG_INST_DW1 -
142* LDS_ALLOC, HWREG_LDS_ALLOC -
143* MODE, HWREG_MODE -
144* PC_HI, HWREG_PC_HI -
145* PC_LO, HWREG_PC_LO -
146* STATUS, HWREG_STATUS -
147* TRAPSTS, HWREG_TRAPSTS -
148
149### LDS direct access
150
151The LDS direct access allow to access LDS memory from VOP instruction directly by supplying
152LDS, LDS_DIRECT or SRC_LDS_DIRECT keyword on the first source operand. Then data from
153LDS will be used on place that operand.
154
155The M0 must hold the offset in bytes (in 0-15 bits) and format of the data (in bits 16-18).
156Table of formats:
157
158 Value | Format
159-------|----------------
1600      | Unsigned byte
1611      | Unsigned 16-bit word
1622      | Unsigned 32-bit word
1633      | unused (same as 2)
1644      | Signed byte
1655      | Signed 16-bit word
166
167A LDS direct access doesn't require `S_WAITCNT LGKMCNT(0)` (??? check).
168
169### Parametrizable modifiers
170
171Many an instruction's modifiers can have parameter that have value 0 or 1. This feature
172allow to easily parametrize modifiers. The non-zero (to 0.1.5 version 1 value)
173value enables modifier, zero disables it. `tfe:0` disable TFE modifier, `tfe:1` enables it.
174The value of parameter is an expression.
175The `omod` modifier with parameter (expression) replaces `mul` and `div` modifiers.
176The `format` in MTBUF encoding is also parametrizable if data and/or
177number format expression will be preceded by `@` character (example: `format[@1,@4]`).
178Special case is `bound_ctrl`. To parametrize bound_ctrl you must use syntax:
179`bound_ctrl:0:expr` or `bound_ctrl:1:expr`.
180The `abs`, `neg` and `sext` modifiers with parameter (expression) allow to set what
181source operand will have operand modifier. Number of bit of value refer to number of source operand. The `abs`, `neg` and `sext` modifiers accepts binary array of expressions like
182`[bit0,bit1,...]`.
183
184The HW registers and send message parameters (message and GSOP) is parametrizable if
185they will be preceded by `@` (example: `hwreg(@5, 8, 16)`).
Note: See TracBrowser for help on using the repository browser.