8 | | supports only these binaries. |
9 | | |
10 | | ## Binary format |
11 | | |
12 | | The binary format contains: kernel informations and the main binary in the ELF format. |
13 | | Main `.text` section contains all code for all kernels. Optionally, |
14 | | section `.rodata` contains constant global data for all kernels. |
15 | | Main binary have the kernel configuration (ProgInfo) in the `.AMDGPU.config` section. |
| 7 | supports only these binaries.</p> |
| 8 | <h2>Binary format</h2> |
| 9 | <p>The binary format contains: kernel informations and the main binary in the ELF format. |
| 10 | Main <code>.text</code> section contains all code for all kernels. Optionally, |
| 11 | section <code>.rodata</code> contains constant global data for all kernels. |
| 12 | Main binary have the kernel configuration (ProgInfo) in the <code>.AMDGPU.config</code> section. |
17 | | floating point setup, register usage, local data usage and rest. |
18 | | |
19 | | The assembler source code divided to three parts: |
20 | | |
21 | | * kernel configuration |
22 | | * kernel constant data (in `.rodata` section) |
23 | | * kernel code (in `.text` section`) |
24 | | |
25 | | Order of these parts doesn't matter. |
26 | | |
27 | | Kernel function should to be aligned to 256 byte boundary. |
28 | | |
29 | | ## List of the specific pseudo-operations |
30 | | |
31 | | ### .arg |
32 | | |
33 | | Syntax: .arg ARGTYPE, SIZE[, TARGETSIZE[, ALIGNMENT[, NUMEXT[, SEMANTIC]]]] |
34 | | |
35 | | Adds kernel argument definition. Must be inside argument configuration. |
36 | | First argument is type: |
37 | | |
38 | | * scalar - scalar value |
39 | | * contant - constant pointer (32-bit ???) |
40 | | * global - global pointer (64-bit) |
41 | | * local - local pointer |
42 | | * image2d_rdonly - ?? |
43 | | * image2d_wronly - ?? |
44 | | * image3d_rdonly - ?? |
45 | | * image3d_wronly - ?? |
46 | | * sampler - ?? |
47 | | * griddim - shortcut for griddim argument definition |
48 | | * gridoffset - shortcut for gridoffset argument definition |
49 | | |
50 | | Second argument is size of argument. Third argument is targetSize which |
| 14 | floating point setup, register usage, local data usage and rest.</p> |
| 15 | <p>The assembler source code divided to three parts:</p> |
| 16 | <ul> |
| 17 | <li>kernel configuration</li> |
| 18 | <li>kernel constant data (in <code>.rodata</code> section)</li> |
| 19 | <li>kernel code (in <code>.text</code> section`)</li> |
| 20 | </ul> |
| 21 | <p>Order of these parts doesn't matter.</p> |
| 22 | <p>Kernel function should to be aligned to 256 byte boundary.</p> |
| 23 | <h2>List of the specific pseudo-operations</h2> |
| 24 | <h3>.arg</h3> |
| 25 | <p>Syntax: .arg ARGTYPE, SIZE[, TARGETSIZE[, ALIGNMENT[, NUMEXT[, SEMANTIC]]]]</p> |
| 26 | <p>Adds kernel argument definition. Must be inside argument configuration. |
| 27 | First argument is type:</p> |
| 28 | <ul> |
| 29 | <li>scalar - scalar value</li> |
| 30 | <li>contant - constant pointer (32-bit ???)</li> |
| 31 | <li>global - global pointer (64-bit)</li> |
| 32 | <li>local - local pointer</li> |
| 33 | <li>image2d_rdonly - ??</li> |
| 34 | <li>image2d_wronly - ??</li> |
| 35 | <li>image3d_rdonly - ??</li> |
| 36 | <li>image3d_wronly - ??</li> |
| 37 | <li>sampler - ??</li> |
| 38 | <li>griddim - shortcut for griddim argument definition</li> |
| 39 | <li>gridoffset - shortcut for gridoffset argument definition</li> |
| 40 | </ul> |
| 41 | <p>Second argument is size of argument. Third argument is targetSize which |
53 | | `sext` - signed, `zext` - zero extend. If argument is smaller than 4 byte, |
54 | | then `sext` can be to define signed integer, `zext` to unsigned integer. |
55 | | Sixth argument is semantic: |
56 | | |
57 | | * general - general argument |
58 | | * griddim - griddim argument |
59 | | * gridoffset - gridoffset argument |
60 | | * imgsize - image size |
61 | | * imgformat - image format |
62 | | |
63 | | Example argument definition: |
64 | | |
65 | | ``` |
66 | | .arg scalar, 4, 4, 4, zext, general |
| 44 | <code>sext</code> - signed, <code>zext</code> - zero extend. If argument is smaller than 4 byte, |
| 45 | then <code>sext</code> can be to define signed integer, <code>zext</code> to unsigned integer. |
| 46 | Sixth argument is semantic:</p> |
| 47 | <ul> |
| 48 | <li>general - general argument</li> |
| 49 | <li>griddim - griddim argument</li> |
| 50 | <li>gridoffset - gridoffset argument</li> |
| 51 | <li>imgsize - image size</li> |
| 52 | <li>imgformat - image format</li> |
| 53 | </ul> |
| 54 | <p>Example argument definition:</p> |
| 55 | <p><code>.arg scalar, 4, 4, 4, zext, general |
69 | | .arg scalar, 4, 4, 4, zext, gridoffset # shortcut .arg gridoffset |
70 | | ``` |
71 | | |
72 | | Last two arguments (griddim, gridoffset) shall to be defined in any kernel definition. |
73 | | |
74 | | ### .args |
75 | | |
76 | | Open kernel argument configuration. Must be inside kernel. |
77 | | |
78 | | ### .config |
79 | | |
80 | | Open kernel configuration. Must be inside kernel. Kernel configuration can not be |
81 | | defined if proginfo configuration was defined (by using `.proginfo`). |
82 | | Following pseudo-ops can be inside kernel config: |
83 | | |
84 | | * .dims DIMS - choose dimensions used by kernel function. Can be: x,y,z. |
85 | | * .floatmode VALUE - choose float mode for kernel (byte value). |
86 | | Default value is 0xc0 |
87 | | * .ieeemode - choose IEEE mode for kernel |
88 | | * .localsize SIZE - initial local data size for kernel in bytes |
89 | | * .pgmrsrc2 VALUE - value of the PGMRSRC2 (only bits that is not set by other pseudo-ops) |
90 | | * .priority VALUE - set priority for kernel (0-3). Default value is 0. |
91 | | * .scratchbuffer SIZE - size of scratchbuffer (???). Default value is 0. |
92 | | * .sgprsnum NUMBER - number of SGPR registers used by kernel (excluding VCC,FLAT_SCRATCH). |
93 | | By default, automatically computed by assembler. |
94 | | * .vgprsnum NUMBER - number of VGPR registers used by kernel. |
95 | | By default, automatically computed by assembler. |
96 | | * .userdatanum NUMBER - number of USERDATA used by kernel (0-16). Default value is 4. |
97 | | * .tgsize - enables using of TG_SIZE_EN (we recommend to add this always) |
98 | | |
99 | | Example configuration: |
100 | | |
101 | | ``` |
102 | | .config |
| 58 | .arg scalar, 4, 4, 4, zext, gridoffset # shortcut .arg gridoffset</code></p> |
| 59 | <p>Last two arguments (griddim, gridoffset) shall to be defined in any kernel definition.</p> |
| 60 | <h3>.args</h3> |
| 61 | <p>Open kernel argument configuration. Must be inside kernel.</p> |
| 62 | <h3>.config</h3> |
| 63 | <p>Open kernel configuration. Must be inside kernel. Kernel configuration can not be |
| 64 | defined if proginfo configuration was defined (by using <code>.proginfo</code>). |
| 65 | Following pseudo-ops can be inside kernel config:</p> |
| 66 | <ul> |
| 67 | <li>.dims DIMS - choose dimensions used by kernel function. Can be: x,y,z.</li> |
| 68 | <li>.floatmode VALUE - choose float mode for kernel (byte value). |
| 69 | Default value is 0xc0</li> |
| 70 | <li>.ieeemode - choose IEEE mode for kernel</li> |
| 71 | <li>.localsize SIZE - initial local data size for kernel in bytes</li> |
| 72 | <li>.pgmrsrc2 VALUE - value of the PGMRSRC2 (only bits that is not set by other pseudo-ops)</li> |
| 73 | <li>.priority VALUE - set priority for kernel (0-3). Default value is 0.</li> |
| 74 | <li>.scratchbuffer SIZE - size of scratchbuffer (???). Default value is 0.</li> |
| 75 | <li>.sgprsnum NUMBER - number of SGPR registers used by kernel (excluding VCC,FLAT_SCRATCH). |
| 76 | By default, automatically computed by assembler.</li> |
| 77 | <li>.vgprsnum NUMBER - number of VGPR registers used by kernel. |
| 78 | By default, automatically computed by assembler.</li> |
| 79 | <li>.userdatanum NUMBER - number of USERDATA used by kernel (0-16). Default value is 4.</li> |
| 80 | <li>.tgsize - enables using of TG_SIZE_EN (we recommend to add this always)</li> |
| 81 | </ul> |
| 82 | <p>Example configuration:</p> |
| 83 | <p><code>.config |
104 | | .tgsize |
105 | | ``` |
106 | | |
107 | | ### .dims |
108 | | |
109 | | Syntax: .dims DIMENSIONS |
110 | | |
111 | | This pseudo-op must be inside kernel configuration (`.config`). Defines what dimensions |
112 | | (from list: x, y, z) will be used to determine space of the kernel execution. |
113 | | |
114 | | ### .entry |
115 | | |
116 | | Syntax: .entry ADDRESS, VALUE |
117 | | |
118 | | Add entry of proginfo. Must be inside proginfo configuration. Sample proginfo: |
119 | | |
120 | | ``` |
121 | | .entry 0x0000b848, 0x000c0080 |
| 85 | .tgsize</code></p> |
| 86 | <h3>.dims</h3> |
| 87 | <p>Syntax: .dims DIMENSIONS</p> |
| 88 | <p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines what dimensions |
| 89 | (from list: x, y, z) will be used to determine space of the kernel execution.</p> |
| 90 | <h3>.entry</h3> |
| 91 | <p>Syntax: .entry ADDRESS, VALUE</p> |
| 92 | <p>Add entry of proginfo. Must be inside proginfo configuration. Sample proginfo:</p> |
| 93 | <p><code>.entry 0x0000b848, 0x000c0080 |
123 | | .entry 0x0000b860, 0x00000000 |
124 | | ``` |
125 | | |
126 | | ### .floatmode |
127 | | |
128 | | Syntax: .floatmode BYTE-VALUE |
129 | | |
130 | | This pseudo-op must be inside kernel configuration (`.config`). Defines float-mode. |
131 | | |
132 | | ### .globaldata |
133 | | |
134 | | Go to constant global data section (`.rodata`). |
135 | | |
136 | | ### .ieeemode |
137 | | |
138 | | Syntax: .ieeemode |
139 | | |
140 | | This pseudo-op must be inside kernel configuration (`.config`). Set ieee-mode. |
141 | | |
142 | | ### .kcode |
143 | | |
144 | | Syntax: .kcode KERNEL1,.... |
145 | | Syntax: .kcode + |
146 | | |
147 | | Open code that will be belonging to specified kernels. By default any code between |
| 95 | .entry 0x0000b860, 0x00000000</code></p> |
| 96 | <h3>.floatmode</h3> |
| 97 | <p>Syntax: .floatmode BYTE-VALUE</p> |
| 98 | <p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines float-mode.</p> |
| 99 | <h3>.globaldata</h3> |
| 100 | <p>Go to constant global data section (<code>.rodata</code>).</p> |
| 101 | <h3>.ieeemode</h3> |
| 102 | <p>Syntax: .ieeemode</p> |
| 103 | <p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set ieee-mode.</p> |
| 104 | <h3>.kcode</h3> |
| 105 | <p>Syntax: .kcode KERNEL1,....<br /> |
| 106 | Syntax: .kcode +</p> |
| 107 | <p>Open code that will be belonging to specified kernels. By default any code between |
194 | | was defined (by using `.config`). |
195 | | |
196 | | ### .scratchbuffer |
197 | | |
198 | | Syntax: .scratchbuffer SIZE |
199 | | |
200 | | This pseudo-op must be inside kernel configuration (`.config`). Defines scratchbuffer size. |
201 | | |
202 | | ### .sgprsnum |
203 | | |
204 | | Syntax: .sgprsnum REGNUM |
205 | | |
206 | | This pseudo-op must be inside kernel configuration (`.config`). Set number of scalar |
207 | | registers which can be used during kernel execution. |
208 | | |
209 | | ### .tgsize |
210 | | |
211 | | This pseudo-op must be inside kernel configuration (`.config`). |
212 | | Enable usage of the TG_SIZE_EN. Should be set. |
213 | | |
214 | | ### .userdatanum |
215 | | |
216 | | Syntax: .userdatanum NUMBER |
217 | | |
218 | | This pseudo-op must be inside kernel configuration (`.config`). Set number of |
219 | | registers for USERDATA. |
220 | | |
221 | | ### .vgprsnum |
222 | | |
223 | | Syntax: .vgprsnum REGNUM |
224 | | |
225 | | This pseudo-op must be inside kernel configuration (`.config`). Set number of vector |
226 | | registers which can be used during kernel execution. |
227 | | |
228 | | ## Sample code |
229 | | |
230 | | This is sample example of the kernel setup: |
231 | | |
232 | | ``` |
233 | | .kernel DCT |
| 137 | was defined (by using <code>.config</code>).</p> |
| 138 | <h3>.scratchbuffer</h3> |
| 139 | <p>Syntax: .scratchbuffer SIZE</p> |
| 140 | <p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Defines scratchbuffer size.</p> |
| 141 | <h3>.sgprsnum</h3> |
| 142 | <p>Syntax: .sgprsnum REGNUM</p> |
| 143 | <p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set number of scalar |
| 144 | registers which can be used during kernel execution.</p> |
| 145 | <h3>.tgsize</h3> |
| 146 | <p>This pseudo-op must be inside kernel configuration (<code>.config</code>). |
| 147 | Enable usage of the TG_SIZE_EN. Should be set.</p> |
| 148 | <h3>.userdatanum</h3> |
| 149 | <p>Syntax: .userdatanum NUMBER</p> |
| 150 | <p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set number of |
| 151 | registers for USERDATA.</p> |
| 152 | <h3>.vgprsnum</h3> |
| 153 | <p>Syntax: .vgprsnum REGNUM</p> |
| 154 | <p>This pseudo-op must be inside kernel configuration (<code>.config</code>). Set number of vector |
| 155 | registers which can be used during kernel execution.</p> |
| 156 | <h2>Sample code</h2> |
| 157 | <p>This is sample example of the kernel setup:</p> |
| 158 | <p><code>.kernel DCT |