Changes between Version 10 and Version 11 of GcnTimings


Ignore:
Timestamp:
05/28/16 17:00:59 (8 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GcnTimings

    v10 v11  
    1010to execution due to bigger size in memory and limits of instruction dispatching.
    1111To achieve best performance, we recommend to use single dword instructions.</p>
     12<p>In some tables present DPFACTOR term. This term indicates that number of cycles depends
     13on the model of GPU as follows:</p>
     14<table>
     15<thead>
     16<tr>
     17<th>DPFACTOR</th>
     18<th>DP speed</th>
     19<th>GPU subfamily</th>
     20</tr>
     21</thead>
     22<tbody>
     23<tr>
     24<td>1</td>
     25<td>1/2</td>
     26<td>professional Hawaii</td>
     27</tr>
     28<tr>
     29<td>2</td>
     30<td>1/4</td>
     31<td>Highend Tahiti: Radeon HD7970</td>
     32</tr>
     33<tr>
     34<td>4</td>
     35<td>1/8</td>
     36<td>Highend Hawaii: R9 290</td>
     37</tr>
     38<tr>
     39<td>8</td>
     40<td>1/16</td>
     41<td>Other GPU's</td>
     42</tr>
     43</tbody>
     44</table>
    1245<h3>Occupancy table</h3>
    1346<table>
     
    143176<h3>SOPP Instruction timings</h3>
    144177<p>Jumps costs 4 (no jump) or 20 cycles (???) if jump will performed.</p>
     178<h3>VOP1 Instruction timings</h3>
     179<p>Timings of VOP1 instructions is in this table:</p>
     180<table>
     181<thead>
     182<tr>
     183<th>Instruction</th>
     184<th>Cycles</th>
     185<th>Instruction</th>
     186<th>Cycles</th>
     187</tr>
     188</thead>
     189<tbody>
     190<tr>
     191<td>V_BFREV_B32</td>
     192<td>4</td>
     193<td>V_FREXP_EXP_I32_F32</td>
     194<td>4</td>
     195</tr>
     196<tr>
     197<td>V_CEIL_F32</td>
     198<td>4</td>
     199<td>V_FREXP_EXP_I32_F64</td>
     200<td>DPFACTOR*4</td>
     201</tr>
     202<tr>
     203<td>V_CEIL_F64</td>
     204<td>DPFACTOR*4</td>
     205<td>V_FREXP_MANT_F32</td>
     206<td>4</td>
     207</tr>
     208<tr>
     209<td>V_CLREXCP</td>
     210<td>4</td>
     211<td>V_FREXP_MANT_F64</td>
     212<td>DPFACTOR*4</td>
     213</tr>
     214<tr>
     215<td>V_COS_F32</td>
     216<td>16</td>
     217<td>V_LOG_CLAMP_F32</td>
     218<td>16</td>
     219</tr>
     220<tr>
     221<td>V_CVT_F16_F32</td>
     222<td>4</td>
     223<td>V_LOG_F32</td>
     224<td>16</td>
     225</tr>
     226<tr>
     227<td>V_CVT_F32_F16</td>
     228<td>4</td>
     229<td>V_LOG_LEGACY_F32</td>
     230<td>16</td>
     231</tr>
     232<tr>
     233<td>V_CVT_F32_F64</td>
     234<td>DPFACTOR*4</td>
     235<td>V_MOVRELD_B32</td>
     236<td>4</td>
     237</tr>
     238<tr>
     239<td>V_CVT_F32_I32</td>
     240<td>4</td>
     241<td>V_MOVRELSD_B32</td>
     242<td>4</td>
     243</tr>
     244<tr>
     245<td>V_CVT_F32_U32</td>
     246<td>4</td>
     247<td>V_MOVRELS_B32</td>
     248<td>4</td>
     249</tr>
     250<tr>
     251<td>V_CVT_F32_UBYTE0</td>
     252<td>4</td>
     253<td>V_MOV_B32</td>
     254<td>4</td>
     255</tr>
     256<tr>
     257<td>V_CVT_F32_UBYTE1</td>
     258<td>4</td>
     259<td>V_MOV_FED_B32</td>
     260<td>4</td>
     261</tr>
     262<tr>
     263<td>V_CVT_F32_UBYTE2</td>
     264<td>4</td>
     265<td>V_NOP</td>
     266<td>4</td>
     267</tr>
     268<tr>
     269<td>V_CVT_F32_UBYTE3</td>
     270<td>4</td>
     271<td>V_NOT_B32</td>
     272<td>4</td>
     273</tr>
     274<tr>
     275<td>V_CVT_F64_F32</td>
     276<td>DPFACTOR*4</td>
     277<td>V_RCP_CLAMP_F32</td>
     278<td>16</td>
     279</tr>
     280<tr>
     281<td>V_CVT_F64_I32</td>
     282<td>DPFACTOR*4</td>
     283<td>V_RCP_CLAMP_F64</td>
     284<td>DPFACTOR*8</td>
     285</tr>
     286<tr>
     287<td>V_CVT_F64_U32</td>
     288<td>DPFACTOR*4</td>
     289<td>V_RCP_F32</td>
     290<td>16</td>
     291</tr>
     292<tr>
     293<td>V_CVT_FLR_I32_F32</td>
     294<td>4</td>
     295<td>V_RCP_F64</td>
     296<td>DPFACTOR*8</td>
     297</tr>
     298<tr>
     299<td>V_CVT_I32_F32</td>
     300<td>4</td>
     301<td>V_RCP_IFLAG_F32</td>
     302<td>16</td>
     303</tr>
     304<tr>
     305<td>V_CVT_I32_F64</td>
     306<td>DPFACTOR*4</td>
     307<td>V_RCP_LEGACY_F32</td>
     308<td>16</td>
     309</tr>
     310<tr>
     311<td>V_CVT_OFF_F32_I4</td>
     312<td>4</td>
     313<td>V_READFIRSTLANE_B32</td>
     314<td>4</td>
     315</tr>
     316<tr>
     317<td>V_CVT_RPI_I32_F32</td>
     318<td>4</td>
     319<td>V_RNDNE_F32</td>
     320<td>4</td>
     321</tr>
     322<tr>
     323<td>V_CVT_U32_F32</td>
     324<td>4</td>
     325<td>V_RNDNE_F64</td>
     326<td>DPFACTOR*4</td>
     327</tr>
     328<tr>
     329<td>V_CVT_U32_F64</td>
     330<td>DPFACTOR*4</td>
     331<td>V_RSQ_CLAMP_F32</td>
     332<td>16</td>
     333</tr>
     334<tr>
     335<td>V_EXP_F32</td>
     336<td>16</td>
     337<td>V_RSQ_CLAMP_F64</td>
     338<td>DPFACTOR*8</td>
     339</tr>
     340<tr>
     341<td>V_EXP_LEGACY_F32</td>
     342<td>16</td>
     343<td>V_RSQ_F32</td>
     344<td>16</td>
     345</tr>
     346<tr>
     347<td>V_FFBH_I32</td>
     348<td>4</td>
     349<td>V_RSQ_F64</td>
     350<td>DPFACTOR*8</td>
     351</tr>
     352<tr>
     353<td>V_FFBH_U32</td>
     354<td>4</td>
     355<td>V_RSQ_LEGACY_F32</td>
     356<td>16</td>
     357</tr>
     358<tr>
     359<td>V_FFBL_B32</td>
     360<td>4</td>
     361<td>V_SIN_F32</td>
     362<td>16</td>
     363</tr>
     364<tr>
     365<td>V_FLOOR_F32</td>
     366<td>4</td>
     367<td>V_SQRT_F32</td>
     368<td>16</td>
     369</tr>
     370<tr>
     371<td>V_FLOOR_F64</td>
     372<td>DPFACTOR*4</td>
     373<td>V_SQRT_F64</td>
     374<td>DPFACTOR*8</td>
     375</tr>
     376<tr>
     377<td>V_FRACT_F32</td>
     378<td>4</td>
     379<td>V_TRUNC_F32</td>
     380<td>4</td>
     381</tr>
     382<tr>
     383<td>V_FRACT_F64</td>
     384<td>DPFACTOR*4</td>
     385<td>V_TRUNC_F64</td>
     386<td>DPFACTOR*4</td>
     387</tr>
     388</tbody>
     389</table>
     390<h3>VOP2 Instruction timings</h3>
     391<p>All VOP2 instructions takes 4 cycles.</p>
    145392}}}