| 25 | <li>best place to jump is 5 first dwords in 32-byte block. Jump to rest of dwords causes |
| 26 | 1-3 penalties, depending on number of dword (N-4, where N is number of dword). This rule |
| 27 | does not apply to backward jumps (???)</li> |
| 28 | <li>any conditional jump instruction should be in first half of 32-byte block, otherwise |
| 29 | 1-4 penalties will be added if jump was not taken, depending on number of dword |
| 30 | (N-3, where N is number of dword).</li> |
27 | | <p>Between any vector operation that operates on VCC and any scalar ALU instruction is |
28 | | 16-cycle delay.</p> |
| 33 | <ul> |
| 34 | <li>between any integer V_ADD<em>, V_SUB</em>, V_FIRSTREADLINE_B32, V_READLANE_B32 operation |
| 35 | and any scalar ALU instruction is 16-cycle delay.</li> |
| 36 | <li>any conditional jump directly that checks VCCZ or EXECZ after instruction that changes |
| 37 | VCC or EXEC adds single penalty (4 cycles)</li> |
| 38 | <li>any conditional jump directly that checks SCC after instruction that changes SCC, |
| 39 | EXEC, VCC adds single penalty (4 cycles)</li> |
| 40 | </ul> |