| 1301 | <h4>V_MQSAD_U8, V_MQSAD_PK_U16_U8</h4> |
| 1302 | <p>Opcode: 371 (0x173) for GCN 1.0/1.1; 486 (0x1e6) for GCN 1.2<br /> |
| 1303 | Syntax (GCN 1.0): V_QSAD_U8 VDST(2), SRC0(2), SRC1, SRC2(2)<br /> |
| 1304 | Syntax (GCN 1.1/1.2): V_QSAD_PK_U16_U8 VDST(2), SRC0(2), SRC1, SRC2(2)<br /> |
| 1305 | Description: Compute four masked sum of absolute differences with accumulation. |
| 1306 | Any that operation get first argument from four bytes begins from N and ends to N+3 |
| 1307 | (where N is number of operation), second argument is SRC1, and third argument is |
| 1308 | N'th 16-bit dword from SRC2.<br /> |
| 1309 | Operation:<br /> |
| 1310 | <code>void MSADU8(UINT32 S0, UINT32 S1, UINT32 S2) |
| 1311 | { |
| 1312 | UINT32 OUT = S2; |
| 1313 | for (UINT8 i = 0; i < 4; i++) |
| 1314 | if ((S1 >> (i*8)) & 0xff) != 0) |
| 1315 | OUT += ABS(((S0 >> (i*8)) & 0xff) - ((S1 >> (i*8)) & 0xff)) |
| 1316 | return OUT; |
| 1317 | } |
| 1318 | VDST = (MSADU8((UINT32)SRC0, SRC1, SRC2 & 0xffff) |
| 1319 | VDST |= (MSADU8((UINT32)(SRC0>>8), SRC1, (SRC2>>16) & 0xffff)<<16 |
| 1320 | VDST |= (MSADU8((UINT32)(SRC0>>16), SRC1, (SRC2>>32) & 0xffff)<<32 |
| 1321 | VDST |= (MSADU8((UINT32)(SRC0>>24), SRC1, (SRC2>>48) & 0xffff)<<48</code></p> |
| 1322 | <h4>V_MSAD_U8</h4> |
| 1323 | <p>Opcode: 369 (0x171) for GCN 1.0/1.1; 484 (0x1e4) for GCN 1.2<br /> |
| 1324 | Syntax: V_MSAD_U8 VDST, SRC0, SRC1, SRC2<br /> |
| 1325 | Description: Calculate sum of absolute differences in SRC0 and SRC1 for bytes that have |
| 1326 | non-zero value in SRC1; add SRC2 to result, and store result to VDST.<br /> |
| 1327 | Operation:<br /> |
| 1328 | <code>VDST = SRC2 |
| 1329 | for (UINT8 i = 0; i < 4; i++) |
| 1330 | if ((SRC1 >> (i*8)) & 0xff) != 0) |
| 1331 | VDST += ABS(((SRC0 >> (i*8)) & 0xff) - ((SRC1 >> (i*8)) & 0xff))</code></p> |
| 1381 | <h4>V_QSAD_U8, V_QSAD_PK_U16_U8</h4> |
| 1382 | <p>Opcode: 370 (0x172) for GCN 1.0/1.1; 485 (0x1e5) for GCN 1.2<br /> |
| 1383 | Syntax (GCN 1.0): V_QSAD_U8 VDST(2), SRC0(2), SRC1, SRC2(2)<br /> |
| 1384 | Syntax (GCN 1.1/1.2): V_QSAD_PK_U16_U8 VDST(2), SRC0(2), SRC1, SRC2(2)<br /> |
| 1385 | Description: Compute four sum of absolute differences with accumulation. Any that operation |
| 1386 | get first argument from four bytes begins from N and ends to N+3 (where N is number of |
| 1387 | operation), second argument is SRC1, and third argument is N'th 16-bit dword from SRC2.<br /> |
| 1388 | Operation:<br /> |
| 1389 | <code>void SADU8(UINT32 S0, UINT32 S1, UINT32 S2) |
| 1390 | { |
| 1391 | UINT32 OUT = S2; |
| 1392 | for (UINT8 i = 0; i < 4; i++) |
| 1393 | OUT += ABS(((S0 >> (i*8)) & 0xff) - ((S1 >> (i*8)) & 0xff)) |
| 1394 | return OUT; |
| 1395 | } |
| 1396 | VDST = (SADU8((UINT32)SRC0, SRC1, SRC2 & 0xffff) |
| 1397 | VDST |= (SADU8((UINT32)(SRC0>>8), SRC1, (SRC2>>16) & 0xffff)<<16 |
| 1398 | VDST |= (SADU8((UINT32)(SRC0>>16), SRC1, (SRC2>>32) & 0xffff)<<32 |
| 1399 | VDST |= (SADU8((UINT32)(SRC0>>24), SRC1, (SRC2>>48) & 0xffff)<<48</code></p> |