Changes between Version 4 and Version 5 of GcnSdwaDpp
- Timestamp:
- 06/05/17 15:00:43 (6 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
GcnSdwaDpp
v4 v5 175 175 (fill bits after part) to source operand while for source operand was not 176 176 selected whole dword (SDWA_DWORD not choosen).</p> 177 <p>Examples: 177 <p>Examples:<br /> 178 178 <code>v_xor_b32 v1,v2,v3 dst_sel:byte_1 src0_sel:byte1 src1_sel:word1 179 179 v_xor_b32 v1,v2,v3 dst_sel:b1 src0_sel:b1 src1_sel:w1 … … 185 185 // SRC0_DST = dest. SRC0, SRC1_DST = dest. SRC1, DST_DST = VDST dest. 186 186 // OPERATION(SRC0, SRC1) - instruction operation, VDST - VDST register before instruction 187 if (HAVE_SRC0) 188 { 189 switch(SRC0_SEL) 190 { 191 case SDWA_BYTE_0: 192 SRC0_DST = (SRC0_SEXT) ? INT32(INT8(SRC0_SRC & 0xff)) : SRC0_SRC & 0xff 193 break; 194 case SDWA_BYTE_1: 195 SRC0_DST = (SRC0_SEXT) ? INT32(INT8((SRC0_SRC>>8) & 0xff)) : 196 (SRC0_SRC>>8) & 0xff 197 break; 198 case SDWA_BYTE_2: 199 SRC0_DST = (SRC0_SEXT) ? INT32(INT8((SRC0_SRC>>16) & 0xff)) : 200 (SRC0_SRC>>16) & 0xff 201 break; 202 case SDWA_BYTE_1: 203 SRC0_DST = (SRC0_SEXT) ? INT32(INT8(SRC0_SRC>>24)) : SRC0_SRC>>24 204 break; 205 case SDWA_WORD_0: 206 SRC0_DST = (SRC0_SEXT) ? INT32(INT16(SRC0_SRC & 0xffff)) : SRC0_SRC & 0xffff 207 break; 208 case SDWA_WORD_1: 209 SRC0_DST = (SRC0_SEXT) ? INT32(INT16(SRC0_SRC >> 16)) : SRC0_SRC >> 16 210 break; 211 case SDWA_DWORD: 212 SRC0_DST = SRC0_SRC 213 break; 214 } 187 switch(SRC0_SEL) 188 { 189 case SDWA_BYTE_0: 190 SRC0_DST = (SRC0_SEXT) ? INT32(INT8(SRC0_SRC & 0xff)) : SRC0_SRC & 0xff 191 break; 192 case SDWA_BYTE_1: 193 SRC0_DST = (SRC0_SEXT) ? INT32(INT8((SRC0_SRC>>8) & 0xff)) : 194 (SRC0_SRC>>8) & 0xff 195 break; 196 case SDWA_BYTE_2: 197 SRC0_DST = (SRC0_SEXT) ? INT32(INT8((SRC0_SRC>>16) & 0xff)) : 198 (SRC0_SRC>>16) & 0xff 199 break; 200 case SDWA_BYTE_1: 201 SRC0_DST = (SRC0_SEXT) ? INT32(INT8(SRC0_SRC>>24)) : SRC0_SRC>>24 202 break; 203 case SDWA_WORD_0: 204 SRC0_DST = (SRC0_SEXT) ? INT32(INT16(SRC0_SRC & 0xffff)) : SRC0_SRC & 0xffff 205 break; 206 case SDWA_WORD_1: 207 SRC0_DST = (SRC0_SEXT) ? INT32(INT16(SRC0_SRC >> 16)) : SRC0_SRC >> 16 208 break; 209 case SDWA_DWORD: 210 SRC0_DST = SRC0_SRC 211 break; 215 212 } 216 213 if (HAVE_SRC1) … … 243 240 } 244 241 } 245 DST_SRC = OPERATION(SRC0 ,SRC1)242 DST_SRC = OPERATION(SRC0_DST,SRC1_DST) 246 243 UNT32 tmp 247 244 switch(DST_SEL) … … 306 303 }</code></p> 307 304 <h3>VOP_DPP</h3> 308 <p>The VOP_DPP encoding is enabled by setting 0xfa in VSRC0 field in VOP1/VOP2/VOPC encoding.305 <p>The VOP_DPP encoding is enabled by setting 0xfa in SRC0 field in VOP1/VOP2/VOPC encoding. 309 306 List of fields:</p> 310 307 <table> … … 364 361 </tbody> 365 362 </table> 366 <p>The operation on wavefronts applied to VSRC0 operand in VOP instruction.363 <p>The operation on wavefronts applied to SRC0 operand in VOP instruction. 367 364 The wavefront contains 4 rows (16 threads), and each row contains 4 banks (4 threads). 368 The DPP_CTRL choose which operation will be applied to VSRC0.365 The DPP_CTRL choose which operation will be applied to SRC0. 369 366 List of data parallel operations:</p> 370 367 <table> … … 446 443 <tr> 447 444 <td>0x143</td> 448 <td>DPP_ROW_BCAST 15</td>449 <td>row_bcast: 15</td>445 <td>DPP_ROW_BCAST31</td> 446 <td>row_bcast:31</td> 450 447 <td>Broadcast 31 thread to row 2 and row 3</td> 451 448 </tr> … … 454 451 <p>The BOUND_CTRL flag (modifier <code>bound_ctrl</code> or <code>bound_ctrl:0</code>) control how to fill invalid 455 452 threads (for example that last threads after left shifting). Zero value (no modifier) 456 sets invalid threads by original VSRC0 value for particular thread. One value (with modifier) 457 fills invalid threads by 0 thread VSRC0 value.</p>453 do not perform operation in thread that source threads are invalid. 454 One value (with modifier) fills invalid threads by 0 value.</p> 458 455 <p>The field BANK_MASK (modifier <code>bank_mask:value</code>) choose which banks will be enabled during 459 456 data parallel operation in each enabled row. The Nth bit represents Nth bank in each row. 460 Disabled bank will be filled by original VSRC0 value for particular thread</p>457 Threads in disabled banks do not perform operation.</p> 461 458 <p>The field ROW_MASK (modifier <code>row_mask:value</code>) choose which rows will be enabled during 462 459 data parallel operation. The Nth bit represents Nth row. 463 Disabled row will be filled by original VSRC0 value for particular thread.</p> 460 Threads in disabled rows do not perform operation.</p> 461 <p>Examples:<br /> 462 <code>v_xor_b32 v1,v2,v3 quad_perm:[2,3,0,1] 463 v_xor_b32 v1,v2,v3 row_shl:5 464 v_xor_b32 v1,v2,v3 row_shr:7 465 v_xor_b32 v1,v2,v3 row_ror:8 466 v_xor_b32 v1,v2,v3 wave_shl:1 467 v_xor_b32 v1,v2,v3 wave_shl 468 v_xor_b32 v1,v2,v3 wave_shr:1 469 v_xor_b32 v1,v2,v3 wave_shr 470 v_xor_b32 v1,v2,v3 wave_rol:1 471 v_xor_b32 v1,v2,v3 wave_rol 472 v_xor_b32 v1,v2,v3 wave_ror:1 473 v_xor_b32 v1,v2,v3 wave_ror 474 v_xor_b32 v1,v2,v3 row_mirror 475 v_xor_b32 v1,v2,v3 row_half_mirror 476 v_xor_b32 v1,v2,v3 row_bcast:15 477 v_xor_b32 v1,v2,v3 row_bcast:31 478 v_xor_b32 v1,v2,v3 row_shr:7 bound_ctrl 479 v_xor_b32 v1,v2,v3 row_shr:7 bound_ctrl:0 480 v_xor_b32 v1,v2,v3 row_shl:5 row_mask:0b1100 481 v_xor_b32 v1,v2,v3 row_shl:5 bank_mask:0b0101</code></p> 482 <p>Operation code:<br /> 483 <code>// SRC0_SRC[X] - original VSRC0 value from thread X 484 // SRC0_DST[X] - destination VSRC0 value from thread X 485 // OPERATION(SRC0, SRC1) - instruction operation, VDST - VDST register before instruction 486 BYTE invalid = 0 487 BYTE srcLane 488 if (DPP_CTRL>=DPP_QUAD_PERM00 && DPP_CTRL<=DPP_QUAD_PERMFF) 489 { 490 BYTE p0 = DPP_CTRL&3 491 BYTE p1 = (DPP_CTRL>>2)&3 492 BYTE p2 = (DPP_CTRL>>4)&3 493 BYTE p3 = (DPP_CTRL>>6)&3 494 BYTE curL4 = LANEID&~3 495 if (LANEID&3==0) 496 srcLane = curL4 + p0 497 else if (LANEID&3==1) 498 srcLane = curL4 + p1 499 else if (LANEID&3==2) 500 srcLane = curL4 + p2 501 else if (LANEID&3==3) 502 srcLane = curL4 + p3 503 } 504 else if (DPP_CTRL>=DPP_ROW_SL1 && DPP_CTRL<=DPP_ROW_SL15) 505 { 506 BYTE shift = DPP_CTRL&15 507 BYTE slid = LANEID&15 508 BYTE curR = LANEID&~15 509 if (slid+shift<=15) 510 srcLane = curR + slid + shift 511 else 512 srcLane = LANESNUM 513 } 514 else if (DPP_CTRL>=DPP_ROW_SR1 && DPP_CTRL<=DPP_ROW_SR15) 515 { 516 BYTE shift = DPP_CTRL&15 517 BYTE slid = LANEID&15 518 BYTE curR = LANEID&~15 519 if (slid>=shift) 520 srcLane = curR + slid - shift 521 else 522 srcLane = LANESNUM 523 } 524 else if (DPP_CTRL>=DPP_ROW_RR1 && DPP_CTRL<=DPP_ROW_RR15) 525 { 526 BYTE shift = DPP_CTRL&15 527 BYTE slid = LANEID&15 528 BYTE curR = LANEID&~15 529 srcLane = curR + ((16+slid - shift)&15) 530 } 531 else if (DPP_CTRL==DPP_WF_SL1) 532 srcLane = LANEID+1 533 else if (DPP_CTRL==DPP_WF_SR1) 534 srcLane = LANEID-1 535 else if (DPP_CTRL==DPP_WF_RL1) 536 srcLane = (LANEID+1)&63 537 else if (DPP_CTRL==DPP_WF_RR1) 538 srcLane = (LANEID-1)&63 539 else if (DPP_CTRL==DPP_ROW_MIRROR) 540 { 541 BYTE curR = LANEID&~15 542 srcLane = curR + ((LANEID&15)^15) 543 } 544 else if (DPP_CTRL==DPP_ROW_HALF_MIRROR) 545 { 546 BYTE curR = LANEID&~7 547 srcLane = curR + ((LANEID&7)^7) 548 } 549 else if (DPP_CTRL==DPP_BCAST_15) 550 { 551 BYTE curR = LANEID&~15 552 if (LANEID<15) 553 srcLane = LANEID 554 else 555 srcLane = ((LANEID-16)&~15)+15 556 } 557 else if (DPP_CTRL==DPP_BCAST_31) 558 { 559 BYTE curR = LANEID&~31 560 if (LANEID<31) 561 srcLane = LANEID 562 else 563 srcLane = ((LANEID-31)&~31)+31 564 } 565 if (dstLane < LANESNUM) 566 SRC0_DST[LANEID] = SRC0_SRC[srcLane] 567 else if (BOUND_CTRL==0) 568 SRC0_DST[LANEID] = 0 569 else 570 invalid = 1 571 if ((ROW_MASK & (1U<<(LANEID>>4)))==0) 572 invalid = 1 573 if ((BANK_MASK & (1U<<((LANEID>>2)&3)))==0) 574 invalid = 1 575 if (!invalid) 576 VDST = OPERATION(SRC0_DST,SRC1)</code></p> 464 577 }}}