Changes between Version 9 and Version 10 of GcnInstrsFlat
- Timestamp:
- 11/28/17 22:00:30 (6 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
GcnInstrsFlat
v9 v10 5 5 <p>These instructions allow to access to main memory, LDS and scratch buffer. 6 6 FLAT instructions fetch address from 2 vector registers that hold 64-bit address. 7 FLAT instruction presents only in GCN 1.1 or later architecture.</p>8 <p>List of fields for the FLAT encoding (GCN 1.1 - 1.4):</p>7 FLAT instructions presents only in GCN 1.1 or later architecture.</p> 8 <p>List of fields for the FLAT encoding (GCN 1.1/1.2):</p> 9 9 <table> 10 10 <thead> … … 75 75 <td>NV</td> 76 76 <td>Non-Volatile (GCN 1.4)</td> 77 </tr> 78 <tr> 79 <td>56-63</td> 80 <td>VDST</td> 81 <td>Vector destination register</td> 82 </tr> 83 </tbody> 84 </table> 85 <p>List of fields for the FLAT encoding (GCN 1.4):</p> 86 <table> 87 <thead> 88 <tr> 89 <th>Bits</th> 90 <th>Name</th> 91 <th>Description</th> 92 </tr> 93 </thead> 94 <tbody> 95 <tr> 96 <td>0-12</td> 97 <td>OFFSET</td> 98 <td>Byte offset</td> 99 </tr> 100 <tr> 101 <td>13</td> 102 <td>LDS</td> 103 <td>transfer DATA to LDS and memory</td> 104 </tr> 105 <tr> 106 <td>14-15</td> 107 <td>SEG</td> 108 <td>Memory segment (instrunction type)</td> 109 </tr> 110 <tr> 111 <td>16</td> 112 <td>GLC</td> 113 <td>Operation globally coherent</td> 114 </tr> 115 <tr> 116 <td>17</td> 117 <td>SLC</td> 118 <td>System level coherent</td> 119 </tr> 120 <tr> 121 <td>18-24</td> 122 <td>OPCODE</td> 123 <td>Operation code</td> 124 </tr> 125 <tr> 126 <td>25-31</td> 127 <td>ENCODING</td> 128 <td>Encoding type. Must be 0b110111</td> 129 </tr> 130 <tr> 131 <td>32-39</td> 132 <td>VADDR</td> 133 <td>Vector address registers</td> 134 </tr> 135 <tr> 136 <td>40-47</td> 137 <td>VDATA</td> 138 <td>Vector data register</td> 139 </tr> 140 <tr> 141 <td>48-54</td> 142 <td>SADDR</td> 143 <td>Scalar SGPR offset (for GLOBAL/SCRATCH) (0x7f value disables it)</td> 144 </tr> 145 <tr> 146 <td>55</td> 147 <td>NV</td> 148 <td>Non-Volatile</td> 77 149 </tr> 78 150 <tr> … … 117 189 SCRATCH instruction syntax: INSTRUCTION VADDR(2), VDATA, SADDR|OFF [MODIFIERS]</p> 118 190 <p>Modifiers can be supplied in any order. Modifiers list: SLC, GLC, TFE, 119 LDS, NV, OFFSET:OFFSET. The TFE flag requires additional the VDATA register.191 LDS, NV, INST_OFFSET:OFFSET. The TFE flag requires additional the VDATA register. 120 192 LDS, NV and OFFSET are available only in GCN 1.4 architecture.</p> 121 193 <p>FLAT instruction can complete out of order with each other. This can be caused by different 122 194 resources from/to that instruction can load/store. FLAT instruction increase VMCNT if access 123 195 to main memory, or LKGMCNT if accesses to LDS.</p> 124 <p>OFFSET can be 13-bit signed for GLOBAL_* and SCRATCH_* instructions or125 12-bit unsigned for FLAT_* instructions.</p>196 <p>OFFSET (INST_OFFSET modifier) can be 13-bit signed for GLOBAL_* and SCRATCH_* 197 instructions or 12-bit unsigned for FLAT_* instructions.</p> 126 198 <h3>Instructions by opcode</h3> 127 199 <p>List of the FLAT instructions by opcode (GCN 1.1/1.2):</p> … … 527 599 </tbody> 528 600 </table> 601 <p>List of the FLAT/GLOBAL/SCRATCH instructions by opcode (GCN 1.4):</p> 602 <table> 603 <thead> 604 <tr> 605 <th>Opcode</th> 606 <th>FLAT</th> 607 <th>GLOBAL</th> 608 <th>SCRATCH</th> 609 <th>Mnemonic</th> 610 </tr> 611 </thead> 612 <tbody> 613 <tr> 614 <td>16 (0x10)</td> 615 <td>✓</td> 616 <td>✓</td> 617 <td>✓</td> 618 <td>*_LOAD_UBYTE</td> 619 </tr> 620 <tr> 621 <td>17 (0x11)</td> 622 <td>✓</td> 623 <td>✓</td> 624 <td>✓</td> 625 <td>*_LOAD_SBYTE</td> 626 </tr> 627 <tr> 628 <td>18 (0x12)</td> 629 <td>✓</td> 630 <td>✓</td> 631 <td>✓</td> 632 <td>*_LOAD_USHORT</td> 633 </tr> 634 <tr> 635 <td>19 (0x13)</td> 636 <td>✓</td> 637 <td>✓</td> 638 <td>✓</td> 639 <td>*_LOAD_SSHORT</td> 640 </tr> 641 <tr> 642 <td>20 (0x14)</td> 643 <td>✓</td> 644 <td>✓</td> 645 <td>✓</td> 646 <td>*_LOAD_DWORD</td> 647 </tr> 648 <tr> 649 <td>21 (0x15)</td> 650 <td>✓</td> 651 <td>✓</td> 652 <td>✓</td> 653 <td>*_LOAD_DWORDX2</td> 654 </tr> 655 <tr> 656 <td>22 (0x16)</td> 657 <td>✓</td> 658 <td>✓</td> 659 <td>✓</td> 660 <td>*_LOAD_DWORDX3</td> 661 </tr> 662 <tr> 663 <td>23 (0x17)</td> 664 <td>✓</td> 665 <td>✓</td> 666 <td>✓</td> 667 <td>*_LOAD_DWORDX4</td> 668 </tr> 669 <tr> 670 <td>24 (0x18)</td> 671 <td>✓</td> 672 <td>✓</td> 673 <td>✓</td> 674 <td>*_STORE_BYTE</td> 675 </tr> 676 <tr> 677 <td>25 (0x19)</td> 678 <td>✓</td> 679 <td>✓</td> 680 <td>✓</td> 681 <td>*_STORE_BYTE_D16_HI</td> 682 </tr> 683 <tr> 684 <td>26 (0x1a)</td> 685 <td>✓</td> 686 <td>✓</td> 687 <td>✓</td> 688 <td>*_STORE_SHORT</td> 689 </tr> 690 <tr> 691 <td>27 (0x1b)</td> 692 <td>✓</td> 693 <td>✓</td> 694 <td>✓</td> 695 <td>*_STORE_SHORT_D16_HI</td> 696 </tr> 697 <tr> 698 <td>28 (0x1c)</td> 699 <td>✓</td> 700 <td>✓</td> 701 <td>✓</td> 702 <td>*_STORE_DWORD</td> 703 </tr> 704 <tr> 705 <td>29 (0x1d)</td> 706 <td>✓</td> 707 <td>✓</td> 708 <td>✓</td> 709 <td>*_STORE_DWORDX2</td> 710 </tr> 711 <tr> 712 <td>30 (0x1e)</td> 713 <td>✓</td> 714 <td>✓</td> 715 <td>✓</td> 716 <td>*_STORE_DWORDX3</td> 717 </tr> 718 <tr> 719 <td>31 (0x1f)</td> 720 <td>✓</td> 721 <td>✓</td> 722 <td>✓</td> 723 <td>*_STORE_DWORDX4</td> 724 </tr> 725 <tr> 726 <td>32 (0x20)</td> 727 <td>✓</td> 728 <td>✓</td> 729 <td>✓</td> 730 <td>*_LOAD_UBYTE_D16</td> 731 </tr> 732 <tr> 733 <td>33 (0x21)</td> 734 <td>✓</td> 735 <td>✓</td> 736 <td>✓</td> 737 <td>*_LOAD_UBYTE_D16_HI</td> 738 </tr> 739 <tr> 740 <td>34 (0x22)</td> 741 <td>✓</td> 742 <td>✓</td> 743 <td>✓</td> 744 <td>*_LOAD_SBYTE_D16</td> 745 </tr> 746 <tr> 747 <td>35 (0x23)</td> 748 <td>✓</td> 749 <td>✓</td> 750 <td>✓</td> 751 <td>*_LOAD_SBYTE_D16_HI</td> 752 </tr> 753 <tr> 754 <td>36 (0x24)</td> 755 <td>✓</td> 756 <td>✓</td> 757 <td>✓</td> 758 <td>*_LOAD_SHORT_D16</td> 759 </tr> 760 <tr> 761 <td>37 (0x25)</td> 762 <td>✓</td> 763 <td>✓</td> 764 <td>✓</td> 765 <td>*_LOAD_SHORT_D16_HI</td> 766 </tr> 767 <tr> 768 <td>64 (0x40)</td> 769 <td>✓</td> 770 <td>✓</td> 771 <td></td> 772 <td>*_ATOMIC_SWAP</td> 773 </tr> 774 <tr> 775 <td>65 (0x41)</td> 776 <td>✓</td> 777 <td>✓</td> 778 <td></td> 779 <td>*_ATOMIC_CMPSWAP</td> 780 </tr> 781 <tr> 782 <td>66 (0x42)</td> 783 <td>✓</td> 784 <td>✓</td> 785 <td></td> 786 <td>*_ATOMIC_ADD</td> 787 </tr> 788 <tr> 789 <td>67 (0x43)</td> 790 <td>✓</td> 791 <td>✓</td> 792 <td></td> 793 <td>*_ATOMIC_SUB</td> 794 </tr> 795 <tr> 796 <td>68 (0x44)</td> 797 <td>✓</td> 798 <td>✓</td> 799 <td></td> 800 <td>*_ATOMIC_SMIN</td> 801 </tr> 802 <tr> 803 <td>69 (0x45)</td> 804 <td>✓</td> 805 <td>✓</td> 806 <td></td> 807 <td>*_ATOMIC_UMIN</td> 808 </tr> 809 <tr> 810 <td>70 (0x46)</td> 811 <td>✓</td> 812 <td>✓</td> 813 <td></td> 814 <td>*_ATOMIC_SMAX</td> 815 </tr> 816 <tr> 817 <td>71 (0x47)</td> 818 <td>✓</td> 819 <td>✓</td> 820 <td></td> 821 <td>*_ATOMIC_UMAX</td> 822 </tr> 823 <tr> 824 <td>72 (0x48)</td> 825 <td>✓</td> 826 <td>✓</td> 827 <td></td> 828 <td>*_ATOMIC_AND</td> 829 </tr> 830 <tr> 831 <td>73 (0x49)</td> 832 <td>✓</td> 833 <td>✓</td> 834 <td></td> 835 <td>*_ATOMIC_OR</td> 836 </tr> 837 <tr> 838 <td>74 (0x4a)</td> 839 <td>✓</td> 840 <td>✓</td> 841 <td></td> 842 <td>*_ATOMIC_XOR</td> 843 </tr> 844 <tr> 845 <td>75 (0x4b)</td> 846 <td>✓</td> 847 <td>✓</td> 848 <td></td> 849 <td>*_ATOMIC_INC</td> 850 </tr> 851 <tr> 852 <td>76 (0x4c)</td> 853 <td>✓</td> 854 <td>✓</td> 855 <td></td> 856 <td>*_ATOMIC_DEC</td> 857 </tr> 858 <tr> 859 <td>96 (0x60)</td> 860 <td>✓</td> 861 <td>✓</td> 862 <td></td> 863 <td>*_ATOMIC_SWAP_X2</td> 864 </tr> 865 <tr> 866 <td>97 (0x61)</td> 867 <td>✓</td> 868 <td>✓</td> 869 <td></td> 870 <td>*_ATOMIC_CMPSWAP_X2</td> 871 </tr> 872 <tr> 873 <td>98 (0x62)</td> 874 <td>✓</td> 875 <td>✓</td> 876 <td></td> 877 <td>*_ATOMIC_ADD_X2</td> 878 </tr> 879 <tr> 880 <td>99 (0x63)</td> 881 <td>✓</td> 882 <td>✓</td> 883 <td></td> 884 <td>*_ATOMIC_SUB_X2</td> 885 </tr> 886 <tr> 887 <td>100 (0x64)</td> 888 <td>✓</td> 889 <td>✓</td> 890 <td></td> 891 <td>*_ATOMIC_SMIN_X2</td> 892 </tr> 893 <tr> 894 <td>101 (0x65)</td> 895 <td>✓</td> 896 <td>✓</td> 897 <td></td> 898 <td>*_ATOMIC_UMIN_X2</td> 899 </tr> 900 <tr> 901 <td>102 (0x66)</td> 902 <td>✓</td> 903 <td>✓</td> 904 <td></td> 905 <td>*_ATOMIC_SMAX_X2</td> 906 </tr> 907 <tr> 908 <td>103 (0x67)</td> 909 <td>✓</td> 910 <td>✓</td> 911 <td></td> 912 <td>*_ATOMIC_UMAX_X2</td> 913 </tr> 914 <tr> 915 <td>104 (0x68)</td> 916 <td>✓</td> 917 <td>✓</td> 918 <td></td> 919 <td>*_ATOMIC_AND_X2</td> 920 </tr> 921 <tr> 922 <td>105 (0x69)</td> 923 <td>✓</td> 924 <td>✓</td> 925 <td></td> 926 <td>*_ATOMIC_OR_X2</td> 927 </tr> 928 <tr> 929 <td>106 (0x6a)</td> 930 <td>✓</td> 931 <td>✓</td> 932 <td></td> 933 <td>*_ATOMIC_XOR_X2</td> 934 </tr> 935 <tr> 936 <td>107 (0x6b)</td> 937 <td>✓</td> 938 <td>✓</td> 939 <td></td> 940 <td>*_ATOMIC_INC_X2</td> 941 </tr> 942 <tr> 943 <td>108 (0x6c)</td> 944 <td>✓</td> 945 <td>✓</td> 946 <td></td> 947 <td>*_ATOMIC_DEC_X2</td> 948 </tr> 949 </tbody> 950 </table> 951 <p>The '*' means prefix of instruction (FLAT, GLOBAL or SCRATCH).</p> 529 952 <h3>Instruction set</h3> 530 953 <p>Alphabetically sorted instruction list:</p> 531 954 <h4>FLAT_ATOMIC_ADD</h4> 532 <p>Opcode: 50 (0x32) for GCN 1.1; 66 (0x42) for GCN 1.2 <br />955 <p>Opcode: 50 (0x32) for GCN 1.1; 66 (0x42) for GCN 1.2/1.4<br /> 533 956 Syntax: FLAT_ATOMIC_ADD VDST, VADDR(2), VDATA<br /> 534 Description: Add VDATA to value of VADDRaddress, and store result to this address.957 Description: Add VDATA to value of memory address, and store result to this address. 535 958 If GLC flag is set then return previous value from this address to VDST, 536 959 otherwise keep VDST value. Operation is atomic.<br /> 537 960 Operation:<br /> 538 <code>UINT32* VM = (UINT32*) VADDR961 <code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET) 539 962 UINT32 P = *VM; *VM = *VM + VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 540 963 <h4>FLAT_ATOMIC_ADD_X2</h4> 541 <p>Opcode: 82 (0x52) for GCN 1.1; 98 (0x62) for GCN 1.2 <br />964 <p>Opcode: 82 (0x52) for GCN 1.1; 98 (0x62) for GCN 1.2/1.4<br /> 542 965 Syntax: FLAT_ATOMIC_ADD_X2 VDST(2), VADDR(2), VDATA(2)<br /> 543 Description: Add 64-bit VDATA to 64-bit value of VADDRaddress, and store result966 Description: Add 64-bit VDATA to 64-bit value of memory address, and store result 544 967 to this address. If GLC flag is set then return previous value from address to VDST, 545 968 otherwise keep VDST value. Operation is atomic.<br /> 546 969 Operation:<br /> 547 <code>UINT64* VM = (UINT64*) VADDR970 <code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET) 548 971 UINT64 P = *VM; *VM = *VM + VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 549 972 <h4>FLAT_ATOMIC_AND</h4> 550 <p>Opcode: 57 (0x39) for GCN 1.1; 72 (0x48) for GCN 1.2 <br />973 <p>Opcode: 57 (0x39) for GCN 1.1; 72 (0x48) for GCN 1.2/1.4<br /> 551 974 Syntax: FLAT_ATOMIC_AND VDST, VADDR(2), VDATA<br /> 552 Description: Do bitwise AND on VDATA and value of VADDRaddress,975 Description: Do bitwise AND on VDATA and value of memory address, 553 976 and store result to this address. If GLC flag is set then return previous value 554 977 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 555 978 Operation:<br /> 556 <code>UINT32* VM = (UINT32*) VADDR979 <code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET) 557 980 UINT32 P = *VM; *VM = *VM & VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 558 981 <h4>FLAT_ATOMIC_AND_X2</h4> 559 <p>Opcode: 89 (0x59) for GCN 1.1; 104 (0x68) for GCN 1.2 <br />982 <p>Opcode: 89 (0x59) for GCN 1.1; 104 (0x68) for GCN 1.2/1.4<br /> 560 983 Syntax: FLAT_ATOMIC_AND_X2 VDST(2), VADDR(2), VDATA(2)<br /> 561 Description: Do 64-bit bitwise AND on VDATA and value of VADDRaddress,984 Description: Do 64-bit bitwise AND on VDATA and value of memory address, 562 985 and store result to this address. If GLC flag is set then return previous value 563 986 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 564 987 Operation:<br /> 565 <code>UINT64* VM = (UINT64*) VADDR988 <code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET) 566 989 UINT64 P = *VM; *VM = *VM & VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 567 990 <h4>FLAT_ATOMIC_CMPSWAP</h4> 568 <p>Opcode: 49 (0x31) for GCN 1.1; 65 (0x41) for GCN 1.2 <br />991 <p>Opcode: 49 (0x31) for GCN 1.1; 65 (0x41) for GCN 1.2/1.4<br /> 569 992 Syntax: FLAT_ATOMIC_CMPSWAP VDST, VADDR(2), VDATA(2)<br /> 570 Description: Store lower VDATA dword into VADDRaddress if previous value993 Description: Store lower VDATA dword into memory address if previous value 571 994 from that address is equal VDATA>>32, otherwise keep old value from address. 572 995 If GLC flag is set then return previous value from address to VDST, 573 996 otherwise keep VDST value. Operation is atomic.<br /> 574 997 Operation:<br /> 575 <code>UINT32* VM = (UINT32*) VADDR998 <code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET) 576 999 UINT32 P = *VM; *VM = *VM==(VDATA>>32) ? VDATA&0xffffffff : *VM // part of atomic 577 1000 VDST = (GLC) ? P : VDST // last part of atomic</code></p> 578 1001 <h4>FLAT_ATOMIC_CMPSWAP_X2</h4> 579 <p>Opcode: 81 (0x51) for GCN 1.1; 97 (0x61) for GCN 1.2 <br />1002 <p>Opcode: 81 (0x51) for GCN 1.1; 97 (0x61) for GCN 1.2/1.4<br /> 580 1003 Syntax: FLAT_ATOMIC_CMPSWAP_X2 VDST(2), VADDR(2), VDATA(4)<br /> 581 Description: Store lower VDATA 64-bit word into VADDRaddress if previous value1004 Description: Store lower VDATA 64-bit word into memory address if previous value 582 1005 from address is equal VDATA>>64, otherwise keep old value from VADDR. 583 1006 If GLC flag is set then return previous value from VADDR to VDST, 584 1007 otherwise keep VDST value. Operation is atomic.<br /> 585 1008 Operation:<br /> 586 <code>UINT64* VM = (UINT64*) VADDR1009 <code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET) 587 1010 UINT64 P = *VM; *VM = *VM==(VDATA[2:3]) ? VDATA[0:1] : *VM // part of atomic 588 1011 VDST = (GLC) ? P : VDST // last part of atomic</code></p> 589 1012 <h4>FLAT_ATOMIC_DEC</h4> 590 <p>Opcode: 61 (0x3d) for GCN 1.1; 76 (0x4c) for GCN 1.2 <br />1013 <p>Opcode: 61 (0x3d) for GCN 1.1; 76 (0x4c) for GCN 1.2/1.4<br /> 591 1014 Syntax: FLAT_ATOMIC_DEC VDST, VADDR(2), VDATA<br /> 592 Description: Compare value from VADDRaddress and if less or equal than VDATA593 and this value is not zero, then decrement value from VADDRaddress,1015 Description: Compare value from memory address and if less or equal than VDATA 1016 and this value is not zero, then decrement value from memory address, 594 1017 otherwise store VDATA to this address. If GLC flag is set then return previous value 595 1018 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 596 1019 Operation:<br /> 597 <code>UINT32* VM = (UINT32*) VADDR1020 <code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET) 598 1021 UINT32 P = *VM; *VM = (*VM <= VDATA && *VM!=0) ? *VM-1 : VDATA // atomic 599 1022 VDST = (GLC) ? P : VDST // atomic</code></p> 600 1023 <h4>FLAT_ATOMIC_DEC_X2</h4> 601 <p>Opcode: 93 (0x5d) for GCN 1.1; 108 (0x6c) for GCN 1.2 <br />1024 <p>Opcode: 93 (0x5d) for GCN 1.1; 108 (0x6c) for GCN 1.2/1.4<br /> 602 1025 Syntax: FLAT_ATOMIC_DEC_X2 VDST(2), VADDR(2), VDATA(2)<br /> 603 Description: Compare 64-bit value from VADDRaddress and if less or equal than VDATA604 and this value is not zero, then decrement value from VADDRaddress,1026 Description: Compare 64-bit value from memory address and if less or equal than VDATA 1027 and this value is not zero, then decrement value from memory address, 605 1028 otherwise store VDATA to this address. If GLC flag is set then return previous value 606 1029 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 607 1030 Operation:<br /> 608 <code>UINT64* VM = (UINT64*) VADDR1031 <code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET) 609 1032 UINT64 P = *VM; *VM = (*VM <= VDATA && *VM!=0) ? *VM-1 : VDATA // atomic 610 1033 VDST = (GLC) ? P : VDST // atomic</code></p> … … 612 1035 <p>Opcode: 62 (0x3e) for GCN 1.1<br /> 613 1036 Syntax: FLAT_ATOMIC_FCMPSWAP VDST, VADDR(1:2), VDATA(2)<br /> 614 Description: Store lower VDATA dword into VADDRaddress if previous single floating point1037 Description: Store lower VDATA dword into memory address if previous single floating point 615 1038 value from address is equal singe floating point value VDATA>>32, 616 otherwise keep old value from VADDRaddress.1039 otherwise keep old value from memory address. 617 1040 If GLC flag is set then return previous value from this address to VDST, 618 1041 otherwise keep VDST value. Operation is atomic.<br /> 619 1042 Operation:<br /> 620 <code>FLOAT* VM = (FLOAT*) VADDR1043 <code>FLOAT* VM = (FLOAT*)(VADDR + INST_OFFSET) 621 1044 FLOAT P = *VM; *VM = *VM==ASFLOAT(VDATA>>32) ? VDATA&0xffffffff : *VM // part of atomic 622 1045 VDST[0] = (GLC) ? P : VDST // last part of atomic</code></p> … … 624 1047 <p>Opcode: 94 (0x5e) for GCN 1.1<br /> 625 1048 Syntax: FLAT_ATOMIC_FCMPSWAP_X2 VDATA(2), VADDR(2), SRSRC(4), SOFFSET<br /> 626 Description: Store lower VDATA 64-bit word into VADDRaddress if previous double1049 Description: Store lower VDATA 64-bit word into memory address if previous double 627 1050 floating point value from address is equal singe floating point value VDATA>>32, 628 otherwise keep old value from VADDRaddress.1051 otherwise keep old value from memory address. 629 1052 If GLC flag is set then return previous value from address to VDST, otherwise keep 630 1053 VDST value. Operation is atomic.<br /> 631 1054 Operation:<br /> 632 <code>DOUBLE* VM = (DOUBLE*) VMADDR1055 <code>DOUBLE* VM = (DOUBLE*)(VADDR + INST_OFFSET) 633 1056 DOUBLE P = *VM; *VM = *VM==ASDOUBLE(VDATA[2:3]) ? VDATA[0:1] : *VM // part of atomic 634 1057 VDST = (GLC) ? P : VDST // last part of atomic</code></p> … … 637 1060 Syntax: FLAT_ATOMIC_FMAX VDST, VADDR(2), VDATA<br /> 638 1061 Description: Choose greatest single floating point value from VDATA and from 639 VADDRaddress, and store result to this address.1062 memory address, and store result to this address. 640 1063 If GLC flag is set then return previous value from address to VDST, otherwise keep 641 1064 VDST value. Operation is atomic.<br /> 642 1065 Operation:<br /> 643 <code>FLOAT* VM = (FLOAT*) VADDR1066 <code>FLOAT* VM = (FLOAT*)(VADDR + INST_OFFSET) 644 1067 UINT32 P = *VM; *VM = MAX(*VM, ASFLOAT(VDATA)); VDST = (GLC) ? P : VDST // atomic</code></p> 645 1068 <h4>BUFFER_ATOMIC_FMAX_X2</h4> … … 647 1070 Syntax: FLAT_ATOMIC_FMAX_X2 VDST(2), VADDR(2), VDATA(2)<br /> 648 1071 Description: Choose greatest double floating point value from VDATA and from 649 VADDRaddress, and store result to this address.1072 memory address, and store result to this address. 650 1073 If GLC flag is set then return previous value from address to VDST, 651 1074 otherwise keep VDST value. Operation is atomic.<br /> 652 1075 Operation:<br /> 653 <code>DOUBLE* VM = (DOUBLE*) VADDR1076 <code>DOUBLE* VM = (DOUBLE*)(VADDR + INST_OFFSET) 654 1077 UINT64 P = *VM; *VM = MAX(*VM, ASDOUBLE(VDATA)); VDST = (GLC) ? P : VDST // atomic</code></p> 655 1078 <h4>FLAT_ATOMIC_FMIN</h4> … … 657 1080 Syntax: FLAT_ATOMIC_FMIN VDST, VADDR(2), VDATA<br /> 658 1081 Description: Choose smallest single floating point value from VDATA and from 659 VADDRaddress, and store result to this address.1082 memory address, and store result to this address. 660 1083 If GLC flag is set then return previous value from address to VDST, otherwise keep 661 1084 VDST value. Operation is atomic.<br /> 662 1085 Operation:<br /> 663 <code>FLOAT* VM = (FLOAT*) VADDR1086 <code>FLOAT* VM = (FLOAT*)(VADDR + INST_OFFSET) 664 1087 UINT32 P = *VM; *VM = MIN(*VM, ASFLOAT(VDATA)); VDST = (GLC) ? P : VDST // atomic</code></p> 665 1088 <h4>BUFFER_ATOMIC_FMIN_X2</h4> … … 667 1090 Syntax: FLAT_ATOMIC_FMIN_X2 VDST(2), VADDR(2), VDATA(2)<br /> 668 1091 Description: Choose smallest double floating point value from VDATA and from 669 VADDRaddress, and store result to this address.1092 memory address, and store result to this address. 670 1093 If GLC flag is set then return previous value from address to VDST, 671 1094 otherwise keep VDST value. Operation is atomic.<br /> 672 1095 Operation:<br /> 673 <code>DOUBLE* VM = (DOUBLE*) VADDR1096 <code>DOUBLE* VM = (DOUBLE*)(VADDR + INST_OFFSET) 674 1097 UINT64 P = *VM; *VM = MIN(*VM, ASDOUBLE(VDATA)); VDST = (GLC) ? P : VDST // atomic</code></p> 675 1098 <h4>FLAT_ATOMIC_INC</h4> 676 <p>Opcode: 60 (0x3c) for GCN 1.1; 75 (0x4b) for GCN 1.2 <br />1099 <p>Opcode: 60 (0x3c) for GCN 1.1; 75 (0x4b) for GCN 1.2/1.4<br /> 677 1100 Syntax: FLT_ATOMIC_INC VDST, VADDR(2), VDATA<br /> 678 Description: Compare value from VADDRaddress and if less than VDATA,1101 Description: Compare value from memory address and if less than VDATA, 679 1102 then increment value from address, otherwise store zero to address. 680 1103 If GLC flag is set then return previous value from this address to VDST, 681 1104 otherwise keep VDST value. Operation is atomic.<br /> 682 1105 Operation:<br /> 683 <code>UINT32* VM = (UINT32*) VADDR1106 <code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET) 684 1107 UINT32 P = *VM; *VM = (*VM < VDATA) ? *VM+1 : 0; VDST = (GLC) ? P : VDST // atomic</code></p> 685 1108 <h4>FLAT_ATOMIC_INC_X2</h4> 686 <p>Opcode: 92 (0x5c) for GCN 1.1; 107 (0x9b) for GCN 1.2 <br />1109 <p>Opcode: 92 (0x5c) for GCN 1.1; 107 (0x9b) for GCN 1.2/1.4<br /> 687 1110 Syntax: FLAT_ATOMIC_INC_X2 VDST(2), VADDR(2), VADDR(2)<br /> 688 Description: Compare 64-bit value from VADDRaddress and if less than VDATA,1111 Description: Compare 64-bit value from memory address and if less than VDATA, 689 1112 then increment value from address, otherwise store zero to address. 690 1113 If GLC flag is set then return previous value from this address to VDST, 691 1114 otherwise keep VDST value. Operation is atomic.<br /> 692 1115 Operation:<br /> 693 <code>UINT64* VM = (UINT64*) VADDR1116 <code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET) 694 1117 UINT64 P = *VM; *VM = (*VM < VDATA) ? *VM+1 : 0; VDST = (GLC) ? P : VDST // atomic</code></p> 695 1118 <h4>FLAT_ATOMIC_OR</h4> 696 <p>Opcode: 58 (0x3a) for GCN 1.1; 73 (0x49) for GCN 1.2 <br />1119 <p>Opcode: 58 (0x3a) for GCN 1.1; 73 (0x49) for GCN 1.2/1.4<br /> 697 1120 Syntax: FLAT_ATOMIC_OR VDST, VADDR(2), VDATA<br /> 698 Description: Do bitwise OR on VDATA and value of VADDRaddress,1121 Description: Do bitwise OR on VDATA and value of memory address, 699 1122 and store result to this address. If GLC flag is set then return previous value 700 1123 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 701 1124 Operation:<br /> 702 <code>UINT32* VM = (UINT32*) VADDR1125 <code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET) 703 1126 UINT32 P = *VM; *VM = *VM | VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 704 1127 <h4>FLAT_ATOMIC_OR_X2</h4> 705 <p>Opcode: 90 (0x5a) for GCN 1.1; 105 (0x69) for GCN 1.2 <br />1128 <p>Opcode: 90 (0x5a) for GCN 1.1; 105 (0x69) for GCN 1.2/1.4<br /> 706 1129 Syntax: FLAT_ATOMIC_OR_X2 VDST(2), VADDR(2), VDATA(2)<br /> 707 Description: Do 64-bit bitwise OR on VDATA and value of VADDRaddress,1130 Description: Do 64-bit bitwise OR on VDATA and value of memory address, 708 1131 and store result to this address. If GLC flag is set then return previous value 709 1132 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 710 1133 Operation:<br /> 711 <code>UINT64* VM = (UINT64*) VADDR1134 <code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET) 712 1135 UINT64 P = *VM; *VM = *VM | VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 713 1136 <h4>FLAT_ATOMIC_SMAX</h4> 714 <p>Opcode: 55 (0x37) for GCN 1.1; 70 (0x46) for GCN 1.2 <br />1137 <p>Opcode: 55 (0x37) for GCN 1.1; 70 (0x46) for GCN 1.2/1.4<br /> 715 1138 Syntax: FLAT_ATOMIC_SMAX VDST, VADDR(2), VDATA<br /> 716 Description: Choose greatest signed 32-bit value from VDATA and from VADDRaddress,1139 Description: Choose greatest signed 32-bit value from VDATA and from memory address, 717 1140 and store result to this address. 718 1141 If GLC flag is set then return previous value from this address to VDST, otherwise keep 719 1142 VDST value. Operation is atomic.<br /> 720 1143 Operation:<br /> 721 <code>INT32* VM = (INT32*) VADDR1144 <code>INT32* VM = (INT32*)(VADDR + INST_OFFSET) 722 1145 UINT32 P = *VM; *VM = MAX(*VM, (INT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 723 1146 <h4>FLAT_ATOMIC_SMAX_X2</h4> 724 <p>Opcode: 87 (0x57) for GCN 1.1; 102 (0x66) for GCN 1.2 <br />1147 <p>Opcode: 87 (0x57) for GCN 1.1; 102 (0x66) for GCN 1.2/1.4<br /> 725 1148 Syntax: FLAT_ATOMIC_SMAX_X2 VDST(2), VADDR(2), VDATA(2)<br /> 726 Description: Choose greatest signed 64-bit value from VDATA and from VADDRaddress,1149 Description: Choose greatest signed 64-bit value from VDATA and from memory address, 727 1150 and store result to this address. 728 1151 If GLC flag is set then return previous value from this address to VDST, otherwise keep 729 1152 VDST value. Operation is atomic.<br /> 730 1153 Operation:<br /> 731 <code>INT64* VM = (INT64*) VADDR1154 <code>INT64* VM = (INT64*)(VADDR + INST_OFFSET) 732 1155 UINT64 P = *VM; *VM = MAX(*VM, (INT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 733 1156 <h4>FLAT_ATOMIC_SMIN</h4> 734 <p>Opcode: 53 (0x35) for GCN 1.1; 68 (0x44) for GCN 1.2 <br />1157 <p>Opcode: 53 (0x35) for GCN 1.1; 68 (0x44) for GCN 1.2/1.4<br /> 735 1158 Syntax: FLAT_ATOMIC_SMIN VDST, VADDR(2), VDATA<br /> 736 Description: Choose smallest signed 32-bit value from VDATA and from VADDRaddress,1159 Description: Choose smallest signed 32-bit value from VDATA and from memory address, 737 1160 and store result to this address. 738 1161 If GLC flag is set then return previous value from this address to VDST, otherwise keep 739 1162 VDST value. Operation is atomic.<br /> 740 1163 Operation:<br /> 741 <code>INT32* VM = (INT32*) VADDR1164 <code>INT32* VM = (INT32*)(VADDR + INST_OFFSET) 742 1165 UINT32 P = *VM; *VM = MIN(*VM, (INT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 743 1166 <h4>FLAT_ATOMIC_SMIN_X2</h4> 744 <p>Opcode: 85 (0x55) for GCN 1.1; 100 (0x64) for GCN 1.2 <br />1167 <p>Opcode: 85 (0x55) for GCN 1.1; 100 (0x64) for GCN 1.2/1.4<br /> 745 1168 Syntax: FLAT_ATOMIC_SMIN_X2 VDST(2), VADDR(2), VDATA(2)<br /> 746 Description: Choose smallest signed 64-bit value from VDATA and from VADDRaddress,1169 Description: Choose smallest signed 64-bit value from VDATA and from memory address, 747 1170 and store result to this address. 748 1171 If GLC flag is set then return previous value from this address to VDST, otherwise keep 749 1172 VDST value. Operation is atomic.<br /> 750 1173 Operation:<br /> 751 <code>INT64* VM = (INT64*) VADDR1174 <code>INT64* VM = (INT64*)(VADDR + INST_OFFSET) 752 1175 UINT64 P = *VM; *VM = MIN(*VM, (INT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 753 1176 <h4>FLAT_ATOMIC_SUB</h4> 754 <p>Opcode: 51 (0x33) for GCN 1.1; 67 (0x43) for GCN 1.2 <br />1177 <p>Opcode: 51 (0x33) for GCN 1.1; 67 (0x43) for GCN 1.2/1.4<br /> 755 1178 Syntax: FLAT_ATOMIC_SUB VDST, VADDR(2), VDATA<br /> 756 Description: Subtract VDATA from value of VADDRaddress, and store result to this address.1179 Description: Subtract VDATA from value of memory address, and store result to this address. 757 1180 If GLC flag is set then return previous value from this address to VDST, 758 1181 otherwise keep VDST value. Operation is atomic.<br /> 759 1182 Operation:<br /> 760 <code>UINT32* VM = (UINT32*) VADDR1183 <code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET) 761 1184 UINT32 P = *VM; *VM = *VM - VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 762 1185 <h4>FLAT_ATOMIC_SUB_X2</h4> 763 <p>Opcode: 83 (0x53) for GCN 1.1; 99 (0x63) for GCN 1.2 <br />1186 <p>Opcode: 83 (0x53) for GCN 1.1; 99 (0x63) for GCN 1.2/1.4<br /> 764 1187 Syntax: FLAT_ATOMIC_SUB_X2 VDST(2), VADDR(2), VDATA(2)<br /> 765 Description: Subtract 64-bit VDATA from 64-bit value of VADDRaddress, and store result1188 Description: Subtract 64-bit VDATA from 64-bit value of memory address, and store result 766 1189 to this address. If GLC flag is set then return previous value from address to VDST, 767 1190 otherwise keep VDST value. Operation is atomic.<br /> 768 1191 Operation:<br /> 769 <code>UINT64* VM = (UINT64*) VADDR1192 <code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET) 770 1193 UINT64 P = *VM; *VM = *VM - VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 771 1194 <h4>FLAT_ATOMIC_SWAP</h4> 772 <p>Opcode: 48 (0x30) for GCN 1.1; 64 (0x40) for GCN 1.2 <br />773 Syntax: FLAT_ATOMIC_SWAP VDST, VADDR(2), VDATA 774 Description: Store VDATA dword into VADDRaddress. If GLC flag is set then775 return previous value from VADDRaddress to VDST, otherwise keep old value from VDST.1195 <p>Opcode: 48 (0x30) for GCN 1.1; 64 (0x40) for GCN 1.2/1.4<br /> 1196 Syntax: FLAT_ATOMIC_SWAP VDST, VADDR(2), VDATA<br /> 1197 Description: Store VDATA dword into memory address. If GLC flag is set then 1198 return previous value from memory address to VDST, otherwise keep old value from VDST. 776 1199 Operation is atomic.<br /> 777 1200 Operation:<br /> 778 <code>UINT32* VM = (UINT32*) VADDR1201 <code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET) 779 1202 UINT32 P = *VM; *VM = VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 780 1203 <h4>FLAT_ATOMIC_SWAP_X2</h4> 781 <p>Opcode: 80 (0x50) for GCN 1.1; 96 (0x60) for GCN 1.2 <br />782 Syntax: FLAT_ATOMIC_SWAP_X2 VDST(2), VADDR(2), VDATA(2) 783 Description: Store VDATA 64-bit word into VADDRaddress. If GLC flag is set then784 return previous value from VADDRaddress to VDST, otherwise keep old value from VDST.1204 <p>Opcode: 80 (0x50) for GCN 1.1; 96 (0x60) for GCN 1.2/1.4<br /> 1205 Syntax: FLAT_ATOMIC_SWAP_X2 VDST(2), VADDR(2), VDATA(2)<br /> 1206 Description: Store VDATA 64-bit word into memory address. If GLC flag is set then 1207 return previous value from memory address to VDST, otherwise keep old value from VDST. 785 1208 Operation is atomic.<br /> 786 1209 Operation:<br /> 787 <code>UINT64* VM = (UINT64*) VADDR1210 <code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET) 788 1211 UINT64 P = *VM; *VM = VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 789 1212 <h4>FLAT_ATOMIC_UMAX</h4> 790 <p>Opcode: 56 (0x38) for GCN 1.1; 71 (0x47) for GCN 1.2 <br />1213 <p>Opcode: 56 (0x38) for GCN 1.1; 71 (0x47) for GCN 1.2/1.4<br /> 791 1214 Syntax: FLAT_ATOMIC_UMAX VDST, VADDR(2), VDATA<br /> 792 Description: Choose greatest unsigned 32-bit value from VDATA and from VADDRaddress,1215 Description: Choose greatest unsigned 32-bit value from VDATA and from memory address, 793 1216 and store result to this address. 794 1217 If GLC flag is set then return previous value from this address to VDST, otherwise keep 795 1218 VDST value. Operation is atomic.<br /> 796 1219 Operation:<br /> 797 <code>UINT32* VM = (UINT32*) VADDR1220 <code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET) 798 1221 UINT32 P = *VM; *VM = MAX(*VM, (UINT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 799 1222 <h4>FLAT_ATOMIC_UMAX_X2</h4> 800 <p>Opcode: 88 (0x58) for GCN 1.1; 103 (0x67) for GCN 1.2 <br />1223 <p>Opcode: 88 (0x58) for GCN 1.1; 103 (0x67) for GCN 1.2/1.4<br /> 801 1224 Syntax: FLAT_ATOMIC_UMAX_X2 VDST(2), VADDR(2), VDATA(2)<br /> 802 Description: Choose greatest unsigned 64-bit value from VDATA and from VADDRaddress,1225 Description: Choose greatest unsigned 64-bit value from VDATA and from memory address, 803 1226 and store result to this address. 804 1227 If GLC flag is set then return previous value from this address to VDST, otherwise keep 805 1228 VDST value. Operation is atomic.<br /> 806 1229 Operation:<br /> 807 <code>UINT64* VM = (UINT64*) VADDR1230 <code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET) 808 1231 UINT64 P = *VM; *VM = MAX(*VM, (UINT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 809 1232 <h4>FLAT_ATOMIC_UMIN</h4> 810 <p>Opcode: 54 (0x36) for GCN 1.1; 69 (0x45) for GCN 1.2 <br />1233 <p>Opcode: 54 (0x36) for GCN 1.1; 69 (0x45) for GCN 1.2/1.4<br /> 811 1234 Syntax: FLAT_ATOMIC_UMIN VDST, VADDR(2), VDATA<br /> 812 Description: Choose smallest unsigned 32-bit value from VDATA and from VADDRaddress,1235 Description: Choose smallest unsigned 32-bit value from VDATA and from memory address, 813 1236 and store result to this address. 814 1237 If GLC flag is set then return previous value from this address to VDST, otherwise keep 815 1238 VDST value. Operation is atomic.<br /> 816 1239 Operation:<br /> 817 <code>UINT32* VM = (UINT32*) VADDR1240 <code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET) 818 1241 UINT32 P = *VM; *VM = MIN(*VM, (UINT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 819 1242 <h4>FLAT_ATOMIC_UMIN_X2</h4> 820 <p>Opcode: 86 (0x56) for GCN 1.1; 101 (0x65) for GCN 1.2 <br />1243 <p>Opcode: 86 (0x56) for GCN 1.1; 101 (0x65) for GCN 1.2/1.4<br /> 821 1244 Syntax: FLAT_ATOMIC_UMIN_X2 VDST(2), VADDR(2), VDATA(2)<br /> 822 Description: Choose smallest unsigned 64-bit value from VDATA and from VADDRaddress,1245 Description: Choose smallest unsigned 64-bit value from VDATA and from memory address, 823 1246 and store result to this address. 824 1247 If GLC flag is set then return previous value from this address to VDST, otherwise keep 825 1248 VDST value. Operation is atomic.<br /> 826 1249 Operation:<br /> 827 <code>UINT64* VM = (UINT64*) VADDR1250 <code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET) 828 1251 UINT64 P = *VM; *VM = MIN(*VM, (UINT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 829 1252 <h4>FLAT_ATOMIC_XOR</h4> 830 <p>Opcode: 59 (0x3b) for GCN 1.1; 74 (0x4a) for GCN 1.2 <br />1253 <p>Opcode: 59 (0x3b) for GCN 1.1; 74 (0x4a) for GCN 1.2/1.4<br /> 831 1254 Syntax: FLAT_ATOMIC_XOR VDST, VADDR(2), VDATA<br /> 832 Description: Do bitwise XOR on VDATA and value of VADDRaddress,1255 Description: Do bitwise XOR on VDATA and value of memory address, 833 1256 and store result to this address. If GLC flag is set then return previous value 834 1257 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 835 1258 Operation:<br /> 836 <code>UINT32* VM = (UINT32*) VADDR1259 <code>UINT32* VM = (UINT32*)(VADDR + INST_OFFSET) 837 1260 UINT32 P = *VM; *VM = *VM ^ VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 838 1261 <h4>FLAT_ATOMIC_XOR_X2</h4> 839 <p>Opcode: 91 (0x5b) for GCN 1.1; 106 (0x6a) for GCN 1.2 <br />1262 <p>Opcode: 91 (0x5b) for GCN 1.1; 106 (0x6a) for GCN 1.2/1.4<br /> 840 1263 Syntax: FLAT_ATOMIC_XOR_X2 VDST(2), VADDR(2), VDATA(2)<br /> 841 Description: Do 64-bit bitwise XOR on VDATA and value of VADDRaddress,1264 Description: Do 64-bit bitwise XOR on VDATA and value of memory address, 842 1265 and store result to this address. If GLC flag is set then return previous value 843 1266 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 844 1267 Operation:<br /> 845 <code>UINT64* VM = (UINT64*) VADDR1268 <code>UINT64* VM = (UINT64*)(VADDR + INST_OFFSET) 846 1269 UINT64 P = *VM; *VM = *VM ^ VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 847 1270 <h4>FLAT_LOAD_DWORD</h4> 848 <p>Opcode: 12 (0xc) for GCN 1.1; 20 (0x14) for GCN 1.2 <br />1271 <p>Opcode: 12 (0xc) for GCN 1.1; 20 (0x14) for GCN 1.2/1.4<br /> 849 1272 Syntax: FLAT_LOAD_DWORD VDST, VADDR(2)<br /> 850 Description Load dword to VDST from VADDRaddress.<br />851 Operation:<br /> 852 <code>VDST = *(UINT32*) VADDR</code></p>1273 Description Load dword to VDST from memory address.<br /> 1274 Operation:<br /> 1275 <code>VDST = *(UINT32*)(VADDR + INST_OFFSET)</code></p> 853 1276 <h4>FLAT_LOAD_DWORDX2</h4> 854 <p>Opcode: 13 (0xd) for GCN 1.1; 21 (0x15) for GCN 1.2 <br />1277 <p>Opcode: 13 (0xd) for GCN 1.1; 21 (0x15) for GCN 1.2/1.4<br /> 855 1278 Syntax: FLAT_LOAD_DWORDX2 VDST(, VADDR(2)<br /> 856 Description Load two dwords to VDST from VADDRaddress.<br />857 Operation:<br /> 858 <code>VDST = *(UINT64*) VADDR</code></p>1279 Description Load two dwords to VDST from memory address.<br /> 1280 Operation:<br /> 1281 <code>VDST = *(UINT64*)(VADDR + INST_OFFSET)</code></p> 859 1282 <h4>FLAT_LOAD_DWORDX3</h4> 860 <p>Opcode: 15 (0xf) for GCN 1.1; 22 (0x16) for GCN 1.2 <br />1283 <p>Opcode: 15 (0xf) for GCN 1.1; 22 (0x16) for GCN 1.2/1.4<br /> 861 1284 Syntax: FLAT_LOAD_DWORDX3 VDST(3), VADDR(2)<br /> 862 Description Load three dwords to VDST from VADDR address.<br /> 863 Operation:<br /> 864 <code>VDST[0] = *(UINT32*)VADDR 865 VDST[1] = *(UINT32*)(VADDR+4) 866 VDST[2] = *(UINT32*)(VADDR+8)</code></p> 1285 Description Load three dwords to VDST from memory address.<br /> 1286 Operation:<br /> 1287 <code>BYTE* VM = (VADDR + INST_OFFSET) 1288 VDST[0] = *(UINT32*)VM 1289 VDST[1] = *(UINT32*)(VM+4) 1290 VDST[2] = *(UINT32*)(VM+8)</code></p> 867 1291 <h4>FLAT_LOAD_DWORDX4</h4> 868 <p>Opcode: 13 (0xe) for GCN 1.1; 23 (0x17) for GCN 1.2 <br />1292 <p>Opcode: 13 (0xe) for GCN 1.1; 23 (0x17) for GCN 1.2/1.4<br /> 869 1293 Syntax: FLAT_LOAD_DWORDX4 VDST(4), VADDR(2)<br /> 870 Description Load four dwords to VDST from VADDR address.<br /> 871 Operation:<br /> 872 <code>VDST[0] = *(UINT32*)VADDR 873 VDST[1] = *(UINT32*)(VADDR+4) 874 VDST[2] = *(UINT32*)(VADDR+8) 875 VDST[3] = *(UINT32*)(VADDR+12)</code></p> 1294 Description Load four dwords to VDST from memory address.<br /> 1295 Operation:<br /> 1296 <code>BYTE* VM = (VADDR + INST_OFFSET) 1297 VDST[0] = *(UINT32*)VM 1298 VDST[1] = *(UINT32*)(VM+4) 1299 VDST[2] = *(UINT32*)(VM+8) 1300 VDST[3] = *(UINT32*)(VM+12)</code></p> 876 1301 <h4>FLAT_LOAD_SBYTE</h4> 877 <p>Opcode: 9 (0x9) for GCN 1.1; 17 (0x11) for GCN 1.2 <br />1302 <p>Opcode: 9 (0x9) for GCN 1.1; 17 (0x11) for GCN 1.2/1.4<br /> 878 1303 Syntax: FLAT_LOAD_SBYTE VDST, VADDR(2)<br /> 879 Description: Load byte to VDST from VADDR address with sign extending.<br /> 880 Operation:<br /> 881 <code>VDST = *(INT8*)VADDR</code></p> 1304 Description: Load byte to VDST from memory address with sign extending.<br /> 1305 Operation:<br /> 1306 <code>VDST = *(INT8*)(VADDR + INST_OFFSET)</code></p> 1307 <h4>FLAT_LOAD_SBYTE_D16</h4> 1308 <p>Opcode: 34 (0x22) for GCN 1.4<br /> 1309 Syntax: FLAT_LOAD_SBYTE_D16 VDST, VADDR(2)<br /> 1310 Description: Load byte to lower 16-bit part of VDST from 1311 memory address with sign extending.<br /> 1312 Operation:<br /> 1313 <code>BYTE* VM = (VADDR + INST_OFFSET) 1314 VDST = ((UINT16)*(INT8*)VM) | (VDST&0xffff0000)</code></p> 1315 <h4>FLAT_LOAD_SBYTE_D16_HI</h4> 1316 <p>Opcode: 35 (0x23) for GCN 1.4<br /> 1317 Syntax: FLAT_LOAD_SBYTE_D16_HI VDST, VADDR(2)<br /> 1318 Description: Load byte to higher 16-bit part of VDST from 1319 memory address with sign extending.<br /> 1320 Operation:<br /> 1321 <code>BYTE* VM = (VADDR + INST_OFFSET) 1322 VDST = (((UINT32)*(INT8*)VM)<<16) | (VDST&0xffff)</code></p> 1323 <h4>FLAT_LOAD_SHORT_D16</h4> 1324 <p>Opcode: 36 (0x24) for GCN 1.4<br /> 1325 Syntax: FLAT_LOAD_SHORT_D16 VDST, VADDR(2)<br /> 1326 Description: Load 16-bit word to lower 16-bit part of VDST from memory address.<br /> 1327 Operation:<br /> 1328 <code>BYTE* VM = (VADDR + INST_OFFSET) 1329 VDST = *(UINT16*)VM | (VDST & 0xffff0000)</code></p> 1330 <h4>FLAT_LOAD_SHORT_D16_HI</h4> 1331 <p>Opcode: 36 (0x24) for GCN 1.4<br /> 1332 Syntax: FLAT_LOAD_SHORT_D16_HI VDST, VADDR(2)<br /> 1333 Description: Load 16-bit word to lower 16-bit part of VDST from memory address.<br /> 1334 Operation:<br /> 1335 <code>BYTE* VM = (VADDR + INST_OFFSET) 1336 VDST = (((UINT32)*(UINT16*)VM)<<16) | (VDST & 0xffff)</code></p> 882 1337 <h4>FLAT_LOAD_SSHORT</h4> 883 <p>Opcode: 11 (0xb) for GCN 1.1; 19 (0x13) for GCN 1.2 <br />1338 <p>Opcode: 11 (0xb) for GCN 1.1; 19 (0x13) for GCN 1.2/1.4<br /> 884 1339 Syntax: FLAT_LOAD_SSHORT VDST, VADDR(2)<br /> 885 Description: Load 16-bit word to VDST from VADDRaddress with sign extending.<br />886 Operation:<br /> 887 <code>VDST = *(INT16*) VADDR</code></p>1340 Description: Load 16-bit word to VDST from memory address with sign extending.<br /> 1341 Operation:<br /> 1342 <code>VDST = *(INT16*)(VADDR + INST_OFFSET)</code></p> 888 1343 <h4>FLAT_LOAD_UBYTE</h4> 889 <p>Opcode: 8 (0x8) for GCN 1.1; 16 (0x10) for GCN 1.2 <br />1344 <p>Opcode: 8 (0x8) for GCN 1.1; 16 (0x10) for GCN 1.2/1.4<br /> 890 1345 Syntax: FLAT_LOAD_UBYTE VDST, VADDR(2)<br /> 891 Description: Load byte to VDST from VADDR address with zero extending.<br /> 892 Operation:<br /> 893 <code>VDST = *(UINT8*)VADDR</code></p> 1346 Description: Load byte to VDST from memory address with zero extending.<br /> 1347 Operation:<br /> 1348 <code>VDST = *(UINT8*)(VADDR + INST_OFFSET)</code></p> 1349 <h4>FLAT_LOAD_UBYTE_D16</h4> 1350 <p>Opcode: 32 (0x20) for GCN 1.4<br /> 1351 Syntax: FLAT_LOAD_UBYTE_D16 VDST, VADDR(2)<br /> 1352 Description: Load byte to lower 16-bit part of VDST from 1353 memory address with zero extending.<br /> 1354 Operation:<br /> 1355 <code>BYTE* VM = (VADDR + INST_OFFSET) 1356 VDST = ((UINT16)*(UINT8*)VM) | (VDST&0xffff0000)</code></p> 1357 <h4>FLAT_LOAD_UBYTE_D16_HI</h4> 1358 <p>Opcode: 33 (0x21) for GCN 1.4<br /> 1359 Syntax: FLAT_LOAD_UBYTE_D16_HI VDST, VADDR(2)<br /> 1360 Description: Load byte to higher 16-bit part of VDST from 1361 memory address with zero extending.<br /> 1362 Operation:<br /> 1363 <code>BYTE* VM = (VADDR + INST_OFFSET) 1364 VDST = (((UINT32)*(UINT8*)VM)<<16) | (VDST&0xffff)</code></p> 894 1365 <h4>FLAT_LOAD_USHORT</h4> 895 <p>Opcode: 10 (0xa) for GCN 1.1; 18 (0x12) for GCN 1.2 <br />1366 <p>Opcode: 10 (0xa) for GCN 1.1; 18 (0x12) for GCN 1.2/1.4<br /> 896 1367 Syntax: FLAT_LOAD_USHORT VDST, VADDR(1:2)<br /> 897 Description: Load 16-bit word to VDST from VADDRaddress with zero extending.<br />898 Operation:<br /> 899 <code>VDST = *(UINT16*) VADDR</code></p>1368 Description: Load 16-bit word to VDST from memory address with zero extending.<br /> 1369 Operation:<br /> 1370 <code>VDST = *(UINT16*)(VADDR + INST_OFFSET)</code></p> 900 1371 <h4>FLAT_STORE_BYTE</h4> 901 1372 <p>Opcode: 24 (0x18)<br /> 902 1373 Syntax: FLAT_STORE_BYTE VADDR(2), VDATA<br /> 903 Description: Store byte from VDATA to VADDR address.<br /> 904 Operation:<br /> 905 <code>*(UINT8*)VADDR = VDATA&0xff</code></p> 1374 Description: Store byte from VDATA to memory address.<br /> 1375 Operation:<br /> 1376 <code>*(UINT8*)(VADDR + INST_OFFSET) = VDATA&0xff</code></p> 1377 <h4>FLAT_STORE_BYTE_D16_HI</h4> 1378 <p>Opcode: 25 (0x19) for GCN 1.4<br /> 1379 Syntax: FLAT_STORE_BYTE_D16_HI VADDR(2), VDATA<br /> 1380 Description: Store byte from 16-23 bits of VDATA to memory address.<br /> 1381 Operation:<br /> 1382 <code>*(UINT8*)(VADDR + INST_OFFSET) = (VDATA>>16)&0xff</code></p> 906 1383 <h4>FLAT_STORE_DWORD</h4> 907 1384 <p>Opcode: 28 (0x1c)<br /> 908 1385 Syntax: FLAT_STORE_DWORD VADDR(2), VDATA<br /> 909 Description: Store dword from VDATA to VADDRaddress.<br />910 Operation:<br /> 911 <code>*(UINT32*) VADDR= VDATA</code></p>1386 Description: Store dword from VDATA to memory address.<br /> 1387 Operation:<br /> 1388 <code>*(UINT32*)(VADDR + INST_OFFSET) = VDATA</code></p> 912 1389 <h4>FLAT_STORE_DWORDX2</h4> 913 1390 <p>Opcode: 29 (0x1d)<br /> 914 1391 Syntax: FLAT_STORE_DWORDX2 VADDR(2), VDATA(2)<br /> 915 Description: Store two dwords from VDATA to VADDRaddress.<br />916 Operation:<br /> 917 <code>*(UINT64*) VADDR= VDATA</code></p>1392 Description: Store two dwords from VDATA to memory address.<br /> 1393 Operation:<br /> 1394 <code>*(UINT64*)(VADDR + INST_OFFSET) = VDATA</code></p> 918 1395 <h4>FLAT_STORE_DWORDX3</h4> 919 <p>Opcode: 31 (0x1f) for GCN 1.1; 30 (0x1e) for GCN 1.2 <br />1396 <p>Opcode: 31 (0x1f) for GCN 1.1; 30 (0x1e) for GCN 1.2/1.4<br /> 920 1397 Syntax: FLAT_STORE_DWORDX3 VADDR(2), VDATA(3)<br /> 921 Description: Store three dwords from VDATA to VADDR address.<br /> 922 Operation:<br /> 923 <code>*(UINT32*)(VADDR) = VDATA[0] 924 *(UINT32*)(VADDR+4) = VDATA[1] 925 *(UINT32*)(VADDR+8) = VDATA[2]</code></p> 1398 Description: Store three dwords from VDATA to memory address.<br /> 1399 Operation:<br /> 1400 <code>BYTE* VM = (VADDR + INST_OFFSET) 1401 *(UINT32*)(VM) = VDATA[0] 1402 *(UINT32*)(VM+4) = VDATA[1] 1403 *(UINT32*)(VM+8) = VDATA[2]</code></p> 926 1404 <h4>FLAT_STORE_DWORDX4</h4> 927 <p>Opcode: 30 (0x1e) for GCN 1.1; 31 (0x1d) for GCN 1.2 <br />1405 <p>Opcode: 30 (0x1e) for GCN 1.1; 31 (0x1d) for GCN 1.2/1.4<br /> 928 1406 Syntax: FLAT_STORE_DWORDX4 VADDR(2), VDATA(4)<br /> 929 Description: Store four dwords from VDATA to VADDRaddress.<br />930 Operation:<br /> 931 <code>*(UINT32*)(V ADDR) = VDATA[0]932 *(UINT32*)(V ADDR+4) = VDATA[1]933 *(UINT32*)(V ADDR+8) = VDATA[2]934 *(UINT32*)(V ADDR+12) = VDATA[3]</code></p>1407 Description: Store four dwords from VDATA to memory address.<br /> 1408 Operation:<br /> 1409 <code>*(UINT32*)(VM) = VDATA[0] 1410 *(UINT32*)(VM+4) = VDATA[1] 1411 *(UINT32*)(VM+8) = VDATA[2] 1412 *(UINT32*)(VM+12) = VDATA[3]</code></p> 935 1413 <h4>FLAT_STORE_SHORT</h4> 936 1414 <p>Opcode: 26 (0x1a)<br /> 937 1415 Syntax: FLAT_STORE_SHORT VADDR(2), VDATA<br /> 938 Description: Store 16-bit word from VDATA to VADDR address.<br /> 939 Operation:<br /> 940 <code>*(UINT16*)VADDR = VDATA&0xffff</code></p> 1416 Description: Store 16-bit word from VDATA to memory address.<br /> 1417 Operation:<br /> 1418 <code>*(UINT16*)(VADDR + INST_OFFSET) = VDATA&0xffff</code></p> 1419 <h4>FLAT_STORE_SHORT_D16_HI</h4> 1420 <p>Opcode: 27 (0x1b) for GCN 1.4<br /> 1421 Syntax: FLAT_STORE_SHORT_D16_HI VADDR(2), VDATA<br /> 1422 Description: Store 16-bit word from higher 16-bit part of VDATA to memory address.<br /> 1423 Operation:<br /> 1424 <code>*(UINT16*)(VADDR + INST_OFFSET) = VDATA>>16</code></p> 1425 <h4>GLOBAL_ATOMIC_ADD</h4> 1426 <p>Opcode: 66 (0x42) for GCN 1.4<br /> 1427 Syntax: GLOBAL_ATOMIC_ADD VDST, VADDR(2), VDATA, SADDR(2)|OFF<br /> 1428 Description: Add VDATA to value of global address, and store result to this address. 1429 If GLC flag is set then return previous value from this address to VDST, 1430 otherwise keep VDST value. Operation is atomic.<br /> 1431 Operation:<br /> 1432 <code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET) 1433 UINT32 P = *VM; *VM = *VM + VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 1434 <h4>GLOBAL_ATOMIC_ADD_X2</h4> 1435 <p>Opcode: 98 (0x62) for GCN 1.4<br /> 1436 Syntax: GLOBAL_ATOMIC_ADD_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br /> 1437 Description: Add 64-bit VDATA to 64-bit value of global address, and store result 1438 to this address. If GLC flag is set then return previous value from address to VDST, 1439 otherwise keep VDST value. Operation is atomic.<br /> 1440 Operation:<br /> 1441 <code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET) 1442 UINT64 P = *VM; *VM = *VM + VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 1443 <h4>GLOBAL_ATOMIC_AND</h4> 1444 <p>Opcode: 72 (0x48) for GCN 1.4<br /> 1445 Syntax: GLOBAL_ATOMIC_AND VDST, VADDR(2), VDATA, SADDR(2)|OFF<br /> 1446 Description: Do bitwise AND on VDATA and value of global address, 1447 and store result to this address. If GLC flag is set then return previous value 1448 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 1449 Operation:<br /> 1450 <code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET) 1451 UINT32 P = *VM; *VM = *VM & VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 1452 <h4>GLOBAL_ATOMIC_AND_X2</h4> 1453 <p>Opcode: 104 (0x68) for GCN 1.4<br /> 1454 Syntax: GLOBAL_ATOMIC_AND_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br /> 1455 Description: Do 64-bit bitwise AND on VDATA and value of global address, 1456 and store result to this address. If GLC flag is set then return previous value 1457 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 1458 Operation:<br /> 1459 <code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET) 1460 UINT64 P = *VM; *VM = *VM & VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 1461 <h4>GLOBAL_ATOMIC_CMPSWAP</h4> 1462 <p>Opcode: 65 (0x41) for GCN 1.4<br /> 1463 Syntax: GLOBAL_ATOMIC_CMPSWAP VDST, VADDR(2), VDATA(2), SADDR(2)|OFF<br /> 1464 Description: Store lower VDATA dword into global address if previous value 1465 from that address is equal VDATA>>32, otherwise keep old value from address. 1466 If GLC flag is set then return previous value from address to VDST, 1467 otherwise keep VDST value. Operation is atomic.<br /> 1468 Operation:<br /> 1469 <code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET) 1470 UINT32 P = *VM; *VM = *VM==(VDATA>>32) ? VDATA&0xffffffff : *VM // part of atomic 1471 VDST = (GLC) ? P : VDST // last part of atomic</code></p> 1472 <h4>GLOBAL_ATOMIC_CMPSWAP_X2</h4> 1473 <p>Opcode: 97 (0x61) for GCN 1.4<br /> 1474 Syntax: GLOBAL_ATOMIC_CMPSWAP_X2 VDST(2), VADDR(2), VDATA(4), SADDR(2)|OFF<br /> 1475 Description: Store lower VDATA 64-bit word into global address if previous value 1476 from address is equal VDATA>>64, otherwise keep old value from VADDR. 1477 If GLC flag is set then return previous value from VADDR to VDST, 1478 otherwise keep VDST value. Operation is atomic.<br /> 1479 Operation:<br /> 1480 <code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET) 1481 UINT64 P = *VM; *VM = *VM==(VDATA[2:3]) ? VDATA[0:1] : *VM // part of atomic 1482 VDST = (GLC) ? P : VDST // last part of atomic</code></p> 1483 <h4>GLOBAL_ATOMIC_DEC</h4> 1484 <p>Opcode: 76 (0x4c) for GCN 1.4<br /> 1485 Syntax: GLOBAL_ATOMIC_DEC VDST, VADDR(2), VDATA, SADDR(2)|OFF<br /> 1486 Description: Compare value from global address and if less or equal than VDATA 1487 and this value is not zero, then decrement value from global address, 1488 otherwise store VDATA to this address. If GLC flag is set then return previous value 1489 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 1490 Operation:<br /> 1491 <code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET) 1492 UINT32 P = *VM; *VM = (*VM <= VDATA && *VM!=0) ? *VM-1 : VDATA // atomic 1493 VDST = (GLC) ? P : VDST // atomic</code></p> 1494 <h4>GLOBAL_ATOMIC_DEC_X2</h4> 1495 <p>Opcode: 108 (0x6c) for GCN 1.4<br /> 1496 Syntax: GLOBAL_ATOMIC_DEC_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br /> 1497 Description: Compare 64-bit value from global address and if less or equal than VDATA 1498 and this value is not zero, then decrement value from global address, 1499 otherwise store VDATA to this address. If GLC flag is set then return previous value 1500 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 1501 Operation:<br /> 1502 <code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET) 1503 UINT64 P = *VM; *VM = (*VM <= VDATA && *VM!=0) ? *VM-1 : VDATA // atomic 1504 VDST = (GLC) ? P : VDST // atomic</code></p> 1505 <h4>GLOBAL_ATOMIC_INC</h4> 1506 <p>Opcode: 75 (0x4b) for GCN 1.4<br /> 1507 Syntax: FLT_ATOMIC_INC VDST, VADDR(2), VDATA, SADDR(2)|OFF<br /> 1508 Description: Compare value from global address and if less than VDATA, 1509 then increment value from address, otherwise store zero to address. 1510 If GLC flag is set then return previous value from this address to VDST, 1511 otherwise keep VDST value. Operation is atomic.<br /> 1512 Operation:<br /> 1513 <code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET) 1514 UINT32 P = *VM; *VM = (*VM < VDATA) ? *VM+1 : 0; VDST = (GLC) ? P : VDST // atomic</code></p> 1515 <h4>GLOBAL_ATOMIC_INC_X2</h4> 1516 <p>Opcode: 107 (0x9b) for GCN 1.4<br /> 1517 Syntax: GLOBAL_ATOMIC_INC_X2 VDST(2), VADDR(2), VADDR(2), SADDR(2)|OFF<br /> 1518 Description: Compare 64-bit value from global address and if less than VDATA, 1519 then increment value from address, otherwise store zero to address. 1520 If GLC flag is set then return previous value from this address to VDST, 1521 otherwise keep VDST value. Operation is atomic.<br /> 1522 Operation:<br /> 1523 <code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET) 1524 UINT64 P = *VM; *VM = (*VM < VDATA) ? *VM+1 : 0; VDST = (GLC) ? P : VDST // atomic</code></p> 1525 <h4>GLOBAL_ATOMIC_OR</h4> 1526 <p>Opcode: 73 (0x49) for GCN 1.4<br /> 1527 Syntax: GLOBAL_ATOMIC_OR VDST, VADDR(2), VDATA, SADDR(2)|OFF<br /> 1528 Description: Do bitwise OR on VDATA and value of global address, 1529 and store result to this address. If GLC flag is set then return previous value 1530 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 1531 Operation:<br /> 1532 <code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET) 1533 UINT32 P = *VM; *VM = *VM | VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 1534 <h4>GLOBAL_ATOMIC_OR_X2</h4> 1535 <p>Opcode: 105 (0x69) for GCN 1.4<br /> 1536 Syntax: GLOBAL_ATOMIC_OR_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br /> 1537 Description: Do 64-bit bitwise OR on VDATA and value of global address, 1538 and store result to this address. If GLC flag is set then return previous value 1539 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 1540 Operation:<br /> 1541 <code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET) 1542 UINT64 P = *VM; *VM = *VM | VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 1543 <h4>GLOBAL_ATOMIC_SMAX</h4> 1544 <p>Opcode: 70 (0x46) for GCN 1.4<br /> 1545 Syntax: GLOBAL_ATOMIC_SMAX VDST, VADDR(2), VDATA, SADDR(2)|OFF<br /> 1546 Description: Choose greatest signed 32-bit value from VDATA and from global address, 1547 and store result to this address. 1548 If GLC flag is set then return previous value from this address to VDST, otherwise keep 1549 VDST value. Operation is atomic.<br /> 1550 Operation:<br /> 1551 <code>INT32* VM = (INT32*)(VADDR + SADDR + INST_OFFSET) 1552 UINT32 P = *VM; *VM = MAX(*VM, (INT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 1553 <h4>GLOBAL_ATOMIC_SMAX_X2</h4> 1554 <p>Opcode: 102 (0x66) for GCN 1.4<br /> 1555 Syntax: GLOBAL_ATOMIC_SMAX_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br /> 1556 Description: Choose greatest signed 64-bit value from VDATA and from global address, 1557 and store result to this address. 1558 If GLC flag is set then return previous value from this address to VDST, otherwise keep 1559 VDST value. Operation is atomic.<br /> 1560 Operation:<br /> 1561 <code>INT64* VM = (INT64*)(VADDR + SADDR + INST_OFFSET) 1562 UINT64 P = *VM; *VM = MAX(*VM, (INT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 1563 <h4>GLOBAL_ATOMIC_SMIN</h4> 1564 <p>Opcode: 68 (0x44) for GCN 1.4<br /> 1565 Syntax: GLOBAL_ATOMIC_SMIN VDST, VADDR(2), VDATA, SADDR(2)|OFF<br /> 1566 Description: Choose smallest signed 32-bit value from VDATA and from global address, 1567 and store result to this address. 1568 If GLC flag is set then return previous value from this address to VDST, otherwise keep 1569 VDST value. Operation is atomic.<br /> 1570 Operation:<br /> 1571 <code>INT32* VM = (INT32*)(VADDR + SADDR + INST_OFFSET) 1572 UINT32 P = *VM; *VM = MIN(*VM, (INT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 1573 <h4>GLOBAL_ATOMIC_SMIN_X2</h4> 1574 <p>Opcode: 100 (0x64) for GCN 1.4<br /> 1575 Syntax: GLOBAL_ATOMIC_SMIN_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br /> 1576 Description: Choose smallest signed 64-bit value from VDATA and from global address, 1577 and store result to this address. 1578 If GLC flag is set then return previous value from this address to VDST, otherwise keep 1579 VDST value. Operation is atomic.<br /> 1580 Operation:<br /> 1581 <code>INT64* VM = (INT64*)(VADDR + SADDR + INST_OFFSET) 1582 UINT64 P = *VM; *VM = MIN(*VM, (INT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 1583 <h4>GLOBAL_ATOMIC_SUB</h4> 1584 <p>Opcode: 67 (0x43) for GCN 1.4<br /> 1585 Syntax: GLOBAL_ATOMIC_SUB VDST, VADDR(2), VDATA, SADDR(2)|OFF<br /> 1586 Description: Subtract VDATA from value of global address, and store result to this address. 1587 If GLC flag is set then return previous value from this address to VDST, 1588 otherwise keep VDST value. Operation is atomic.<br /> 1589 Operation:<br /> 1590 <code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET) 1591 UINT32 P = *VM; *VM = *VM - VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 1592 <h4>GLOBAL_ATOMIC_SUB_X2</h4> 1593 <p>Opcode: 99 (0x63) for GCN 1.4<br /> 1594 Syntax: GLOBAL_ATOMIC_SUB_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br /> 1595 Description: Subtract 64-bit VDATA from 64-bit value of global address, and store result 1596 to this address. If GLC flag is set then return previous value from address to VDST, 1597 otherwise keep VDST value. Operation is atomic.<br /> 1598 Operation:<br /> 1599 <code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET) 1600 UINT64 P = *VM; *VM = *VM - VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 1601 <h4>GLOBAL_ATOMIC_SWAP</h4> 1602 <p>Opcode: 64 (0x40) for GCN 1.4<br /> 1603 Syntax: GLOBAL_ATOMIC_SWAP VDST, VADDR(2), VDATA, SADDR(2)|OFF<br /> 1604 Description: Store VDATA dword into global address. If GLC flag is set then 1605 return previous value from global address to VDST, otherwise keep old value from VDST. 1606 Operation is atomic.<br /> 1607 Operation:<br /> 1608 <code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET) 1609 UINT32 P = *VM; *VM = VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 1610 <h4>GLOBAL_ATOMIC_SWAP_X2</h4> 1611 <p>Opcode: 96 (0x60) for GCN 1.4<br /> 1612 Syntax: GLOBAL_ATOMIC_SWAP_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br /> 1613 Description: Store VDATA 64-bit word into global address. If GLC flag is set then 1614 return previous value from global address to VDST, otherwise keep old value from VDST. 1615 Operation is atomic.<br /> 1616 Operation:<br /> 1617 <code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET) 1618 UINT64 P = *VM; *VM = VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 1619 <h4>GLOBAL_ATOMIC_UMAX</h4> 1620 <p>Opcode: 71 (0x47) for GCN 1.4<br /> 1621 Syntax: GLOBAL_ATOMIC_UMAX VDST, VADDR(2), VDATA, SADDR(2)|OFF<br /> 1622 Description: Choose greatest unsigned 32-bit value from VDATA and from global address, 1623 and store result to this address. 1624 If GLC flag is set then return previous value from this address to VDST, otherwise keep 1625 VDST value. Operation is atomic.<br /> 1626 Operation:<br /> 1627 <code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET) 1628 UINT32 P = *VM; *VM = MAX(*VM, (UINT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 1629 <h4>GLOBAL_ATOMIC_UMAX_X2</h4> 1630 <p>Opcode: 103 (0x67) for GCN 1.4<br /> 1631 Syntax: GLOBAL_ATOMIC_UMAX_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br /> 1632 Description: Choose greatest unsigned 64-bit value from VDATA and from global address, 1633 and store result to this address. 1634 If GLC flag is set then return previous value from this address to VDST, otherwise keep 1635 VDST value. Operation is atomic.<br /> 1636 Operation:<br /> 1637 <code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET) 1638 UINT64 P = *VM; *VM = MAX(*VM, (UINT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 1639 <h4>GLOBAL_ATOMIC_UMIN</h4> 1640 <p>Opcode: 69 (0x45) for GCN 1.4<br /> 1641 Syntax: GLOBAL_ATOMIC_UMIN VDST, VADDR(2), VDATA, SADDR(2)|OFF<br /> 1642 Description: Choose smallest unsigned 32-bit value from VDATA and from global address, 1643 and store result to this address. 1644 If GLC flag is set then return previous value from this address to VDST, otherwise keep 1645 VDST value. Operation is atomic.<br /> 1646 Operation:<br /> 1647 <code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET) 1648 UINT32 P = *VM; *VM = MIN(*VM, (UINT32)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 1649 <h4>GLOBAL_ATOMIC_UMIN_X2</h4> 1650 <p>Opcode: 101 (0x65) for GCN 1.4<br /> 1651 Syntax: GLOBAL_ATOMIC_UMIN_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br /> 1652 Description: Choose smallest unsigned 64-bit value from VDATA and from global address, 1653 and store result to this address. 1654 If GLC flag is set then return previous value from this address to VDST, otherwise keep 1655 VDST value. Operation is atomic.<br /> 1656 Operation:<br /> 1657 <code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET) 1658 UINT64 P = *VM; *VM = MIN(*VM, (UINT64)VDATA); VDST = (GLC) ? P : VDST // atomic</code></p> 1659 <h4>GLOBAL_ATOMIC_XOR</h4> 1660 <p>Opcode: 74 (0x4a) for GCN 1.4<br /> 1661 Syntax: GLOBAL_ATOMIC_XOR VDST, VADDR(2), VDATA, SADDR(2)|OFF<br /> 1662 Description: Do bitwise XOR on VDATA and value of global address, 1663 and store result to this address. If GLC flag is set then return previous value 1664 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 1665 Operation:<br /> 1666 <code>UINT32* VM = (UINT32*)(VADDR + SADDR + INST_OFFSET) 1667 UINT32 P = *VM; *VM = *VM ^ VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 1668 <h4>GLOBAL_ATOMIC_XOR_X2</h4> 1669 <p>Opcode: 106 (0x6a) for GCN 1.4<br /> 1670 Syntax: GLOBAL_ATOMIC_XOR_X2 VDST(2), VADDR(2), VDATA(2), SADDR(2)|OFF<br /> 1671 Description: Do 64-bit bitwise XOR on VDATA and value of global address, 1672 and store result to this address. If GLC flag is set then return previous value 1673 from this address to VDST, otherwise keep VDST value. Operation is atomic.<br /> 1674 Operation:<br /> 1675 <code>UINT64* VM = (UINT64*)(VADDR + SADDR + INST_OFFSET) 1676 UINT64 P = *VM; *VM = *VM ^ VDATA; VDST = (GLC) ? P : VDST // atomic</code></p> 1677 <h4>GLOBAL_LOAD_DWORD</h4> 1678 <p>Opcode: 20 (0x14) for GCN 1.4<br /> 1679 Syntax: GLOBAL_LOAD_DWORD VDST, VADDR(2), SADDR(2)|OFF<br /> 1680 Description Load dword to VDST from global address.<br /> 1681 Operation:<br /> 1682 <code>VDST = *(UINT32*)(VADDR + SADDR + INST_OFFSET)</code></p> 1683 <h4>GLOBAL_LOAD_DWORDX2</h4> 1684 <p>Opcode: 21 (0x15) for GCN 1.4<br /> 1685 Syntax: GLOBAL_LOAD_DWORDX2 VDST(, VADDR(2), SADDR(2)|OFF<br /> 1686 Description Load two dwords to VDST from global address.<br /> 1687 Operation:<br /> 1688 <code>VDST = *(UINT64*)(VADDR + SADDR + INST_OFFSET)</code></p> 1689 <h4>GLOBAL_LOAD_DWORDX3</h4> 1690 <p>Opcode: 22 (0x16) for GCN 1.4<br /> 1691 Syntax: GLOBAL_LOAD_DWORDX3 VDST(3), VADDR(2), SADDR(2)|OFF<br /> 1692 Description Load three dwords to VDST from global address.<br /> 1693 Operation:<br /> 1694 <code>BYTE* VM = (VADDR + SADDR + INST_OFFSET) 1695 VDST[0] = *(UINT32*)VM 1696 VDST[1] = *(UINT32*)(VM+4) 1697 VDST[2] = *(UINT32*)(VM+8)</code></p> 1698 <h4>GLOBAL_LOAD_DWORDX4</h4> 1699 <p>Opcode: 23 (0x17) for GCN 1.4<br /> 1700 Syntax: GLOBAL_LOAD_DWORDX4 VDST(4), VADDR(2), SADDR(2)|OFF<br /> 1701 Description Load four dwords to VDST from global address.<br /> 1702 Operation:<br /> 1703 <code>BYTE* VM = (VADDR + SADDR + INST_OFFSET) 1704 VDST[0] = *(UINT32*)VM 1705 VDST[1] = *(UINT32*)(VM+4) 1706 VDST[2] = *(UINT32*)(VM+8) 1707 VDST[3] = *(UINT32*)(VM+12)</code></p> 1708 <h4>GLOBAL_LOAD_SBYTE</h4> 1709 <p>Opcode: 17 (0x11) for GCN 1.4<br /> 1710 Syntax: GLOBAL_LOAD_SBYTE VDST, VADDR(2), SADDR(2)|OFF<br /> 1711 Description: Load byte to VDST from global address with sign extending.<br /> 1712 Operation:<br /> 1713 <code>VDST = *(INT8*)(VADDR + SADDR + INST_OFFSET)</code></p> 1714 <h4>GLOBAL_LOAD_SBYTE_D16</h4> 1715 <p>Opcode: 34 (0x22) for GCN 1.4<br /> 1716 Syntax: GLOBAL_LOAD_SBYTE_D16 VDST, VADDR(2), SADDR(2)|OFF<br /> 1717 Description: Load byte to lower 16-bit part of VDST from 1718 global address with sign extending.<br /> 1719 Operation:<br /> 1720 <code>BYTE* VM = (VADDR + SADDR + INST_OFFSET) 1721 VDST = ((UINT16)*(INT8*)VM) | (VDST&0xffff0000)</code></p> 1722 <h4>GLOBAL_LOAD_SBYTE_D16_HI</h4> 1723 <p>Opcode: 35 (0x23) for GCN 1.4<br /> 1724 Syntax: GLOBAL_LOAD_SBYTE_D16_HI VDST, VADDR(2), SADDR(2)|OFF<br /> 1725 Description: Load byte to higher 16-bit part of VDST from 1726 global address with sign extending.<br /> 1727 Operation:<br /> 1728 <code>BYTE* VM = (VADDR + SADDR + INST_OFFSET) 1729 VDST = (((UINT32)*(INT8*)VM)<<16) | (VDST&0xffff)</code></p> 1730 <h4>GLOBAL_LOAD_SHORT_D16</h4> 1731 <p>Opcode: 36 (0x24) for GCN 1.4<br /> 1732 Syntax: GLOBAL_LOAD_SHORT_D16 VDST, VADDR(2), SADDR(2)|OFF<br /> 1733 Description: Load 16-bit word to lower 16-bit part of VDST from global address.<br /> 1734 Operation:<br /> 1735 <code>BYTE* VM = (VADDR + SADDR + INST_OFFSET) 1736 VDST = *(UINT16*)VM | (VDST & 0xffff0000)</code></p> 1737 <h4>GLOBAL_LOAD_SHORT_D16_HI</h4> 1738 <p>Opcode: 36 (0x24) for GCN 1.4<br /> 1739 Syntax: GLOBAL_LOAD_SHORT_D16_HI VDST, VADDR(2), SADDR(2)|OFF<br /> 1740 Description: Load 16-bit word to lower 16-bit part of VDST from global address.<br /> 1741 Operation:<br /> 1742 <code>BYTE* VM = (VADDR + SADDR + INST_OFFSET) 1743 VDST = (((UINT32)*(UINT16*)VM)<<16) | (VDST & 0xffff)</code></p> 1744 <h4>GLOBAL_LOAD_SSHORT</h4> 1745 <p>Opcode: 19 (0x13) for GCN 1.4<br /> 1746 Syntax: GLOBAL_LOAD_SSHORT VDST, VADDR(2), SADDR(2)|OFF<br /> 1747 Description: Load 16-bit word to VDST from global address with sign extending.<br /> 1748 Operation:<br /> 1749 <code>VDST = *(INT16*)(VADDR + SADDR + INST_OFFSET)</code></p> 1750 <h4>GLOBAL_LOAD_UBYTE</h4> 1751 <p>Opcode: 16 (0x10) for GCN 1.4<br /> 1752 Syntax: GLOBAL_LOAD_UBYTE VDST, VADDR(2), SADDR(2)|OFF<br /> 1753 Description: Load byte to VDST from global address with zero extending.<br /> 1754 Operation:<br /> 1755 <code>VDST = *(UINT8*)(VADDR + SADDR + INST_OFFSET)</code></p> 1756 <h4>GLOBAL_LOAD_UBYTE_D16</h4> 1757 <p>Opcode: 32 (0x20) for GCN 1.4<br /> 1758 Syntax: GLOBAL_LOAD_UBYTE_D16 VDST, VADDR(2), SADDR(2)|OFF<br /> 1759 Description: Load byte to lower 16-bit part of VDST from 1760 global address with zero extending.<br /> 1761 Operation:<br /> 1762 <code>BYTE* VM = (VADDR + SADDR + INST_OFFSET) 1763 VDST = ((UINT16)*(UINT8*)VM) | (VDST&0xffff0000)</code></p> 1764 <h4>GLOBAL_LOAD_UBYTE_D16_HI</h4> 1765 <p>Opcode: 33 (0x21) for GCN 1.4<br /> 1766 Syntax: GLOBAL_LOAD_UBYTE_D16_HI VDST, VADDR(2), SADDR(2)|OFF<br /> 1767 Description: Load byte to higher 16-bit part of VDST from 1768 global address with zero extending.<br /> 1769 Operation:<br /> 1770 <code>BYTE* VM = (VADDR + SADDR + INST_OFFSET) 1771 VDST = (((UINT32)*(UINT8*)VM)<<16) | (VDST&0xffff)</code></p> 1772 <h4>GLOBAL_LOAD_USHORT</h4> 1773 <p>Opcode: 18 (0x12) for GCN 1.4<br /> 1774 Syntax: GLOBAL_LOAD_USHORT VDST, VADDR(1:2), SADDR(2)|OFF<br /> 1775 Description: Load 16-bit word to VDST from global address with zero extending.<br /> 1776 Operation:<br /> 1777 <code>VDST = *(UINT16*)(VADDR + SADDR + INST_OFFSET)</code></p> 1778 <h4>GLOBAL_STORE_BYTE</h4> 1779 <p>Opcode: 24 (0x18) for GCN 1.4<br /> 1780 Syntax: GLOBAL_STORE_BYTE VADDR(2), VDATA, SADDR(2)|OFF<br /> 1781 Description: Store byte from VDATA to global address.<br /> 1782 Operation:<br /> 1783 <code>*(UINT8*)(VADDR + SADDR + INST_OFFSET) = VDATA&0xff</code></p> 1784 <h4>GLOBAL_STORE_BYTE_D16_HI</h4> 1785 <p>Opcode: 25 (0x19) for GCN 1.4<br /> 1786 Syntax: GLOBAL_STORE_BYTE_D16_HI VADDR(2), VDATA, SADDR(2)|OFF<br /> 1787 Description: Store byte from 16-23 bits of VDATA to global address.<br /> 1788 Operation:<br /> 1789 <code>*(UINT8*)(VADDR + SADDR + INST_OFFSET) = (VDATA>>16)&0xff</code></p> 1790 <h4>GLOBAL_STORE_DWORD</h4> 1791 <p>Opcode: 28 (0x1c) for GCN 1.4<br /> 1792 Syntax: GLOBAL_STORE_DWORD VADDR(2), VDATA, SADDR(2)|OFF<br /> 1793 Description: Store dword from VDATA to global address.<br /> 1794 Operation:<br /> 1795 <code>*(UINT32*)(VADDR + SADDR + INST_OFFSET) = VDATA</code></p> 1796 <h4>GLOBAL_STORE_DWORDX2</h4> 1797 <p>Opcode: 29 (0x1d) for GCN 1.4<br /> 1798 Syntax: GLOBAL_STORE_DWORDX2 VADDR(2), VDATA(2), SADDR(2)|OFF<br /> 1799 Description: Store two dwords from VDATA to global address.<br /> 1800 Operation:<br /> 1801 <code>*(UINT64*)(VADDR + SADDR + INST_OFFSET) = VDATA</code></p> 1802 <h4>GLOBAL_STORE_DWORDX3</h4> 1803 <p>Opcode: 30 (0x1e) for GCN 1.4<br /> 1804 Syntax: GLOBAL_STORE_DWORDX3 VADDR(2), VDATA(3), SADDR(2)|OFF<br /> 1805 Description: Store three dwords from VDATA to global address.<br /> 1806 Operation:<br /> 1807 <code>BYTE* VM = (VADDR + SADDR + INST_OFFSET) 1808 *(UINT32*)(VM) = VDATA[0] 1809 *(UINT32*)(VM+4) = VDATA[1] 1810 *(UINT32*)(VM+8) = VDATA[2]</code></p> 1811 <h4>GLOBAL_STORE_DWORDX4</h4> 1812 <p>Opcode: 31 (0x1d) for GCN 1.4<br /> 1813 Syntax: GLOBAL_STORE_DWORDX4 VADDR(2), VDATA(4), SADDR(2)|OFF<br /> 1814 Description: Store four dwords from VDATA to global address.<br /> 1815 Operation:<br /> 1816 <code>BYTE* VM = (VADDR + SADDR + INST_OFFSET) 1817 *(UINT32*)(VM) = VDATA[0] 1818 *(UINT32*)(VM+4) = VDATA[1] 1819 *(UINT32*)(VM+8) = VDATA[2] 1820 *(UINT32*)(VM+12) = VDATA[3]</code></p> 1821 <h4>GLOBAL_STORE_SHORT</h4> 1822 <p>Opcode: 26 (0x1a) for GCN 1.4<br /> 1823 Syntax: GLOBAL_STORE_SHORT VADDR(2), VDATA, SADDR(2)|OFF<br /> 1824 Description: Store 16-bit word from VDATA to global address.<br /> 1825 Operation:<br /> 1826 <code>*(UINT16*)(VADDR + SADDR + INST_OFFSET) = VDATA&0xffff</code></p> 1827 <h4>GLOBAL_STORE_SHORT_D16_HI</h4> 1828 <p>Opcode: 27 (0x1b) for GCN 1.4<br /> 1829 Syntax: GLOBAL_STORE_SHORT_D16_HI VADDR(2), VDATA, SADDR(2)|OFF<br /> 1830 Description: Store 16-bit word from higher 16-bit part of VDATA to global address.<br /> 1831 Operation:<br /> 1832 <code>*(UINT16*)(VADDR + SADDR + INST_OFFSET) = VDATA>>16</code></p> 941 1833 }}}