GcnInstrsVop3 – CLRadeonExtender

Context Navigation

Version 2 (modified by trac, 8 years ago) (diff)
--

Back to Table of content

GCN ISA VOP3 instructions

The VOP3 instructions requires two dword to store in program code. By default, these encoding of these instructions gives all features of the VOP3 encoding: all possible modifiers, any source operand combination.

List of fields for the VOP3A/VOP3B encoding (GCN 1.0/1.1):

Bits	Name	Description
0-7	VDST	Vector destination operand
8-10	ABS	Absolute modifiers for source operands (VOP3A)
8-14	SDST	Scalar destination operand (VOP3B)
11	CLAMP	CLAMP modifier (VOP3A)
15	CLAMP	CLAMP modifier (VOP3B)
17-25	OPCODE	Operation code
26-31	ENCODING	Encoding type. Must be 0b110100
32-40	SRC0	First (scalar or vector) source operand
41-49	SRC1	Second (scalar or vector) source operand
50-58	SRC2	Third (scalar or vector) source operand
59-60	OMOD	OMOD modifier. Multiplication modifier
61-63	NEG	Negation modifier for source operands

List of fields for VOP3A/VOP3B encoding (GCN 1.2):

Bits	Name	Description
0-7	VDST	Destination vector operand
8-10	ABS	Absolute modifiers for source operands (VOP3A)
8-14	SDST	Scalar destination operand (VOP3B)
15	CLAMP	CLAMP modifier
16-25	OPCODE	Operation code
26-31	ENCODING	Encoding type. Must be 0b110100
32-40	SRC0	First (scalar or vector) source operand
41-49	SRC1	Second (scalar or vector) source operand
50-58	SRC2	Third (scalar or vector) source operand
59-60	OMOD	OMOD modifier. Multiplication modifier
61-63	NEG	Negation modifier for source operands

Typical syntax: INSTRUCTION VDST, SRC0, SRC1, SRC2 [MODIFIERS]

Modifiers:

CLAMP - clamps destination floating point value in range 0.0-1.0
MUL:2, MUL:4, DIV:2 - OMOD modifiers. Multiply destination floating point value by 2.0, 4.0 or 0.5 respectively. Clamping applied after OMOD modifier.
-SRC - negate floating point value from source operand. Applied after ABS modifier.
ABS(SRC) - apply absolute value to source operand

NOTE: OMOD modifier doesn't work if output denormals are allowed (5 bit of MODE register for single precision or 7 bit for double precision).
NOTE: OMOD and CLAMP modifier affects only for instruction that output is floating point value.
NOTE: ABS and negation is applied to source operand for any instruction.

Negation and absolute value can be combined: -ABS(V0). Modifiers CLAMP and OMOD (MUL:2, MUL:4 and DIV:2) can be given in random order.

Limitations for operands:

only one SGPR can be read by instruction. Multiple occurrences of this same SGPR is allowed
only one literal constant can be used, and only when a SGPR or M0 is not used in source operands
only SRC0 can holds LDS_DIRECT

Unaligned pairs of SGPRs are allowed in source operands.

List of the instructions by opcode (GCN 1.0/1.1):

Opcode	Mnemonic (GCN 1.0)	Mnemonic (GCN 1.0)
320 (0x140)	V_MAD_LEGACY_F32	V_MAD_LEGACY_F32
321 (0x141)	V_MAD_F32	V_MAD_F32
322 (0x142)	V_MAD_I32_I24	V_MAD_I32_I24
323 (0x143)	V_MAD_U32_U24	V_MAD_U32_U24
324 (0x144)	V_CUBEID_F32	V_CUBEID_F32
325 (0x145)	V_CUBESC_F32	V_CUBESC_F32
326 (0x146)	V_CUBETC_F32	V_CUBETC_F32
327 (0x147)	V_CUBEMA_F32	V_CUBEMA_F32
328 (0x148)	V_BFE_U32	V_BFE_U32
329 (0x149)	V_BFE_I32	V_BFE_I32
330 (0x14a)	V_BFI_B32	V_BFI_B32
331 (0x14b)	V_FMA_F32	V_FMA_F32
332 (0x14c)	V_FMA_F64	V_FMA_F64
333 (0x14d)	V_LERP_U8	V_LERP_U8
334 (0x14e)	V_ALIGNBIT_B32	V_ALIGNBIT_B32
335 (0x14f)	V_ALIGNBYTE_B32	V_ALIGNBYTE_B32
336 (0x150)	V_MULLIT_F32	V_MULLIT_F32
337 (0x151)	V_MIN3_F32	V_MIN3_F32
338 (0x152)	V_MIN3_I32	V_MIN3_I32
339 (0x153)	V_MIN3_U32	V_MIN3_U32
340 (0x154)	V_MAX3_F32	V_MAX3_F32
341 (0x155)	V_MAX3_I32	V_MAX3_I32
342 (0x156)	V_MAX3_U32	V_MAX3_U32
343 (0x157)	V_MED3_F32	V_MED3_F32
344 (0x158)	V_MED3_I32	V_MED3_I32
345 (0x159)	V_MED3_U32	V_MED3_U32
346 (0x15a)	V_SAD_U8	V_SAD_U8
347 (0x15b)	V_SAD_HI_U8	V_SAD_HI_U8
348 (0x15c)	V_SAD_U16	V_SAD_U16
349 (0x15d)	V_SAD_U32	V_SAD_U32
350 (0x15e)	V_CVT_PK_U8_F32	V_CVT_PK_U8_F32
351 (0x15f)	V_DIV_FIXUP_F32	V_DIV_FIXUP_F32
352 (0x160)	V_DIV_FIXUP_F64	V_DIV_FIXUP_F64
353 (0x161)	V_LSHL_B64	V_LSHL_B64
354 (0x162)	V_LSHR_B64	V_LSHR_B64
355 (0x163)	V_ASHR_I64	V_ASHR_I64
356 (0x164)	V_ADD_F64	V_ADD_F64
357 (0x165)	V_MUL_F64	V_MUL_F64
358 (0x166)	V_MIN_F64	V_MIN_F64
359 (0x167)	V_MAX_F64	V_MAX_F64
360 (0x168)	V_LDEXP_F64	V_LDEXP_F64
361 (0x169)	V_MUL_LO_U32	V_MUL_LO_U32
362 (0x16a)	V_MUL_HI_U32	V_MUL_HI_U32
363 (0x16b)	V_MUL_LO_I32	V_MUL_LO_I32
364 (0x16c)	V_MUL_HI_I32	V_MUL_HI_I32
365 (0x16d)	V_DIV_SCALE_F32 (VOP3B)	V_DIV_SCALE_F32 (VOP3B)
366 (0x16e)	V_DIV_SCALE_F64 (VOP3B)	V_DIV_SCALE_F64 (VOP3B)
367 (0x16f)	V_DIV_FMAS_F32	V_DIV_FMAS_F32
368 (0x170)	V_DIV_FMAS_F64	V_DIV_FMAS_F64
369 (0x171)	V_MSAD_U8	V_MSAD_U8
370 (0x172)	V_QSAD_U8	V_QSAD_PK_U16_U8
371 (0x173)	V_MQSAD_U8	V_MQSAD_PK_U16_U8
372 (0x174)	V_TRIG_PREOP_F64	V_TRIG_PREOP_F64
373 (0x175)	--	V_MQSAD_U32_U8
374 (0x176)	--	V_MAD_U64_U32 (VOP3B)
375 (0x177)	--	V_MAD_I64_I32 (VOP3B)

List of the instructions by opcode (GCN 1.2):

Opcode	Mnemonic
448 (0x1c0)	V_MAD_LEGACY_F32
449 (0x1c1)	V_MAD_F32
450 (0x1c2)	V_MAD_I32_I24
451 (0x1c3)	V_MAD_U32_U24
452 (0x1c4)	V_CUBEID_F32
453 (0x1c5)	V_CUBESC_F32
454 (0x1c6)	V_CUBETC_F32
455 (0x1c7)	V_CUBEMA_F32
456 (0x1c8)	V_BFE_U32
457 (0x1c9)	V_BFE_I32
458 (0x1ca)	V_BFI_B32
459 (0x1cb)	V_FMA_F32
460 (0x1cc)	V_FMA_F64
461 (0x1cd)	V_LERP_U8
462 (0x1ce)	V_ALIGNBIT_B32
463 (0x1cf)	V_ALIGNBYTE_B32
464 (0x1d0)	V_MIN3_F32
465 (0x1d1)	V_MIN3_I32
466 (0x1d2)	V_MIN3_U32
467 (0x1d3)	V_MAX3_F32
468 (0x1d4)	V_MAX3_I32
469 (0x1d5)	V_MAX3_U32
470 (0x1d6)	V_MED3_F32
471 (0x1d7)	V_MED3_I32
472 (0x1d8)	V_MED3_U32
473 (0x1d9)	V_SAD_U8
474 (0x1da)	V_SAD_HI_U8
475 (0x1db)	V_SAD_U16
476 (0x1dc)	V_SAD_U32
477 (0x1dd)	V_CVT_PK_U8_F32
478 (0x1de)	V_DIV_FIXUP_F32
479 (0x1df)	V_DIV_FIXUP_F64
480 (0x1e0)	V_DIV_SCALE_F32 (VOP3B)
481 (0x1e1)	V_DIV_SCALE_F64 (VOP3B)
482 (0x1e2)	V_DIV_FMAS_F32
483 (0x1e3)	V_DIV_FMAS_F64
484 (0x1e4)	V_MSAD_U8
485 (0x1e5)	V_QSAD_PK_U16_U8
486 (0x1e6)	V_MQSAD_PK_U16_U8
487 (0x1e7)	V_MQSAD_U32_U8
488 (0x1e8)	V_MAD_U64_U32 (VOP3B)
489 (0x1e9)	V_MAD_I64_I32 (VOP3B)
490 (0x1ea)	V_MAD_F16
491 (0x1eb)	V_MAD_U16
492 (0x1ec)	V_MAD_I16
493 (0x1ed)	V_PERM_B32
494 (0x1ee)	V_FMA_F16
495 (0x1ef)	V_DIV_FIXUP_F16
496 (0x1f0)	V_CVT_PKACCUM_U8_F32
624 (0x270)	V_INTERP_P1_F32 (VINTRP)
625 (0x271)	V_INTERP_P2_F32 (VINTRP)
626 (0x272)	V_INTERP_MOV_F32 (VINTRP)
627 (0x273)	V_INTERP_P1LL_F16 (VINTRP)
628 (0x274)	V_INTERP_P1LV_F16 (VINTRP)
629 (0x275)	V_INTERP_P2_F16 (VINTRP)
640 (0x280)	V_ADD_F64
641 (0x281)	V_MUL_F64
642 (0x282)	V_MIN_F64
643 (0x283)	V_MAX_F64
644 (0x284)	V_LDEXP_F64
645 (0x285)	V_MUL_LO_U32
646 (0x286)	V_MUL_HI_U32
647 (0x287)	V_MUL_HI_I32
648 (0x288)	V_LDEXP_F32
649 (0x289)	V_READLANE_B32
650 (0x28a)	V_WRITELANE_B32
651 (0x28b)	V_BCNT_U32_B32
652 (0x28c)	V_MBCNT_LO_U32_B32
653 (0x28d)	V_MBCNT_HI_U32_B32
654 (0x28e)	V_MAC_LEGACY_F32
655 (0x28f)	V_LSHLREV_B64
656 (0x290)	V_LSHRREV_B64
657 (0x291)	V_ASHRREV_I64
658 (0x292)	V_TRIG_PREOP_F64
659 (0x293)	V_BFM_B32
660 (0x294)	V_CVT_PKNORM_I16_F32
661 (0x295)	V_CVT_PKNORM_U16_F32
662 (0x296)	V_CVT_PKRTZ_F16_F32
663 (0x297)	V_CVT_PK_U16_U32
664 (0x298)	V_CVT_PK_I16_I32

Instruction set

Alphabetically sorted instruction list:

V_CUBEID_F32

Opcode: 324 (0x144) for GCN 1.0/1.1; 452 (0x1c4) for GCN 1.2
Syntax: V_CUBEID_F32 VDST, SRC0, SRC1, SRC2
Description: Cubemap face identification. Determine face by comparing three single FP values: SRC0 (X), SRC1 (Y), SRC2(Z). Choose highest absolute value and check whether is negative or positive. Store floating point value of face ID: (DIM*2.0)+(V[DIM]>=0?1:0), where DIM is number of choosen dimension (X - 0, Y - 1, Z - 2); V - vector = [SRC0, SRC1, SRC2].
Operation:
FLOAT SF0 = ASFLOAT(SRC0) FLOAT SF1 = ASFLOAT(SRC1) FLOAT SF2 = ASFLOAT(SRC2) FLOAT OUT if (ABS(SF2) >= ABS(SF1) && ABS(SF2) >= ABS(SF0)) OUT = (SF2 >= 0.0) ? 4 : 5 else if (ABS(SF1) >= ABS(SF0) OUT = (SF1 >= 0.0) ? 2 : 3 else OUT = (SF0 >= 0.0) ? 0 : 1 VDST = OUT

V_MAD_F32

Opcode: 321 (0x141) for GCN 1.0/1.1; 449 (0x1c1) for GCN 1.2
Syntax: V_MAD_F32 VDST, SRC0, SRC1, SRC2
Description: Multiply FP value from SRC0 by FP value from SRC1 and add SRC2, and store result to VDST.
Operation:
VDST = ASFLOAT(SRC0) * ASFLOAT(SRC1) + ASFLOAT(SRC2)

V_MAD_I32_I24

Opcode: 322 (0x142) for GCN 1.0/1.1; 450 (0x1c2) for GCN 1.2
Syntax: V_MAD_I32_I24 VDST, SRC0, SRC1, SRC2
Description: Multiply 24-bit signed integer value from SRC0 by 24-bit signed value from SRC1, add SRC2 to this product, and and store result to VDST.
Operation:
INT32 V0 = (INT32)((SRC0&0x7fffff) | (SSRC0&0x800000 ? 0xff800000 : 0)) INT32 V1 = (INT32)((SRC1&0x7fffff) | (SSRC1&0x800000 ? 0xff800000 : 0)) VDST = V0 * V1 + SRC2

V_MAD_LEGACY_F32

Opcode: 320 (0x140) for GCN 1.0/1.1; 448 (0x1c0) for GCN 1.2
Syntax: V_MAD_LEGACY_F32 VDST, SRC0, SRC1, SRC2
Description: Multiply FP value from SRC0 by FP value from SRC1 and add result to SRC2, and store result to VDST. If one of value is 0.0 then always store SRC2 to VDST (do not apply IEEE rules for 0.0*x).
Operation:
if (ASFLOAT(SRC0)!=0.0 && ASFLOAT(SRC1)!=0.0) VDST = ASFLOAT(SRC0) * ASFLOAT(SRC1) + ASFLOAT(SRC2)

V_MAD_U32_U24

Opcode: 323 (0x143) for GCN 1.0/1.1; 451 (0x1c3) for GCN 1.2
Syntax: V_MAD_U32_U24 VDST, SRC0, SRC1, SRC2
Description: Multiply 24-bit unsigned integer value from SRC0 by 24-bit unsigned value from SRC1, add SRC2 to this product and store result to VDST.
Operation:
VDST = (UINT32)(SRC0&0xffffff) * (UINT32)(SRC1&0xffffff) + SRC2

Download in other formats:

Plain Text