Changeset 1783 in CLRX
 Timestamp:
 Dec 5, 2015, 10:19:54 PM (4 years ago)
 Location:
 CLRadeonExtender/trunk
 Files:

 5 edited
Legend:
 Unmodified
 Added
 Removed

CLRadeonExtender/trunk/amdasm/GCNInstructions.cpp
r1523 r1783 1703 1703 { "v_alignbit_b32", GCNENC_VOP3A, GCN_STDMODE, 334, ARCH_GCN_1_0_1 }, 1704 1704 { "v_alignbyte_b32", GCNENC_VOP3A, GCN_STDMODE, 335, ARCH_GCN_1_0_1 }, 1705 { "v_mullit_f32", GCNENC_VOP3A, GCN_S RC2_NONE, /* ??? */336, ARCH_GCN_1_0_1 },1705 { "v_mullit_f32", GCNENC_VOP3A, GCN_STDMODE, 336, ARCH_GCN_1_0_1 }, 1706 1706 { "v_min3_f32", GCNENC_VOP3A, GCN_STDMODE, 337, ARCH_GCN_1_0_1 }, 1707 1707 { "v_min3_i32", GCNENC_VOP3A, GCN_STDMODE, 338, ARCH_GCN_1_0_1 }, 
CLRadeonExtender/trunk/doc/GcnInstrsSop2.md
r1743 r1783 199 199 Syntax: S_BFE_I32 SDST, SSRC0, SSRC1 200 200 Description: Extracts bits in SSRC0 from range (SSRC1&31) with length ((SSRC1>>16)&0x7f) 201 and extend sign from last bit of extracted value .201 and extend sign from last bit of extracted value, and store result to SDST. 202 202 If result is nonzero store 1 to SCC, otherwise store 0 to SCC. 203 203 Operation: … … 219 219 Syntax: S_BFE_I64 SDST, SSRC0, SSRC1 220 220 Description: Extracts bits in SSRC0 from range (SSRC1&63) with length ((SSRC1>>16)&0x7f) 221 and extend sign from last bit of extracted value .221 and extend sign from last bit of extracted value, and store result to SDST. 222 222 If result is nonzero store 1 to SCC, otherwise store 0 to SCC. 223 223 Operation: … … 238 238 Opcode: 39 (0x27) for GCN 1.0/1.1; 37 (0x25) for GCN 1.2 239 239 Syntax: S_BFE_U32 SDST, SSRC0, SSRC1 240 Description: Extracts bits in SSRC0 from range (SSRC1&31) with length ((SSRC1>>16)&0x7f) .241 If result is nonzero store 1 to SCC, otherwise store 0 to SCC.240 Description: Extracts bits in SSRC0 from range (SSRC1&31) with length ((SSRC1>>16)&0x7f), 241 and store result to SDST. If result is nonzero store 1 to SCC, otherwise store 0 to SCC. 242 242 Operation: 243 243 ``` … … 257 257 Opcode: 41 (0x29) for GCN 1.0/1.1; 39 (0x27) for GCN 1.2 258 258 Syntax: S_BFE_U64 SDST(2), SSRC0(2), SSRC1 259 Description: Extracts bits in SSRC0 from range (SSRC1&63) with length ((SSRC1>>16)&0x7f) .260 If result is nonzero store 1 to SCC, otherwise store 0 to SCC.259 Description: Extracts bits in SSRC0 from range (SSRC1&63) with length ((SSRC1>>16)&0x7f), 260 and store result to SDST. If result is nonzero store 1 to SCC, otherwise store 0 to SCC. 261 261 SDST, SSRC0 are 64bit, SSRC1 is 32bit. 262 262 Operation: 
CLRadeonExtender/trunk/doc/GcnInstrsVop3.md
r1782 r1783 217 217 Alphabetically sorted instruction list: 218 218 219 #### V_ALIGNBIT_B32 220 221 Opcode: 334 (0x14e) for GCN 1.0/1.1; 462 (0x1ce) for GCN 1.2 222 Syntax: V_ALIGNBIT_B32 VDST, SRC0, SRC1, SRC2 223 Description: Align bit. Shift right bits in 64bit stored in SRC1 (low part) and 224 SRC0 (high part) by SRC2&31 bits, and store low 32bit of the result in VDST. 225 Operation: 226 ``` 227 VDST = (((UINT64)SRC0)<<32)  SRC1) >> (SRC2&31) 228 ``` 229 230 #### V_ALIGNBYTE_B32 231 232 Opcode: 335 (0x14f) for GCN 1.0/1.1; 463 (0x1cf) for GCN 1.2 233 Syntax: V_ALIGNBYTE_B32 VDST, SRC0, SRC1, SRC2 234 Description: Align bit. Shift right bits in 64bit stored in SRC1 (low part) and 235 SRC0 (high part) by (SRC2&3)*8 bits, and store low 32bit of the result in VDST. 236 Operation: 237 ``` 238 VDST = (((UINT64)SRC0)<<32)  SRC1) >> ((SRC2&3)*8) 239 ``` 240 241 #### V_BFE_I32 242 243 Opcode: 329 (0x149) for GCN 1.0/1.1; 457 (0x1c9) for GCN 1.2 244 Syntax: V_BFE_I32 VDST, SRC0, SRC1, SRC2 245 Description: Extracts bits in SRC0 from range (SRC1&31) with length (SRC2&31) 246 and extend sign from last bit of extracted value, and store result to VDST. 247 Operation: 248 ``` 249 UINT8 shift = SRC1 & 31 250 UINT8 length = SRC2 & 31 251 if (length==0) 252 VDST = 0 253 if (shift+length < 32) 254 VDST = (INT32)(SRC0 << (32  shift  length)) >> (32  length) 255 else 256 VDST = (INT32)SRC0 >> shift 257 ``` 258 259 #### V_BFE_U32 260 261 Opcode: 328 (0x148) for GCN 1.0/1.1; 456 (0x1c8) for GCN 1.2 262 Syntax: V_BFE_U32 VDST, SRC0, SRC1, SRC2 263 Description: Extracts bits in SRC0 from range SRC1&31 with length SRC2&31, and 264 store result to VDST. 265 Operation: 266 ``` 267 UINT8 shift = SRC1 & 31 268 UINT8 length = SRC2 & 31 269 if (length==0) 270 VDST = 0 271 if (shift+length < 32) 272 VDST = SRC0 << (32  shift  length) >> (32  length) 273 else 274 VDST = SRC0 >> shift 275 ``` 276 277 #### V_BFI_B32 278 279 Opcode: 330 (0x14a) for GCN 1.0/1.1; 458 (0x1ca) for GCN 1.2 280 Syntax: V_BFI_B32 VDST, SRC0, SRC1, SRC2 281 Description: Replace bits in SRC2 by bits from SRC1 marked by bits in SRC0, and store result 282 to VDST. 283 Operation: 284 ``` 285 VDST = (SRC0 & SRC1)  (~SRC0 & SRC2) 286 ``` 287 288 219 289 #### V_CUBEID_F32 220 290 … … 241 311 ``` 242 312 313 #### V_CUBEMA_F32 314 315 Opcode: 327 (0x147) for GCN 1.0/1.1; 455 (0x1c7) for GCN 1.2 316 Syntax: V_CUBEMA_F32 VDST, SRC0, SRC1, SRC2 317 Description: Cubemap Major Axis. Choose highest absolute value from all three FP values 318 (SRC0, SRC1, SRC2) and multiply choosen FP value by two. Result is stored in VDST. 319 Operation: 320 ``` 321 FLOAT SF0 = ASFLOAT(SRC0) 322 FLOAT SF1 = ASFLOAT(SRC1) 323 FLOAT SF2 = ASFLOAT(SRC2) 324 if (ABS(SF2) >= ABS(SF1) && ABS(SF2) >= ABS(SF0)) 325 OUT = 2*SF2 326 else if (ABS(SF1) >= ABS(SF0) 327 OUT = 2*SF1 328 else 329 OUT = 2*SF0 330 VDST = OUT 331 ``` 332 333 #### V_CUBESC_F32 334 335 Opcode: 325 (0x145) for GCN 1.0/1.1; 453 (0x1c5) for GCN 1.2 336 Syntax: V_CUBESC_F32 VDST, SRC0, SRC1, SRC2 337 Description: Cubemap S coordination. Algorithm below. 338 Operation: 339 ``` 340 FLOAT SF0 = ASFLOAT(SRC0) 341 FLOAT SF1 = ASFLOAT(SRC1) 342 FLOAT SF2 = ASFLOAT(SRC2) 343 if (ABS(SF2) >= ABS(SF1) && ABS(SF2) >= ABS(SF0)) 344 OUT = SIGN((SF2) * SF0 345 else if (ABS(SF1) >= ABS(SF0) 346 OUT = SF0 347 else 348 OUT = SIGN((SF0) * SF2 349 VDST = OUT 350 ``` 351 352 #### V_CUBETC_F32 353 354 Opcode: 326 (0x146) for GCN 1.0/1.1; 454 (0x1c6) for GCN 1.2 355 Syntax: V_CUBETC_F32 VDST, SRC0, SRC1, SRC2 356 Description: Cubemap T coordination. Algorithm below. 357 Operation: 358 ``` 359 FLOAT SF0 = ASFLOAT(SRC0) 360 FLOAT SF1 = ASFLOAT(SRC1) 361 FLOAT SF2 = ASFLOAT(SRC2) 362 if (ABS(SF2) >= ABS(SF1) && ABS(SF2) >= ABS(SF0)) 363 OUT = SF1 364 else if (ABS(SF1) >= ABS(SF0) 365 OUT = SIGN(SF1) * SF2 366 else 367 OUT = SF1 368 VDST = OUT 369 ``` 370 371 #### V_FMA_F32 372 373 Opcode: 331 (0x14b) for GCN 1.0/1.1; 459 (0x1cb) for GCN 1.2 374 Syntax: V_FMA_F32 VDST, SRC0, SRC1, SRC2 375 Description: Fused multiply addition on single floating point values from 376 SRC0, SRC1 and SRC2. Result stored in VDST. 377 Operation: 378 ``` 379 // SRC0*SRC1+SRC2 380 VDST = FMA(ASFLOAT(SRC0), ASFLOAT(SRC1), ASFLOAT(SRC2)) 381 ``` 382 383 #### V_FMA_F64 384 385 Opcode: 332 (0x14c) for GCN 1.0/1.1; 460 (0x1cc) for GCN 1.2 386 Syntax: V_FMA_F64 VDST(2), SRC0(2), SRC1(2), SRC2(2) 387 Description: Fused multiply addition on double floating point values from 388 SRC0, SRC1 and SRC2. Result stored in VDST. 389 Operation: 390 ``` 391 // SRC0*SRC1+SRC2 392 VDST = FMA(ASDOUBLE(SRC0), ASDOUBLE(SRC1), ASDOUBLE(SRC2)) 393 ``` 394 395 #### V_LERP_U8 396 397 Opcode: 333 (0x14d) for GCN 1.0/1.1; 461 (0x1cd) for GCN 1.2 398 Syntax: V_LERP_U8 VDST, SRC0, SRC1, SRC2 399 Description: For each byte of dword, calculate average from SRC0 byte and SRC1 byte with 400 rounding mode defined in first of the byte SRC2. If rounding bit is set then result for 401 that byte is rounded, otherwise truncated. All bytes will be stored in VDST. 402 Operation: 403 ``` 404 for (UINT8 i = 0; i < 4; i++) 405 { 406 UINT8 S0 = (SRC0 >> (i*8)) & 0xff 407 UINT8 S1 = (SRC1 >> (i*8)) & 0xff 408 UINT8 S2 = (SRC2 >> (i*8)) & 1 409 VDST = (VDST & ~(255U<<(i*8)))  (((S0+S1+S2) >> 1) << (i*8)) 410 } 411 ``` 412 243 413 #### V_MAD_F32 244 414 … … 288 458 VDST = (UINT32)(SRC0&0xffffff) * (UINT32)(SRC1&0xffffff) + SRC2 289 459 ``` 460 461 #### V_MIN3_F32 462 463 Opcode: 337 (0x151) for GCN 1.0/1.1; 465 (0x1d0) for GCN 1.2 464 Syntax: V_MIN3_B32 VDST, SRC0, SRC1, SRC2 465 
CLRadeonExtender/trunk/tests/amdasm/GCNAsmOpc11.cpp
r1776 r1783 1646 1646 { " v_alignbyte_b32 v55, v79, v166, v229", 1647 1647 0xd29e0037U, 0x07974d4fU, true, true, "" }, 1648 { " v_mullit_f32 v55, v79, v166", 0xd2a00037U, 0x00034d4fU, true, true, "" }, 1649 { " v_mullit_f32 v55, s79, v166", 0xd2a00037U, 0x00034c4fU, true, true, "" }, 1648 { " v_mullit_f32 v55, v79, v166, s0", 0xd2a00037U, 0x00034d4fU, true, true, "" }, 1649 { " v_mullit_f32 v55, v79, v166, s0", 0xd2a00037U, 0x00034d4fU, true, true, "" }, 1650 { " v_mullit_f32 v55, s79, v166, v229", 0xd2a00037U, 0x07974c4fU, true, true, "" }, 1650 1651 { " v_min3_f32 v55, v79, v166, v229", 0xd2a20037U, 0x07974d4fU, true, true, "" }, 1651 1652 { " v_min3_i32 v55, v79, v166, v229", 0xd2a40037U, 0x07974d4fU, true, true, "" }, 
CLRadeonExtender/trunk/tests/amdasm/GCNDisasmOpc11.cpp
r1453 r1783 1850 1850 { 0xd29e0037U, 0x07974d4fU, true, " v_alignbyte_b32 v55, v79, v166, v229\n" }, 1851 1851 { 0xd2a00037U, 0x07974d4fU, true, " v_mullit_f32 " 1852 "v55, v79, v166 vsrc2=0x1e5\n" },1852 "v55, v79, v166, v229\n" }, 1853 1853 { 0xd2a00037U, 0x00034d4fU, true, " v_mullit_f32 " 1854 "v55, v79, v166 \n" },1854 "v55, v79, v166, s0\n" }, 1855 1855 { 0xd2a20037U, 0x07974d4fU, true, " v_min3_f32 v55, v79, v166, v229\n" }, 1856 1856 { 0xd2a40037U, 0x07974d4fU, true, " v_min3_i32 v55, v79, v166, v229\n" },
Note: See TracChangeset
for help on using the changeset viewer.