= Change Log = CLRadeonExtender 0.1.9: * add AMD Navi support for assembler and disassembler * add shorter addressing of FLAT/GLOBAL/SCRATCH * add literal immediate for SMRD addressing for GCN1.1 * add Amd3 OpenCL binary format for AMD Navi for AMD OpenCL implementation * include specific extension in device name for ROCm-OpenCL platform CLRadeonExtender 0.1.8: * add chapter about binary formats to CLRX documentation * add some informations about compilation under FreeBSD * add '.nosectdiffs' to disable new section difference behaviour if new ROCm format choosen * small optimization in the AsmScope destructor. * add extra info about setting up number of the SGPRs register in documentation * fixed OpenCL detection for AMDGPU-PRO * add '.enum' pseudo-op to simplify defining enumerations * add CLRX_VERSION_NUMBER and CLRX_POLICY_UNIFIED_SGPR_COUNT * add policy to unify SGPR counting for all binary formats (by default disabled) * in documentation fix some some mistakes about building * add preliminary support for CPU architectures (untested): SPARC, IA64 and MIPS * add new '.dims' syntax for distinguish vector group ids and scalar local ids * improve CLZ32/64 for MSVC * introduce CTZ32/64 * while disassemblying determine minimal AMD driver version for GPU device type (better code detection while disassemblying) * fixed some types in documentation * update list of GPU devices in documentation * fix stupid and old bug in ImageMix sample * change a GPU device name for VEGA11 to GFX902 * fixed segfault when attempt to disassemble old Gallium binaries using new Gallium binary format * sort the kernels by an offset order by disassemblying * better input data checking while disassemblying code * add HSALayout mode for AMDCL2 format (similar code layout like in ROCm and Gallium formats) * introduce kernel code parts ('.kcode' and '.kcodeend') to AMDCL2 * check sanity of use LDS in AMD VEGA architecture (can be used only in SCRATCH and GLOBAL) * in source code add new types: GPUArchMask, AsmKernelId and AsmSectionId type. * allow constant literals in sym regranges * fixed symreg ranges checking * fixed handling some the symbol names similar to some register names (like exec_masc) * add new GPU devices to list (gfx904, gfx905, gfx906 and gfx907) * add AMD VEGA 20 instruction set * add much stuff to handle register allocation (still it doesn't work and it wasnot finished) * add a DTree structure to save memory in storing register allocation structures * fixed possible segfault while preparing to write when ASMKERN_INNER is present CLRadeonExtender 0.1.7: * update AmdCL2ABI chapter * fixed kernel arguments sizes in GalliumCompute binary format * add new GPU devices gfx902-gfx905 * update device tables for Amd Crimson drivers * small fixes in DynLibrary interface * add relocations to GalliumCompute binary format (for scratch buffer symbols) * make getXXXDisasmInputFromBinaryXX as public interface * speeding up evaluation of simple expressions without symbols * add '.for' and '.while' pseudo-ops ('for' and 'while' loops) * fixed some grammar/typos in CLRX documentation * add GPU device names from ROCm-OpenCL * handle new ROCm binary format with YAML metadatas (assembler and disassembler) * add few pseudo-ops to ROCm handling * add new pseudo-ops to set parameters in ROCm YAML metadata * fixes in GalliumCompute binary generator (for conformant with standards) * add '.reqd_work_group_size' pseudo-op (equivalent of '.cws') * add support for work_group_size_hint and vec_type hint in Amd OpenCL 2.0 binary format * some small bug fixes in ROCm disassembler * updates in README.md and INSTALL files * small sanitizations in DisasmAmd, DisasmAmdCL2 (argument type checking) * change behaviour of '.cws' (.reqd_work_group_size) while setting default values * add calculation of section differences in an expressions (for ROCm handling) * fixed invalid reads (potential segfault) after undefining symbol * fixed old stupid bug: resolve symbol value by using new value (or just if undefined then do not resolve symbol) instead old unresolved symbol value later when expression has been evaluated * Add GOT table handling in ROCm binary format * add new option '--newROCmBinFormat' * add untested support for ROCm in CLHelper and VectorAdd sample * add support for multiple OpenCL platforms in CLHelper and samples * allow te call_convetion to 0xffffffff in AMDHSA config * handle special cases with relatives while evaluating binary/logical operators * small fixes in CLRX documentation and Unix manuals * developing unfinished AsmRegAlloc * add a missing access qualifier to images 'read_write' for AMD OpenCL 2.0 CLRadeonExtender 0.1.6: * add support for Mesa3D 17.3.0 (GPU detection) * fixed segfaults during disassemblying new Gallium binaries with AMD HSA * add ability to supply defined symbols during using the CLHelper * fixed CLRXDocs mistakes in GcnSrmdInstrs, GcmSmemInstrs, GcnVopXInstrs chapters. * add GCN1.4 (VEGA) instruction's descriptons to CLRXDocs * add support for GCN 1.4 (VEGA) to samples * fixed encoding/decoding of SMEM instructions with SGPR offset (GCN 1.4) * add a missing GCN 1.4 instructions * fixed encoding/decoding of OP_SEL (GCN 1.4) * fixed encoding/decoding of DS_READ_ADDTID_B32 (GCN 1.4) * fixed encoding/decoding of TBUFFER_x_D16/BUFFER_x_D16 instructions for GCN 1.4 * fixed encoding CLAMP VOP3/VOPC instructions (GCN 1.4) * allow to use OMOD, NEG, ABS, CLAMP modifiers in VOP3/VINTRP instructions * add new VOP3/VINTRP instruction's descriptions to CLRXDocs * update GCN timings chapter in CLRXDocs CLRadeonExtender 0.1.5r1: * add detection of OpenGL to CMakeLists.txt * add more comments in the source code * fixed hanging when ROCm code have hundreds or more kernels * parameter in modifier can have any value * add 'get_version' pseudo-operation * add oldModParam mode (old modifier parameter's policy) * fixes for ROCm disassembler module * fixes for Gallium binary reader (accept new binaries with many kernels) * added support for Mesa3D 17.2.x * added Mesa3D/Gallium device names for AMD Polaris * add new exceptions to code (to distinguish type of exception) * fixed position in disassembler code in comments (mainly for Gallium/ROCm) * add CLRXCLHelper library to facilitate running assembler code on the OpenCL * move some GPU architecture versions tables to GPUId * add new testcase GPUId CLRadeonExtender 0.1.5: * ignore case in an access qualifier name's (Amd and AmdCL2) * improve handling a '\()' and '\@' * add SDWA and DPP words to set instruction encoding * fixing few CLRXDocs typos * fixes for AMD RX VEGA (GFX900) * disassembler prints an instruction's position in comments * update GcnTimings * update VectorAdd and ReverseBits for LLVM 4.0 and Mesa3D 17.0.0 * updates in ImageMix (correct workSize calculating for kernel) * small fixes in disassembler * disassembler can correctly disassemble GalliumCompute for LLVM 4.0 * add '--llvmVersion' to clrxdisasm * dump AMD HSA configuration for GalliumCompute and AmdCL2 (like in ROCm format) * disassembler add '@' to hwreg and sendmsg to make dump compatible with clrxasm * add '--HSAConfig' to dump AmdCL2 kernel configuration as AMD HSA config * add AMD HSA configuration pseudo-ops to GalliumCompute and AmdCL2 binary formats * update device list for Gallium and ROCm binary formats for recognizing device * fixed support for LLVM>=3.9 and Mesa3D>=17.0.0 in GalliumCompute * add pseudo-op '.default_hsa_features' to AmdCL2, Gallium and ROCm formats * update headers in code * make error handling more compact in assembler's code * fixed '.machine', '.codeversion' handling (do not print obsolete warnings) * add pkg-config files to installation * remove obsolete warnings in CMakeLists.txt * added GFX901 support (RX VEGA with HBCC ?) * add Config.h and amdbin/Elf.h headers to Doxygen documentation * change lowest device for GCN 1.2 to Iceland in GPUId. * add support for Windows developments environments: CygWin and MinGW * make detecting of 64-bits more portable in CMakeLists.txt (use compiler to do) * checking whether std::call_once is available for non full supported std threads * use only C++ compiler to check features (Int128Detect.cpp) CLRadeonExtender 0.1.4r1: * fixed code operation in SMRD and SMEM instructions * fixed parsing symbol register ranges begins from 'exec', 'vcc', 'tma', ... * checking end of line at parsing symbol and regvar register ranges CLRadeonExtender 0.1.4: * add AMD RX VEGA support (GCN 1.4/VEGA) * add symbol scopes * add support for 32-bit AMD OpenCL 2.0 binaries * update GPU device ids to latest drivers * add Ellesmere and Baffin support for AMD OpenCL 1.2 binaries * add support for LLVM 3.9, LLVM 4.0 and Mesa3D 17.0 * add new options to clrxasm (--llvmVersion) * add GCN 1.2 instruction set documentation * add new SMEM instruction (s_buffer_atomics) * add GDS segment size to AMD OpenCL 2.0 binaries * add code of samples for GCN 1.2 * add option to use old AMD OpenCL 1.2 binary format into samples * add editor's syntax (NotePad++, Kate, Gedit, VIM) * minor fixes in GCN assembler * add modifier's parametrization * add options to control case-sensitiviness in macro names * fixed handling AMDOCL names for 32-bit Windows environment * add installation rules for AMDGPU-PRO drivers (OpenSUSE and Ubuntu) * add new pseudo-ops '.get_64bit', '.get_arch', '.get_format', '.get_gpu' * add autodetection for LLVM and Mesa3D version * find correct AMDOCL, MesaOCL and llvm-config at runtime CLRadeonExtender 0.1.3: * ROCm binary format support * fixed '.format' pseudo-op * fixed resolving variables in some specific cases * fixed handling AmdCL2 format for device type later than GCN.1.1 * small fixes in documentation * fixed disassemblying s_waitcnt * fixed handling floating point literals in assembler and compatibility mode (bugFP) * ARMv8 (AArch64) architecture support * Android support CLRadeonExtender 0.1.2: * AMD OpenCL 2.0 support * 64-bit Gallium binary format support * support for new closed Linux and Windows drivers * new samples * documentation for OpenCL 2.0 support (includes ABI) * documentation for GCN ISA FLAT encoding * lit() specifier to distinguish literal and inline constant * alternate macro syntax * correct counting registers for automatic configuration * fixed handling of conditionals and macro pseudo-ops * disassembler can dump configuration in user-friendly form CLRadeonExtender 0.1.1: * support for Windows * register ranges, and symbol's of register ranges * GCN ISA documentation * fixed AMD Catalyst and Gallium compute binary generator * fixed clrxasm CLRadeonExtender 0.1: * first published version