wiki:ClrxAsmRocm

Version 1 (modified by trac, 7 years ago) (diff)

--

Back to Table of content

CLRadeonExtender Assembler ROCm handling

The ROCm platform is new an open-source environment created by AMD for Radeon GPU (especially designed for HPC and their proffesional products). This platform uses HSACO binary object file format to store compiled code for GPU's.

Binary format

The binary file is stored in ELF file. The symbol table holds kernels and data's symbols. Main .text section contains all code for all kernels. Data (for example global constant datas) also stored in `.text' section. Kernel symbols points to configuration for kernel. Special offset field in configuration data's points where is kernel code.

The assembler source code divided to three parts:

  • kernel configuration
  • kernel code and data (in .text section`)

Order of these parts doesn't matter.

Kernel function should to be aligned to 256 byte boundary.

Scalar register allocation

Assembler for ROCm format counts all SGPR registers and add extra registers (VCC, FLAT_SCRATCH, XNACK_MASK) if any used to register pool. Special fields determines what extra SGPR extra has been added.

List of the specific pseudo-operations

.arch_minor

Syntax: .arch_minor ARCH_MINOR

Set architecture minor number.

.arch_stepping

Syntax: .arch_minor ARCH_STEPPING

Set architecture stepping number.

.call_convention

Syntax: .call_convention CALL_CONV

This pseudo-op must be inside kernel configuration (.config). Set call convention for kernel.

.codeversion

This pseudo-op must be inside kernel configuration (.config).

.config

.control_directive

.debug_private_segment_buffer_sgpr

This pseudo-op must be inside kernel configuration (.config).

.debug_wavefront_private_segment_offset_sgpr

This pseudo-op must be inside kernel configuration (.config).

.debugmode

This pseudo-op must be inside kernel configuration (.config). Enable usage of the DEBUG_MODE.

.dims

Syntax: .dims DIMENSIONS

This pseudo-op must be inside kernel configuration (.config). Defines what dimensions (from list: x, y, z) will be used to determine space of the kernel execution.

.dx10clamp

This pseudo-op must be inside kernel configuration (.config). Enable usage of the DX10_CLAMP.

.exceptions

This pseudo-op must be inside kernel configuration (.config).

.fkernel

.floatmode

This pseudo-op must be inside kernel configuration (.config).

.gds_segment_size

This pseudo-op must be inside kernel configuration (.config).

.group_segment_align

This pseudo-op must be inside kernel configuration (.config).

.ieeemode

This pseudo-op must be inside kernel configuration (.config).

.kcode

.kcodeend

.kernarg_segment_align

This pseudo-op must be inside kernel configuration (.config).

.kernarg_segment_size

This pseudo-op must be inside kernel configuration (.config).

.kernel_code_entry_offset

This pseudo-op must be inside kernel configuration (.config).

.kernel_code_prefetch_offset

This pseudo-op must be inside kernel configuration (.config).

.kernel_code_prefetch_size

This pseudo-op must be inside kernel configuration (.config).

.localsize

This pseudo-op must be inside kernel configuration (.config).

.machine

This pseudo-op must be inside kernel configuration (.config).

.max_scratch_backing_memory

This pseudo-op must be inside kernel configuration (.config).

.pgmrsrc1

This pseudo-op must be inside kernel configuration (.config).

.pgmrsrc2

This pseudo-op must be inside kernel configuration (.config).

.priority

This pseudo-op must be inside kernel configuration (.config).

.private_elem_size

This pseudo-op must be inside kernel configuration (.config).

.private_segment_align

This pseudo-op must be inside kernel configuration (.config).

.privmode

This pseudo-op must be inside kernel configuration (.config).

.reserved_sgprs

This pseudo-op must be inside kernel configuration (.config).

.reserved_vgprs

This pseudo-op must be inside kernel configuration (.config).

.runtime_loader_kernel_symbol

This pseudo-op must be inside kernel configuration (.config).

.scratchbuffer

This pseudo-op must be inside kernel configuration (.config).

.sgprsnum

This pseudo-op must be inside kernel configuration (.config).

.tgsize

This pseudo-op must be inside kernel configuration (.config).

.use_debug_enabled

This pseudo-op must be inside kernel configuration (.config).

.use_dispatch_id

This pseudo-op must be inside kernel configuration (.config).

.use_dispatch_ptr

This pseudo-op must be inside kernel configuration (.config).

.use_dynamic_call_stack

This pseudo-op must be inside kernel configuration (.config).

.use_flat_scratch_init

This pseudo-op must be inside kernel configuration (.config).

.use_grid_workgroup_count

This pseudo-op must be inside kernel configuration (.config).

.use_kernarg_segment_ptr

This pseudo-op must be inside kernel configuration (.config).

.use_ordered_append_gds

This pseudo-op must be inside kernel configuration (.config).

.use_private_segment_buffer

This pseudo-op must be inside kernel configuration (.config).

.use_private_segment_size

This pseudo-op must be inside kernel configuration (.config).

.use_ptr64

This pseudo-op must be inside kernel configuration (.config).

.use_queue_ptr

This pseudo-op must be inside kernel configuration (.config).

.use_xnack_enabled

This pseudo-op must be inside kernel configuration (.config).

.userdatanum

This pseudo-op must be inside kernel configuration (.config).

.vgprsnum

This pseudo-op must be inside kernel configuration (.config).

.wavefront_sgpr_count

This pseudo-op must be inside kernel configuration (.config).

.wavefront_size

This pseudo-op must be inside kernel configuration (.config).

.workgroup_fbarrier_count

This pseudo-op must be inside kernel configuration (.config).

.workgroup_group_segment_size

This pseudo-op must be inside kernel configuration (.config).

.workitem_private_segment_size

This pseudo-op must be inside kernel configuration (.config).

.workitem_vgpr_count

This pseudo-op must be inside kernel configuration (.config).