C API (`dlpack.h`)¶

Macros¶

DLPACK_EXTERN_C¶: Compatibility with C++.

DLPACK_MAJOR_VERSION¶: The current major version of dlpack.

DLPACK_MINOR_VERSION¶: The current minor version of dlpack.

DLPACK_DLL¶: DLPACK_DLL prefix for windows.

DLPACK_FLAG_BITMASK_READ_ONLY¶: bit mask to indicate that the tensor is read only.

DLPACK_FLAG_BITMASK_IS_COPIED¶

bit mask to indicate that the tensor is a copy made by the producer.

If set, the tensor is considered solely owned throughout its lifetime by the consumer, until the producer-provided deleter is invoked.

DLPACK_FLAG_BITMASK_IS_SUBBYTE_TYPE_PADDED¶

bit mask to indicate that whether a sub-byte type is packed or padded.

The default for sub-byte types (ex: fp4/fp6) is assumed packed. This flag can be set by the producer to signal that a tensor of sub-byte type is padded.

Enumerations¶

enum DLDeviceType¶

The device type in DLDevice.

Values:

enumerator kDLCPU¶: CPU device.

enumerator kDLCUDA¶: CUDA GPU device.

enumerator kDLCUDAHost¶: Pinned CUDA CPU memory by cudaMallocHost.

enumerator kDLOpenCL¶: OpenCL devices.

enumerator kDLVulkan¶: Vulkan buffer for next generation graphics.

enumerator kDLMetal¶: Metal for Apple GPU.

enumerator kDLVPI¶: Verilog simulator buffer.

enumerator kDLROCM¶: ROCm GPUs for AMD GPUs.

enumerator kDLROCMHost¶: Pinned ROCm CPU memory allocated by hipMallocHost.

enumerator kDLExtDev¶: Reserved extension device type, used for quickly test extension device The semantics can differ depending on the implementation.

enumerator kDLCUDAManaged¶: CUDA managed/unified memory allocated by cudaMallocManaged.

enumerator kDLOneAPI¶: Unified shared memory allocated on a oneAPI non-partititioned device. Call to oneAPI runtime is required to determine the device type, the USM allocation type and the sycl context it is bound to.

enumerator kDLWebGPU¶: GPU support for next generation WebGPU standard.

enumerator kDLHexagon¶: Qualcomm Hexagon DSP.

enumerator kDLMAIA¶: Microsoft MAIA devices.

enumerator kDLTrn¶: AWS Trainium.

enum DLDataTypeCode¶

The type code options DLDataType.

Values:

enumerator kDLInt¶: signed integer

enumerator kDLUInt¶: unsigned integer

enumerator kDLFloat¶: IEEE floating point.

enumerator kDLOpaqueHandle¶: Opaque handle type, reserved for testing purposes. Frameworks need to agree on the handle data type for the exchange to be well-defined.

enumerator kDLBfloat¶: bfloat16

enumerator kDLComplex¶: complex number (C/C++/Python layout: compact struct per complex number)

enumerator kDLBool¶: boolean

enumerator kDLFloat8_e3m4¶: FP8 data types.

enumerator kDLFloat8_e4m3¶

enumerator kDLFloat8_e4m3b11fnuz¶

enumerator kDLFloat8_e4m3fn¶

enumerator kDLFloat8_e4m3fnuz¶

enumerator kDLFloat8_e5m2¶

enumerator kDLFloat8_e5m2fnuz¶

enumerator kDLFloat8_e8m0fnu¶

enumerator kDLFloat6_e2m3fn¶: FP6 data types Setting bits != 6 is currently unspecified, and the producer must ensure it is set while the consumer must stop importing if the value is unexpected.

enumerator kDLFloat6_e3m2fn¶

enumerator kDLFloat4_e2m1fn¶: FP4 data types Setting bits != 4 is currently unspecified, and the producer must ensure it is set while the consumer must stop importing if the value is unexpected.

Structs¶

struct DLPackVersion¶

The DLPack version.

A change in major version indicates that we have changed the data layout of the ABI - DLManagedTensorVersioned.

A change in minor version indicates that we have added new code, such as a new device type, but the ABI is kept the same.

If an obtained DLPack tensor has a major version that disagrees with the version number specified in this header file (i.e. major != DLPACK_MAJOR_VERSION), the consumer must call the deleter (and it is safe to do so). It is not safe to access any other fields as the memory layout will have changed.

In the case of a minor version mismatch, the tensor can be safely used as long as the consumer knows how to interpret all fields. Minor version updates indicate the addition of enumeration values.

Public Members

uint32_t major¶: DLPack major version.

uint32_t minor¶: DLPack minor version.

struct DLDevice¶

A Device for Tensor and operator.

Public Members

DLDeviceType device_type¶: The device type used in the device.

int32_t device_id¶: The device index. For vanilla CPU memory, pinned memory, or managed memory, this is set to 0.

struct DLDataType¶

The data type the tensor can hold. The data type is assumed to follow the native endian-ness. An explicit error message should be raised when attempting to export an array with non-native endianness.

Examples

float: type_code = 2, bits = 32, lanes = 1
float4(vectorized 4 float): type_code = 2, bits = 32, lanes = 4
int8: type_code = 0, bits = 8, lanes = 1
std::complex<float>: type_code = 5, bits = 64, lanes = 1
bool: type_code = 6, bits = 8, lanes = 1 (as per common array library convention, the underlying storage size of bool is 8 bits)
float8_e4m3: type_code = 8, bits = 8, lanes = 1 (packed in memory)
float6_e3m2fn: type_code = 16, bits = 6, lanes = 1 (packed in memory)
float4_e2m1fn: type_code = 17, bits = 4, lanes = 1 (packed in memory)

When a sub-byte type is packed, DLPack requires the data to be in little bit-endian, i.e., for a packed data set D ((D >> (i * bits)) && bit_mask) stores the i-th element.

Public Members

uint8_t code¶: Type code of base types. We keep it uint8_t instead of DLDataTypeCode for minimal memory footprint, but the value should be one of DLDataTypeCode enum values.

uint8_t bits¶: Number of bits, common choices are 8, 16, 32.

uint16_t lanes¶: Number of lanes in the type, used for vector types.

struct DLTensor¶

Plain C Tensor object, does not manage memory.

Public Members

void *data¶

The data pointer points to the allocated data. This will be CUDA device pointer or cl_mem handle in OpenCL. It may be opaque on some device types. This pointer is always aligned to 256 bytes as in CUDA. The byte_offset field should be used to point to the beginning of the data.

Note that as of Nov 2021, multiple libraries (CuPy, PyTorch, TensorFlow, TVM, perhaps others) do not adhere to this 256 byte alignment requirement on CPU/CUDA/ROCm, and always use byte_offset=0. This must be fixed (after which this note will be updated); at the moment it is recommended to not rely on the data pointer being correctly aligned.

For given DLTensor, the size of memory required to store the contents of data is calculated as follows:

static inline size_t GetDataSize(const DLTensor* t) {
  size_t size = 1;
  for (tvm_index_t i = 0; i < t->ndim; ++i) {
    size *= t->shape[i];
  }
  size *= (t->dtype.bits * t->dtype.lanes + 7) / 8;
  return size;
}

Note that if the tensor is of size zero, then the data pointer should be set to NULL.

DLDevice device¶: The device of the tensor.

int32_t ndim¶: Number of dimensions.

DLDataType dtype¶: The data type of the pointer.

int64_t *shape¶: The shape of the tensor.

int64_t *strides¶: strides of the tensor (in number of elements, not bytes) can be NULL, indicating tensor is compact and row-majored.

uint64_t byte_offset¶: The offset in bytes to the beginning pointer to data.

struct DLManagedTensor¶

C Tensor object, manage memory of DLTensor. This data structure is intended to facilitate the borrowing of DLTensor by another framework. It is not meant to transfer the tensor. When the borrowing framework doesn’t need the tensor, it should call the deleter to notify the host that the resource is no longer needed.

C API (dlpack.h)¶

Macros¶

Enumerations¶

Structs¶

C API (`dlpack.h`)¶