C API (dlpack.h)¶
Macros¶
-
DLPACK_EXTERN_C¶
Compatibility with C++.
-
DLPACK_MAJOR_VERSION¶
The current major version of dlpack.
-
DLPACK_MINOR_VERSION¶
The current minor version of dlpack.
-
DLPACK_DLL¶
DLPACK_DLL prefix for windows.
-
DLPACK_FLAG_BITMASK_READ_ONLY¶
bit mask to indicate that the tensor is read only.
-
DLPACK_FLAG_BITMASK_IS_COPIED¶
bit mask to indicate that the tensor is a copy made by the producer.
If set, the tensor is considered solely owned throughout its lifetime by the consumer, until the producer-provided deleter is invoked.
-
DLPACK_FLAG_BITMASK_IS_SUBBYTE_TYPE_PADDED¶
bit mask to indicate that whether a sub-byte type is packed or padded.
The default for sub-byte types (ex: fp4/fp6) is assumed packed. This flag can be set by the producer to signal that a tensor of sub-byte type is padded.
Enumerations¶
-
enum DLDeviceType¶
The device type in DLDevice.
Values:
-
enumerator kDLCPU¶
CPU device.
-
enumerator kDLCUDA¶
CUDA GPU device.
-
enumerator kDLCUDAHost¶
Pinned CUDA CPU memory by cudaMallocHost.
-
enumerator kDLOpenCL¶
OpenCL devices.
-
enumerator kDLVulkan¶
Vulkan buffer for next generation graphics.
-
enumerator kDLMetal¶
Metal for Apple GPU.
-
enumerator kDLVPI¶
Verilog simulator buffer.
-
enumerator kDLROCM¶
ROCm GPUs for AMD GPUs.
-
enumerator kDLROCMHost¶
Pinned ROCm CPU memory allocated by hipMallocHost.
-
enumerator kDLExtDev¶
Reserved extension device type, used for quickly test extension device The semantics can differ depending on the implementation.
-
enumerator kDLCUDAManaged¶
CUDA managed/unified memory allocated by cudaMallocManaged.
-
enumerator kDLOneAPI¶
Unified shared memory allocated on a oneAPI non-partititioned device. Call to oneAPI runtime is required to determine the device type, the USM allocation type and the sycl context it is bound to.
-
enumerator kDLWebGPU¶
GPU support for next generation WebGPU standard.
-
enumerator kDLHexagon¶
Qualcomm Hexagon DSP.
-
enumerator kDLMAIA¶
Microsoft MAIA devices.
-
enumerator kDLTrn¶
AWS Trainium.
-
enumerator kDLCPU¶
-
enum DLDataTypeCode¶
The type code options DLDataType.
Values:
-
enumerator kDLInt¶
signed integer
-
enumerator kDLUInt¶
unsigned integer
-
enumerator kDLFloat¶
IEEE floating point.
-
enumerator kDLOpaqueHandle¶
Opaque handle type, reserved for testing purposes. Frameworks need to agree on the handle data type for the exchange to be well-defined.
-
enumerator kDLBfloat¶
bfloat16
-
enumerator kDLComplex¶
complex number (C/C++/Python layout: compact struct per complex number)
-
enumerator kDLBool¶
boolean
-
enumerator kDLFloat8_e3m4¶
FP8 data types.
-
enumerator kDLFloat8_e4m3¶
-
enumerator kDLFloat8_e4m3b11fnuz¶
-
enumerator kDLFloat8_e4m3fn¶
-
enumerator kDLFloat8_e4m3fnuz¶
-
enumerator kDLFloat8_e5m2¶
-
enumerator kDLFloat8_e5m2fnuz¶
-
enumerator kDLFloat8_e8m0fnu¶
-
enumerator kDLFloat6_e2m3fn¶
FP6 data types Setting bits != 6 is currently unspecified, and the producer must ensure it is set while the consumer must stop importing if the value is unexpected.
-
enumerator kDLFloat6_e3m2fn¶
-
enumerator kDLFloat4_e2m1fn¶
FP4 data types Setting bits != 4 is currently unspecified, and the producer must ensure it is set while the consumer must stop importing if the value is unexpected.
-
enumerator kDLInt¶
Typedefs¶
-
typedef int (*DLPackManagedTensorAllocator)(DLTensor *prototype, DLManagedTensorVersioned **out, void *error_ctx, void (*SetError)(void *error_ctx, const char *kind, const char *message))¶
Request a producer library to create a new tensor.
Create a new
DLManagedTensorVersionedwithin the context of the producer library. The allocation is defined via the prototype DLTensor.This function is exposed by the framework through the DLPackExchangeAPI.
See also
Note
- As a C function, must not thrown C++ exceptions.
Error propagation via SetError to avoid any direct need of Python API. Due to this
SetErrormay have to ensure the GIL is held since it will presumably set a Python error.
- Param prototype:
The prototype DLTensor. Only the dtype, ndim, shape, and device fields are used.
- Param out:
The output DLManagedTensorVersioned.
- Param error_ctx:
Context for
SetError.- Param SetError:
The function to set the error.
- Return:
The owning DLManagedTensorVersioned* or NULL on failure. SetError is called exactly when NULL is returned (the implementor must ensure this).
-
typedef int (*DLPackManagedTensorFromPyObjectNoSync)(void *py_object, DLManagedTensorVersioned **out)¶
Exports a PyObject* Tensor/NDArray to a DLManagedTensorVersioned.
This function does not perform any stream synchronization. The consumer should query DLPackCurrentWorkStream to get the current work stream and launch kernels on it.
This function is exposed by the framework through the DLPackExchangeAPI.
See also
Note
- As a C function, must not thrown C++ exceptions.
- Param py_object:
The Python object to convert. Must have the same type as the one the
DLPackExchangeAPIwas discovered from.- Param out:
The output DLManagedTensorVersioned.
- Return:
The owning DLManagedTensorVersioned* or NULL on failure with a Python exception set. If the data cannot be described using DLPack this should be a BufferError if possible.
-
typedef int (*DLPackManagedTensorToPyObjectNoSync)(DLManagedTensorVersioned *tensor, void **out_py_object)¶
Imports a DLManagedTensorVersioned to a PyObject* Tensor/NDArray.
Convert an owning DLManagedTensorVersioned* to the Python tensor of the producer (implementor) library with the correct type.
This function does not perform any stream synchronization.
This function is exposed by the framework through the DLPackExchangeAPI.
See also
- Param tensor:
The DLManagedTensorVersioned to convert the ownership of the tensor is stolen.
- Param out_py_object:
The output Python object.
- Return:
0 on success, -1 on failure with a Python exception set.
-
typedef int (*DLPackDLTensorFromPyObjectNoSync)(void *py_object, DLTensor *out)¶
Exports a PyObject* Tensor/NDArray to a provided DLTensor.
This function provides a faster interface for temporary, non-owning, exchange. The producer (implementor) still owns the memory of data, strides, shape. The liveness of the DLTensor and the data it views is only guaranteed until control is returned.
This function currently assumes that the producer (implementor) can fill in the DLTensor shape and strides without the need for temporary allocations.
This function does not perform any stream synchronization. The consumer should query DLPackCurrentWorkStream to get the current work stream and launch kernels on it.
This function is exposed by the framework through the DLPackExchangeAPI.
See also
Note
- As a C function, must not thrown C++ exceptions.
- Param py_object:
The Python object to convert. Must have the same type as the one the
DLPackExchangeAPIwas discovered from.- Param out:
The output DLTensor, whose space is pre-allocated on stack.
- Return:
0 on success, -1 on failure with a Python exception set.
-
typedef int (*DLPackCurrentWorkStream)(DLDeviceType device_type, int32_t device_id, void **out_current_stream)¶
Obtain the current work stream of a device.
Obtain the current work stream of a device from the producer framework. For example, it should map to torch.cuda.current_stream in PyTorch.
When device_type is kDLCPU, the consumer do not have to query the stream and the producer can simply return NULL when queried. The consumer do not have to do anything on stream sync or setting. So CPU only framework can just provide a dummy implementation that always set out_current_stream[0] to NULL.
See also
Note
- As a C function, must not thrown C++ exceptions.
- Param device_type:
The device type.
- Param device_id:
The device id.
- Param out_current_stream:
The output current work stream.
- Return:
0 on success, -1 on failure with a Python exception set.
Structs¶
-
struct DLPackVersion¶
The DLPack version.
A change in major version indicates that we have changed the data layout of the ABI - DLManagedTensorVersioned.
A change in minor version indicates that we have added new code, such as a new device type, but the ABI is kept the same.
If an obtained DLPack tensor has a major version that disagrees with the version number specified in this header file (i.e. major != DLPACK_MAJOR_VERSION), the consumer must call the deleter (and it is safe to do so). It is not safe to access any other fields as the memory layout will have changed.
In the case of a minor version mismatch, the tensor can be safely used as long as the consumer knows how to interpret all fields. Minor version updates indicate the addition of enumeration values.
-
struct DLDevice¶
A Device for Tensor and operator.
Public Members
-
DLDeviceType device_type¶
The device type used in the device.
-
int32_t device_id¶
The device index. For vanilla CPU memory, pinned memory, or managed memory, this is set to 0.
-
DLDeviceType device_type¶
-
struct DLDataType¶
The data type the tensor can hold. The data type is assumed to follow the native endian-ness. An explicit error message should be raised when attempting to export an array with non-native endianness.
Examples
float: type_code = 2, bits = 32, lanes = 1
float4(vectorized 4 float): type_code = 2, bits = 32, lanes = 4
int8: type_code = 0, bits = 8, lanes = 1
std::complex<float>: type_code = 5, bits = 64, lanes = 1
bool: type_code = 6, bits = 8, lanes = 1 (as per common array library convention, the underlying storage size of bool is 8 bits)
float8_e4m3: type_code = 8, bits = 8, lanes = 1 (packed in memory)
float6_e3m2fn: type_code = 16, bits = 6, lanes = 1 (packed in memory)
float4_e2m1fn: type_code = 17, bits = 4, lanes = 1 (packed in memory)
When a sub-byte type is packed, DLPack requires the data to be in little bit-endian, i.e., for a packed data set D ((D >> (i * bits)) && bit_mask) stores the i-th element.
Public Members
-
uint8_t code¶
Type code of base types. We keep it uint8_t instead of DLDataTypeCode for minimal memory footprint, but the value should be one of DLDataTypeCode enum values.
-
uint8_t bits¶
Number of bits, common choices are 8, 16, 32.
-
uint16_t lanes¶
Number of lanes in the type, used for vector types.
-
struct DLTensor¶
Plain C Tensor object, does not manage memory.
Public Members
-
void *data¶
The data pointer points to the allocated data. This will be CUDA device pointer or cl_mem handle in OpenCL. It may be opaque on some device types. This pointer is always aligned to 256 bytes as in CUDA. The
byte_offsetfield should be used to point to the beginning of the data.Note that as of Nov 2021, multiple libraries (CuPy, PyTorch, TensorFlow, TVM, perhaps others) do not adhere to this 256 byte alignment requirement on CPU/CUDA/ROCm, and always use
byte_offset=0. This must be fixed (after which this note will be updated); at the moment it is recommended to not rely on the data pointer being correctly aligned.For given DLTensor, the size of memory required to store the contents of data is calculated as follows:
static inline size_t GetDataSize(const DLTensor* t) { size_t size = 1; for (tvm_index_t i = 0; i < t->ndim; ++i) { size *= t->shape[i]; } size *= (t->dtype.bits * t->dtype.lanes + 7) / 8; return size; }
Note that if the tensor is of size zero, then the data pointer should be set to
NULL.
-
int32_t ndim¶
Number of dimensions.
-
DLDataType dtype¶
The data type of the pointer.
-
int64_t *shape¶
The shape of the tensor.
When ndim == 0, shape can be set to NULL.
-
int64_t *strides¶
strides of the tensor (in number of elements, not bytes), can not be NULL if ndim != 0, must points to an array of ndim elements that specifies the strides, so consumer can always rely on strides[dim] being valid for 0 <= dim < ndim.
When ndim == 0, strides can be set to NULL.
Note
Before DLPack v1.2, strides can be NULL to indicate contiguous data. This is not allowed in DLPack v1.2 and later. The rationale is to simplify the consumer handling.
-
uint64_t byte_offset¶
The offset in bytes to the beginning pointer to data.
-
void *data¶
-
struct DLManagedTensor¶
C Tensor object, manage memory of DLTensor. This data structure is intended to facilitate the borrowing of DLTensor by another framework. It is not meant to transfer the tensor. When the borrowing framework doesn’t need the tensor, it should call the deleter to notify the host that the resource is no longer needed.
See also
Note
This data structure is used as Legacy DLManagedTensor in DLPack exchange and is deprecated after DLPack v0.8 Use DLManagedTensorVersioned instead. This data structure may get renamed or deleted in future versions.
Public Members
-
void *manager_ctx¶
the context of the original host framework of DLManagedTensor in which DLManagedTensor is used in the framework. It can also be NULL.
-
void (*deleter)(struct DLManagedTensor *self)¶
Destructor - this should be called to destruct the manager_ctx which backs the DLManagedTensor. It can be NULL if there is no way for the caller to provide a reasonable destructor. The destructor deletes the argument self as well.
-
void *manager_ctx¶
-
struct DLManagedTensorVersioned¶
A versioned and managed C Tensor object, manage memory of DLTensor.
This data structure is intended to facilitate the borrowing of DLTensor by another framework. It is not meant to transfer the tensor. When the borrowing framework doesn’t need the tensor, it should call the deleter to notify the host that the resource is no longer needed.
Note
This is the current standard DLPack exchange data structure.
Public Members
-
DLPackVersion version¶
The API and ABI version of the current managed Tensor.
-
void *manager_ctx¶
the context of the original host framework.
Stores DLManagedTensorVersioned is used in the framework. It can also be NULL.
-
void (*deleter)(struct DLManagedTensorVersioned *self)¶
Destructor.
This should be called to destruct manager_ctx which holds the DLManagedTensorVersioned. It can be NULL if there is no way for the caller to provide a reasonable destructor. The destructor deletes the argument self as well.
-
uint64_t flags¶
Additional bitmask flags information about the tensor.
By default the flags should be set to 0.
See also
See also
Note
Future ABI changes should keep everything until this field stable, to ensure that deleter can be correctly called.
-
DLPackVersion version¶
-
struct DLPackExchangeAPIHeader¶
DLPackExchangeAPI stable header.
See also
Public Members
-
DLPackVersion version¶
The provided DLPack version the consumer must check major version compatibility before using this struct.
-
struct DLPackExchangeAPIHeader *prev_api¶
Optional pointer to an older DLPackExchangeAPI in the chain.
It must be NULL if the framework does not support older versions. If the current major version is larger than the one supported by the consumer, the consumer may walk this to find an earlier supported version.
See also
-
DLPackVersion version¶
-
struct DLPackExchangeAPI¶
Framework-specific function pointers table for DLPack exchange.
Additionally to
__dlpack__()we define a C function table sharable by Python implementations via__c_dlpack_exchange_api__. This attribute must be set on the type as a Python integer compatible withPyLong_FromVoidPtr/PyLong_AsVoidPtr.A consumer library may use a pattern such as:
PyObject *api_obj = type(tensor_obj).__c_dlpack_exchange_api__; // as C-code MyDLPackExchangeAPI *api = PyLong_AsVoidPtr(api_obj); if (api == NULL && PyErr_Occurred()) { goto handle_error; }
Note that this must be defined on the type. The consumer should look up the attribute on the type and may cache the result for each unique type.
The precise API table is given by:
struct MyDLPackExchangeAPI : public DLPackExchangeAPI { MyDLPackExchangeAPI() { header.version.major = DLPACK_MAJOR_VERSION; header.version.minor = DLPACK_MINOR_VERSION; header.prev_version_api = nullptr; managed_tensor_allocator = MyDLPackManagedTensorAllocator; managed_tensor_from_py_object_no_sync = MyDLPackManagedTensorFromPyObjectNoSync; managed_tensor_to_py_object_no_sync = MyDLPackManagedTensorToPyObjectNoSync; dltensor_from_py_object_no_sync = MyDLPackDLTensorFromPyObjectNoSync; current_work_stream = MyDLPackCurrentWorkStream; } static const DLPackExchangeAPI* Global() { static MyDLPackExchangeAPI inst; return &inst; } };
Guidelines for leveraging DLPackExchangeAPI:
There are generally two kinds of consumer needs for DLPack exchange:
N0: library support, where consumer.kernel(x, y, z) would like to run a kernel with the data from x, y, z. The consumer is also expected to run the kernel with the same stream context as the producer. For example, when x, y, z is torch.Tensor, consumer should query exchange_api->current_work_stream to get the current stream and launch the kernel with the same stream. This setup is necessary for no synchronization in kernel launch and maximum compatibility with CUDA graph capture in the producer. This is the desirable behavior for library extension support for frameworks like PyTorch.
N1: data ingestion and retention
Note that obj.__dlpack__() API should provide useful ways for N1. The primary focus of the current DLPackExchangeAPI is to enable faster exchange N0 with the support of the function pointer current_work_stream.
Array/Tensor libraries should statically create and initialize this structure then return a pointer to DLPackExchangeAPI as an int value in Tensor/Array. The DLPackExchangeAPI* must stay alive throughout the lifetime of the process.
One simple way to do so is to create a static instance of DLPackExchangeAPI within the framework and return a pointer to it. The following code shows an example to do so in C++. It should also be reasonably easy to do so in other languages.
Public Members
-
DLPackExchangeAPIHeader header¶
The header that remains stable across versions.
-
DLPackManagedTensorAllocator managed_tensor_allocator¶
Producer function pointer for DLPackManagedTensorAllocator This function must not be NULL.
See also
-
DLPackManagedTensorFromPyObjectNoSync managed_tensor_from_py_object_no_sync¶
Producer function pointer for DLPackManagedTensorFromPyObject This function must be not NULL.
See also
DLPackManagedTensorFromPyObject
-
DLPackManagedTensorToPyObjectNoSync managed_tensor_to_py_object_no_sync¶
Producer function pointer for DLPackManagedTensorToPyObject This function must be not NULL.
See also
-
DLPackDLTensorFromPyObjectNoSync dltensor_from_py_object_no_sync¶
Producer function pointer for DLPackDLTensorFromPyObject This function can be NULL when the producer does not support this function.
See also
-
DLPackCurrentWorkStream current_work_stream¶
Producer function pointer for DLPackCurrentWorkStream This function must be not NULL.
See also