High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
Product Demo Video
Daft is a high-performance distributed data processing engine built specifically for AI and multimodal workloads, providing a DataFrame API that handles images, videos, audio, documents, and tensor data natively alongside structured data without requiring manual conversion between data types or separate processing pipelines for different modalities.
Designed for the data-heavy preprocessing and feature extraction stages of ML pipelines, Daft processes multimodal datasets at scale with lazy evaluation, intelligent query optimization, and GPU acceleration for compute-intensive transforms.
The engine's DataFrame model extends the familiar pandas API to support native image loading and decoding, embedding generation, model inference as a column transform, and tensor operations enabling end-to-end ML data pipelines expressed as composable DataFrame operations.
Daft integrates with cloud storage (S3, GCS, Azure Blob), data lakehouse formats (Parquet, Delta Lake, Iceberg), and compute platforms (Ray, AWS EC2) for distributed execution across heterogeneous hardware clusters combining CPUs and GPUs.
Daft is open-source under the Apache 2.0 license and developed by Eventual, a company founded by former AWS and data infrastructure engineers.
It is particularly relevant for ML teams at companies building large-scale computer vision, video understanding, or multimodal AI systems where existing tools like Spark require brittle serialization workarounds to handle non-tabular data, and where pandas lacks the scalability and GPU support needed for ML preprocessing at production data volumes.
Get implementation playbooks for tools like Daft in guided Academy lessons. Start free, then unlock the full library with Learner.
Open Academy →Pricing details on provider page.
Comments (0)
Sign in to join the discussion.