Lightning fast data version control system for structured and unstructured machine learning datasets. We aim to make versioning datasets as easy as versioning code.
Product Demo Video
Oxen is a data version control system designed specifically for machine learning datasetsproviding Git-like versioning semantics for large files, binary data, images, and structured data that Git and Git LFS handle poorly at ML-relevant scales.
It enables teams to track dataset versions, roll back to previous dataset states, branch datasets for experiments, and collaborate on data the same way they collaborate on codewith a command-line interface that mirrors Git's workflow for developers already familiar with version control concepts.
The system is optimized for the large file sizes and high file counts common in ML datasets: a dataset of millions of images or gigabytes of text is handled efficiently by Oxen's content-addressed storage and transfer protocols, which deduplicate unchanged files between versions rather than storing full copies.
The diff and merge capabilities handle tabular data (CSV, Parquet) intelligentlyshowing which rows changed between dataset versions rather than treating data files as binary blobs where any change produces an opaque before/after comparison.
ML teams that have outgrown ad-hoc dataset management (shared folders, manual naming conventions like 'dataset_v3_final_FINAL') and are experiencing reproducibility problems because they cannot reconstruct which dataset version trained which model use Oxen to bring version control discipline to their data.
Data engineers building data pipelines where lineage tracking is a compliance requirement find Oxen's version history useful for demonstrating exactly what data was used at each point in the pipeline.
Its Git-compatible workflow significantly lowers the learning curve compared to DVC or other specialized data versioning tools.
Get implementation playbooks for tools like Oxen in guided Academy lessons. Start free, then unlock the full library with Learner.
Open Academy →Pricing details on provider page.
Comments (0)
Sign in to join the discussion.