What is Pachyderm?
Pachyderm is an open source data science platform that combines Data Lineage with End-to-End Pipelines on Kubernetes, engineered for the enterprise. It is a cost-effective solution that enables data engineering teams to automate complex pipelines with sophisticated data transformations across any type of data. Pachyderm's unique approach provides parallelized processing of multi-stage, language-agnostic pipelines with data versioning and data lineage tracking, delivering the ultimate CI/CD engine for data Pachyderm is the leader in data versioning and pipelines for MLOps, providing the data foundation that allows data science teams to automate and scale their machine learning lifecycle while guaranteeing reproducibility. With over $40 million in three rounds of funding from leading investors, Pachyderm offers both a commercial Pachyderm Enterprise Edition and an open source Pachyderm Community Edition, helping customers get their ML and AI projects to market faster, lower data processing and storage costs, and support strict data governance requirements
Highlights
- Data Lineage and End-to-End Pipelines
- Parallelized Processing of Multi-stage, Language-agnostic Pipelines
- Data Versioning and Reproducibility for Machine Learning Lifecycle Automation
Features
Deployed with CoreOS
Git-like File System
Microservice Architecture
Dockerized MapReduce