DTA Provenance Standards Demo¶
-
:material-rocket-launch:{ .lg .middle } Quick Start
Get up and running in 60 seconds with Git-native provenance tracking
-
:material-book-open-variant:{ .lg .middle } Documentation
Comprehensive guides, API reference, and examples
-
:material-docker:{ .lg .middle } Docker Support
One-command setup with Docker Compose
-
:material-github:{ .lg .middle } Open Source
MIT licensed - contributions welcome!
What is This?¶
A comprehensive demonstration of data provenance tracking using the official Data & Trust Alliance standards v1.0.0.
Shows you two complete implementations:
- Git-Native Approach - Cryptographic audit logs using Git commits
- Blockchain Approach - Smart contracts on Ethereum/Polygon
Perfect for understanding when to use blockchain vs. traditional solutions for data provenance in AI/ML pipelines, supply chains, and regulated industries.
Why This Project?¶
- ✅ Official Standards - Implements DTA v1.0.0 specification
- ✅ Production Quality - Comprehensive tests, CLI tools, extensive documentation
- ✅ Real-World Examples - Healthcare, ML training, IoT sensors, financial data
- ✅ Educational - Learn what works (and what doesn't) in provenance tracking
- ✅ Honest Comparison - Shows why most projects DON'T need blockchain
Features¶
Git-Native Implementation¶
from src.provenance import ProvenanceTracker, ProvenanceMetadata
# Create DTA-compliant metadata
metadata = ProvenanceMetadata(
source={"datasetName": "Training Data", "providerName": "ML Team"},
provenance={"dataGenerationMethod": "SQL export", ...},
use={"intendedUse": "Model training", ...}
)
# Commit with provenance
tracker = ProvenanceTracker("./my-project")
commit_hash = tracker.commit_with_provenance(
["dataset.csv"],
metadata,
"Add training data v1.0"
)
# Verify integrity
is_valid, message = tracker.verify_integrity(commit_hash)
Blockchain Implementation¶
// Register provenance on-chain
function registerProvenance(
string memory _datasetName,
string memory _metadataURI, // IPFS/Arweave
bytes32 _metadataHash
) public returns (bytes32 recordId)
CLI Tools¶
# Validate DTA standards
dta-provenance validate metadata.json
# Commit with provenance
dta-provenance commit dataset.csv \
--metadata provenance.json \
--message "Add training data"
# Verify integrity
dta-provenance verify HEAD
# Generate audit trail
dta-provenance trace dataset.csv
Use Cases¶
Healthcare Imaging¶
Complete HIPAA-compliant provenance for medical imaging datasets with de-identification documentation.
ML Training Data¶
Track provenance of training datasets from HuggingFace with license compliance verification.
IoT Sensor Streams¶
Real-time environmental monitoring with sensor calibration and quality indicators.
Financial Transactions¶
Anonymized payment data with multi-layer privacy protection and risk assessment.
When to Use What?¶
| Scenario | Recommendation | Why? |
|---|---|---|
| Internal ML pipeline | Git-Native ✅ | Fast, free, integrates with existing workflows |
| Cross-company supply chain | Blockchain (maybe) ⚠️ | Only if trust is issue AND you can't use APIs |
| Single organization | Git-Native ✅ | No need for blockchain's trust properties |
| Regulatory audit trail | Either ✅ | Both provide cryptographic integrity |
| High-frequency updates | Git-Native ✅ | No gas fees, instant commits |
| Public transparency | Blockchain ✅ | Immutable, publicly verifiable |
Quick Links¶
Installation¶
Community¶
- GitHub: Report issues
- DTA Alliance: Official Website
- Documentation: Full Docs
License¶
MIT License - see LICENSE for details.
The DTA standards are used under their original license. See Credits for full attribution.
Ready to get started? → Quick Start Guide