1 of 12

datatrac

A developer-first tool for easier dataset management and discovery.

Presented By: Anant Chaudhary

2 of 12

Data Woes Developers Face

  • Accidentally re-download or re-generate the same data multiple times.
  • no easy way to know what data already exists.
  • Tracking data origins for reproducibility is difficult

3 of 12

Solution: datatrac & datatracweb

  • datatrac offers a powerful command-line interface
  • An intuitive web dashboard for visual discovery through datatracweb
  • A centralized backend to tie it all together

4 of 12

System Architecture

Decoupled architecture for high extensibility

Core Logic Module is the system's brain

CLI and API expose core functionality

Web UI interacts via the API

01

02

03

04

5 of 12

Database Schema

  • Datasets table helps in tracking the datasets organisation currently have
  • Local copies table helps track the datasets which are deleted on registry but present locally
  • Lineage table powers data traceability features

6 of 12

CLI: Power Workflow

  • CLI for efficient dataset management
  • Push, download, and track datasets
  • Track datasets with just four commands

7 of 12

DataTrac Hub in Action

Web UI for visual data discovery

Search for datasets easily

View top datasets and their details

Explore data lineage interactively

01

02

03

04

8 of 12

Secure Data Transfer

  • Encrypted Transfers: All data is encrypted in transit using scp over SSH.
  • No New Passwords: Leverages existing developer SSH keys for authentication, eliminating the need for new credentials.
  • Guaranteed Integrity: Files are verified by their SHA256 hash, ensuring downloads are bit-for-bit identical to uploads.

9 of 12

Decoupled Core: Why it Matters

Zero code duplication across interfaces

Enabled rapid development of CLI and API

Guarantees future extensibility for the system

10 of 12

Key Learnings & Takeaways

  • Evolving database schema for multi-user support
  • Packaging React frontend with Python backend
  • Tools enhance existing developer workflows

11 of 12

Future Work & Next Steps

  • Implement robust user authentication system
  • Automate updates for derived datasets
  • Visualize dataset history with interactive graphs
  • Improve CLI commands for better user experience

12 of 12

Thank You

Feel free to ask any questions

I appreciate your time and attention

https://github.com/anant-c/datatrac

pip install --extra-index-url https://test.pypi.org/simple/ datatrac==0.0.5

01

02

03