ML Training Platform
March 15, 2026 · 446 tests · 5 PRs open
End-to-end ML training platform built on ATHENA — model artifact management, feature extraction, TF inference, TFX training pipeline, and ClearML experiment tracking. All services run locally via standalone dev server.
Pull Requests
| # | PR | Service | Tests | Description |
| 1 |
#1083 |
Model Artifacts |
89 |
SavedModel versioning, exclusive tagging, REST API, browser UI |
| 2 |
#1084 |
Feature Extraction |
180 |
Feature registry, lineage tracking, model-feature mapping, access analytics |
| 3 |
#1085 |
Evaluation |
82 |
TF inference pipeline, LRU cache, logging, Docker integration tests |
| 4 |
#1088 |
TFX Pipeline |
25 |
9-component TFX pipeline, Jupyter notebook, model registration scripts |
| 5 |
#1090 |
Experiment Tracking |
70 |
ClearML offline tracking, experiment comparison, best-model selection |
| | Total | 446 | |
Architecture
Raw Data → Feature Service (extract) → Evaluation Service (inference) → Prediction
↑
TFX Pipeline (train) → Model Artifacts (register) → Evaluation Service (load)
↑
ClearML (track) → Experiment Service (compare) → Best Model → Tag as prod
Browser UIs
Key Design Decisions
- SQLAlchemy ORM with dedicated Base per service (SQLite local, PostgreSQL prod) — separate from Snowflake
- Self-contained — all services run without Redis, Snowflake, or other production deps
- ClearML offline mode — no server required, stores experiment data locally
- TFX on Linux/Colab only — Apple Silicon not supported (no ARM64 wheels for ml-metadata)
- 446 self-contained tests — in-memory SQLite, temp dirs, no local machine dependencies
Merge Order
#1083
→
#1084
→
#1085
→
#1088
→
#1090