Project

Project

Modular, reproducible financial observability infrastructure for Morocco’s evolving markets.

AtlasVanguard is more than a codebase. It is a shared infrastructure layer where financial domain expertise, quantitative research, and software engineering converge to produce trustworthy market observability. The system ingests raw market data, applies rigorous analytical methods, and surfaces derived metrics that serve researchers, risk analysts, and regulators alike.

This page describes the core technical system: how data flows from sources to storage, how quant engines produce reproducible analytics, and how the architecture supports transparency and incremental improvement. Whether you come from finance, engineering, or supervision, you should leave with a clear understanding of what AtlasVanguard builds and how it operates.

Why this infrastructure exists

Morocco’s introduction of derivatives and ongoing market modernisation create a new observability burden. Institutions need systematic visibility into volatility regimes, cross-asset correlations, and systemic stress signals. AtlasVanguard provides open, reproducible infrastructure to meet that need — not ad-hoc analysis.

The core pipeline repository is currently private during the early build-out phase. It is shared with all accepted members. The codebase will be opened to the public as the infrastructure stabilizes. In the meantime, the architecture and decisions are fully documented on this page.

Pipeline architecture

Idempotent writes — safe to re-run Incremental ingestion — only new data processed

How it works

Data Sources — where the raw inputs come from

Public market feeds (YFinance) and curated historical CSV files from local exchanges (e.g., Casablanca) provide raw OHLCV inputs. The system currently supports equities and indices and is designed to add futures or other asset classes as data availability improves.

Extraction & Normalisation — turning heterogeneous feeds into a canonical series

Raw records are fetched, aligned, and mapped to a canonical OHLCV schema with explicit metadata fields (source, retrieval time, value_kind). Normalisation includes parsing timestamps, renaming columns, filling obvious gaps, and enforcing sanity checks (for example, high ≥ low). This step is critical: credible analytics require consistent, provenance-rich inputs.

Idempotent Loader — safe, efficient persistence

Clean records are written to PostgreSQL using bulk operations with unique constraints and upsert semantics (ON CONFLICT DO NOTHING / controlled upserts). Idempotency ensures re-running jobs does not create duplicates; incremental ingestion consults each asset’s last stored date to fetch only new observations, keeping runs efficient and bounded.

Quant Engines — reproducible analytics

Engines are implemented as parameterised Python/Pandas functions with explicit inputs and outputs. They compute returns (simple/log), realised and EWMA volatility, rolling correlations, beta estimates, and drawdowns. Jobs run engines against canonical series and persist results to derived tables using the same idempotent patterns.

API — programmatic access to observability signals

A lightweight HTTP API exposes time-series slices, asset metadata, and pre-computed analytics for dashboards, reproducible reports, and external consumers. Endpoints are designed for clarity (time range, asset id, metric) and provenance (version, parameters used).

Quant engines — at a glance

Category What it computes Example use
Returns Simple, log and cumulative returns; rolling multi-period returns Asset performance, benchmark-relative returns
Volatility Realised (rolling) and EWMA volatility measures Detecting regime shifts, cross-sector risk comparisons
Correlation Rolling Pearson correlations between series Contagion detection and stress analysis
Beta Rolling market-sensitivity estimates Hedge ratio estimation and systematic risk decomposition
Rolling stats & drawdown Moving averages, z-scores, and drawdown calculations Signal smoothing, outlier detection, drawdown monitoring

Engines are declaratively scheduled; default configuration computes simple/log returns, 20‑day realised volatility, and a 20‑day EWMA volatility by default.

Current status

Implemented

  • Data extraction (YFinance, CSV)
  • Canonical normalisation (OHLCV schema & provenance)
  • Idempotent loader into PostgreSQL
  • Asset management and metadata
  • Quant engine execution and persistence (returns, volatility, correlation, beta, drawdown)
  • HTTP API exposing time-series and aggregated metrics

In progress / planned

  • Expanded automated test coverage and CI
  • Production-grade logging, monitoring and alerting
  • Orchestration for scheduled incremental runs and DAGs
  • Tighter integration with report-generation tooling

Design principles

Reproducible by design

Every analytical step is versioned, parameterised, and re-runnable; claims must be auditable.

Incremental & idempotent

Safe to re-run daily: database constraints and upserts prevent duplication.

Modular & composable

Extractors, normalisers, engines, and storage are separated so components can evolve independently.

Transparent & inspectable

Code, assumptions, and provenance are recorded to build trust with researchers and supervisors.

Technology stack

  • Data processing: Python (Pandas, NumPy)
  • Database: PostgreSQL (SQLAlchemy)
  • Sources: YFinance, Casablanca CSV feeds
  • Execution: Python runner with idempotent upserts
  • API: FastAPI (Uvicorn)
  • Migrations & CI: Alembic, GitHub

Source code

The full source is maintained in the private repository during the early build-out phase: View atlasvanguard-core on GitHub (member access during early phase)

Access is currently restricted to members. Public access will follow as part of the roadmap.