02 · PROJECT · DATA ENGINEERING

Market Data
Pipeline

Production-grade ETL system ingesting equities + crypto OHLCV data, computing financial indicators, orchestrating via Apache Airflow, stored in PostgreSQL and served through FastAPI.

Python 3.11Apache AirflowPostgreSQLFastAPIDockerNext.jsyfinanceCoinGecko
DATA SOURCES
2
Yahoo Finance + CoinGecko
ASSETS TRACKED
11
7 stocks · 4 crypto
ROWS STORED
OHLCV + indicators
PIPELINE STATUS
Airflow DAGs
NEW YORK
--:--
○ CLOSED
PARIS
--:--
○ CLOSED
HONG KONG
--:--
○ CLOSED
0
Equities · Click to chart
Crypto · Click to chart
SPY · 1Y · Yahoo Finance
NO DATA AVAILABLE FOR THIS RANGE
Pipeline · Airflow DAGs
stocks_1min
Weekdays · every minute
ACTIVE
Yahoo Finance · yfinance (1m interval)
1
Extract 1m OHLCV via yfinance
2
Validate + clean rows
3
Compute SMA-20/50, daily return
4
INSERT ON CONFLICT DO NOTHING
5
Log run to pipeline_runs
SPYNVDAMSFTSIE.DEGOOGLPLTRURTH
crypto_20min
24/7 · every 20 minutes
ACTIVE
CoinGecko public API
1
Fetch OHLCV from CoinGecko
2
Handle rate limits (1.5s delay)
3
Transform + deduplicate
4
Load to crypto_prices
5
Log run to pipeline_runs
BTCETHSOLXRP
DAGLAST RUNSTATUSROWS INSERTEDDURATION
NO RUNS RECORDED YET
System Architecture · ETL Flow
Yahoo Financestocks OHLCVCoinGeckocrypto OHLCVAirflow DAGschedule + retryPython ETLextract+transformPostgreSQLOHLCV + indicatorsFastAPIREST endpointsNext.jsdashboardDOCKER COMPOSECLOUD RUNVERCEL
01 · EXTRACT

yfinance pulls daily OHLCV for 7 equities. CoinGecko delivers crypto bars. Retry logic + exponential backoff handle rate limits and transient failures.

02 · TRANSFORM

Python validates schema, drops nulls and corrupt rows (high < low), deduplicates on (symbol, time). Computes SMA-20, SMA-50 via rolling window, daily return via pct_change().

03 · LOAD

Idempotent INSERT ... ON CONFLICT DO NOTHING into PostgreSQL. Running the pipeline twice yields the same result. Every run is logged to pipeline_runs for the status API.

REST API · Endpoints
GET/api/stocks/{symbol}/ohlcv
GET/api/stocks/{symbol}/latest
GET/api/stocks/{symbol}/indicators
GET/api/crypto/{symbol}/ohlcv
GET/api/crypto/{symbol}/latest
GET/api/pipeline/status
GET/api/assets/list
INTERACTIVE DOCS ↗VIEW SOURCE ↗
What I Learned
DAGs & orchestration

How Airflow schedules, retries, and tracks tasks. The difference between a cron job and a proper orchestrated pipeline with dependency management and failure isolation.

Idempotent pipelines

ON CONFLICT DO NOTHING means running the same pipeline twice yields the same DB state. Critical for reliability - restarts and replays should not corrupt data.

Time-series data modelling

How to structure OHLCV tables for efficient range queries. Why indexing on (symbol, time DESC) matters for the query patterns a financial API actually runs.

Financial indicators

What SMA-20 and SMA-50 reveal about price trends. How daily return normalises changes across assets at different scales.

← BACK TO PORTFOLIOAHMED BERRADA · 2026