Production-grade ETL system ingesting equities + crypto OHLCV data, computing financial indicators, orchestrating via Apache Airflow, stored in PostgreSQL and served through FastAPI.
yfinance pulls daily OHLCV for 7 equities. CoinGecko delivers crypto bars. Retry logic + exponential backoff handle rate limits and transient failures.
Python validates schema, drops nulls and corrupt rows (high < low), deduplicates on (symbol, time). Computes SMA-20, SMA-50 via rolling window, daily return via pct_change().
Idempotent INSERT ... ON CONFLICT DO NOTHING into PostgreSQL. Running the pipeline twice yields the same result. Every run is logged to pipeline_runs for the status API.
How Airflow schedules, retries, and tracks tasks. The difference between a cron job and a proper orchestrated pipeline with dependency management and failure isolation.
ON CONFLICT DO NOTHING means running the same pipeline twice yields the same DB state. Critical for reliability - restarts and replays should not corrupt data.
How to structure OHLCV tables for efficient range queries. Why indexing on (symbol, time DESC) matters for the query patterns a financial API actually runs.
What SMA-20 and SMA-50 reveal about price trends. How daily return normalises changes across assets at different scales.