System Design
The system was designed as a layered, event-driven data platform. PostgreSQL and Microsoft SQL Server serve as source systems. Debezium captures database changes via log-based CDC, streaming events through Google Cloud Pub/Sub. Raw data is ingested into BigQuery, with ETL orchestrated via Apache Airflow. A custom UI generates YAML configurations that are compiled into Airflow DAGs automatically. Dataset-level IAM within BigQuery enforces access control. This architecture separates ingestion, transformation, and consumption layers for independent scaling and fault isolation.