Vincere.dev Vincere
Healthcare Production System

Meditap

Data Warehouse Automation Platform for Healthcare Analytics

Meditap
60+
Hospitals Served
~1 min
Data Latency
100+
Pipelines Deployed
0
Database Outages

Executive Summary

We designed and built a fully automated, event-driven data warehouse platform on Google Cloud that decouples analytics from transactional systems. Using CDC, Pub/Sub, and BigQuery, we eliminated database downtime and achieved ~1 minute data latency for near real-time reporting across 60+ hospitals.

The Problem

Meditap required building a scalable data platform for a healthcare network operating across 60+ private hospitals in Indonesia. The primary transactional database was being used directly for analytics, causing 3–5 outages daily. Key challenges included achieving near real-time reporting, ensuring data consistency across distributed sources, migrating historical data without disrupting live systems, implementing fine-grained data governance, and handling continuous data ingestion at scale — all to be managed by a small data team of 3–5 people.

60+
Hospitals
3–5
Daily Outages
3–5
Data Team Size
Services Delivered
AI Integration Dedicated Team

Data Warehouse Automation Platform for Healthcare Analytics

Architecture Overview

Data Layer
PostgreSQL Microsoft SQL Server Debezium Google Cloud Pub/Sub
Backend & Orchestration
FastAPI Apache Airflow
Frontend
React
Infrastructure
GCP GKE BigQuery

Key Technical Decisions

System Design

The system was designed as a layered, event-driven data platform. PostgreSQL and Microsoft SQL Server serve as source systems. Debezium captures database changes via log-based CDC, streaming events through Google Cloud Pub/Sub. Raw data is ingested into BigQuery, with ETL orchestrated via Apache Airflow. A custom UI generates YAML configurations that are compiled into Airflow DAGs automatically. Dataset-level IAM within BigQuery enforces access control. This architecture separates ingestion, transformation, and consumption layers for independent scaling and fault isolation.

Key Decisions

CDC was adopted over batch ingestion to enable near real-time data availability with ~1 minute delay. A template-driven Airflow platform eliminated the need for manual DAG coding, allowing non-engineers to operate pipelines. BigQuery was chosen as the analytical backend for its serverless scalability and simplified infrastructure management. The tradeoff was reduced flexibility in custom pipeline logic in exchange for operational simplicity and speed of delivery.

Implementation Highlights

Log-based CDC using Debezium provided reliable change tracking from source databases. An event streaming pipeline via Pub/Sub enabled decoupled ingestion. A multi-layer warehouse design separated raw, processed, and reporting-ready data. Schema validation at the platform level prevented bad configurations before DAG generation. Hybrid processing combined near real-time ingestion with batch transformations for aggregation and reporting.

Results & Validation

Eliminated primary database downtime (previously 3–5 outages per day).

Reduced report generation time from minutes to seconds.

Achieved ~1 minute data latency for near real-time reporting.

Enabled a small data team (3–5 members) to manage the entire pipeline ecosystem.

Successfully deployed and operated 100+ pipelines via the platform.

Key Insights

Near real-time analytics architecture achieving ~1 minute data freshness without overloading source systems.

Operational abstraction via a platform layer that allows pipeline creation without writing code.

Scalable pipeline management supporting 100+ data pipelines through reusable templates.

Full separation of concerns decoupling transactional and analytical workloads, eliminating contention.

Governance built-in with dataset-level IAM ensuring secure, role-based access across stakeholders.

Who This Applies To

This architecture is applicable to organizations with high-volume transactional systems that require real-time analytics without compromising operational stability — particularly in healthcare, fintech, and multi-tenant enterprise environments.

Healthcare Data Warehousing Real-Time Analytics CDC & Streaming Platform Engineering

Technologies Used

Backend

FastAPI React

Frontend

GCP

Infrastructure

GKE PostgreSQL Google Pub/Sub

Data & Integrations

Microsoft SQL Server Apache Airflow BigQuery

Patterns & Techniques

Debezium CDC YAML DAGs IAM Bitbucket

Tools

Jira Keycloak

Building something similar?

We specialize in ai integration and dedicated team for healthcare companies. If you're facing challenges like the ones we solved for Meditap, let's talk.