Carro
Explore Project

Data pipeline engineering
Core Capacities
Real-time, AI-ready infrastructure
Project Focus
2025
Year
Overview
1
Running analytics and live operations against the same database was a ticking clock. As merchant volume grew, query performance degraded and the data science team had no reliable foundation for AI-driven recommendations or demand intelligence. The company needed to decouple the two environments without disrupting the live platform — and do it in a way that could support advanced AI use cases, not just reporting.


2
Sierra designed and built a fully decoupled streaming data pipeline using Kafka Change Data Capture, replicating production data in real time with zero impact on live operations. Before building anything, Sierra's engineers investigated and reverse-engineered the existing database schemas to make sure the new architecture reflected how the data actually behaved — not how it was assumed to behave. The pipeline fed into a dedicated Databricks analytics environment structured across Bronze, Silver, and Gold layers, from raw ingestion to business-ready outputs.
3
• Conducted investigatory analysis and reverse-engineered existing database schemas before a line of architecture was committed. • Built real-time CDC ingestion using Confluent Kafka, eliminating analytics load from the production system entirely. • Defined the full infrastructure in Terraform, making deployments reproducible and reconfigurable as the business evolves. • Structured data across a medallion architecture optimized for machine learning, recommendation engines, and downstream AI workflows.

4
Jobs that previously took an hour now run in minutes. More significantly, the new foundation unlocked capabilities that had not been feasible before — machine learning models, recommendation engines, and AI-driven demand intelligence built on clean, real-time data. Carro now has the infrastructure to pursue AI-powered merchant experiences without putting operational reliability at risk.