What is Change data capture (CDC)?
Change data capture is a technique for detecting and streaming every insert, update, and delete in a database as it happens, so other systems can react to changes in near real time instead of repeatedly polling.
Change data capture, CDC, is the practice of turning the changes happening in a source database into a continuous stream of events that downstream systems can consume. Rather than periodically re-querying a table to find what changed, which is slow, misses fast updates, and loads the database, CDC reads the database's own transaction log (the write-ahead log in Postgres, the binlog in MySQL) and emits an event for each row inserted, updated, or deleted, in order, as it commits. That log-based approach is low-overhead and lossless, which is why it has become the standard way to keep systems in sync: replicating into a data warehouse, invalidating caches, populating a search index, feeding analytics, or driving event-driven workflows. For AI systems CDC is increasingly relevant for keeping retrieval fresh, when a source row changes, a CDC pipeline can trigger re-embedding so a vector store or RAG index reflects current data instead of a stale snapshot. Tools like Debezium popularized log-based CDC, and managed databases expose it as a feature; the main design concerns are ordering guarantees, handling schema changes, and the eventual-consistency lag between a write and its propagation downstream.