Real time data streaming in PostgreSQL
I am a Software Engineer in Bangalore and am with the Platform team of Endurance International Group Pvt Ltd, APAC. A life hacker by nature and a database enthusiast by interest. Am currently in pursuit of exploring different data stores and understanding the art of persistence.
No video of the event yet, sorry!
Real time data streaming of any Data Modification event on the database can prove to be instrumental as a form of derived event sourcing to update heterogenous and non slave data stores. PostgreSQL out of the box provides functionalities and settings which can enable the generation of such events. When combined with services like Debezium which can monitor and record the events we get a pipeline which is popularly termed as Change Data Capture(CDC).In addition to this, CDC has numerous applications in a microservice environment ranging from Cache Invalidation to maintaining Data Integrity where there can be multiple processes that update the database. A detailed setup of such a CDC pipeline elucidates that it is very easy to build, maintain and monitor it. These features are being augmented to dissolve the lag that is caused by having a pull based mechanism to update an Elasticsearch index that is frequently queried by the search service.
Evaluation on real world data shows that if you consider two pipelines:
- a push based mechanism based on CDC principles.
- a pull based mechanism which can be achieved by running periodic queries to fetch the recently updated queries.
The CDC variant places an order of magnitude less load on the database, and can operate in a more reactive manner by waiting for the logical information to be written into the WAL file rather than the pull based mechanism which can be prone to have multiple no hits among frequent polls. Additionally different modes and configuration of the Debezium service are explained along with its impact on the entire pipeline.
- 40 min
- PGConf India, 2020
- Case Study