Presented by:

9f7b8d4e8d33fc6b4080a5060265fcc3

Ashutosh Bapat

from EnterpriseDB

Ashutosh Bapat is a Major Contributor to PostgreSQL and a developer in BDR team in EnterpriseDB. Apart from BDR and logical replication, he has contributed in the areas of partitioning, FDW, query optimization. He has spoken at PostgreSQL conferences across the globe. He has contributed features to Hive in the areas of replication and metastore.

No video of the event yet, sorry!

PostgreSQL is first and foremost an OLTP DBMS. But it has a rich set of features which allows it to be used as an OLAP DBMS.

  • Highly compliant with SQL standard
  • Analytical features like windowing.
  • Many data types important for analytical workload e.g. JSON, Geo-spatial datatypes
  • Full-text search on document data types
  • Non-traditional indexes like GIN, BRIN
  • Partial and expression indexes
  • Extensibility through deeper integrations - user defined data types, functions, procedures
  • Parallel querying capability to process very large datasets

There are other more complex and system level features that allow it to be used as a Data Warehouse.

  • Physical or logical replication for low maintenance ELT system to separate OLAP and OLTP workloads
  • Foreign data wrappers allow querying external data sources in today's hybrid data models.
  • Partitioning along with FDW paves path for horizontal scaling

Yet when it comes to processing TBs for data it falls short compared to the other Data Warehouses and Data Lakes built specially for analytical workloads. However PostgreSQL ecosystem is stepping up to this challenge. One of the current trends is to use a combination of cloud columnar storage format, columnar execution engine and FDW.

  • Columnar execution engine and columnar storage make it possible to process large amounts of data efficiently.
  • FDW provides a way to access the columnar engine from PostgreSQL.
  • PostgreSQL itself provides rich SQL and handles authentication, governance and metadata.

These Data Lakehouse systems are easier to maintain, do not involve data movement and are yet reasonably efficient.

In this talk we will discuss how PostgreSQL can be used for OLAP workloads as described above.

Date:
2024 February 29 - 14:00
Duration:
40 min
Room:
Grand Victoria B
Conference:
PGConf India, 2024
Language:
Track:
Application Developer
Difficulty:
Medium

Happening at the same time:

  1. An Introduction to PostGIS, H3 and its Integration with Greenplum
  2. Start Time:
    2024 February 29 14:00

    Room:
    Robusta + Arabica

  3. Why Your Security Best Practices Don’t Work
  4. Start Time:
    2024 February 29 14:00

    Room:
    Grand Victoria A