PGConf India

Presented by:

Vaijayanti Bharadwaj

from EnterpriseDB

Vaijayanti has 20+ years of experience in designing and architecting Enterprise products in areas of storage, file systems, information retrieval, E-discovery, database storage engines working for companies like Veritas, Symantec, NetApp and Greenplum. She has 30+ filed patents. She is a PostgreSQL enthusiast with a deep interest in transactions, concurrency control, inverted indexes. She has authored a E-book on Kindle: "Database Transactions and Concurrency Control: Isolation Levels and MVCC Demystified with Study of PostgreSQL." (https://www.amazon.com/dp/B0B9YQF82L)

Checkout the slides

Background

Today, Elasticsearch is a very popular datastore for full-text searches. Using Elasticsearch has certain considerations such as: â€¢ Maintaining a separate cluster just for keeping a full-text index â€¢ A proper schema design, given requirements on performance and Elasticsearch limitations around support for one to many and many to many relationships â€¢ Making sure the cluster is kept consistent, given that Elasticsearch doesnâ€™t support full ACID semantics. â€¢ Typically, resource requirements of Elasticsearch are know to be high and proper sizing is essential, given requirements on types of queries, indexing throughput and search latencies, etc. PostgreSQL has a rich set of features that support full-text indexing. Databases have some great benefits for full-text indexing, especially when there are several entities with complex relationships among them, with a few fields that require full-text indexing. Some of the these benefits are: 1. Full text search can be combined with other non-text or keyword fields 2. Utilize full power and expressiveness of SQL 3. Get ACID semantics 4. All data is in the same datastore, without having to worry about transferring to another datastore and keeping both datastores consistent 5. Databases are good at doing JOINS efficiently, even with tables that do not fit in memory. This is a much-needed feature as schema gets complex. In contrast, datastores like Elasticsearch/Lucene are not good at expressing even one to many or parent child relationships, leave alone many to many. 6. Full text search can be used in conjunction with other fields such as geo-spatial data, graph data, using the power of SQL to do powerful queries

Overview of the Paper

The goal of this talk is to look briefly at PostgreSQLâ€™s full-text indexing and search capabilities, and take a text-heavy dataset such as Wikipedia data as a sample, and study the behavior of PostgreSQL and share learnings as it performs indexing and searches on this dataset. Also, compare and contrast the capabilities and performance with Elasticsearch in following areas: 1. Schema Design 2. Size of index 3. Doing other queries such as graph queries in conjunction with full-text searches 4. Resource requirements for indexing and search 5. Queries over text as well as other fields

Date:: 2023 February 23 - 10:45
Duration:: 45 min
Room:: Grand Victoria 1
Conference:: PGConf India, 2023
Language:
Track:: Case Study
Difficulty:: Medium

Full Text Search: Capabilities of PostgreSQL and a Comparison with Elasticsearch Using Wikipedia Dataset

Presented by:

Vaijayanti Bharadwaj

Happening at the same time: