Presented by:

4c0943eccb4f26345b421b6bb9592ac9

Joe Conway

from AWS

Joe Conway has been involved with the PostgreSQL community for more than 25 years, presently as a PostgreSQL Committer, Major Contributor, and Infrastructure Team member. He currently leads the PostgreSQL Contributors Team at Amazon Web Services.

No video of the event yet, sorry!

Background: "libc" is commonly used as a shorthand for the "standard C library", a library of standard functions that can be used by all C programs. glibc is the GNU C Library implementation, which is used on all major Linux distributions (e.g. AL, RHEL, Debian/Ubuntu, SuSE). The glibc library, libc.so, provides most of the foundational C routines such as open, read, write, malloc, printf, and literally thousands more. It also provides the interface to the Linux kernel via syscalls.

For the purposes of this talk, the facility of interest is the locale functionality, and more specifically the functions that provide string sorting according to localized collation rules. In order for PostgreSQL to work durably and correctly, sort order must be determinant and immutable. Since glibc implements the sort order, if/when glibc changes the sort order from one version to the next, it breaks the contract with PostgreSQL, and thereby causes data corruption. Indexes that have been persisted to storage may now memorialize the data in the wrong order according to the currently installed version of glibc.

A two pronged approach was taken to mitigate this issue and will be covered at a high level -- one backward looking, and one forward. The former demonstrates a method to build a collation compatibility library on a system with a very specific glibc base-version. That may then be used on another Linux system to provide stable collation, and thus avoid breakage due to glibc and/or OS upgrades. The latter solution involves a new built-in collation provider in PostgreSQL 17. This new method comes with significant benefits in both robustness as well as performance.

Summary: If a PostgreSQL database resides on, for example, a RHEL 7 system with glibc version 2.17, and the operating system (OS) is upgraded to RHEL 8+ with glibc version 2.28+, the majority of indexes built on collatable columns will be broken. This talk will walk through examples of the types of breakage that can occur, the proposed solutions at a high level, and a demonstration of the solutions in action.

Date:
Duration:
45 min
Room:
Conference:
PGConf India, 2025
Language:
Track:
Database Administration
Difficulty:
Medium