A quick intro into partitioning for PostgreSQL

Database partitioning is a process that involves breaking down a large database table into smaller, more manageable parts. The best part is doing so helps increase performance since now, you don't have to run your queries against a billion records and it will be on a subset of the dataset.

Two key reasons why we need partitioning are to simplify maintaining large data sets and to potentially increase query performance when I am only interested in a subset of the data.

PostgreSQL has really high hard limits, these give you a lot of wiggle room

ItemUpper LimitComment
database sizeunlimited
number of databases4,294,950,911
relations per database1,431,650,303
relation size32 TBwith the default BLCKSZ of 8192 bytes
rows per tablelimited by the number of tuples that can fit onto 4,294,967,295 pages
columns per table1600further limited by tuple size fitting on a single page; see note below
columns in a result set1664
field size1 GB
identifier length63 bytescan be increased by recompiling PostgreSQL
indexes per tableunlimitedconstrained by maximum relations per database
columns per index32can be increased by recompiling PostgreSQL
partition keys32can be increased by recompiling PostgreSQL

Recently I had to look into optimizing a relation that would store transactional data in millions. We anticipated this relation would have exponential growth causing query performance problems in a few months.

10 memes that Data Scientists would absolutely love

While doing my research I was looking at sharding as an option, and since this project used PostgreSQL as the primary data store I was investigating what patterns are available for us.

This led me to a really great talk from 2021 on youtube Jimmy Angelakos

This gave me some great insights as to what partitioning is and how its implemented within Postgresql.

Declarative Partitioning

With PostgreSQL we declare that a table is divided into partitions. The DDL that creates the table will specify what type of partitioning method and the columns we will be looking at.

PostgreSQL has 3 types of partitioning.

  1. Range Partitioning

  2. List Partitioning

  3. Hash Partitioning

The examples code snippets are from the official PostgreSQL documentation - https://www.postgresql.org/docs/15/ddl-partitioning.html

CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
    FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
    FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

In the above example, we are partitioning the measurement table by month. Range partitioning lets you partition data that falls between a "range" of values. This can be dates or a range of numbers city_id between 100 to 1000

Another popular mechanism is to partition by Hash or list values.

One big win from partitions is the ability to move them to a different tablespace or to detach them and drop them without the overhead of deleting millions of records with the DELETE command which has a few side effect.

ALTER TABLE measurement DETACH PARTITION measurement_y2006m02;
DROP TABLE measurement_y2006m02;

/* This also means that we can add new partitions easily */
CREATE TABLE measurement_y2008m02 PARTITION OF measurement
    FOR VALUES FROM ('2008-02-01') TO ('2008-03-01')
    TABLESPACE fasttablespace;

Philosoraptor Meme - Imgflip

However, we need to keep in mind the following limitations of partitioned tables:

  • Unique/primary key constraints cannot include expressions or function calls in partition keys and must include all partition key columns.

  • Exclusion constraints can only be applied to individual leaf partitions.

  • BEFORE ROW triggers on INSERT cannot change the destination partition for a new row.

  • Mixing temporary and permanent relations in the same partition tree is not allowed, and temporary relations must be within the same session.

Considerations before opting to use Declarative Partitioning:

  • Careful design is crucial to avoid negative performance effects on query planning and execution.

  • Choosing the partition key based on commonly used WHERE clauses can prune unneeded partitions.

  • Removal of unwanted data can be optimized by placing it in a single partition.

  • Choosing the target number of partitions is important to balance index size and query planning time.

  • Sub-partitioning and range partitioning with multiple columns can help with uneven data distribution.

  • Query planner can handle partition hierarchies with a few thousand partitions if pruning is effective.

  • Large number of partitions can result in longer planning times, higher memory consumption, and slower re-partitioning.

  • Data warehouse workloads may benefit from a larger number of partitions.

  • Simulations can help optimize partitioning strategy; never assume more partitions are better.

PostgreSQL uses Table inheritance in the background to implement partitioning. If we require more control over the partition key, where there are more constraints and rules around them inheritance-based partitioning works the best. We will explore that in a different article.

10 meme pictures which say thank you better than words | Staffino

Did you find this article valuable?

Support Gayan Hewa by becoming a sponsor. Any amount is appreciated!