DELETEs are difficult

Your database is ticking along nicely - until a simple DELETE brings it to its knees. What went wrong? While we tend to focus on optimizing SELECT and INSERT operations, we often overlook the hidden complexities of DELETE. Yet, removing unnecessary data is just as critical. Outdated or irrelevant data can bloat your database, degrade performance, and make maintenance a nightmare. Worse, retaining some types of data without valid justification might even lead to compliance issues.

At first glance, the DELETE command seems straightforward. Even the PostgreSQL documentation provides simple examples like:

DELETE FROM films WHERE kind <> 'Musical';
DELETE FROM films;

These queries might work effortlessly on your development machine, where only a few hundred records exist. But what happens when you try running a similar DELETE in production, where datasets are orders of magnitude larger?

In this article, we’ll uncover why DELETE operations demand careful consideration and explore how to handle them effectively.

What really happens when you DELETE the data?

At first glance, a DELETE query might seem straightforward. However, once the query is executed, a series of intricate steps occur:

Row Identification: Similar to a SELECT operation, the query identifies rows visible to the current transaction (considering MVCC) and checks for locks.
Lock Acquisition: The database acquires row-level exclusive locks to prevent other operations on the targeted rows.
BEFORE DELETE trigger: If a BEFORE DELETE trigger is defined, it is executed at this point.
Marking Rows as Deleted: Instead of being physically removed, the rows are marked as deleted in the current transaction, rendering them invisible to future queries (depending on transaction isolation). If table has large data objects, TOAST table will have to be involved too.
Index Updates: The corresponding index entries are also marked for deletion (if applicable).
Cascaded Actions: Cascading operations, such as ON DELETE CASCADE, are performed on related tables.
AFTER DELETE Trigger: If an AFTER DELETE trigger is defined, it is executed.
Write-Ahead Log (WAL): Changes are recorded in the WAL-first at the row level, followed by index-level updates.

Only when the transaction is committed do these changes become permanent and visible to transactions starting afterward. However, even at this point, the data is not physically removed. This is how bloat is created.

Until the autovacuum process or a manual VACUUM operation reclaims the space, the “deleted” data remains. This leftover data contributes to bloat, which can degrade query performance over time.

The key question now is whether DELETEs are truly the hardest operation we can subject our database to. The answer? Quite possibly. While UPDATEs come close in complexity, they’re typically designed in ways that make them less challenging:

UPDATEs usually modify only a limited number of columns, reducing the potential number of index updates lower.
Not all UPDATEs trigger a full row (COLD) update, where the old row is marked as dead and a new row is created. With careful table and query design, you can minimise these cases. HOT (Heap-Only Tuple) updates, for instance, are easier to achieve with fixed-length columns.
Unlike DELETEs, UPDATEs don’t trigger cascaded actions - they only involve triggers that are explicitly defined.

And then comes AUTOVACUUM

When autovacuum kicks in (and you really want it to kick in) - typically triggered by thresholds for the number of dead tuples or changes to the table - a significant amount of work is required to clean them up. Let’s break it down step by step:

The process begins by scanning the table. While it’s not always a full table scan, the autovacuum checks the visibility map and pages where dead tuples might exist. This can happen incrementally, allowing even large tables to be processed - provided your autovacuum settings allow it to run frequently enough.
Each tuple is checked to ensure it’s no longer visible to any active or pending transactions.
Dead tuples that pass the visibility check are physically removed from the table.
Corresponding index entries for the removed tuples are updated.
The now-empty space is marked for reuse in future INSERT or UPDATE operations.
Table statistics are updated to reflect the current state, helping the query planner make better decisions.
The changes, including tuple removals and index updates, are logged in the Write-Ahead Log (WAL) for durability and replication.
If the table has TOASTed data (large objects), the associated TOAST tables are processed.
The visibility map is updated to mark cleaned pages as fully visible again.
Autovacuum resets thresholds to determine when the next vacuum operation should occur.

This process continues until it reaches the configured vacuum cost limit (in the case of autovacuum), at which point it pauses or stops. While autovacuum helps keep your database in check, it’s clear that reclaiming dead tuples is no small task - and it underscores why DELETE operations can have lasting effects on database performance.

While the AUTOVACUUM might sounds as a bad news, without it, your database would quickly become bloated with dead tuples, leading to degraded performance, slower queries, increased storage usage, and even the potential for out-of-disk errors as unused space cannot be reclaimed.

Further considerations

For what seems like a simple DELETE, a surprising amount of work has already taken place, but the complexity doesn’t stop there. DELETE operations can introduce additional challenges, particularly when replication, resource contention, or the size of the operation comes into play.

In environments with replication to hot standby or replicas, DELETEs become more time-sensitive. The transaction cannot complete until the corresponding WAL (Write-Ahead Log) records have been written to disk on the standby. This is a fundamental requirement for maintaining data consistency in high-availability setups, where at least one standby server is typically involved. Additionally, if the standby is actively serving read operations, it must account for DELETEs before confirming the changes, potentially introducing further delays.

The size of the DELETE operation also plays a critical role. Small DELETEs, such as removing a single row, tend to have minimal impact. However, as the size of the operation grows, so does the volume of WAL records generated. Large DELETEs can overwhelm the system, slowing down transactions and straining the replication process. Standby servers must work harder to process the incoming WAL stream, which can bottleneck performance if their throughput is insufficient.

Resource contention adds yet another layer of complexity, particularly for large DELETEs. Generating WAL records, handling regular transactional workloads, and running background processes can collectively push the system towards I/O saturation. This creates competition for CPU and memory resources, leading to slower operations across the board.

Finally, once the data is marked for deletion, the autovacuum process must eventually step in to physically remove it. This introduces its own set of challenges, as autovacuum must deal with the same resource contention and I/O demands, compounding the overall impact of the initial DELETE operation.

Soft-deletes are not the solution

Soft-deletes might seem like an easy way to sidestep the complexities of traditional DELETE operations. After all, updating a deleted_at field is straightforward and unlikely to trigger a COLD update. However, soft-deletes are not a true mechanism for data removal, and they come with their own set of complications.

While soft-deletes can provide a simple way to implement “undo” functionality, they raise serious questions about data consistency. For instance, do you only mark the main entity as deleted, or do you also cascade the status to all related records in referenced tables? Failing to cascade properly can leave your database in an inconsistent state, making it difficult to maintain data integrity.

Soft-deletes also require consideration in your application logic. Every query must include appropriate filters to exclude “deleted” rows, which can complicate query design and increase the risk of oversights. One missed filter could expose data that should no longer be visible, leading to potential security or business logic issues.

Finally, soft-deletes don’t solve the problem - they do merely postpone it. The data is still in your database, consuming storage and potentially contributing to performance degradation over time. Sooner or later, you’ll need to deal with the actual removal of this data, bringing you back to the same challenges that DELETEs pose in the first place.

At the time of writing this article we can only speculate how much will support of temporal PRIMARY KEY and UNIQUE constriants in PostgreSQL 18 will transform the balances in future. But given the complexity of the feature I wouldn't bet on it just yet.

Batching is the answer

Giving PostgreSQL time to process and catch up with large-scale changes is critical when dealing with operations like DELETEs. The core issue here is the duration and magnitude of the transaction. The shorter the transaction and the fewer changes made, the better PostgreSQL can manage and reconcile those changes. This principle is universal across all database operations and underscores the importance of minimizing the impact of individual transactions.

While you can optimize certain aspects, like row identification (using indexes, clustering, or similar techniques), larger datasets demand a more strategic approach - batching. For example, deleting 1 million rows in a single transaction is a textbook case of what not to do. Instead, splitting the operation into smaller batches, such as deleting 10,000 rows across 100 iterations, is far more effective.

Will this method be faster than performing one massive DELETE? Likely not, especially if you include a wait time between batches to allow PostgreSQL to handle other workloads. However, the trade-off is worthwhile. By batching, you give PostgreSQL more breathing room to manage changes without overwhelming regular transactional workloads - unless, of course, you’ve scheduled dedicated maintenance time for the operation.

How to Batch DELETEs

The easiest way to batch DELETEs is to use a subquery or a Common Table Expression (CTE) to limit the number of rows affected in each iteration. For example, instead of executing a bulk DELETE like this:

DELETE FROM films WHERE kind <> 'Musical';

You can break the operation into smaller chunks. Using a query like the following, you can repeatedly delete rows in manageable batches (for instance, using \watch in psql to automate iterations):

DELETE FROM films
WHERE ctid IN (
    SELECT ctid FROM films
    WHERE kind <> 'Musical'
    LIMIT 250
);

The use of ctid in this example is PostgreSQL system column which provides a unique identifier for each row. By selecting ctid values in the subquery, you can limit the number of rows affected in each iteration. This approach is more efficient than using LIMIT directly in the main query, as it avoids the need to re-scan the table for each batch.

If you don't feel comfortable with ctid (which might deserve article on its own) you can use regular lookup by primary key and LIMIT.

Planning for Autovacuum

Batching alone doesn’t directly solve the issue of autovacuum catching up with the changes. You’ll need to plan for that separately. Adjusting autovacuum settings or triggering manual VACUUM and VACUUM ANALYZE runs can help manage bloat created during the DELETE process. However, disabling autovacuum is rarely advisable unless you’ve carefully planned for manual maintenance throughout the batch operations. Skipping this step risks leaving behind performance-impacting bloat that will require even more effort to address later.

Drop whole range of data with partitioning

Data that is naturally segmented - for example by time of the creation - makes it an excellent candidate for removal through partitioning. Partitioning allows you to bypass DELETE operations altogether by simply dropping or truncating the relevant partitions. This approach is far more efficient and avoids the overhead of scanning, locking, and marking rows as deleted, effectively eliminating the problem with the bloat.

While partitioning adds some complexity to schema design and query planning, it can provide significant performance benefits for DELETE-heavy workloads, especially when combined with automated partition management.

Conclusion

DELETE operations are often a source of unpleasant surprises - not just by affecting performance and creating bloat, but by striking back at the times we least expect. To handle them effectively, focus on strategies such as batching, monitoring autovacuum, or leveraging partitioning for large datasets. By considering DELETE operations during schema design, you can maintain an efficient database, reduce maintenance headaches, and ensure it continues to run smoothly as your data grows.

What really happens when you DELETE the data?#

And then comes AUTOVACUUM#

Further considerations#

Soft-deletes are not the solution#

Batching is the answer#

How to Batch DELETEs#

Planning for Autovacuum#

Drop whole range of data with partitioning#

Conclusion#