Beyond Simple Upserts with MERGE in PostgreSQL

Understanding how comfortable someone is with databases and SQL often comes down to the features they use. In PostgreSQL, one such feature that distinguishes more advanced users is the MERGE command, introduced in version 15 and expanded in version 17 (in beta at the time of writing this article). Before MERGE, developers typically relied on INSERT ... ON CONFLICT DO UPDATE for upserts—a method introduced in PostgreSQL 9.5 that has since become a staple in many developers' toolkits.

While ON CONFLICT offers a straightforward solution for simple upsert scenarios, it can quickly become limiting as business logic grows in complexity. This is where the MERGE command excels. Introduced in the SQL:2003 standard, MERGE allows for more sophisticated data synchronisation tasks by combining multiple operations—such as conditional inserts, updates, and deletes—into a single, atomic statement.

In this article, we’ll explore the capabilities of the MERGE command, comparing it with traditional upsert methods and examining how it can streamline database operations. Through practical examples, we'll illustrate how MERGE can simplify even the most complex workflows, making it an indispensable tool for developers working with PostgreSQL.

This introduction sets the stage by highlighting the evolution from ON CONFLICT to MERGE, explains the context in which MERGE becomes valuable, and clearly outlines what the reader will gain from the article.

Hands-on Example: Managing a Score Points System for a Mobile Game

Imagine you’re managing a tournament score system for an online game, where players earn special status by participating in weekly tournaments. The simple business logic would be:

Players can earn score points by participating in the tournament.
Players must participate in all tournaments to maintain their score (i.e., maintain the streak).
Players who participate in a second consecutive tournament can achieve the status of veteran (if they reach 1,000 score points) or gain the status of star (if they reach 100 or more score points).

The score points are updated weekly, based on the outcome of the previous week's participation.

We will start with a simple database schema consisting of a table tournament_scores that tracks (as the name suggests) only active users.

CREATE TABLE tournament_scores (
    player_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    player_name TEXT UNIQUE NOT NULL,
    score INT NOT NULL DEFAULT 0,
    status TEXT NOT NULL DEFAULT 'newbie' CHECK (status IN ('newbie', 'veteran', 'star'))
);

And populate it with some sample data:

INSERT INTO tournament_scores (player_name, score, status)
VALUES 
    ('PlayerOne', 900, 'newbie'),   -- Regular player, close to Veteran promotion
    ('PlayerTwo', 1200, 'veteran'), -- Already a Veteran player
    ('PlayerThree', 300, 'newbie'); -- Regular player with a lower score

Upsert using `ON CONFLICT`

When activity data is received, there are multiple ways to process it. ON CONFLICT is used here for demonstration purposes.

INSERT INTO tournament_scores (player_name, score)
VALUES 
    ('PlayerOne', 50),    -- Add points for PlayerOne
    ('PlayerTwo', 120),   -- Add points for PlayerTwo
    ('PlayerFour', 70)    -- New player without an account, PlayerFour
ON CONFLICT (player_name) 
DO UPDATE SET
    score = tournament_scores.score + EXCLUDED.score,
    status = CASE WHEN (tournament_scores.score + EXCLUDED.score) > 1000 THEN 'veteran' ELSE 'newbie' END;

This performs the basic requirements, updating the scores and possibly evaluating the status for existing tournament users. To remove users who no longer participated (and hence broke the streak) and apply other conditional logic, you would need to use separate statements.

Handling Upserts with `MERGE`

Now, let’s introduce the MERGE command and implement the same logic as above.

MERGE INTO tournament_scores ts
USING (
    VALUES 
        ('PlayerOne', 50),    -- Add points for PlayerOne
        ('PlayerTwo', 120),   -- Add points for PlayerTwo
        ('PlayerFour', 70)    -- New player without an account, PlayerFour
) AS v(player_name, score_added)
ON ts.player_name = v.player_name
WHEN MATCHED THEN
    UPDATE SET
      score = ts.score + v.score_added,
      status = CASE WHEN (ts.score + v.score_added) > 1000 THEN 'veteran' ELSE 'newbie' END
WHEN NOT MATCHED THEN
    INSERT (player_name, score)
    VALUES (v.player_name, v.score_added);

Here, we define the table tournament_scores as the target and the list of VALUES as the source. Using a conditional clause, we define the logic for both matched and unmatched entries, giving us both INSERT and UPDATE paths.

The evaluation paths in this case are called when_clauses. The MERGE statement allows you to specify multiple clauses with different conditions, for example, allowing you to award users star status if they gain more than 100 score points within a given tournament.

MERGE INTO tournament_scores ts
USING (
    VALUES 
        ('PlayerOne', 50),    -- Add points for PlayerOne
        ('PlayerTwo', 120),   -- Add points for PlayerTwo
        ('PlayerFour', 70)    -- New player without an account, PlayerFour
) AS v(player_name, score_added)
ON ts.player_name = v.player_name
WHEN MATCHED AND v.score_added > 100 THEN
    UPDATE SET score = ts.score + v.score_added, status = 'star'
WHEN MATCHED THEN
    UPDATE SET
      score = ts.score + v.score_added,
      status = CASE WHEN (ts.score + v.score_added) > 1000 THEN 'veteran' ELSE 'newbie' END
WHEN NOT MATCHED THEN
    INSERT (player_name, score)
    VALUES (v.player_name, v.score_added);

Here’s a summary of how it works:

The MERGE command evaluates the when_clauses in the order they are written.
If a row matches the condition specified in a clause, the action defined in that clause is performed, and the row is no longer eligible to be matched against subsequent when_clauses.

While we introduced multiple evaluation paths, the MERGE command in PostgreSQL goes beyond that and expands the WHEN NOT MATCHED clause, which effectively becomes WHEN NOT MATCHED [BY TARGET]. PostgreSQL allows you to specify WHEN NOT MATCHED BY SOURCE to perform the necessary merge statement for the data not present in the source table.

Handling DELETEs with `MERGE`

The functionality completely missing from regular upserts with ON CONFLICT is the ability to delete missing entries. In our sample scenario, we want to penalise players who broke their streak and did not participate in last week's tournament by effectively deleting their entries from the target table.

MERGE INTO tournament_scores ts
USING (
    VALUES 
        ('PlayerOne', 50),    -- Add points for PlayerOne
        ('PlayerTwo', 120),   -- Add points for PlayerTwo
        ('PlayerFour', 70)    -- New player without an account, PlayerFour
) AS v(player_name, score_added)
ON ts.player_name = v.player_name
WHEN MATCHED AND v.score_added > 100 THEN
    UPDATE SET score = ts.score + v.score_added, status = 'star'
WHEN MATCHED THEN
    UPDATE SET score = ts.score + v.score_added
WHEN NOT MATCHED THEN
    INSERT (player_name, score)
    VALUES (v.player_name, v.score_added)
WHEN NOT MATCHED BY SOURCE THEN 
    DELETE;

The introduction of the DELETE merge operation complements all possible MERGE outcomes:

The first is merge_update, applicable to WHEN MATCHED and WHEN NOT MATCHED clauses, consisting of regular UPDATE operations.
The second is merge_insert for the WHEN NOT MATCHED clause, allowing (as the name implies) data to be inserted.
The DELETE we introduce is part of merge_delete.

Technically, there's also the ability to specify DO NOTHING for all available when_clauses, similar to upserts using ON CONFLICT.

Using `MERGE` Output

Starting from PostgreSQL 17, the MERGE command has been extended to support RETURNING clauses to process merged data further. When an INSERT or UPDATE action is performed, the new values of the target table's columns are used. When a DELETE is performed, the old values of the target table's columns are used.

Conclusion

The MERGE command significantly enhances the ability to handle complex INSERT and UPDATE logic by allowing multiple operations within a single query while also capturing even the most intricate scenarios. In this article, we have explored the syntax and common use cases of MERGE. For further exploration, note that the source dataset is where you can perform any data transformations required, making MERGE particularly well-suited for ETL operations, data archival, cleanup tasks, and more.

Incorporating MERGE into your PostgreSQL toolkit can simplify database operations, reduce the risk of data inconsistencies, and streamline your

Hands-on Example: Managing a Score Points System for a Mobile Game#

Upsert using ON CONFLICT#

Handling Upserts with MERGE#

Handling DELETEs with MERGE#

Using MERGE Output#

Conclusion#