We need to talk about ENUMs

Designing a database schema, whether for a new application or a new feature, always raises a lot of questions. The choices you make can have a big impact on how well your database performs and how easy it is to maintain and scale. Whether you’re just getting started with PostgreSQL or consider yourself a seasoned pro, it’s easy to rely on old habits or outdated advice. In this article, I want to take a fresh look at one of those topics that often sparks debate: the use of ENUMs in PostgreSQL.

I have to admit, not so long ago, I would advise "don't use ENUMs" without thinking about it too much. Relying only on random articles and some personal but outdated experience, this had been my go-to answer for some years. And while there were several limitations of ENUMs in PostgreSQL in the (distant) past, the support has improved a long time ago.

The improvements are so long-standing that:

PostgreSQL 9.1 (released in 2011) introduced the option to add new values to ENUMs without a table rewrite. From what I can say this fact alone remained the source of biggest misconception when thinking about ENUMs.
Renaming of values was added in version 10 (2017).
The ability to ALTER TYPE ... ADD VALUE in a transaction block was introduced in PostgreSQL 12 (2019).

And new features coming (at the time of writing this article) with PostgreSQL 17 allow the use of newly added values within the same transaction block (previously not possible without an explicit commit).

Therefore, I want to correct even myself and say—let's give ENUMs another chance. This article will go into detail and help us correctly decide when it makes sense to use them or not.

How are ENUMs Implemented?

Every stored ENUM value occupies 4 bytes on disk. This is the size of the OID, representing the actual ENUM value, stored as a row within the pg_enum table. You can see this yourself by querying the table directly, but it will come by default with no data defined.

# select * from pg_catalog.pg_enum;

 oid | enumtypid | enumsortorder | enumlabel
-----+-----------+---------------+-----------
(0 rows)

But once you define a new ENUM data type, using:

CREATE TYPE order_status AS ENUM ('new', 'pending', 'processing', 'shipped', 'delivered');

You will get the actual OIDs.

# select * from pg_catalog.pg_enum;

  oid   | enumtypid | enumsortorder | enumlabel
--------+-----------+---------------+------------
 211018 |    211016 |             1 | new
 211020 |    211016 |             2 | pending
 211022 |    211016 |             3 | processing
 211024 |    211016 |             4 | shipped
 211026 |    211016 |             5 | delivered

The table straightforwardly identifies both the type and individual values, but also the enumsortorder, which is the foundation for BEFORE/AFTER new value(s) definition.

ALTER TYPE order_status ADD VALUE 'cancelled' BEFORE 'shipped';

Resulting in:

  oid   | enumtypid | enumsortorder | enumlabel
--------+-----------+---------------+------------
 211018 |    211016 |             1 | new
 211020 |    211016 |             2 | pending
 211022 |    211016 |             3 | processing
 211024 |    211016 |             4 | shipped
 211026 |    211016 |             5 | delivered
 211027 |    211016 |           3.5 | cancelled

To confirm the actual storage size of the ENUM value, you can use a sample table, the previously defined type order_status, a single data row, and the pg_column_size function.

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    status order_status NOT NULL,
    order_date DATE NOT NULL DEFAULT CURRENT_DATE
);

INSERT INTO orders (status, order_date) VALUES
('new', '2024-09-01');

This gives you confirmation of the actual 4-byte storage.

# SELECT id, status, pg_column_size(status) AS status_size

 id | status | status_size
----+--------+-------------
  1 | new    |           4

Returning to the pg_enum table, the structure of the table provides other important clues:

enumlabel is of type name, which is effectively a 63-byte varchar for storing system identifiers.
The UNIQUE CONSTRAINT pg_enum_typid_label_index prevents us from creating two values with the same name.
However, given the nature of VARCHAR, it is case-sensitive, allowing you to define new and NEW as two distinct values.

Life with ENUMs

Now that we understand how ENUMs are implemented, let's review the lifecycle of such a data type in a typical database.

As demonstrated above, creation of new ENUMs is simple, and since version 12 (and expanded in 17), it's possible even in transaction blocks, making it easier to work with database migration tools without needing to explicitly worry about transaction management.

Adding new values does not require a table rewrite and can be done without further hesitation. You can also safely rename existing values. Similarly, you can use ENUMs in arrays, indexes, or compare them as needed (based on their order).

ALTER TYPE name ADD VALUE [ IF NOT EXISTS ] new_enum_value [ { BEFORE | AFTER } neighbor_enum_value ]

ALTER TYPE name RENAME VALUE existing_enum_value TO new_enum_value

It's also important to mention that ENUMs allow you to use a default value, like this:

status order_status NOT NULL DEFAULT 'new'

Unfortunately, this is where the flexibility ends at the moment. There's no easy way to remove or re-order existing values (please, don't modify pg_enum—or any system tables, for that matter). The topic of removing existing ENUM value(s) can easily open up a whole discussion about the best way to 'deprecate' and drop a value over time. The only suitable way to perform these operations is by creating a new type and altering the column. Please be aware of the ways described in How not to change PostgreSQL column type will apply in this scenario.

Using CHECK Constraints as an Alternative to ENUMs

While ENUMs are a powerful tool in PostgreSQL for enforcing a fixed set of values in a column, they come with the limitations mentioned above. An alternative approach to achieving similar functionality is to use a CHECK constraint with a TEXT column. This method offers more flexibility when managing the allowed values, especially in scenarios where the set of valid values might change frequently.

A CHECK constraint can be applied to a column to enforce that its value must be one of a predefined set of values. Here’s how you can define a table using a CHECK constraint as an alternative to an ENUM:

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    status TEXT NOT NULL,
    order_date DATE NOT NULL DEFAULT CURRENT_DATE,
    CHECK (status IN ('new', 'pending', 'processing', 'shipped', 'delivered'))
);

Modification of the constraints is possible, but you will always have to DROP CONSTRAINT and then ADD CONSTRAINT.

ALTER TABLE orders DROP CONSTRAINT orders_status_check;

ALTER TABLE orders ADD CONSTRAINT orders_status_check CHECK (status IN ('new', 'pending', 'processing', 'shipped', 'delivered', 'returned'));

While this operation is not as 'heavyweight' as changing the data type, please be aware that an ACCESS EXCLUSIVE lock will be acquired for the entire table—meaning no other operations can be performed on the table. With a growing table size and increasing concurrency, this might still lead to application-level downtime.

Another limitation is the lack of direct support for sorting, meaning you will have to implement it manually in every query or abstract the sorting detail using a view.

SELECT id, status, order_date
FROM orders
ORDER BY
    ARRAY_POSITION(
        ARRAY['new', 'pending', 'processing', 'shipped', 'delivered', 'returned'],
        status
    );

Reference Tables as the Real Alternative

It might be surprising, but one of the motivations for using either ENUMs or CHECK constraints is to avoid additional JOINs. It's no surprise that JOINs are a source of pain for many people who struggle with SQL, and therefore they try to limit their use. While there might be negligible impact on performing such an operation, there are no other reasons not to use reference tables—it’s not storage (unless you over-optimise with smallint and extremely large datasets) and definitely not operational considerations.

In fact, reference tables are often better in most cases. They not only address all the use cases supported by both ENUMs and CHECK constraints, but they can also resolve the limitations of these methods.

Creating and Using a Reference Table

Let’s create a reference table similar to the ENUM data type we’ve discussed above:

CREATE TABLE order_statuses (
    status_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    status_name TEXT NOT NULL
);

CREATE UNIQUE INDEX unique_status_name ON order_statuses (LOWER(status_name));

INSERT INTO order_statuses (status_name) VALUES
('new'),
('pending'),
('processing'),
('shipped'),
('delivered'),
('cancelled');

Updating the Orders Table to Use the Reference Table

To use the reference table, you’ll need to alter your existing orders table. This involves dropping the current status column if it’s using an ENUM or CHECK constraint, and replacing it with a foreign key that references the order_statuses table:

ALTER TABLE orders DROP COLUMN IF EXISTS status;

ALTER TABLE orders ADD COLUMN status_id INT,
ADD CONSTRAINT fk_order_status
FOREIGN KEY (status_id) REFERENCES order_statuses(status_id);

Impact of Using Reference Tables

While reference tables do not provide anything for free, such as ENUM ordering or simple CHECK constraints, they offer far greater flexibility. Here are some examples of what you can achieve with reference tables:

Custom Sorting Logic: You can implement specific sorting by adding an additional sort_order column to the order_statuses table. This allows you to easily change the order of statuses without altering the schema.
Deprecating Values: By adding a deleted_at column (or a similar field), you can deprecate certain status values. Combined with a trigger, you can prevent these deprecated values from being used in new data while maintaining historical records.
Renaming Values: Unlike ENUMs, renaming a status is straightforward - simply update the value in the order_statuses table.
Internationalisation: If your application needs to support multiple languages, you can extend the order_statuses table with additional columns for different languages or even create a separate lookup table that stores translations.
Additional Metadata: Reference tables can store extra information about each status, such as descriptions, colours (for UI purposes), or associated icons, which can be particularly useful in applications with rich user interfaces.

Having mentioned all advantages of using Reference Tables, it's also important to mention the fact it's not straightforward to set the default values for such a references. Definitely not in obvious way which ENUMs provide. The real alternative is to create table/column without default value and alter it specifically for given database using ALTER COLUMN ... SET DEFAULT

ALTER TABLE orders
ALTER COLUMN status_id SET DEFAULT (SELECT status_id FROM order_statuses WHERE status_name = 'new');

Performance Considerations: ENUMs, CHECK Constraints, and Reference Tables

Thanks to their internal representation as OIDs, ENUMs offer predictable performance and indexing strategies, often outperforming text-based CHECK constraints. Since ENUM comparisons rely on integers, they tend to be faster than string comparisons required for CHECK constraints. Reference tables, on the other hand, inherently require a JOIN, which introduces the overhead of accessing two tables and potentially working with multiple indexes.

While I initially planned to publish benchmarks comparing the performance of all three approaches, I decided not to include specific numbers. It’s true that ENUMs consistently performed the fastest, and reference tables were somewhat slower due to the necessary JOIN. However, as with most benchmarks, the scenarios were artificial. The actual performance difference, though measurable, is unlikely to be significant in practical applications. In any reasonably complex query, other factors are far more likely to impact performance than whether you use ENUMs or not.

Conclusion: Use ENUMs Where They Make Sense

As with many decisions in software development, the use of ENUMs comes down to "it depends." This article has provided clarity on how ENUMs work, where their limitations lie, and how to make an informed decision about when to use them.

To recap:

ENUMs are ideal for small, static sets of values that require strict type enforcement and minimal changes, preferably data types that are not directly tied to the business logic.
CHECK constraints offer more flexibility than ENUMs, making them a good choice when the allowed values might change, but the data type remains simple, and there's no
Reference tables provide the most flexibility and scalability, making them the go-to choice for dynamic or complex value sets and when you want to avoid the limitations of ENUMs and CHECK constraints.

But in the end, no plan ever survives first contact with reality. It’s essential to revisit and reconsider your decisions as your application evolves, ensuring that your database schema continues to serve the needs of your project as effectively as possible.

And while it's not meant as advice - personally I'm still going to gravitate towards the advice "don't use ENUMs" though. I'm yet to see value-based field that does not change over time.

How are ENUMs Implemented?#

Life with ENUMs#

Using CHECK Constraints as an Alternative to ENUMs#

Reference Tables as the Real Alternative#

Creating and Using a Reference Table#

Updating the Orders Table to Use the Reference Table#

Impact of Using Reference Tables#

Performance Considerations: ENUMs, CHECK Constraints, and Reference Tables#

Conclusion: Use ENUMs Where They Make Sense#