Working with PostgreSQL, and virtually any database system, extends far beyond merely inserting and retrieving data. Many application and business processes, maintenance tasks, reporting, and orchestration tasks require the integration of a job scheduler. While third-party tools can drive automation, you can also automate the execution of predefined tasks directly within the database environment. Although system-level cron might be a starting point, the power of the database system lies in its ability to store all the necessary information alongside your data/schema. In this article, we will explore pg_cron and pg_timetable as two distinct PostgreSQL-specific tools for scheduled task automation.

The Many Roles of Job Scheduling

Usually, the first requirement to automate job execution is the optimisation of the PostgreSQL cluster and databases. Except in very low usage scenarios, routine maintenance is the foundation of all database deployments. Whether it is VACUUMing, index rebuilding, or updating statistics, these are tasks that you will eventually need to automate to maintain operational efficiency.

Job scheduling is also indispensable for maintaining data quality, particularly through operations like refreshing materialised views or batch removal of outdated data from the system. From there, it is just a step to reporting, which can involve the automation of generating operational and business reports.

I also believe that databases are an excellent place to coordinate business processes. Automating data flows between components, teams, and departments helps create agile systems. As mentioned above, there is no better place to store the description of such automations than alongside your data.

pg_cron: Automation's First Gear

Traditionally, the role of automation started with operating system tools like cron or Task Scheduler. That's where pg_cron comes in. It's a PostgreSQL extension that provides the simplicity and familiarity of cron's scheduling directly within the database environment.

Due to its dependency on a shared library (because of the use of the background worker), it requires a full cluster restart. Nevertheless, due to its popularity, it is available within most managed cloud environments.

# requires configuration update in postgresql.conf (or conf.d)
shared_preload_libraries = 'pg_cron'

# add extension as superuser 

As long as you are familiar with cron-like syntax, you can get started immediately by using commands like:

SELECT cron.schedule('30 3 * * 6', $$DELETE FROM events WHERE event_time < now() - interval '1 week'$$);

By default, pg_cron exposes its functionality in the postgres database (configurable), where it expects its metadata tables. Personally, I consider this a drawback, as it makes it less obvious to the casual DBA who might not be aware of the scheduling logic present.

In the same database, you can perform basic monitoring, for example, getting details of running and recently completed jobs:

SELECT * FROM cron.job_run_details ORDER BY start_time DESC LIMIT 5;

There is no direct support to talk to other PostgreSQL clusters (you would have to facilitate this, for example, using Foreign Data Wrappers).

While cron-like syntax is beneficial for adoption, it is where pg_cron falls short, as it does not support advanced cases like task chaining, dependencies, and triggers.

Full Speed with pg_timetable

If pg_cron offers automation in first gear, pg_timetable is where you can go full speed ahead. It not only provides cron-like syntax but elevates it to a whole new level with task chaining, parameter support, multiple execution clients, enhanced scheduling, and much more.

The first difference you might notice is in its distribution. Timetable is not a PostgreSQL extension but a standalone binary (or available as a Docker image) that you have to configure and run. This immediately raises the bar in terms of the infrastructure it requires. On the other hand, not being distributed as an extension makes it compatible with all managed services by default (if you can get the process up and running).

While the basic syntax might be similar to pg_cron:

SELECT timetable.add_job('execute-func', '5 0 * 8 *', 'SELECT public.my_func()');

this is not anywhere near the limits of pg_timetable. add_job is a helper function that creates a simple one-task chain. For task execution, it directly supports more options, including built-in tasks (like sending emails if properly configured, downloading files, sleeping, copying files, etc.), external commands, the ability to choose which client should execute the job, concurrency, and more.

In its full definition, add_job is quite powerful (see documentation for details):

SELECT timetable.add_job(
    job_name            => 'notify every minute',
    job_schedule        => '* * * * *',
    job_command         => 'SELECT pg_notify($1, $2)',
    job_parameters      => '["TT_CHANNEL", "Hello World!"]'::jsonb,
    job_kind            => 'SQL'::timetable.command_kind,
    job_client_name     => NULL,
    job_max_instances   => 1,
    job_live            => TRUE,
    job_self_destruct   => FALSE,
    job_ignore_errors   => TRUE
) AS chain_id;

The components of pg_timetable consist of the command (SQL/program or built-in), task controlling the configuration for the command execution (error handling, timeout, or database connection to use), and chain which can contain a number of tasks chained together.

The big difference comes in the scheduling options. While pg_cron is limited (as its name suggests) to cron-like syntax, pg_timetable uses it for simple use cases but offers much more. It supports schedules @every and @after, allowing repeated execution and breaking away from the limitations of cron notation as it can go down to custom intervals (including second intervals). Another case is @reboot for instances when the pg_timetable controller reconnects to the database.

The timetable setup manages own schema migrations and stores all the configuration in schema timetable (by default), where you can find the definitions of chains/tasks and relevant auditing information (logs).

And yes, if you have been paying attention, pg_timetable architecture allows for running tasks across multiple PostgreSQL clusters, making it an advanced orchestration tool. The database connection, which can be configured in-place or via a drop-in connection service file, can be set on a per-task basis.

With chains, the setup can be much more complex:

DO $$
  v_chain_id BIGINT;
  v_notify_task_id BIGINT;
    INSERT INTO timetable.chain (
    VALUES (
        'Generate Weekly Report',
        '5 4 * * 1',
    RETURNING chain_id INTO v_chain_id;

    INSERT INTO timetable.task (
    VALUES (
        1,                                    -- task_order
        'generate_report',                    -- task_name
        'SELECT generate_weekly_report();',   -- command
        NULL                                  -- database_connection

    INSERT INTO timetable.task (
    VALUES (
        2,                                   -- task_order (second task in the chain)
        'notify_management',                 -- task_name
        'SELECT pg_notify($1, $2)',          -- command
        'service=service_db'                 -- database_connection
    RETURNING task_id INTO v_notify_task_id;

    INSERT INTO timetable.task_parameter (task_id, order_id, value) 
        (v_notify_task_id, 1, 'management_notifications'),
        (v_notify_task_id, 2, 'Weekly report');
$$ LANGUAGE plpgsql;

The Comparison

When deciding between pg_cron and pg_timetable, it's essential to consider the specific needs of your use case, as both tools target different scenarios and scales of deployment. Here’s a comparison to help you decide which tool is more suitable for your requirements:

DistributionPostgreSQL extensionStandalone executable
SchedulingLimited (cron-like)Extensive (intervals, calendar, custom)
Job TypesSQLSQL, Built-in, Shell
Job ChainingNoYes
Error HandlingBasicIntermediate
Ease of UseEasyModerate
OrchestrationSingle ClusterAdvanced (multiple clusters)

When to Use pg_cron

  1. If your requirements are straightforward, such as running maintenance tasks, refreshing materialised views, or generating periodic reports using simple SQL commands, pg_cron is an excellent choice. Its cron-like syntax is familiar and easy to use.
  2. pg_cron is a PostgreSQL extension, making it easier to install and configure within your existing PostgreSQL environment. This makes it ideal for users who prefer minimal setup effort.
  3. When your environment is limited to a single PostgreSQL cluster, pg_cron is well-suited for the job. It does not support orchestration across multiple clusters, so it’s best used in environments where all operations are confined to one database cluster.
  4. If Basic Error Handling and Logging is sufficient, pg_cron provides the necessary functionalities without additional complexity.

When to Use pg_timetable

  1. If you need more advanced scheduling capabilities and task depedencies and chaining, such as task chaining, complex intervals, and custom execution times, pg_timetable is the better choice. It supports extensive scheduling options beyond the typical cron syntax.
  2. pg_timetable supports a variety of job types, including SQL commands, built-in tasks (like sending emails or downloading files), and shell commands. This makes it ideal for more complex automation needs that go beyond simple SQL execution.
  3. If your environment spans multiple PostgreSQL clusters, pg_timetable can handle advanced orchestration, allowing tasks to be executed across different clusters seamlessly.
  4. Intermediate Error Handling and Logging provides better insights and control over job executions.
  5. Although bit more complex to setup, pg_timetable works as standalone Service, which makes it suitable for environments where extensions cannot be installed directly.

Choosing between pg_cron and pg_timetable depends on your specific needs. For simpler, single-cluster tasks, pg_cron offers ease of use and straightforward setup. For more complex requirements involving advanced scheduling, task dependencies, and multi-cluster orchestration, pg_timetable provides a more powerful and flexible solution.

By understanding the strengths and limitations of each tool, you can make an informed decision that best suits your database automation needs.

PS: there's also pgAgent, but it is generally not used and does not seem to be actively maintained.