PostgreSQL performance for Laravel applications

Sebastian Ravenscroft
6 min readMay 10, 2021

--

When you run an SQL query the database uses a number of techniques in order to optimise your query and return the results as quickly as possible.

In this article we will run some experiments to get a better understanding of how to improve performance.

If you feel like coding along we will start out with a fresh install of Laravel 8 hooked up with a local PostgreSQL 13 database.

How can we seed dummy data?

The easiest way to create dummy data is using Laravel model factories. We can use the Faker package to generate realistic dummy data to fill the table.

Imagine we are working on software for a publication and need store information about thousands of articles.

First lets run a migration to create an articles table:

php artisan make:migration create_articles_table

Now here is our code for the migration:

Now run the migration and you will be able to see the table using a database manager such as TablePlus.

php artisan migrate

Now we can create a data model using this command:

php artisan make:model Article

Here’s the contents of the generated Article.php model.

Finally run a command to create a model factory which makes it easy to seed dummy data:

php artisan make:factory ArticleFactory

Again the system generates a new ArticleFactory.php file for us which should look like this:

Inside the definition method we need to specify the kind of data we should use to seed each column.

Our factory extends the Illuminate\Database\Eloquent\Factories\Factory.php class which gives us access to the Faker library.

Here is our updated code for the ArticleFactory.php class:

Now open up Tinker (the Artisan console) and you can seed the table with a single command. This will create 10,000 random entries in our articles table.

php artisan tinker

Article::factory()->count(10000)->create();

This can take a few seconds to run. Once it completes take another look at the articles table and you should find it full of dummy data.

This is great progress!

However we need more data. If we really want to investigate PostgreSQL performance under pressure then we need a table with millions of rows.

Artisan factories are powerful and they are perfect for running unit tests. However they are not particularly fast, and it would take about 15 minutes to generate a million rows.

How to seed millions of rows

We can achieve a faster result by creating a Seeder class that prepares an array of values and then inserts them into the database in bulk.

php artisan make:seeder ArticleSeeder

Open the new ArticleSeeder.php file and add the following code. Notice we had to break our insert into chunks of 1000 rows. If we just ran a single loop we would run out of memory in PHP.

Find the DatabaseSeeder.php file and add a call to our ArticleSeeder.

Now we can run our seeder from Artisan:

php artisan db:seed

In just a couple of minutes we have seeded our table with a million rows.

Reading from the Database

In the TablePlus SQL editor enter a select all command:

SELECT * FROM articles;

This took 5.002 seconds to run! The database had to read through every row of our table in order to return the data.

What if we don’t need all the data? Let’s try selecting just the a couple of fields.

SELECT id, publication_date, title FROM articles;

This was a modest improvement, on my laptop the query took 3.254s.

Selecting all our data will always be slow so we’re much better off limiting our selection. If we need to display a lot of data we can paginate the results in the UI.

Sequential Scans

When we run where queries Postgres can use a few different strategies to find our data.

A Sequential Scan is the most basic option. It reads through the table and simply filters out any rows that don’t match our condition.

Say we want to select all the articles written since the start of the year:

SELECT * FROM articles WHERE publication_date > ‘2021–01–01’;

This took 107ms to run. We can use the explain command to understand how Postgres plans to execute a query:

EXPLAIN SELECT * FROM articles WHERE publication_date > ‘2021–01–01’;

Instead of returning us the results, Postgres returns a query plan. In this case it chose to run a Parallel Sequential Scan which just means it shared the load over two processes to make it faster.

For each step Postgres describes a cost, which estimates how expensive that particular step will be. It predicts a cost of 0.00 to get up and running, and a cost of 46878.79 to complete the scan. These values are a score used to compare the relative cost of different steps and strategies.

Postgres also returns a rows value which is an estimate for the number of rows the query could return, and a width value which is an indication of how much data it will fetch from each row of the table.

Indexes and Bitmaps

Our ‘where’ query ran in 107ms which limits us to around 10 queries per second. If we expect to see a lot of traffic calling this query then we would need to make it faster.

We can add an index to the publication_date column so that Postgres can simply look up the results, rather than having to read through the whole table every time.

We can add an index on-the-fly using an SQL query:

CREATE INDEX ON articles (publication_date);

This took 526ms. Bear in mind that Postgres locks writing to the table whilst it creates the index. If you were working on a live system then you can create the index concurrently which doesn’t lock writes to the table.

CREATE INDEX CONCURRENTLY ON articles (publication_date);

Now we have our index we can check how it affects our performance. Calling the explain command again:

In our query plan the indentation conveys the order of execution with the inner commands running first.

This time Postgres can use our index and run a Bitmap Index Scan which is much faster — the relative cost is just 72.00 compared to 46878.79 for the sequential scan!

Once that’s complete Postgres still has to fetch our data. It calls a Bitmap Heap Scan on the articles table with a cost of 17083.46. This means Postgres is still visiting each page of data on the disk. However it visits each page only once, grabbing the relevant rows from each page.

This time the query ran in 36ms which is around three times faster.

How much does it cost to create an index?

Creating an index on a column makes it faster to find or filter results based on the value in that column.

However it creates more work when we write to the database. Whenever we add a row to the database Postgres needs to also update the index to include the new row.

Let’s try seeding our database again with the new index and see how much longer it takes to write a million rows.

First run a truncate command to delete the contents of the articles table:

TRUNCATE articles RESTART IDENTITY;

Restart identity will just reset the id column so the count starts from 1 again.

Now run the seeder again from your terminal:

php artisan db:seed

This time it took 78.40 seconds to write a million rows, compared with 76.00 seconds without the index. So we took a 3% hit on write performance, for a 3x gain in speed for that particular ‘where’ query.

Adding indexes does cost us some write performance, and also the disk space to store the index. When deciding which columns to index we need to consider a few things:

  • Whether we are writing (or updating) rows frequently
  • How often, and how quickly, we need to read data from the table
  • Which is most important to us — the write speed or the read speed?

--

--

Sebastian Ravenscroft
Sebastian Ravenscroft

No responses yet