Text Search With Trigrams in PostgreSQL – Part 2
In the previous blog post, we learned how to use trigrams for text searching and how they work behind the scenes in PostgreSQL. We also ran some queries and found the results. But that’s not the end. Let’s see how fast the text search is happening and look for ways to improve the query performance.
To see how much time a given query takes to execute, we can use the EXPLAIN ANALYZE command.
EXPLAIN ANALYZE rest of your SQL command goes here;
The following query analyzes how long it takes to find forenames related to “pery”.
EXPLAIN ANALYZE SELECT DISTINCT(forenames_normalized) FROM persons WHERE forenames_normalized %> 'pery';
As you can see, the above query takes 3467ms(see execution time in the second image) to complete. It is not a good performance. The execution time will only keep on increasing with an increase in the number of records.
Causes of Slow Performance
Now the query is executing slowly because of the following reasons:
- It iterates all the records in the database row by row to check for matching names.
- At each row, it needs to extract the trigrams of the column and compare them with the trigrams of the given word.
Two things are slowing down the search. The iteration of the table row by row and the calculation of trigrams that takes place at each iteration.
Can we pre-calculate the trigrams and store them somewhere? Can we reduce the number of linear searches that takes place? Yes, We can do that with the help of indices.
To follow along you can restore this dump file in your local databse.
Improving trigrams performance with the Indices
pg_trgm module provides two types of indices for trigrams, a GIN index, and a GIST index.
CREATE INDEX index_name ON table_name USING index_type(column_name ops_type)
You can create a GIN Index using the following command.
CREATE INDEX forenames_normalized_idx ON persons USING GIN(forenames_normalized gin_trgm_ops);
To create a GiST index use the following command.
CREATE INDEX forenames_normalized_idx ON persons USING GIST(forenames_normalized gist_trgm_ops);
Performance After Indexing
I have created a GIN index. The good thing about creating an index is it improves the speed of searching and provides trigram-based search support for the operators LIKE, ILIKE, ~, and ~*.
We will search for a person’s forename containing “mond”.
SELECT DISTINCT(forenames_normalized) from persons WHERE forenames_normalized % 'mond';
You can see in the above image the execution time went from 3000ms to 300ms after indexing. That’s about 10 times faster than before.
Let’s run some more queries to analyze the performance before and after indexing.
Search for forenames with “mond” using word similarity.
SELECT DISTINCT(forenames_normalized)from persons WHERE forenames_normalized %> 'mond';
Let’s search for forenames that end with ‘man’.
SELECT DISTINCT(forenames_normalized) FROM persons WHERE forenames_normalized ~ 'man$';
In the above queries, we saw huge performance improvement. The developer can decide which type of index(GIn or GIST) to use. Now we have understood how the indices can help improve the performance of text search.
In the third part of this series of blogs, we will see how we can implement the trigrams search functionality in a Rails Application.
To learn the basics of trigrams in Postgres(Previous Blog): https://engineering.rently.com/text-search-with-trigrams-in-postgresql/
To learn more about Postgres trigrams, have a look at this official documentation: https://www.postgresql.org/docs/current/pgtrgm.html