Have you ever wondered that there is always a missing piece when working with Views? Yes, you guessed it. It is indexing and specially composite indexing on underlying tables. Materialized views offer much more than indexing but comes with its own drawbacks as well. Let’s look back at SQL views first and then into materialized views.
Views and When to go for views?
Let’s consider the below schema which has users and orders table.
Consider a case where we need to get the total order amount across each location, then below will be our query.
Let’s say we will be using this query in many places and we have decided to make it as a view instead, so that our query looks clean and tidy. Then the derived view would like below.
Now we shall query using the view that we had just created.
Great! We had optimized a join query!!!!!!
Actually, no! What we have done is just the abstraction of the original query. In fact, the SQL view which we have created is gonna work like an ‘alias’ and nothing beyond in terms of performance. It will still execute the original query, whenever the view gets called and will have the same performance of the original query.
So what advantage SQL views bring in?
The answer is the characteristics of SQL views itself. Since this is an abstracted version of the actual query, it simplifies the way of using it. In terms of security, you can grant access to views instead of granting it to a table which hides sensitive columns to be accessed.
Everything looks great here. But how is this going to boost performance?
Let’s dive in, how materialized view works when it comes to performance.
Imagine that you are running a coffee shop and you need to serve coffee on demand. There are multiple ways of achieving this. Upon receiving the coffee order, get the desired amount of coffee powder, milk and sugar. Heat up the milk, add coffee powder, mix the sugar in it and you are done.
The other way is to get a coffee vending machine. Add milk, water and coffee seeds in the right bucket. Whenever you receive any number of coffee orders, all you have to do is to press the right button. Now the vending machine takes care of the entire process. If you notice, the vending machine is gonna do the same process as the first way, but it reduces the human work involved. You can compare this to SQL views where the view takes care of the query part and all we need to do is to call the view alone.
Then there is another way where you have a vending machine and instead of waiting for coffee orders, you just fill up coffee in a flask and keep it ready to serve. The difference here is that, whenever a coffee order comes in, you are not instructing the vending machine to prepare coffee, rather you are serving the pre-processed coffee that you already have. This is how a materialized view works.
A materialized view is a pre-processed result set which is stored in a physical table and is ready to serve without executing the underlying query. Since the query beneath the materialized view doesn’t execute every time, it boosts the performance significantly.
Creating materialized views
Creating a materialized view is like creating a SQL view with minor differences. Let’s take the same example above, then the materialized view query will look like below.
Awesome! Now your query is being served much faster. But hang on! Wouldn’t my materialized view’s data get outdated?
Updating materialized views - Default way
That’s right, materialized views don’t give you the latest data when the underlying tables get updated. So you may need to manually update the materialized view as below.
Under the hood, this wipes out the existing data in the materialized view and reprocesses the query, to get the updated data. In short, the existing materialized view gets dropped and creates a fresh set of data from scratch.
Updating materialized views - Concurrently
The default way would solve the problem of serving outdated data. But the database would acquire an exclusive lock until the entire refresh process gets completed. In other words, till the entire process gets completed, no read and write operations could be performed on the materialized view.
Alert! New problem!
This is when you may need concurrent refresh. This will allow you to read materialized views, though it is getting refreshed in parallel. Below is the way to refresh concurrently.
In order to refresh concurrently, you would need to create an unique index on materialized view. The created index should include all our records and shouldn’t be applied partially using WHERE clause.In our case, we shall use the id of users and orders table.
How does concurrent refresh work under the hood?
So the materialized view is refreshed using temporary tables which is well supported by the unique index which we had created earlier. Since this is achieved by deleting expired records and inserting the new ones, it can lead to dead tuples and eventually the table starts bloating up.
Materialized views and dead tuples
It is obvious that you may see a lot of dead tuples because of the way that materialized view works. VACUUM may remove the dead rows, but it cannot reduce the bloat. The other way is to use Vacuum (FULL) which would lock the view itself and block all read and write operations.
This can be measured using pgstattuple which is offered by postgres.
Let’s delete few rows, followed by a concurrent refresh and check the status of pgstattuple.
As we could see, a lot of dead tuples began to pop in. Even if we try to clean up via VACUUM it would remove the dead tuples but then it creates a lot of free space which is not returned to the OS.
Cool, but what if I need to change that view?
It is always a tricky part when you may require changes to view. The straightforward way would be to drop the existing materialized view and create a new one. Now the obvious question is, will it cause db down time? Yes, for sure it will.
One of the ways to solve this problem is using a temporary materialized view. The plan is simple.
This might also have a chance of bringing the table unavailable for sometime, but it is fairly less compared to the straightforward way. Since, all the query operations including joins, applying necessary indices and data retrieval is done before dropping of old view, it may tend to give less impact while switching up to a newer version.
Scenic - The Savior
When it comes to Rails, we have this wonderful gem called Scenic which comes in with a lot of support functionalities for developers. One can use this gem to generate materialized views like creating a migration.
Another awesome feature is, it works like a version control system where it maintains each version of the view. Let’s say you may need to change the existing materialized view query and all you have to do is to run the above generator again. Scenic senses that it already has a view named the same and it creates the second version now.
So, it is a wrong myth that a denormalized table will always give us the worst performance. In fact, materialized views is a type of denormalized table and yet, it serves the best performance than a traditional table or views. But it is prone to give outdated data if it is not refreshed from time to time and that is the price you pay for boosting up the performance.
Scenarios where you don’t always require the latest data or if the underlying data doesn’t get updated often, then materialized view is a knight to boost your performance. And do remember that, you may have to monitor VACUUM status of materialized view.
Happy coding!!! Cheers
Technical Lead at Rently