Why you shouldn’t index your temporary tables.

Good morning friends !! In today’s post we will see why we should not index our temporary tables. When you have a process that uses temp tables and you want to speed it up, it can be tempting to index the temp table to help get the job done more quickly. However, in most cases, not all, but in most, it’s a bad idea …

CREATE OR ALTER PROC dbo.usp_TempTable_Heap 
 @DisplayName NVARCHAR (40) AS
 BEGIN
 CREATE TABLE #MyUsers (Id INT, DisplayName NVARCHAR (40));
 INSERT INTO #MyUsers (Id, DisplayName)
 SELECT Id, DisplayName
 FROM dbo.Users;
 SELECT Id, DisplayName FROM #MyUsers WHERE DisplayName = @DisplayName; END
 GO

The first statement loads about 10 million rows into a temporary table, and the second statement only extracts the rows that match what we are looking for. We will activate the time statistics to be able to have a rough idea of where SQL Server spends time: the first statement or the second:

SET STATISTICS TIME ON;
EXEC dbo.usp_TempTable_Heap N'Gpsos';

I don’t usually use time statistics here on the blog, so a quick explanation: on the Messages tab in SSMS, you get one line for each statement in the batch, plus a total:

SQL Server Execution Times:
 CPU time = 4453 ms, elapsed time = 4458 ms.
 SQL Server Execution Times:
 CPU time = 1031 ms, elapsed time = 1017 ms.
 SQL Server Execution Times:
 CPU time = 5484 ms, elapsed time = 5477 ms.

The first statement, loading the temporary table, took approximately 4.5 seconds.

The second statement, finding Gpsos, took about a second.

Could an index on DisplayName speed up the second query?

To find out, let’s add a new version of our stored procedure, this time one that creates an index on DisplayName before our data is loaded:

CREATE OR ALTER PROC dbo.usp_TempTable_IndexAddedBeforeLoad
 @DisplayName NVARCHAR (40) AS
 BEGIN
 CREATE TABLE #MyUsers (Id INT, DisplayName NVARCHAR (40));
 CREATE INDEX DisplayName ON #MyUsers (DisplayName); / * THIS IS NEW * /
 INSERT INTO #MyUsers (Id, DisplayName)
 SELECT Id, DisplayName
 FROM dbo.Users;
 SELECT Id, DisplayName FROM #MyUsers WHERE DisplayName = @DisplayName; END
 GO

Now when we run the new query:

EXEC dbo.usp_TempTable_IndexAddedBeforeLoad N'Gpsos';

The weather statistics paint a gruesome picture:

SQL Server Execution Times:
 CPU time = 56594 ms, elapsed time = 56772 ms.
 SQL Server Execution Times:
 CPU time = 0 ms, elapsed time = 75 ms.
 SQL Server Execution Times:
 CPU time = 56594 ms, elapsed time = 56850 ms.

Sure, the second statement drops from 1,017 milliseconds to 75, but… who cares?!?! The additional overhead of creating the index is much, much greater, making the query take ten times longer overall.

Part of the problem is that our heap (table) needs to be loaded first, and then the data needs to be sorted by DisplayName, and then an index needs to be created on DisplayName. We can’t do both in parallel at the same time because the non-clustered index must be able to point to a specific row in the heap, and to do that we need its physical location.

Penalty fee. What about a clustered index?

We can reduce the overhead of the process by having only one structure to store the data. Instead of a heap plus a non-clustered index on DisplayName, we can simply define a unique structure for the temporary table: a clustered index on DisplayName. It can’t be a unique index because multiple users share the same DisplayName, but that’s okay. Here we go:

Why you shouldn’t index your temporary tables.

CREATE OR ALTER PROC dbo.usp_TempTable_ClusteredIndex
 @DisplayName NVARCHAR (40) AS
 BEGIN
 CREATE TABLE #MyUsers (Id INT, DisplayName NVARCHAR (40));
 CREATE CLUSTERED INDEX DisplayName ON #MyUsers (DisplayName); / * THIS IS NEW * /
 INSERT INTO #MyUsers (Id, DisplayName)
 SELECT Id, DisplayName
 FROM dbo.Users;
 SELECT Id, DisplayName FROM #MyUsers WHERE DisplayName = @DisplayName; END
 GO

And when we run it:

Why you shouldn't index your temporary tables

At first it seems much faster than the last method:

SQL Server Execution Times:
 CPU time = 42107 ms, elapsed time = 18071 ms.
 SQL Server Execution Times:
 CPU time = 0 ms, elapsed time = 0 ms.
 SQL Server Execution Times:
 CPU time = 42107 ms, elapsed time = 18145 ms.

The execution time was reduced from 57 to 18 seconds, but there is a catch. A big part of the reason it’s faster is that the query now goes in parallel. Note that the CPU time is greater than the elapsed time; that’s your clue that the query was done in parallel across more CPU cores. Now we are going to have a CPU problem if multiple of these queries run simultaneously.

And it’s still not as fast as our original solution, the heap.

So how do we make temporary tables faster?

Just load them with the data you really need. When loading temporary tables, or any object, really, be ruthless about filtering ASAP.

If you’re only going to access the data once, leave it in a heap. Indexes make more sense when the temporary table is to be repeatedly reused in many statements that filter, join, or sort using the same keys.

If you want to know more about indexes, do not hesitate and visit our entry: https://www.gpsos.es/2021/01/indices-sql-server-2019/

More information in the official documentation: https://docs.microsoft.com/es-es/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-ver15

If you don’t want to miss our publications, you can subscribe to our newsletter. With an email a month you will be informed. You have more info in: GPS Open Source News. Subscribe to our newsletter

Still do not know Query Performance ? Find out how it can help you in your Oracle environment. More information on their page LinkedIn .

Follow GPS on LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *