Thursday, 26 July 2012

SQL Server Heap Index

A heap is a table without a clustered index. Heaps have one row in sys.partitions, with index_id = 0 for each partition used by the heap. By default, a heap has a single partition. When a heap has multiple partitions, each partition has a heap structure that contains the data for that specific partition. For example, if a heap has four partitions, there are four heap structures; one in each partition.
Depending on the data types in the heap, each heap structure will have one or more allocation units to store and manage the data for a specific partition. At a minimum, each heap will have one IN_ROW_DATA allocation unit per partition. The heap will also have one LOB_DATA allocation unit per partition, if it contains large object (LOB) columns. It will also have one ROW_OVERFLOW_DATA allocation unit per partition, if it contains variable length columns that exceed the 8,060 byte row size limit. For more information about allocation units, see Table and Index Organization.
The column first_iam_page in the sys.system_internals_allocation_units system view points to the first IAM page in the chain of IAM pages that manage the space allocated to the heap in a specific partition. SQL Server uses the IAM pages to move through the heap. The data pages and the rows within them are not in any specific order and are not linked. The only logical connection between data pages is the information recorded in the IAM pages.

The sys.system_internals_allocation_units system view is reserved for Microsoft SQL Server internal use only. Future compatibility is not guaranteed.


Table scans or serial reads of a heap can be performed by scanning the IAM pages to find the extents that are holding pages for the heap. Because the IAM represents extents in the same order that they exist in the data files, this means that serial heap scans progress sequentially through each file. Using the IAM pages to set the scan sequence also means that rows from the heap are not typically returned in the order in which they were inserted.
The following illustration shows how the SQL Server Database Engine uses IAM pages to retrieve data rows in a single partition heap.


Unique and non-unique SQL Server indexes on a heap table

In the upcoming weblog postings I want to work out the differences between unique and non-unique indexes in SQL Server. I assume that you already know the concepts about clustered- and non clustered indexes and how they are used in SQL Server.
In the past I've done a lot of trainings and consulting regarding SQL Server performance tuning and it seems that some people doesn't know the differences and implications between unique and non-unique indexes. And as you will see in the upcoming postings there are really big differences how SQL Server stores those two variants that impact the size and the efficiency of your indexes.
Let's start today with unique and non unique non clustered indexes on a table without a clustered index, a so-called heap table in SQL Server. The following listing shows how to create our test table and populate it with 80.000 records. Each record needs 400 bytes, therefore SQL Server can put 20 records on each data page. This means that our heap table contains 4.000 data pages and 1 IAM page.

-- Create a table with 393 length + 7 bytes overhead = 400 bytes -- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24 CREATE TABLE CustomersHeap ( CustomerID INT NOT NULL, CustomerName CHAR(100) NOT NULL, CustomerAddress CHAR(100) NOT NULL, Comments CHAR(189) NOT NULL ) GO
-- Insert 80.000 records DECLARE @i INT = 1 WHILE (@i <= 80000) BEGIN INSERT INTO CustomersHeap VALUES ( @i, 'CustomerName' + CAST(@i AS CHAR), 'CustomerAddress' + CAST(@i AS CHAR), 'Comments' + CAST(@i AS CHAR) )
SET @i += 1 END GO
-- Retrieve physical information about the heap table SELECT * FROM sys.dm_db_index_physical_stats ( DB_ID('NonClusteredIndexStructureHeap'), OBJECT_ID('CustomersHeap'), NULL, NULL, 'DETAILED' ) GO

After the creation of the heap table and the data loading, you can now define a unique and non-unique non-clustered index on the column CustomerID of our heap table. We will define both indexes on the same column so that we can analyze the differences between unique- and non-unique non-clustered indexes.

-- Create a unique non clustered index CREATE UNIQUE NONCLUSTERED INDEX IDX_UniqueNCI_CustomerID ON CustomersHeap(CustomerID) GO
-- Create a non-unique non clustered index CREATE NONCLUSTERED INDEX IDX_NonUniqueNCI_CustomerID ON CustomersHeap(CustomerID) GO 

If you want to define a unique non-clustered index on a column that doesn't contain unique data, you will get back an error message from SQL Server. Important to know is that SQL Server creates a non-unique non-clustered index if you don't specify the UNIQUE property when creating a non-clustered index. So by default you will always get a non-unique non-clustered index!
After the creation of both indexes you can analyze their size, their index depth, their size etc. with the DMV sys.dm_db_index_physical_stats. You can to pass in as the 3rd parameter the index-id. The IDs of all non-clustered indexes starts at 2, therefore the first non-clustered index gets the ID 2 and the second one the ID 3

-- Retrieve physical information about the unique non-clustered index SELECT * FROM sys.dm_db_index_physical_stats ( DB_ID('NonClusteredIndexStructureHeap'), OBJECT_ID('CustomersHeap'), 2, NULL, 'DETAILED' ) GO
-- Retrieve physical information about the non-unique non-clustered index SELECT * FROM sys.dm_db_index_physical_stats ( DB_ID('NonClusteredIndexStructureHeap'), OBJECT_ID('CustomersHeap'), 3, NULL, 'DETAILED' ) GO

As you can see from both outputs, the index root page of the unique non-clustered index is occupied of around 24%, where the index root page of the non-unique non-clustered index is occupied of around 39%, so there must be a difference in the storage format of unique/non-unique non-clustered indexes on a heap table! In the next step we create a simple helper table that stores the output of the DBCC IND command. The structure of this helper table is directly taken from the excellent book SQL Server 2008 Internals. 

-- Create a helper table CREATE TABLE sp_table_pages ( PageFID TINYINT,PagePID INT, IAMFID TINYINT, IAMPID INT, ObjectID INT, IndexID TINYINT, PartitionNumber TINYINT, PartitionID BIGINT, iam_chain_type VARCHAR(30), PageType TINYINT, IndexLevel TINYINT, NextPageFID TINYINT, NextPagePID INT, PrevPageFID TINYINT, PrevPagePID INT, PRIMARY KEY (PageFID, PagePID) ) GO

After the creation of this helper table we can dump out all pages that are belonging to our non-clustered indexes to this helper table with the following two calls to DBCC INC in combination with the INSERT INTO statement

-- Write everything in a table for further analysis INSERT INTO sp_table_pages EXEC('DBCC IND(NonClusteredIndexStructureHeap, CustomersHeap, 2)') GO
-- Write everything in a table for further analysis INSERT INTO sp_table_pages EXEC('DBCC IND(NonClusteredIndexStructureHeap, CustomersHeap, 3)') GO


Now we can start analyzing our non-clustered indexes by using the undocumented DBCC PAGE command. To get some information back from DBCC PAGE you have to enable the flag 3604 of DBCC:

DBCC TRACEON(3604) GO

Let's dump out the index root page of our unique non-clustered index by the following command:

DBCC PAGE(NonClusteredIndexStructureHeap, 1, 4192, 3) GO

This will result in the following result in SQL Server Management Studio:


As you can see from this screenshot SQL Server stores the child page of the B-tree where the minimum key of the non-clustered index is located. The child page 4161 contains for example the record with the minimum key of 540 up to the maximum key of 1078. When you dump out the index root page with the dump option 1 you get the byte by byte representation of all index records on the index root page:

DBCC PAGE(NonClusteredIndexStructureHeap, 1, 4192, 1) GO

SQL Server needs here 11 bytes for storing an index row. These 11 bytes are storing the following information:
  • 1 byte: Status Bits
  • 4 bytes: Customer ID, like 540
  • 4 bytes: child PageID, like 4161
  • 2 bytes: FileID, like 1
As you can see it's up to the length of the non-clustered key how long an index row is. This also means that SQL Server is able to store more index rows on an index page if you choose a smaller non-clustered key. If you choose for example a CHAR(100) as a non-clustered index key, then SQL Server needs more index pages for your non-clustered index, which is not so efficient as using a smaller index key. The T-SQL script enclosed to this posting shows you how you can decode those bytes from the hexadecimal representation.
Finally you can dump out the child page 4161, which is located on the leaf-level of the non-clustered index.

DBCC PAGE(NonClusteredIndexStructureHeap, 1, 4161, 3) GO

As you can see from the figure, SQL Server stores for each index key on which data page and on which slot the corresponding record is located. Because we have not defined a clustered index on our table, SQL Server uses here the RID (Row Identifier) to point to the correct record on the data page. Index pages on the leaf-level on a heap table are different from leaf-level index pages defined on a clustered table (a table that contains a clustered index).When you dump out the leaf-level index page of the non-clustered index you can see that SQL Server needs 13 bytes per index row:
  • 1 byte: Status Bits
  • 4 bytes: CustomerID, like 540
  • 4 bytes: PageID, like 178,
  • 2 bytes: FileID, like 1
  • 2 bytes: Slot number, like 19
Finally with this information in your hand, it is very easy to locate the correct record on the data page, because you know the PageID, FileID, and also the slot number where the record on the data page is located. Easy, isn't it?
Let's move on now to non-unique non-clustered indexes. Earlier we have already created such an index, which gets the index-id of 3 from SQL Server, because it's the second non-clustered index we have defined. In my case the index root page of the non-unique non-clustered index is located on page 4264, therefore I dump it out with the following command:

DBCC PAGE(NonClusteredIndexStructureHeap, 1, 4264, 3) GO


But wait! Now the result from DBCC PAGE on the root index page on a non-unique non-clustered index is different! As you can see SQL Server returns here an additional column named "HEAP RID (key)". The value in this column is used to make your non-unique non-clustered index unique. The HEAP RID column uses 8 additional bytes in your index row, which encodes the following information that are granted to be unique on a heap table:
  • 4 bytes: PageID, like 178
  • 2 bytes: FileID, like 1
  • 2 bytes: Slot number, like 19
The overead of a non-unique non-clustered index on a heap table costs you 8 additional bytes per index row - on all index levels, expect the leaf-level, because SQL Server stores here always the HEAP RID as you have seen previously! So please keep this 8 bytes of additional index record overhead in mind, when you create non-clustered indexed that are NOT unique! And as I have said earlier, they are NOT unique by default!!!
In this example your non-unique non-clustered index is about 2 times bigger than the unique non-clustered index, because the unique index needs 11 bytes and the non-unique index needs 19 bytes (overhead of 8 bytes). When you look back to the output of the DMV sys.dm_db_index_physical_stats you can see that the index root page of the unique non-clustered index has a page space usage of around 24% where the index root page of the non-unique non-clustered index has a page space usage of around 39%. This will make a big difference on large non-clustered indexes!




1 comment: