Tuesday 26 June 2012

Column Store Index in SQL Server 2012

A Column Store Index stores a single column in a separate SQL Server Page. As opposed to Row Store Index mentioned above where collection of rows reside in a Page, in a Column Store Index, a column is allocated a separate Page.

Column Store Indexes are designed for Business-Intelligence (BI) queries in Online Analytical Processing (OLAP) environment where you have to play with a huge volume of data. In such BI requests, data often needs to be queried as filtered columns from a data warehouse.

The question now arises what are the benefits of allocating a separate Page to each column? Here are the benefits:
(1) Only required pages can be fetched from the disk resulting in improved performance of query processing. As each column can be accessed independently, IO is reduced.
(2) Column Store Index technology makes use of compression algorithm to eliminate redundancy in column values.
(3) Frequently accessed columns remain in memory.
The Column Store Index in SQL Server employs Microsoft’s patented Vertipaq™ technology, which it shares with SQL Server Analysis Services and PowerPivot. SQL Server columnstore indexes don’t have to fit in main memory, but they can effectively use as much memory as is available on the server. Portions of columns are moved in and out of memory on demand.

Limitations:


  • At this point of time, Column Store Indexes are read-only in nature and therefore DML statements are not possible in regular OLTP environment.
  • Computed columns cannot be part of a Column Store Index.
  • Column Store Index cannot be created on indexed views.

Creating a Column Store Index

CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_CustomerOrders_ColumnStore] ON [CustomerOrders] (TransactionDate, StockQty, ItemID)

Performance Test

AdventureWorks sample database is used for performing tests.

--Create the Test Table
USE [AdventureWorks2008R2]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Test_Person](
 [BusinessEntityID] [int] NOT NULL,
 [PersonType] [nchar](2) NOT NULL,
 [NameStyle] [dbo].[NameStyle] NOT NULL,
 [Title] [nvarchar](8) NULL,
 [FirstName] [dbo].[Name] NOT NULL,
 [MiddleName] [dbo].[Name] NULL,
 [LastName] [dbo].[Name] NOT NULL,
 [Suffix] [nvarchar](10) NULL,
 [EmailPromotion] [int] NOT NULL,
 [AdditionalContactInfo] [xml](CONTENT [Person].[AdditionalContactInfoSchemaCollection]) NULL,
 [Demographics] [xml](CONTENT [Person].[IndividualSurveySchemaCollection]) NULL,
 [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
 [ModifiedDate] [datetime] NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
-- We Populated this table with the Data Stored in Table Person.Person.
-- As we need Plenty of data so we ran the loop 100 times.
INSERT INTO [dbo].[Test_Person] 
SELECT P1.*
FROM Person.Person P1
GO 100
-- At this point we have 1,997,200 rows in the table.
-- Create Clustered Index  on Coloun [BusinessEntityID] 
CREATE CLUSTERED INDEX [CL_Test_Person] ON [dbo].[Test_Person]
( [BusinessEntityID])
GO
-- Creating Non - CLustered Index on 3 Columns
CREATE NONCLUSTERED INDEX [ColumnStore__Test_Person]
ON [dbo].[Test_Person]
([FirstName] , [MiddleName],[LastName])

-- Creating Non - CLustered  ColumnStore Index on 3 Columns
CREATE NONCLUSTERED COLUMNSTORE INDEX [ColumnStore__Test_Person]
ON [dbo].[Test_Person]
([FirstName] , [MiddleName],[LastName])

At this point we have created the ColumnStore Index on our test table. Now we will run the SELECT query with and without the ColumnStore Index and analyze performance.  

Query Without ColumnStore Index

select [LastName],Count([FirstName]),Count([MiddleName])
from dbo.Test_Person 
group by [LastName]
Order by [LastName]
OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)

We have used the OPTION(IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX) query hint to not to use the ColumnStore Index this time.  

Query With ColumnStore Index

select [LastName],Count([FirstName]),Count([MiddleName])
from dbo.Test_Person 
group by [LastName]
Order by [LastName]


Here are the Actual Execution Plans for both queries:



We can see the cost when using the NonClustered Index is 59 % while using the ColumnStore index is 13%.
Now if we hover the mouse over the Index Scans we can see details for these operations. The below is a comparison:


It is clear from the results that the query performs extremely fast after creating the ColumnStore Index as the column needed for the query is stored in the same page and the query does not have to go through every single page to read these columns.

Performing INSERT, DELETE or UPDATE Operations

We cannot perform DML ( Insert\ Update \ Delete ) operations on a table having a ColumnStore Index, because this puts the data in a Read Only mode. So one big advantage of using this feature is a Data Warehouse where most operations are read only.
For example, if you perform a DELETE operation on a table with a ColumnStore Index you will get this error:

Msg 35330, Level 15, State 1, Line 1
DELETE statement failed because data cannot be updated
in a table with a columnstore index. Consider disabling the
columnstore index before issuing the DELETE statement,
then rebuilding the columnstore index after DELETE is complete.

However, to perform the operation we would need to disable the ColumnStore Index before issuing the command as shown below:
 
ALTER INDEX 'Index Name' on 'table name' DISABLE


Creating a ColumnStore Index using Management Studio

Right click and select New Index and select Non-Clustered Columnstore Index...


Click add to add the columns for the index.



After selecting the columns click OK to create the index.



No comments:

Post a Comment