Sunday, 18 November 2012

Columnstore Indexes

SQL Server 2012 introduces a new type of index specifically designed to improve the speed of queries that are typically used in data warehouse reporting applications, such as:
SELECT [Size], [Weight], SUM(ListPrice)
FROM [Production].[Product]
GROUP BY [Size], [Weight]

Aggregates, groups and related joins, in particular, are catered for because unlike normal indexes, which index rows in a table, columnstore indexes index each column and then join the columns together to produce a single index.

Creating a column store index is easy. Just decide which columns you need in the queries for your reporting application and add them all to the index.

You only need one columnstore index on each table. In fact you can only create one columnstore index on a table and columnstore indexes are always non-clustered. Also worth noting is that as soon as you add a columnstore index to a table; the table becomes read only, yet further proof that this type of index is for reporting applications. There restrictions will no doubt be relaxed in future versions.

Here is an example columnstore index for the above query (all based on the AdventureWorks sample database):
 
CREATE COLUMNSTORE INDEX [Product_csindex]
ON [Production].[Product] ([ListPrice], [Size], [Weight])

This index is also used for the query below:
 
SELECT [Weight], SUM(ListPrice)
FROM [Production].[Product]
GROUP BY [Weight]

Columns that have a considerable number of duplicate values are the best type of column to add to a columnstore index; because column values are grouped and compressed in the index thereby saving a considerable amount of space when compared to a normal index.  The [size] column in the above table has only 19 unique values for the 504 rows and so is a good candidate for columnstore indexing.

In summary, columnstore indexes are designed to support large aggregation based queries and are a new tool in SQL Server 2012 for us to use when creating data warehousing and reporting applications.