SQL Server @ BallardChalmers: FILESTREAM and Partitioning

There is a requirement for a large number of large PDF documents to be managed by a SQL Server database without the contents of the documents actually being stored in the database, for size reasons. FILESTREAM meets these requirements.

There is an additional requirement to store documents of different types in different folders so as to be able to take advantage of multiple disk drives. Partitioning allows a single table to be allocated to multiple file groups according to values in certain columns and so again this meets the requirements.

Combining FILESTREAM and partitioning, however, turned out not to be as simple as expected.

I started by enabling FILESTREAM on SQL Server using:

sp_configure 'filestream access level', 2

RECONFIGURE

I then created a database with two FILESTREAM file groups using:

CREATE DATABASE [FileStreamExample] ON PRIMARY

( NAME = N'FileStreamTest', FILENAME = N'c:\FileStreamExample\FileStreamExample.mdf'),

FILEGROUP [FStream1] CONTAINS FILESTREAM

( NAME = N'FStream1', FILENAME = N'c:\FileStreamExample\FStream1' ),

FILEGROUP [FStream2] CONTAINS FILESTREAM

( NAME = N'FStream2', FILENAME = N'c:\FileStreamExample\FStream2' )

LOG ON

( NAME = N'FileStreamTest_log', FILENAME = N'c:\FileStreamExample\FileStreamExample_log.ldf')

Note that each FILESTREAM file group can only one have one folder associated with it, hence the need for multiple file groups.

I then proceeded to setup partitioning by creating a partitioning function. The document type will be either 1 or 2 and each one of these should be assigned to a different file group:

CREATE PARTITION FUNCTION FSPartFunc ( int )

AS RANGE LEFT

FOR VALUES (1)

Next I assigned created two partitioning schemes, one for the FILESTREAM content and one for the other columns in the table. I need both because the table and the FILESTREAM need to follow the same partitioning scheme:

CREATE PARTITION SCHEME FSPartScheme

AS PARTITION FSPartFunc

TO (FStream1, FStream2)

CREATE PARTITION SCHEME PartScheme

AS PARTITION FSPartFunc

TO ([PRIMARY], [PRIMARY])

Then I created the table:

CREATE TABLE FSPartExampleTable (

ID uniqueidentifier not null ROWGUIDCOL,

DataType int NOT NULL,

TheData varbinary(max) filestream NOT NULL)

ON PartScheme(DataType)

FILESTREAM_ON FSPartScheme

But I got error: A table with FILESTREAM column(s) must have a non-NULL unique ROWGUID column.

So I added a PRIMARY KEY on the ID column as below:

CREATE TABLE FSPartExampleTable (

ID uniqueidentifier not null ROWGUIDCOL PRIMARY KEY,

DataType int NOT NULL,

TheData varbinary(max) filestream NOT NULL)

ON PartScheme(DataType)

FILESTREAM_ON FSPartScheme

But then I got the error: Partition columns, DataType in this case, for a unique index must be a subset of the index key. I tried again with a UNIQUE constraint and got the same error.

It seemed like there were two conflicting requirements here: Firstly the ID column must be unique and secondly the unique column must contain the partitioning column. So I tried to create a primary key based on both ID and DataType:

CREATE TABLE FSPartExampleTable (

ID uniqueidentifier not null ROWGUIDCOL,

DataType int NOT NULL,

TheData varbinary(max) filestream NOT NULL,

PRIMARY KEY (ID, DataType ))

ON PartScheme(DataType)

FILESTREAM_ON FSPartScheme

But then I was back to the first error: But I got error: A table with FILESTREAM column(s) must have a non-NULL unique ROWGUID column.

So FILESTREAM is pretty insistent on having a unique index on the ID column, whereas partitioning wants to have a composite index if the index is to be partitioned. That’s when I realised what the issue was, the unique indexes are being associated with the partitioning scheme of the table and actually I do not need this to be the case. So I changed my primary key to be non-clustered, so it is separate from the table, and associated it with a different file group, [PRIMARY] in this example, and the issue was solved:
CREATE TABLE FSPartExampleTable (

ID uniqueidentifier not null ROWGUIDCOL

PRIMARY KEY NONCLUSTERED ON [PRIMARY],

DataType int NOT NULL,

TheData varbinary(max) filestream NOT NULL)

ON PartScheme(DataType)

FILESTREAM_ON FSPartScheme

Now the FILESTREAM data for DataType 1 and 2 are stored in different folders as required:

INSERT INTO FSPartExampleTable

VALUES (NEWID(), 1, CAST ('Part 1 ...data' AS varbinary(max)))

INSERT INTO FSPartExampleTable

VALUES (NEWID(), 2, CAST ('Part 2 ...data' AS varbinary(max)))

SQL Server @ BallardChalmers

Monday, 7 May 2012

FILESTREAM and Partitioning

No comments:

Post a Comment