Tuesday, 13 December 2011

Sequential Unique Identifiers

Using Globally Unique Identifiers as the primary key for SQL Server database tables has become common place in physical database design. It replaces the use of int identifiers and the IDENTITY clause which date back to the first versions of SQL Server. So what are the pros and cons:

Using int and IDENTITY is efficient because an int is only 4 bytes long and so does not take up much space in the index associated with the primary key. IDENTITY also generates sequential numbers and so new records are appended to the end of the index and so the table grows in a controlled manner. This is particularly important when the primary key is based on a clustered index, which they usually are, because the clustered index requires that the keys are sorted in order. An example table is:
CREATE TABLE table1 (
      ID int IDENTITY(1,1) NOT NULL
PRIMARY KEY CLUSTERED,
      OtherData varchar(1000)
)            
Using a uniqueidentifier ID and the NEWID() function requires  16 bytes and the values are not sequential and so when used as a clustered primary key the  associated  index will quickly become fragmented as new records are inserted between existing records. This just sounds like disadvantages so far, but the big advantage is that the keys are unique and the remain unique even if the record is copied to another table in the database, or even to a table in another database or even to a table on another server altogether. This means that  uniqueidentifiers are very good for any data that needs to be replicated in any way and this cannot be said for int IDENTITY columns. The extra 12 bytes is also not a big deal either because storage is relatively cheap.  An example table is:
CREATE TABLE table2 (
      ID uniqueidentifier DEFAULT NEWID() NOT NULL
PRIMARY KEY CLUSTERED,
      OtherData varchar(1000)
)
The non-sequential nature of the uniqueidentifier and the associated index / table fragmentation remained an issue until it was resolved in SQL Server 2005 with the introduction of the NEWSEQUENTIALID() function. NEWSEQUENTIALID()generates sequential identifiers  thereby solving the fragmentation problems associated with NEWID(). It can only be used in the DEFAULT clause of a column in a table but given its purpose this is hardly a big limitation.
An example of sequential identifiers in action is shown below:
CREATE TABLE table3 (
      ID uniqueidentifier DEFAULT NEWSEQUENTIALID() NOT NULL
            PRIMARY KEY CLUSTERED,
      OtherData varchar(1000)
)
GO

INSERT INTO table3 (OtherData)
VALUES ('Test1'), ('Test2'), ('Test3')
GO

SELECT ID, OtherData
FROM table3
ORDER BY ID