Thursday, 27 December 2012

Master Data Services

Recently a client asked me the following question: What is Master Data Services, What is it for and what has it to do with SQL Server? I answered as succinctly as I could. It then occurred to me that the answer would make a good blog posting, as others may well be asking themselves the same question.
Master Data Services was first released with SQL Server 2008 R2. It is also part of SQL Server 2012, which adds some additional capabilities.

Master Data Services (MDS) is managed central data repository. It defines data schemas, applies business rules to validate the data and provides interfaces for external systems to use the data. Important to note is that it is not just a data dictionary it also stores the actual data.
Data can be versioned so that applications using the data can reference a particular version and are not forced to continually update to the latest version when the schema of the data changes.

Consider a simple example to put all this in to perspective: A centralised calendar to tracks the days when core business processes jobs run. MDS would contain:
·         Entity and attribute definitions to define the schema for the calendar.

·         Central business rules defining the validations to apply to the data. In this simple example there will be one validation rule that prevents calendar entries prior to 01/01/2012

·         The data for the calendar

·         The methods by which external applications can access the shared calendar data. In this case this will be a SQL Server database view, although other methods such as Web Services are allowed
To get started with MDS the first step is to create a database and the Master Data Manager web site for MDS. This is done using the Master Data Services Configuration Manager which comes as part of SQL Server. The resulting Master Data Manager web site is shown below:

The System Administration menu in the Master Data Manager web site is used to define:
·         A new Model called Calendar. This will hold the entity definitions and data

·         The JobType entity. This is a very simple entity that holds the Names of the types of jobs that are allowed: StockCheck, ProcessDirectDebits and IssueStatements

·         The Calendar Entity. Thus will consist primarily of a Name, Date and JobType

·         A business rule that prevents calendar entries before 01/01/2012
The various screens are easy to use. The one for defining the validation rule is:

The Explorer menu in the Master Data Manager web site is used to define the actual data. It is relatively straightforward to use, just pick an entity and add the data and apply the validation rules. The example below shows how one of the entries, the 30/04/1999 one, has failed the validation rule:

The last step is to use the Integration Management menu in the Master Data Manager to create the SQL Server views that will expose the Calendar for use by the external applications. In this case I created a view called CalendarView. The following SQL query uses the view to display the Calendar data in MDS:

SELECT *  FROM [MDS].[mdm].[CalendarView] ORDER BY Code

As well as the actual data in the entity, the view includes lots of other information, such as version number, entry date time, author,  and so on. Note that not all of the fields are shown in the above screen for space reasons.    
Admittedly lot of details have been left out of this simple example including: security, web service interfaces, APIs, interfaces to Excel, data imports, versioning and so on. However this information is readily available in the SQL Server Documentation and the MDS system as a whole is easy to use and manage.

So if MDS provides a service that you need then it should be relatively easy to get started.

Sunday, 18 November 2012

Columnstore Indexes

SQL Server 2012 introduces a new type of index specifically designed to improve the speed of queries that are typically used in data warehouse reporting applications, such as:
SELECT [Size], [Weight], SUM(ListPrice)
FROM [Production].[Product]
GROUP BY [Size], [Weight]

Aggregates, groups and related joins, in particular, are catered for because unlike normal indexes, which index rows in a table, columnstore indexes index each column and then join the columns together to produce a single index.

Creating a column store index is easy. Just decide which columns you need in the queries for your reporting application and add them all to the index.

You only need one columnstore index on each table. In fact you can only create one columnstore index on a table and columnstore indexes are always non-clustered. Also worth noting is that as soon as you add a columnstore index to a table; the table becomes read only, yet further proof that this type of index is for reporting applications. There restrictions will no doubt be relaxed in future versions.

Here is an example columnstore index for the above query (all based on the AdventureWorks sample database):
 
CREATE COLUMNSTORE INDEX [Product_csindex]
ON [Production].[Product] ([ListPrice], [Size], [Weight])

This index is also used for the query below:
 
SELECT [Weight], SUM(ListPrice)
FROM [Production].[Product]
GROUP BY [Weight]

Columns that have a considerable number of duplicate values are the best type of column to add to a columnstore index; because column values are grouped and compressed in the index thereby saving a considerable amount of space when compared to a normal index.  The [size] column in the above table has only 19 unique values for the 504 rows and so is a good candidate for columnstore indexing.

In summary, columnstore indexes are designed to support large aggregation based queries and are a new tool in SQL Server 2012 for us to use when creating data warehousing and reporting applications.

Friday, 26 October 2012

Power View: Business Intelligence for Power Users

Power View is new with SQL Server 2012 and is part of SQL Server 2012 Reporting Services (SSRS). However interestingly it only works when SSRS is in SharePoint Integrated mode and so you need to have SharePoint installed as well to use it.
Power View is designed to easy to use, allowing users to create their own reports and to slice and dice data without help from a professional developer. However, being easy for the user does not necessarily mean that it is equally straightforward to install. My first experience with Power View was to install SQL Server 2012 and SharePoint 2010 and then to look around looking for Power View, without finding it. So I took it upon myself to work out how to install and configure SQL Server and SharePoint for use with Power View.
While I am sure that there are a many of ways to get Power View ready to use, these are the steps that I followed:
1. Install SQL Server 2012, including all features (except for Analysis Services)
2. Install SharePoint 2010 and create a web application and site collection based on the Business Intelligence Site Template
3. Return to SQL Server Setup and add the features:
·         Reporting Services - SharePoint
·         Reporting Services Add-in for SharePoint Products


4. Return to SQL Server Setup again and install SQL Server PowerPivot. Power View uses either PowerPivot or Analysis Services in Tabular Mode as data sources:


















5. Run the PowerPivot Configuration Tool, and complete the prerequisites it requests:




















6. Download and install Silverlight. Power View is based on Silverlight
7. Start SharePoint 2010 Management Shell and run two commands to install the SharePoint Application Services for Reporting Services:
·         Install-SPRSService
·         Install-SPRSServiceProxy
8. Start SharePoint 2010 Central Administration:
·         Select the Services On Server menu  and verify that the Power Pivot and SQL Server Reporting Services Service are running



9. Select the Manage Service Applications menu
·         Select the New > SQL Server Reporting Services Service Application to create a new Reporting Services application
10. Navigate to the SharePoint Business Intelligence site created earlier and use the Site Settings > Site Collection Features menu to verify that the Power View and Reporting Server Integration features are installed:








11. Create a new site in the Business Intelligence site using menu Site Actions > More Options… and site template PowerPivot Site:




12. Download and install Microsoft® SQL Server® 2012 PowerPivot for Microsoft® Excel® 2010
13. Start Excel, select the PowerPivot menu and then click the PowerPivot Window button and create a data source based on the AdventureWorks.Production.Product table in SQL Server. Then save this as a file called PowerPivotExample.xlsx:











14. Navigate to the newly created PowerPivot site, select the PowerPivot Gallery library and upload file PowerPivotExample.xlsx into it. This is the source for the Power View report:

 15. Click the Power View icon in the top right of the new data source to invoke the Power View designer. Then carry out the following steps to  add a matrix of average product price by product colour and class:
·         Drag ListPrice onto the page, select it with the mouse and then select the Matrix option from the Design toolbar
·         Drag Class and Color onto the Row and Column Groups



















16. Then add two slicers to allow easy filtering:
·         Drag Class onto the report page, select it with the mouse and select the Slicer option in the Design toolbar
·         Repeat the above using Color

















17. Then click Full Screen in the Home toolbar to view the report. Selecting one or more colours or classes in the slicers redisplays the matrix:















18. Finally click File and Save to save the report back into the SharePoint Library for future use
In summary, there are a quite few steps for the administrator to get everything in place before it’s possible to start using Power View. But once it’s all there, it’s very easy for users to slice and dice their data and produce reports of their own.

Sunday, 16 September 2012

Source Control for Database Applications

It is second nature for a .NET developer to store source code in a source control system such as TFS or Source Safe. Source control systems are a corner stone of software development and are essential for tracking what changed, who changed it why it was changed, managing versions and allowing developers to work on different branches of the same system.

However when it comes to database development, developers often neglect proper source control management. It is just easier to start SQL Server Management Studio and right click and add a new table, column or stored procedure. They should of course generate SQL scripts for everything they do and add these to the source control system, along with all the related .NET code, but this is often done too late, or incorrectly, or not at all. Things get even worse when deploying a new version of an existing database system because a SQL scripts is required to update the schema (using ALTER TABLE etc.) and this script depends on which version of the system is being updated.

One of the main reasons that .NET developers find it easy to use source control is that .NET source code is just text and text can be managed easily in a source control system. In contrast database schemas tend to be created using graphical tools and not as SQL scripts.

There are third party version control offerings for SQL developers, such as SQL Source Control from Red Gate Software and Liquibase from Apache. However there is something closer to home, in form of Data-Tier Applications and Visual Studio Database Projects that makes database source control management easier.

A Data-Tier Application (or DAC) is single entity that contains the definition of a database and all the database objects in that database that a particular application uses. Fundamentally it is a single file with the extension .dacpac that contains all the database definitions. No need for multiple script files to create databases, tables, views and stored procedures.

DAC files can be created from an existing database using SQL Server Management Studio by right clicking on a database and selecting the All Tasks menu. Note that the examples in this article relate to SQL Server 2012, but the DAC concept was introduced in SQL Server 2008 R2.

DAC files can be used by DBAs for deploying databases to servers. However it is the creation of DAC files by developers using Visual Studio that is of particular interest here because of its implications for source control.

Take the following simple example of creating a Database Project and a DAC using SQL Server Data Tools (aka Visual Studio 2010):
·         Start SQL Server Data Tools
·         Create a new project of type SQL Server Database Project
·         Use menu Project > Add New Item to add a new Table called Table1
·         Add a new column called data1 using the table designer  (see below)

Note that the definition of the new table is stored in T-SQL in file Table1.sql. Also note that the project and the files in it can be added to the source control system and checked in and out like any other Visual Studio project. In this example SQL Server Data Tools is using the source control in Team Foundation Server 2010 and the red ticks next to the files indicate they are checked out.

To create and deploy the .dacpac file for this Data-Tier application, right click on the project and select Publish. The .dacpac file is created in the bin > Debug folder and the database and the table are deployed to a SQL Server of your choice as shown below (in SQL Server Management Studio).

This is not just for one off deployments however and it is able to manage the transition between versions. When publishing a DAC a database schema compare is automatically carried out and ALTER TABLE statements generated and executed to manage the change on the database server. Let’s take a very simple example of this in practice:
·         Database project and Table1.sql have been checked into TFS
·         Database1 and Table1 have already been published to SQL Server
·         Table1 is edited in Visual Studio  and a new column data2 is added

·         When development is complete the files are checked back into TFS so that the source control system has a full record of all schema changes
·         The database project is then published to SQL Server to add the new column onto Table1 as shown below in SQL Server Management Studio

·         The actual SQL script used to carry out the deployment is also stored in the bin > debug folder and on inspection it contains the required statement:
ALTER TABLE [dbo].[Table1]
    ADD [data2] NCHAR (10) NULL;

The above shows how to manage a database development using Visual Studio while:
·         Still using graphical tools to design tables etc
·         Keeping track of all source changes in a source control system
·         Deploying new databases and database changes to SQL Servers in a controlled manner using a single file

Obviously the above is a very simple example just using a single table. The same process however can be applied to however many tables you need, along with views, stored procedures, functions and everything else you need.

Wednesday, 8 August 2012

Contained Databases

Contained Databases
Ever since the first version of SQL Server was released there has been a distinction between a server login and a database user.

Server logins are recorded in the master database and control connection to the server. They consist of:
·         Windows Logins
·         Windows Groups
·         SQL Server Logins plus password

Database users are recorded in each database and manage access to the database and permissions. Each database user is associated with a server login.

The screen shot below shows a login called INTERNAL\Administrator mapped to a database user called ReportUser in database ReportServer  and to database user dbo in a number of other databases:


The concept of having server logins and database users has served SQL Server well over the years, but it does have a problem because it ties the database to the server.

Moving a database to another server causes an issue because the connection between the database user and the server login is broken and the user becomes an orphan. The result is that it is no longer possible to connect to the database as that user. Permissions are attached to the database user and are stored in the database and so moving the database to another server does transfer all the permissions, it’s just the login that causes the issue.

Microsoft has a solution for this issue in the form of a stored procedure that can be used to fix the orphaned users:
·         sp_change_users_login 'report'
o   Reports on orphaned users
·         sp_change_users 'auto_fix', 'user', ‘login’, ‘password’
o   Fixes a single user by associating it with a login or creating a login if it does not exist
o   The default login name is the same as the user
o   Wildcards are supported in the user name so multiple users can be fixed in one go
·         sp_change_users 'update_one', 'user', 'login'
o   Fixes a user by connecting it to a login

Although sp_change_users_login is one solution to the login issue, SQL Server 2012 has a better solution in the form of Contained Databases. These allow authentication to a database without a login and as a result contained databases can be transferred between servers without any issues.

To create a contained database, or restore or attach one, it is first necessary to configure the server to allow contained databases on the server. This is disabled by default:
sp_configure ‘contained database authentication’, 1
RECONFIGURE

Then to create a contained database use:
                CREATE DATABASE ContainedDB
                CONTAINMENT = PARTIAL
The only other option for the CONTAINMENT clause is NONE.

Users can be created in the new database without an associated login. For example, the following commands create one SQL user and one Windows user:
                USE ContainmentDB
GO
CREATE USER [SQLUser1] WITH PASSWORD N'######'
CREATE USER [INTERNAL\WindowsUser1] FROM WINDOWS

Then when connecting to a contained database just specify the database as part of the connection and no server login is required.

In a .NET application the database is included as part of the connection string:
Server=.; Database= ContainedDB; User ID=SQLUser1; Password=######;
Trusted_Connection=False;

In SQL Server Management Studio the database is selected on the Connection Properties tab:



Contained Database is a very simple but effective concept and makes databases independent of the SQL Server they are hosted on.