Sunday, 25 September 2011

Querying Data Streams in Real-Time

Processing and acting on high volume real-time data is a requirement for many organisations today including:
·         Financial: Processing real time trading information
·         Utilities: Monitoring power consumption, tracking supply vs. demand
·         Communications: Monitoring network usage, tracking failure rates, Processing Banner Ad clicks on web sites
However the traditional approach to data analysis is; to store the data in a database, run queries to extract and calculate the required data and then to produce output in the form of a report. But with 1000 new data records arriving per second this was clearly not practical, until StreamInsight came along that is.

StreamInsight is part of SQL Server 2008 R2 and is designed to accept and query real-time data streams with volumes of thousands of records per second and it is all based on familiar .NET and LINQ technology. There are two editions:
·         Standard edition for data volumes less that 5000 events per second
·         Premium edition for data volumes up to 100000 events per second
You need SQL Server 2008 R2 Data Centre Edition to get the Premium edition, while Standard edition comes with SQL Server 2008 R2 Standard and Enterprise Editions. So how does it work?
·         Firstly the data stream is read from the sources by an adaptor. StreamInsight comes with a set of standard Adaptors for typical data sources, but new custom ones are typical created from these
·         The data is then passed to the queries, which are written using the LINQ query language. These queries operate on the data as it arrives
·         The results of the queries are sent as a data stream to output adaptors which in turn route the results to the system that needs them. This may be a reporting solution or a trading application and so on
Architecturally it looks like this:
 StreamInsight typically runs as a Windows Service on a server, but can be embedded directly into a program if required.

All in all this is a pretty cool extension to SQL Server and real-time decision making on high volume data can now become a part of any organisations analysis and reporting capabilities.

No comments:

Post a Comment