Thursday, 2 August 2012

Microsoft BI Tools with Hadoop - Big Data Analytics

Microsoft’s goal for Big Data is to provide insights to all users from structured or unstructured data of any size. While very scalable, accommodating, and powerful, most Big Data solutions based on Hadoop require highly trained staff to deploy and manage. In addition, the benefits are limited to few highly technical users who are as comfortable programming their requirements as they are using advanced statistical techniques to extract value. For those of us who have been around the BI industry for a few years, this may sound similar to the early 90s where the benefits of our field were limited to a few within the corporation through the Executive Information Systems.

Analysis on Hadoop for Everyone

Microsoft entered the Business Intelligence industry to enable orders of magnitude more users to make better decisions from applications they use every day. This was the motivation behind being the first DBMS vendor to include an OLAP engine with the release of SQL Server 7.0 OLAP Services that enabled Excel users to ask business questions at the speed of thought. It remained the motivation behind PowerPivot in SQL Server 2008 R2, a self-service BI offering that allowed end users to build their own solutions without dependence on IT, as well as provided IT insights on how data was being consumed within the organization. And, with the release of Power View in SQL Server 2012, that goal will bring the power of rich interactive exploration directly in the hands of every user within an organization.
Enabling end users to merge data stored in a Hadoop deployment with data from other systems or with their own personal data is a natural next step. In fact, we also introduced Hive ODBC driver, currently in Community Technology Preview, at the PASS Summit in October. This driver allows connectivity to Apache Hive, which in turn facilitates querying and managing large datasets residing in distributed storage by exposing them as a data warehouse.

Microsoft BI connectivity with Hadoop

This connector brings the benefit of the entire Microsoft BI stack and ecosystem on Hive. A few examples include:
- Bring Hive data directly to Excel through the Microsoft Hive Add-in for Excel
- Build a PowerPivot workbook using data in Hive
- Build Power View reports on top of Hive
- Instead of manually refreshing a PowerPivot workbook based on Hive on their desktop, users can use PowerPivot for SharePoint to schedule a data refresh feature to refresh a central copy shared with others, without worrying about the time or resources it takes.
- BI Professionals can build BI Semantic Model or Reporting Services Reports on Hive in SQL Server Data tools
- Of course all of the 3rd party client applications built on the Microsoft BI stack can now access Hive data as well!
Klout is a great customer that’s leveraging the Microsoft BI stack on Big Data to provide mission critical analysis for both internal users as well as to its customers

Best of both worlds

As we mentioned in the beginning of this blog article, one size doesn’t fit all, and it’s important to recognize the inherent strengths of options available to choose when to use what. Hadoop broadly provides:
- an inexpensive and highly scalable store for data in any shape,
- a robust execution infrastructure for data cleansing, shaping and analytical operations typically in a batch mode, and
- a growing ecosystem that provides highly skilled users many options to process data.
The Microsoft BI stack is targeted at significantly larger user population and provides:
- functionality in tools such as Excel and SharePoint that users are already familiar with,
- interactive queries at the speed of thought,
- business layer that allows users to understand the data, combine it with other sources, and express business logic in more accessible ways, and
- mechanisms to publish results for others to consume and build on themselves.
Successful projects may use both of these technologies in complementary manner, like Klout does. Enabling this choice has been the primary motivator for providing Hive ODBC connectivity, as well as investing in providing Hadoop-based distribution for Windows Server and Windows Azure.

1 comment:

  1. Big Data Analytics is the process of examining large amounts
    of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information.