Thursday 2 August 2012

Connecting PowerPivot to Hadoop on Azure – Self Service BI to Big Data in the Cloud

This post describes the steps to connect Hadoop from PowerPivot.
Pre-requisites

Configuration Steps

The steps to follow are the:
  • Install the HiveODBC Driver (we will configure the DSN later)
  • Steps 1 – 3 from Using the Excel Hive Add-In to open the ports in Hadoop on Azure

2) Create a Hive ODBC Data Source > File DSN
Here, we will go about creating a File DSN Hive ODBC Data Source.

To do this:
  • Go to the ODBC Data Sources Administrator and click on the File DSN tab.

    Click on Add, Choose HIVE, Click Next, Click Browse to choose a location of the file; click Finish.

    Open the File DSN you just created and click Configure.  The ODBC Hive Setup and configure the host (e.g. [clustername].cloudapp.net) and authentication information (the username is what you had specified when you had created the cluster)

    3) Connect PowerPivot to Hadoop on Azure via the HiveODBC File DSN
    • Open up the PowerPivot ribbon and click on the Get External Data from Other Sources.

From the Table Import Wizard, click on the Others (OLEDB/ODBC) and click Next

  • From here, click Build and the Data Link Properties, click on Provider, and ensure the Microsoft OLEDB Provider for ODBC Drivers is selected. Click Next.
  • In the Data Link Properties dialog, choose “Use connection string”, and click Build and choose the File DSN you had created from Step #2.  Enter in the password to your Hadoop on Azure cluster.  Click OK.


The Data Link Properties now contains a connection string do the Hadoop on Azure cluster.
Note, after this dialog, verify that the password has been entered into the connection string that that has been built into the Table Import Wizard.  Note, the blue arrow points to a lack of a PWD=<password> clause.  If the password isn’t specified, make sure to add it back in.

  • Click OK, click Next.  From here you will get the Table Import Wizard and we are back to the usual PowerPivot steps.
  • Click on “Select from a list of tables and views to choose the data to import”
Choose your table (e.g. hivesampletable) and import the data in.
It looks like a lot of steps but once you get into the flow of things, it’s actually a relatively easy flow.

No comments:

Post a Comment