Thursday 2 August 2012

Connect Excel to Hadoop on Azure via HiveODBC

One key feature of Microsoft’s Big Data Solution is solid integration of Apache Hadoop with the Microsoft Business Intelligence (BI) components. A good example of this is the ability for Excel to connect to the Hive data warehouse framework in the Hadoop cluster. This section walks you through using Excel via the Hive ODBC driver.

Install the Hive ODBC Driver
Prerequisites:


  • Ensure that Excel 2010 64-bit is already installed before installing the HiveODBC Driver.
  • Download the 64-bit Hive ODBC driver MSI file from the Portal by clicking the Download tile after logging into HadoopOnAzure.com .
  • Note, the HiveODBC driver includes both the HiveODBC Driver and the Excel Hive-AddIn.




  1. Double click HiveODBCSetupx64.msi to start the installation.
  2. Read the license agreement.
  3. If you agree to the license agreement, click I agree and Install.

 4. Once the installation has completed, click Finish to exit the Setup Wizard.

Install the Microsoft Excel Hive Add-In

Prerequisites:

  • Microsoft Excel 2010 64-bit
  • 64bit Hive ODBC driver installed

  1. Start Microsoft Excel 2010 64-bit.
  2. You will be asked to install the HiveExcel add-in. Click Install.
  3. Once the add-in has been installed, click the Data tab in Microsoft Excel 2010. You should see the Hive Panel as shown in the screenshot below.


Create a Hive ODBC Data Source to use with Excel

  1. Click Start->Control Panel to launch the Control Panel for Microsoft Windows.
  2. In the Control Panel, Click System and Security->Administrative Tools->Data Sources (ODBC). This will launch the ODBC Data Source Administrator dialog.
  1. In the ODBC Data Source Administrator dialog, click the System DSN tab.
  2. Click Add to add a new data source.
  3. Click the HIVE driver in the ODBC driver list.
4. Then click Finish. This will launch the ODBC Hive Setup dialog shown in the screenshot below.

  1. Enter a data source a name in the Data Source Name box. For Example, “MyHiveData”.
  2. In the Host box , enter the host name of the cluster you created on the portal. For example, “myhadoopcluster.cloudapp.net”.
  3. Enter the username you used to authenticate on the portal.
  4. Click OK to save the new Hive data source.
  5. Click OK to close the ODBC Data Source Administrator dialog.


Using the Excel Hive Add-In





  1. Go to HadoopOnAzure.com and click Sign In to sign into your Hadoop cluster.
  2. Click the Open Ports tile to access port configurations for your cluster.
3. On the Configure Ports page, click the toggle switch for the ODBC Server port to turn it on.
  1. Open Microsoft Excel 2010 64-bit.
  2. In Microsoft Excel 2010 64-bit, click the Data tab.
  3. Click the Hive Panel to open the Hive panel in Excel.
  4. In the drop-down list labeled Select or Enter Hive Connection, select the data source name you previously created.
  5. You will be asked to enter the password to authenticate with the cluster on the portal. Enter the password for the user name.
  6. In the drop-down list labeled Select the Hive Object to Query, select hivesampletable [Table].
  7. Select the checkbox for each of the columns in the table. The Hive Query panel should look similar to the following.

11. Click Execute Query.


No comments:

Post a Comment