BIRT and Cloudera: Give Your Hadoop Data Meaning

Cloudera Connect

BIRT and BIRT iHub are now certified with Cloudera 5!

What is Cloudera?

For those that don’t know, Cloudera Enterprise 5 is one of the industry-leading Hadoop data management platforms. It provides a single integrated platform for bringing diverse users and application workloads to a common infrastructure with security, enterprise-grade data auditing, fault tolerance, automated data backup, system and data management, and more.

Download Cloudera

Why is this Certification Important?

With BIRT Designer Professional and BIRT iHub now certified with Cloudera 5, it becomes easy to rapidly gather, filter, and analyze massive amounts of data. This allows for critical insights to be formulated and quickly communicated with end users through meaningful and interactive BIRT-based visualizations. All you have to do is connect to your Hadoop data via HiveServer2 or Cloudera Impala.

Loading Your Data

Data from HDFS can be loaded into your Cloudera HiveServer2 in several different ways. For running HQL queries through the command terminal, you have by simply typing “beeline” into the terminal and then connecting with your database url, as seen in the image below:

Or you can use the Hue interface within Cloudera Manager to manage HDFS and run Hive and Impala queries.

Connecting to Your Data

Now that you have data loaded into your HiveServer2, you can either write queries against that data from the terminal or Hue, as seen in the image above. Or, you can use a tool like BIRT to grab the data via a JDBC connection and turn it into something meaningful for your end users.

BIRT Designer Professional has a Cloudera specific data source. With open source BIRT, you’d use the JDBC data source. In this blog, I’m using BIRT Designer Professional.

Download BIRT

First things first, grab the 0.12.0 Hive2 jars from your Cloudera install and add them to your BIRT install in the folder <BDPro location>/eclipse/plugins/org.eclipse.birt.report.data.oda.jdbc_4.2.3.v20131216-0430/drivers/. In the Cloudera VM I used, these were located at /opt/cloudera/parcels/CDH/lib/hive/lib/.

Now, all we do is create a new data source in BIRT and choose the Cloudera Hive Data Source type.

Next, you’ll enter your connection information and test to make sure you’re able to connect to your server. In the below image, a connection to the HiveServer2 is made. If you were wanting to connect via Impala, you’d use a URL like: jdbc:hive2://192.168.40.130:21050/;auth=noSasl

With your connection made, you can now create a data set using this data source and write your HQL queries.

What Now?

Now that you’re able to connect to your data, it’s just BIRT. Just as with any other data, you can do further computations and joins, create tables, charts, crosstabs, etc. to display your data in the way that is useful to your end user. Then, you deploy the reports to make them accessible to your users.

For the certification, I took these reports and deployed them to my BIRT iHub3 and scheduled 20 reports to run against HiveServer2 and Impala performing various HQL query functions. For those that don’t know about iHub, it’s a very powerful BIRT based platform that provides security, scheduling, distribution, interactivity, and more. To learn more about iHub, see the product page. You can download a free trial here to experience all that Actuate and BIRT can do for you and your Cloudera-managed Hadoop data.

If you have questions about using BIRT with Cloudera, feel free to post questions/comments in the blog comment section or ask questions in the community forums. Thanks for reading.

-Michael

Write a Reply or Comment

Your email address will not be published. Required fields are marked *