Sydney Cloudera Data Analyst Training Using Pig, Hive & Impala

Event Information

Share this event

Date and Time




60 Margaret Street

Sydney, NSW 2000


View Map

Refund Policy

Refund Policy

Refunds up to 30 days before event

Friends Who Are Going
Event description


By registering, you are confirming that you have read and agree to our privacy policy.

For all enquiries, please email amy@contexti.com


Cloudera University’s four-day data analyst training course focusing on Apache Pig and Hive and Cloudera Impala will teach you to apply traditional data analytics and business intelligence skills to big data. Cloudera presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.


4 Days


Hadoop Fundamentals

The Motivation for Hadoop

Hadoop Overview

Data Storage: HDFS

Distributed Data Processing: YARN, MapReduce, and Spark

Data Processing and Analysis: Pig, Hive, and Impala

Data Integration: Sqoop

Other Hadoop Data Tools

Exercise Scenarios Explanation

Introduction to Pig

What Is Pig?

Pig’s Features

Pig Use Cases

Interacting with Pig

Basic Data Analysis with Pig

Pig Latin Syntax

Loading Data

Simple Data Types

Field Definitions

Data Output

Viewing the Schema

Filtering and Sorting Data

Commonly-Used Functions

Processing Complex Data with Pig

S torage Formats

Complex/Nested Data Types

G rouping

Built-In Functions for Complex Data

Iterating Grouped Data

Multi-Dataset Operations with Pig

Techniques for Combining Data Sets

Joining Data Sets in Pig

Set Operations

Splitting Data Sets

Pig Troubleshooting and Optimization

Troubleshooting Pig


Using Hadoop’s Web UI

Data Sampling and Debugging

Performance Overview

Understanding the Execution Plan

Tips for Improving the Performance of Your Pig Jobs

Introduction to Hive and Impala

What Is Hive?

What Is Impala?

Schema and Data Storage

Comparing Hive to Traditional Databases

Hive Use Cases

Querying with Hive and Impala

Databases and Tables

Basic Hive and Impala Query Language Syntax

Data Types

Differences Between Hive and Impala Query Syntax

Using Hue to Execute Queries

Using the Impala Shell

Data Management

Data Storage

Creating Databases and Tables

Loading Data

Altering Databases and Tables

Simplifying Queries with Views

Storing Query Results

Data Storage and Performance

Partitioning Tables

Choosing a File Format

Managing Metadata

Controlling Access to Data

Relational Data Analysis with Hive and Impala

Joining Datasets

Common Built-In Functions

Aggregation and Windowing

Working with Impala

How Impala Executes Queries

Extending Impala with User-Defined Functions

Improving Impala Performance

Analyzing Text and Complex Data with Hive

Complex Values in Hive

Using Regular Expressions in Hive

Sentiment Analysis and N-Grams


Hive Optimization

Understanding Query Performance

Controlling Job Execution Plan


Indexing Data

Extending Hive


Data Transformation with Custom Scripts

User-Defined Functions

Parameterized Queries

Choosing the Best Tool for the Job

Comparing MapReduce, Pig, Hive, Impala, and Relational Databases

Which to Choose?

Share with friends

Date and Time



60 Margaret Street

Sydney, NSW 2000


View Map

Refund Policy

Refunds up to 30 days before event

Save This Event

Event Saved