BigData

The Hive Query Language is a query language for Hive to process, analyze the structured data in a Hadoop Eco system. We have seen creation of database and table operation of Hive, also we have seen creation of table from the text file in HDFS. This blog will cover some basic data manipulation operations.

Select clauses:

SELECT statement is used to retrieve the data from the hive table.

Syntax:

select * from table_name;

hive> select * from hadoopd_tbl limit 5;

88-124 2017-02-25 69.0

13-92 2017-02-25 9.0

13-85 2017-02-25 35.0

08-76 2017-02-25 3.0

01322 2017-02-25 10.0

Time taken: 0.356 seconds, Fetched: 5 row(s)

Syntax:

Select column_name from table_name;

hive> select item_id from hadoopd_tbl limit 5;

88-124

13-92

13-85

08-76

01322

Time taken: 0.324 seconds, Fetched: 5 row(s)

hive>

Entered Limit clause just to see only 5 rows in the screen.

To see the Row count from the table.

Syntax:

Select count(*) from Table_Name;

Select count(column_Name) from Table_Name;

Count(*) will be taking us to a Map Reduce job. Also, when applying any function the query will be run as a MapReduce job to get the advantage of distributed computing.

hive> select count(*) from hadoopd_tbl ;

Query ID = hive_20170413065454_39ce59cd-7e32-4781-b5a0-efa68f1d4c75

Total jobs = 1

Launching Job 1 out of 1

Tez session was closed. Reopening...

Session re-established.

Status: Running (Executing on YARN cluster with App id application_1492062882009_0003)

--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 .......... SUCCEEDED 1 1 0 0 0 0

Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0

--------------------------------------------------------------------------------

VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 23.87 s

--------------------------------------------------------------------------------

Time taken: 53.579 seconds, Fetched: 1 row(s)

hive>

Where clauses:

WHERE clause works like a condition. It filters the data using the condition and gives you a finite result. The built-in operators and functions generate an expression, which satisfies this condition.

Syntax:

Select [column1,column2, … ] from [Table_name] where [condition] Group by [column1,column2, …] order by [column1,column2, …] ;

hive> select item_id,date,count(*) from hadoopd_tbl where date > 2017-02-24 group by item_id,date order by date;

Query ID = hive_20170413093434_ea787a5f-326b-4503-bbb8-fd5b5c5e651c

Total jobs = 1

Launching Job 1 out of 1

Status: Running (Executing on YARN cluster with App id application_1492062882009_0005)

--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 .......... SUCCEEDED 1 1 0 0 0 0

Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0

Reducer 3 ...... SUCCEEDED 1 1 0 0 0 0

--------------------------------------------------------------------------------

VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 18.72 s

--------------------------------------------------------------------------------

88-124 2017-02-25 1

7-50T 2017-02-25 1

641 2017-02-25 2

13-85 2017-02-25 2

08-76 2017-02-25 1

0474 2017-02-25 1

01322 2017-02-25 2

Time taken: 20.475 seconds, Fetched: 7 row(s)

hive>

Joins:

Join is the function used to combine 2 tables values to a single table.

Lets join the 2 tables here.

EMPLOYEE_part:

ID	Name	Salary	Country
16	john	4000	USA
17	robert	2000	USA
18	andrew	4000	USA
19	katty	2000	USA
27	edward	4000	UK
29	alan	3000	UK
31	kerry	4000	UK
34	tom	3000	UK
35	zack	2000	UK

Employee:

ID	Name
16	john
17	robert
18	andrew
19	katty
21	tom
22	tim
23	james
24	paul
27	edward
29	alan
31	kerry
34	terri

Join

JOIN operation is used to combine and retrieve the records from multiple tables.

Example:

SELECT b.ID,a.Name,a.Salary,a.country from EMPLOYEE_part as a JOIN EMPLOYEE as b ON (a.ID = b.ID);

ID	Name	Salary	Country
16	john	4000	USA
17	robert	2000	USA
18	andrew	4000	USA
19	katty	2000	USA
27	edward	4000	UK
29	alan	3000	UK
31	kerry	4000	UK
34	tom	3000	UK

Left outer join

LEFT OUTER JOIN will return all the rows from the left table, even though there are no matches in the right-side table. The ON condition matches no record in the right-side table, this JOIN will return a row in the result, but with NULL value in the columns of the right table.

Example:

SELECT b.ID,a.Name,a.Salary,a.country from EMPLOYEE_part as a LEFT OUTER JOIN EMPLOYEE as b ON (a.ID = b.ID);

ID	Name	Salary	Country
16	john	4000	USA
17	robert	2000	USA
18	andrew	4000	USA
19	katty	2000	USA
27	edward	4000	UK
29	alan	3000	UK
31	kerry	4000	UK
34	tom	3000	UK
NULL	zack	2000	UK

Right outer join

RIGHT OUTER JOIN will return all the rows from the RIGHT table, even though there are no matches in the left table. The ON condition matches no record in the left-side table, this JOIN will return a row in the result, but with NULL value in the columns of the left table.

Example:

SELECT b.ID,a.Name,a.Salary,a.country from EMPLOYEE_part as a RIGHT OUTER JOIN EMPLOYEE as b ON (a.ID = b.ID);

ID	Name	Salary	Country
16	john	4000	USA
17	robert	2000	USA
18	andrew	4000	USA
19	katty	2000	USA
21	NULL	NULL	NULL
22	NULL	NULL	NULL
23	NULL	NULL	NULL
24	NULL	NULL	NULL
27	edward	4000	UK
29	alan	3000	UK
31	kerry	4000	UK
34	tom	3000	UK

Full outer join

FULL OUTER JOIN combines all the records of both the left and the right tables. The joined table contains all the records from both the tables and fills in NULL values for missing matched columns on both the side.

Example:

SELECT b.ID,a.Name,a.Salary,a.country from EMPLOYEE_part as a FULL OUTER JOIN EMPLOYEE as b ON (a.ID = b.ID);

ID	Name	Salary	Country
16	john	4000	USA
17	robert	2000	USA
18	andrew	4000	USA
19	katty	2000	USA
21	NULL	NULL	NULL
22	NULL	NULL	NULL
23	NULL	NULL	NULL
24	NULL	NULL	NULL
27	edward	4000	UK
29	alan	3000	UK
31	kerry	4000	UK
34	tom	3000	UK
NULL	zack	2000	UK

BigData

Friday, 21 April 2017

Hive joins

Select clauses:

Where clauses:

Joins:

Join

Left outer join

Right outer join

Full outer join