The Hive Query Language is a query language for Hive to
process, analyze the structured data in a Hadoop Eco system. We have seen
creation of database and table operation of Hive, also we have seen creation of
table from the text file in HDFS. This blog will cover some basic data
manipulation operations.
Select clauses:
SELECT statement is used to retrieve the data from the hive
table.
Syntax:
select * from table_name;
hive> select * from hadoopd_tbl limit 5;
OK
88-124
2017-02-25 69.0
13-92
2017-02-25 9.0
13-85
2017-02-25 35.0
08-76
2017-02-25 3.0
01322
2017-02-25 10.0
Time taken: 0.356 seconds, Fetched: 5 row(s)
Syntax:
Syntax:
Select column_name from table_name;
hive> select item_id from hadoopd_tbl limit 5;
OK
88-124
13-92
13-85
08-76
01322
Time taken: 0.324 seconds, Fetched: 5 row(s)
hive>
Entered Limit clause just to see only 5 rows in the screen.
To see the Row count from the table.
Syntax:
Select count(*) from Table_Name;
Select count(column_Name) from Table_Name;
Count(*) will be taking us to a Map Reduce job. Also, when
applying any function the query will be run as a MapReduce job to get the
advantage of distributed computing.
hive> select count(*) from hadoopd_tbl ;
Query ID =
hive_20170413065454_39ce59cd-7e32-4781-b5a0-efa68f1d4c75
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Status: Running (Executing on YARN cluster with App id
application_1492062882009_0003)
--------------------------------------------------------------------------------
VERTICES STATUS
TOTAL COMPLETED RUNNING
PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 ..........
SUCCEEDED 1 1 0
0 0 0
Reducer 2 ......
SUCCEEDED 1 1 0
0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02
[==========================>>] 100% ELAPSED TIME: 23.87 s
--------------------------------------------------------------------------------
OK
19
Time taken: 53.579 seconds, Fetched: 1 row(s)
hive>
Where clauses:
WHERE clause works like a condition. It filters the data
using the condition and gives you a finite result. The built-in operators and
functions generate an expression, which satisfies this condition.
Syntax:
Select [column1,column2, … ] from [Table_name] where
[condition] Group by [column1,column2, …] order by [column1,column2, …] ;
hive> select item_id,date,count(*) from hadoopd_tbl where
date > 2017-02-24 group by item_id,date order by date;
Query ID = hive_20170413093434_ea787a5f-326b-4503-bbb8-fd5b5c5e651c
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id
application_1492062882009_0005)
--------------------------------------------------------------------------------
VERTICES
STATUS TOTAL COMPLETED
RUNNING PENDING FAILED
KILLED
--------------------------------------------------------------------------------
Map 1 ..........
SUCCEEDED 1 1 0
0 0 0
Reducer 2 ...... SUCCEEDED 1
1 0 0
0 0
Reducer 3 ......
SUCCEEDED 1 1 0
0 0 0
--------------------------------------------------------------------------------
VERTICES: 03/03
[==========================>>] 100% ELAPSED TIME: 18.72 s
--------------------------------------------------------------------------------
OK
88-124
2017-02-25 1
7-50T
2017-02-25 1
641
2017-02-25 2
13-85
2017-02-25 2
08-76
2017-02-25 1
0474
2017-02-25 1
01322
2017-02-25 2
Time taken: 20.475 seconds, Fetched: 7 row(s)
hive>
Joins:
Join is the function used to combine 2 tables values to a single table.
Lets join the 2 tables here.
EMPLOYEE_part:
ID
|
Name
|
Salary
|
Country
|
16
|
john
|
4000
|
USA
|
17
|
robert
|
2000
|
USA
|
18
|
andrew
|
4000
|
USA
|
19
|
katty
|
2000
|
USA
|
27
|
edward
|
4000
|
UK
|
29
|
alan
|
3000
|
UK
|
31
|
kerry
|
4000
|
UK
|
34
|
tom
|
3000
|
UK
|
35
|
zack
|
2000
|
UK
|
Employee:
ID
|
Name
|
16
|
john
|
17
|
robert
|
18
|
andrew
|
19
|
katty
|
21
|
tom
|
22
|
tim
|
23
|
james
|
24
|
paul
|
27
|
edward
|
29
|
alan
|
31
|
kerry
|
34
|
terri
|
Join
JOIN
operation is used to combine and retrieve the records from multiple tables.
Example:
SELECT b.ID,a.Name,a.Salary,a.country from EMPLOYEE_part as
a JOIN EMPLOYEE as b ON (a.ID = b.ID);
ID
|
Name
|
Salary
|
Country
|
16
|
john
|
4000
|
USA
|
17
|
robert
|
2000
|
USA
|
18
|
andrew
|
4000
|
USA
|
19
|
katty
|
2000
|
USA
|
27
|
edward
|
4000
|
UK
|
29
|
alan
|
3000
|
UK
|
31
|
kerry
|
4000
|
UK
|
34
|
tom
|
3000
|
UK
|
Left outer join
LEFT OUTER JOIN will return all the rows from the left
table, even though there are no matches in the right-side table. The ON
condition matches no record in the right-side table, this JOIN will return a
row in the result, but with NULL value in the columns of the right table.
Example:
SELECT b.ID,a.Name,a.Salary,a.country from EMPLOYEE_part as
a LEFT OUTER JOIN EMPLOYEE as b ON (a.ID = b.ID);
ID
|
Name
|
Salary
|
Country
|
16
|
john
|
4000
|
USA
|
17
|
robert
|
2000
|
USA
|
18
|
andrew
|
4000
|
USA
|
19
|
katty
|
2000
|
USA
|
27
|
edward
|
4000
|
UK
|
29
|
alan
|
3000
|
UK
|
31
|
kerry
|
4000
|
UK
|
34
|
tom
|
3000
|
UK
|
NULL
|
zack
|
2000
|
UK
|
Right outer join
RIGHT OUTER JOIN will return all the rows from the RIGHT
table, even though there are no matches in the left table. The ON condition
matches no record in the left-side table, this JOIN will return a row in the
result, but with NULL value in the columns of the left table.
Example:
SELECT b.ID,a.Name,a.Salary,a.country from EMPLOYEE_part as
a RIGHT OUTER JOIN EMPLOYEE as b ON (a.ID = b.ID);
ID
|
Name
|
Salary
|
Country
|
16
|
john
|
4000
|
USA
|
17
|
robert
|
2000
|
USA
|
18
|
andrew
|
4000
|
USA
|
19
|
katty
|
2000
|
USA
|
21
|
NULL
|
NULL
|
NULL
|
22
|
NULL
|
NULL
|
NULL
|
23
|
NULL
|
NULL
|
NULL
|
24
|
NULL
|
NULL
|
NULL
|
27
|
edward
|
4000
|
UK
|
29
|
alan
|
3000
|
UK
|
31
|
kerry
|
4000
|
UK
|
34
|
tom
|
3000
|
UK
|
Full outer join
FULL OUTER JOIN combines all the records of both the left
and the right tables. The joined table
contains all the records from both the tables and fills in NULL values for
missing matched columns on both the side.
Example:
SELECT b.ID,a.Name,a.Salary,a.country from EMPLOYEE_part as
a FULL OUTER JOIN EMPLOYEE as b ON (a.ID = b.ID);
ID
|
Name
|
Salary
|
Country
|
16
|
john
|
4000
|
USA
|
17
|
robert
|
2000
|
USA
|
18
|
andrew
|
4000
|
USA
|
19
|
katty
|
2000
|
USA
|
21
|
NULL
|
NULL
|
NULL
|
22
|
NULL
|
NULL
|
NULL
|
23
|
NULL
|
NULL
|
NULL
|
24
|
NULL
|
NULL
|
NULL
|
27
|
edward
|
4000
|
UK
|
29
|
alan
|
3000
|
UK
|
31
|
kerry
|
4000
|
UK
|
34
|
tom
|
3000
|
UK
|
NULL
|
zack
|
2000
|
UK
|