Hive is a SQL like query language to querying and analyzing
data on Hadoop distributed file system. When there was a bottleneck where
Hadoop can be used only by the java known peoples, facebook came up with the
SQL like language where database engineers to get some benefit of the Hadoop.
So, Hive is
originally developed by Facebook is a data warehouse built on top of Hadoop and
provides the SQL like query language called HiveQL for querying data, data
summarization and analysis from the Hadoop eco system. It also Hive makes
querying faster through indexing. which makes the database engineers to get
into the Hadoop distributed file system.
To the simpler note, Hive is a data warehouse application
that build on top of the Hadoop Distributed file system. Hive can interact with
the many parts of the HDFS. Hive is a SQL like language it easy for the analyst
and the data engineers but Hive internally it converts as a MapReduce program
will get the major advantage of the Distributed computing and parallel
processing concepts.
Getting starting with Hive:
Login with the Hive Shell
[hive@sandbox ~]$ hive
Logging initialized using configuration in file:/etc/hive/conf/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hive/lib/hive-jdbc-0.14.0.2.2.4.2-2-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive>
Show databases:
hive> show databases;
OK
default
ipl
xademo
Time taken: 0.306 seconds, Fetched: 3 row(s)
hive>
Create databases:
hive> create database hadoopd_demo;
OK
Time taken: 0.658 seconds
hive>
Use database:
hive> use hadoopd_demo;
OK
Time taken: 0.429 seconds
hive>
show tables:
hive> show databases;
OK
default
hadoopd_demo
ipl
xademo
Time taken: 0.163 seconds, Fetched: 4 row(s)
hive> use hadoopd_demo;
OK
Time taken: 0.432 seconds
hive> show tables;
OK
Time taken: 0.582 seconds
hive> use default;
OK
Time taken: 0.366 seconds
hive> show tables;
OK
hadoopd_tbl
sample_07
sample_08
Time taken: 0.196 seconds, Fetched: 3 row(s)
hive>
Create tables
Table creation syntax shown as below. Each query must be
terminated with the semicolon(;).
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS]
[db_name.] table_name
[(col_name1 data_type [COMMENT col_comment],
col_name2 data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format];
Copy the data to HDFS and Create the table based on the data.
[hive@sandbox tmp]$ head online.txt
88-124 2017-02-25 69.0
13-92 2017-02-25 9.0
13-85 2017-02-25 35.0
08-76 2017-02-25 3.0
01322 2017-02-25 10.0
10539 2017-02-25 050.0
0474 2017-02-25 78.0
L1 2017-02-25 30.0
7-50T 2017-02-25 57.0
641 2017-02-25 00.0
[hive@sandbox tmp]$ hdfs dfs -put online.txt /tmp/online/
Creating the hive table:
hive> CREATE EXTERNAL TABLE IF NOT EXISTS hadoopd_tbl (
> item_id STRING,
> date date,
> clicks float
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> LOCATION '/tmp/hadoopd/';
OK
Time taken: 0.488 seconds
hive>
Show table after the table creation to make sure the table created:
hive> show tables;
OK
hadoopd_tbl
sample_07
sample_08
Time taken: 0.192 seconds, Fetched: 3 row(s)
hive>
Data types:
There are numerus data type available to define the column
based on the data. Hive is basically case insensitive.
INT:
Integer type data can be specified using INT. When the data
range exceeds, we need to go for the BIGINT and if the data range is small then
use SMALLINT. Also, TINYINT available for the data which is smaller than the
SMALLINT.
Syntax: Column_Name
INT
STRING:
String type data types can be specified as single quotes ('
') or double quotes (" "). This STRING data type contains two data types:
VARCHAR and CHAR.
Syntax: Column_Name
STRING
Timestamp:
It supports UNIX timestamp upto the nanosecond. Also, It
supports java.sql.Timestamp format “YYYY-MM-DD HH:MM:SS.fffffffff” and format
“yyyy-mm-dd hh:mm:ss.ffffffffff”.
Syntax: Column_Name
TIMESTAMP
Dates:
DATE values are described in the form of year-month-day
{{YYYY-MM-DD}}.
Syntax: Column_Name
DATE
Decimals:
The DECIMAL type is as same as Big Decimal format of Java.
It is used for representing immutable arbitrary precision.
Floating Point Types:
Floating point type is nothing but numbers with decimal
point values. Generally, this type of data is composed of DOUBLE data type.
Syntax: Column_Name
FLOAT
Decimal Type:
Decimal type data is nothing but floating point value with
higher range than DOUBLE data type. The range of decimal type is approximately
-10-308 to 10308.
Null Value:
Missing values are represented as the special value NULL.
Arrays:
Arrays in Hive are similar to the array in the Java.
Syntax: ARRAY <data_type>
Maps:
Maps in Hive are similar to the Java Maps.
Syntax: MAP <primitive_type,
data_type>
Structs:
Structs are similar to using complex data with comment.
Syntax: STRUCT <col_name : data_type [COMMENT
col_comment], ...>
Alter the table:
It is used to alter a table in Hive.
Rename the table Name:
Renaming the table Name in the Hive.
Syntax:
ALTER TABLE name RENAME TO new_name;
Example:
hive> ALTER TABLE hadoopd_tbl RENAME to hadoopd_demo;
OK
Time taken: 0.884 seconds
hive>
Add the column to the table:
Add the additional columns to the table.
Syntax:
ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
Example:
hive> alter table hadoopd_demo add columns ( Customer_ID
STRING );
OK
Time taken: 0.757 seconds
hive> describe hadoopd_demo;
OK
item_id
string
date
date
clicks
float
customer_id
string
Time taken: 0.935 seconds, Fetched: 4 row(s)
hive>
Alter the data type:
Changing the data type of the columns.
Syntax:
ALTER TABLE name CHANGE column_name new_name new_type;
Example:
hive> describe hadoopd_demo;
OK
item_id
string
date
date
clicks
float
customer_id
string
Time taken: 0.857 seconds, Fetched: 4 row(s)
hive> ALTER TABLE hadoopd_demo CHANGE customer_id
customer_id INT AFTER clicks;
OK
Time taken: 0.683 seconds
hive> describe hadoopd_demo;
OK
item_id
string
date
date
clicks
float
customer_id
int
Time taken: 0.758 seconds, Fetched: 4 row(s)
hive>
Alter the table with columns and its datatypes:
Changing the columns and its data types.
Syntax:
ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
Example:
hive> ALTER TABLE hadoopd_demo REPLACE COLUMNS( item_id
STRING, date date, clicks float);
OK
Time taken: 0.632 seconds
hive> describe hadoopd_demo;
OK
item_id
string
date
date
clicks
float
Time taken: 0.806 seconds, Fetched: 3 row(s)
hive>
Drop the table and database
Drop table:
Dropping the table from the database.
Syntax:
drop table table_Name;
Example:
hive> drop table hadoopd_demo;
OK
Time taken: 0.817 seconds
hive>
Drop database:
Dropping the database from Hive.
Syntax:
Drop database database_Name;
Example:
hive> show databases;
OK
default
hadoopd_demo
ipl
xademo
Time taken: 0.091 seconds, Fetched: 4 row(s)
hive> drop database hadoopd_demo;
OK
Time taken: 0.606 seconds
hive> show databases;
OK
default
ipl
xademo
Time taken: 0.058 seconds, Fetched: 3 row(s)
hive>
No comments:
Post a Comment