BigData: Hive - Basics, Create and update tables in hive, Data types available in Hive

Hive is a SQL like query language to querying and analyzing data on Hadoop distributed file system. When there was a bottleneck where Hadoop can be used only by the java known peoples, facebook came up with the SQL like language where database engineers to get some benefit of the Hadoop.

So, Hive is originally developed by Facebook is a data warehouse built on top of Hadoop and provides the SQL like query language called HiveQL for querying data, data summarization and analysis from the Hadoop eco system. It also Hive makes querying faster through indexing. which makes the database engineers to get into the Hadoop distributed file system.

To the simpler note, Hive is a data warehouse application that build on top of the Hadoop Distributed file system. Hive can interact with the many parts of the HDFS. Hive is a SQL like language it easy for the analyst and the data engineers but Hive internally it converts as a MapReduce program will get the major advantage of the Distributed computing and parallel processing concepts.

Getting starting with Hive:

Login with the Hive Shell

[hive@sandbox ~]$ hive

Logging initialized using configuration in file:/etc/hive/conf/hive-log4j.properties

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hive/lib/hive-jdbc-0.14.0.2.2.4.2-2-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

hive>

Show databases:

hive> show databases;

default

ipl

xademo

Time taken: 0.306 seconds, Fetched: 3 row(s)

hive>

Create databases:

hive> create database hadoopd_demo;

Time taken: 0.658 seconds

hive>

Use database:

hive> use hadoopd_demo;

Time taken: 0.429 seconds

hive>

show tables:

hive> show databases;

default

hadoopd_demo

ipl

xademo

Time taken: 0.163 seconds, Fetched: 4 row(s)

hive> use hadoopd_demo;

Time taken: 0.432 seconds

hive> show tables;

Time taken: 0.582 seconds

hive> use default;

Time taken: 0.366 seconds

hive> show tables;

hadoopd_tbl

sample_07

sample_08

Time taken: 0.196 seconds, Fetched: 3 row(s)

hive>

Create tables

Table creation syntax shown as below. Each query must be terminated with the semicolon(;).

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name

[(col_name1 data_type [COMMENT col_comment],

col_name2 data_type [COMMENT col_comment], ...)]

[COMMENT table_comment]

[ROW FORMAT row_format]

[STORED AS file_format];

Copy the data to HDFS and Create the table based on the data.

[hive@sandbox tmp]$ head online.txt

88-124 2017-02-25 69.0

13-92 2017-02-25 9.0

13-85 2017-02-25 35.0

08-76 2017-02-25 3.0

01322 2017-02-25 10.0

10539 2017-02-25 050.0

0474 2017-02-25 78.0

L1 2017-02-25 30.0

7-50T 2017-02-25 57.0

641 2017-02-25 00.0

[hive@sandbox tmp]$ hdfs dfs -put online.txt /tmp/online/

Creating the hive table:

hive> CREATE EXTERNAL TABLE IF NOT EXISTS hadoopd_tbl (

> item_id STRING,

> date date,

> clicks float

> )

> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

> LOCATION '/tmp/hadoopd/';

Time taken: 0.488 seconds

hive>

Show table after the table creation to make sure the table created:

hive> show tables;

hadoopd_tbl

sample_07

sample_08

Time taken: 0.192 seconds, Fetched: 3 row(s)

hive>

Data types:

There are numerus data type available to define the column based on the data. Hive is basically case insensitive.

INT:

Integer type data can be specified using INT. When the data range exceeds, we need to go for the BIGINT and if the data range is small then use SMALLINT. Also, TINYINT available for the data which is smaller than the SMALLINT.

Syntax: Column_Name INT

STRING:

String type data types can be specified as single quotes (' ') or double quotes (" "). This STRING data type contains two data types: VARCHAR and CHAR.

Syntax: Column_Name STRING

Timestamp:

It supports UNIX timestamp upto the nanosecond. Also, It supports java.sql.Timestamp format “YYYY-MM-DD HH:MM:SS.fffffffff” and format “yyyy-mm-dd hh:mm:ss.ffffffffff”.

Syntax: Column_Name TIMESTAMP

Dates:

DATE values are described in the form of year-month-day {{YYYY-MM-DD}}.

Syntax: Column_Name DATE

Decimals:

The DECIMAL type is as same as Big Decimal format of Java. It is used for representing immutable arbitrary precision.

Floating Point Types:

Floating point type is nothing but numbers with decimal point values. Generally, this type of data is composed of DOUBLE data type.

Syntax: Column_Name FLOAT

Decimal Type:

Decimal type data is nothing but floating point value with higher range than DOUBLE data type. The range of decimal type is approximately -10^-308 to 10³⁰⁸.

Null Value:

Missing values are represented as the special value NULL.

Arrays:

Arrays in Hive are similar to the array in the Java.

Syntax: ARRAY <data_type>

Maps:

Maps in Hive are similar to the Java Maps.

Syntax: MAP <primitive_type, data_type>

Structs:

Structs are similar to using complex data with comment.

Syntax: STRUCT <col_name : data_type [COMMENT col_comment], ...>

Alter the table:

It is used to alter a table in Hive.

Rename the table Name:

Renaming the table Name in the Hive.

Syntax:

ALTER TABLE name RENAME TO new_name;

Example:

hive> ALTER TABLE hadoopd_tbl RENAME to hadoopd_demo;

Time taken: 0.884 seconds

hive>

Add the column to the table:

Add the additional columns to the table.

Syntax:

ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])

Example:

hive> alter table hadoopd_demo add columns ( Customer_ID STRING );

Time taken: 0.757 seconds

hive> describe hadoopd_demo;

item_id string

date date

clicks float

customer_id string

Time taken: 0.935 seconds, Fetched: 4 row(s)

hive>

Alter the data type:

Changing the data type of the columns.

Syntax:

ALTER TABLE name CHANGE column_name new_name new_type;

Example:

hive> describe hadoopd_demo;

item_id string

date date

clicks float

customer_id string

Time taken: 0.857 seconds, Fetched: 4 row(s)

hive> ALTER TABLE hadoopd_demo CHANGE customer_id customer_id INT AFTER clicks;

Time taken: 0.683 seconds

hive> describe hadoopd_demo;

item_id string

date date

clicks float

customer_id int

Time taken: 0.758 seconds, Fetched: 4 row(s)

hive>

Alter the table with columns and its datatypes:

Changing the columns and its data types.

Syntax:

ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])

Example:

hive> ALTER TABLE hadoopd_demo REPLACE COLUMNS( item_id STRING, date date, clicks float);

Time taken: 0.632 seconds

hive> describe hadoopd_demo;

item_id string

date date

clicks float

Time taken: 0.806 seconds, Fetched: 3 row(s)

hive>

Drop the table and database

Drop table:

Dropping the table from the database.

Syntax:

drop table table_Name;

Example:

hive> drop table hadoopd_demo;

Time taken: 0.817 seconds

hive>

Drop database:

Dropping the database from Hive.

Syntax:

Drop database database_Name;

Example:

hive> show databases;

default

hadoopd_demo

ipl

xademo

Time taken: 0.091 seconds, Fetched: 4 row(s)

hive> drop database hadoopd_demo;

Time taken: 0.606 seconds

hive> show databases;

default

ipl

xademo

Time taken: 0.058 seconds, Fetched: 3 row(s)

hive>

BigData

Monday, 10 April 2017

Hive - Basics, Create and update tables in hive, Data types available in Hive

Getting starting with Hive:

Login with the Hive Shell

Show databases:

Create databases:

Use database:

show tables:

Create tables

Copy the data to HDFS and Create the table based on the data.

Creating the hive table:

Show table after the table creation to make sure the table created:

Data types:

INT:

STRING:

Timestamp:

Dates:

Decimals:

Floating Point Types:

Decimal Type:

Null Value:

Arrays:

Maps:

Structs:

Alter the table:

Rename the table Name:

Add the column to the table:

Alter the data type:

Alter the table with columns and its datatypes:

Drop the table and database

Drop table:

Drop database:

No comments:

Post a Comment