大数据全系列 教程
1869个小节阅读:467.8k
408考研
JAVA全系列 教程
面向对象的程序设计语言
Python全系列 教程
Python3.x版本,未来主流的版本
人工智能 教程
顺势而为,AI创新未来
大厂算法 教程
算法,程序员自我提升必经之路
C++ 教程
一门通用计算机编程语言
微服务 教程
目前业界流行的框架组合
web前端全系列 教程
通向WEB技术世界的钥匙
大数据全系列 教程
站在云端操控万千数据
AIGC全能工具班
A A
White Night
(1) Hive 内部表
xxxxxxxxxx
CREATE TABLE [IF NOT EXISTS] table_name
删除表时,元数据与数据都会被删除
Table Type: MANAGED_TABLE 内部表
(2) Hive 外部表
xxxxxxxxxx
CREATE EXTERNAL TABLE [IF NOT EXISTS] table_name LOCATION hdfs_path
删除外部表只删除metastore的元数据,不删除hdfs中的表数据
Table Type: EXTERNAL_TABLE external
首先将person.txt文件上传到hdfs的/usr目录
xxxxxxxxxx
[root@node4 ~]# hdfs dfs -mkdir -p /usr/external
[root@node4 ~]# hdfs dfs -put /root/data/person.txt /usr/external
[root@node4 ~]# hdfs dfs -ls /usr/external
Found 1 items
-rw-r--r-- 3 root supergroup 496 2021-11-11 19:23 /usr/external/person.txt
建表脚本:
xxxxxxxxxx
create external table person4(
id int comment "唯一标识id",
name string comment "名称",
likes array<string> comment "爱好",
address map<string,string> comment "地址"
)
row format delimited
fields terminated by ","
collection items terminated by "-"
map keys terminated by ":"
lines terminated by "\n"
location '/usr/external';
创建表:
xxxxxxxxxx
hive> create external table person4(
> id int comment "唯一标识id",
> name string comment "名称",
> likes array<string> comment "爱好",
> address map<string,string> comment "地址"
> )
> row format delimited
> fields terminated by ","
> collection items terminated by "-"
> map keys terminated by ":"
> lines terminated by "\n"
> location '/usr/external';
OK
Time taken: 0.851 seconds
查看表的信息:
xxxxxxxxxx
hive> desc formatted person4;
OK
# col_name data_type comment
id int 唯一标识id
name string 名称
likes array<string> 爱好
address map<string,string> 地址
# Detailed Table Information
Database: default
OwnerType: USER
Owner: root
CreateTime: Thu Nov 11 19:28:23 CST 2021
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://mycluster/usr/external
Table Type: EXTERNAL_TABLE # 表示外部表
Table Parameters:
EXTERNAL TRUE
bucketing_version 2
numFiles 1
totalSize 496
transient_lastDdlTime 1636630103
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
collection.delim -
field.delim ,
line.delim \n
mapkey.delim :
serialization.format ,
Time taken: 0.41 seconds, Fetched: 36 row(s)
查询表中数据:
xxxxxxxxxx
hive> select * from person4;
OK
1 小明1 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
2 小明2 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
3 小明3 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
4 小明4 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
5 小明5 ["lol","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
6 小明6 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
7 小明7 ["lol","book"] {"beijing":"xisanqi","shanghai":"pudong"}
8 小明8 ["lol","book"] {"beijing":"xisanqi","shanghai":"pudong"}
9 小明9 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
Time taken: 2.947 seconds, Fetched: 9 row(s)
内部表和外部表的区别:
1、创建表的时候,内部表直接存储在默认的hdfs路径(/user/hive_remote/warehouse),外部表需要自己指定路径
2、删除表的时候,内部表会将数据和元数据全部删除,外部表只删除元数据,数据不删除
外部表删除前:
xxxxxxxxxx
mysql> use hive_remote;
Database changed
mysql> select * from TBLS where TBL_NAME='person4' ;
+--------+-------------+-------+------------------+-------+------------+-----------+-------+----------+----------------+--------------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | OWNER_TYPE | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+-------+------------+-----------+-------+----------+----------------+--------------------+--------------------+--------------------+
| 26 | 1636630103 | 1 | 0 | root | USER | 0 | 26 | person4 | EXTERNAL_TABLE | NULL | NULL | |
+--------+-------------+-------+------------------+-------+------------+-----------+-------+----------+----------------+--------------------+--------------------+--------------------+
1 row in set (0.00 sec)
mysql> select * from COLUMNS_V2 where CD_ID=26;
+-------+----------------+-------------+--------------------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+----------------+-------------+--------------------+-------------+
| 26 | 地址 | address | map<string,string> | 3 |
| 26 | 唯一标识id | id | int | 0 |
| 26 | 爱好 | likes | array<string> | 2 |
| 26 | 名称 | name | string | 1 |
+-------+----------------+-------------+--------------------+-------------+
4 rows in set (0.00 sec)
在hive中删除表:
xxxxxxxxxx
hive> drop table person4;
OK
Time taken: 0.687 seconds
再次检查元数据
xxxxxxxxxx
mysql> select * from TBLS where TBL_NAME='person4';
Empty set (0.00 sec)
mysql> select * from COLUMNS_V2 where CD_ID=26;
Empty set (0.00 sec)
查看hdfs对应的目录和文件还存在:
xxxxxxxxxx
[root@node4 ~]# hdfs dfs -ls /usr/external
Found 1 items
-rw-r--r-- 3 root supergroup 496 2021-11-11 19:26 /usr/external/person.txt
结论删除外部表时,元数据被删除;表对应的路径下的数据文件还存在。
删除内部表演示:
删除前:
xxxxxxxxxx
hive> select * from person2_2;
OK
1 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
2 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
3 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
4 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
5 ["lol","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
6 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
7 ["lol","book"] {"beijing":"xisanqi","shanghai":"pudong"}
8 ["lol","book"] {"beijing":"xisanqi","shanghai":"pudong"}
9 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
Time taken: 0.377 seconds, Fetched: 9 row(s)
从另外一个维度查看:
xxxxxxxxxx
[root@node4 ~]# hdfs dfs -ls /user/hive_remote/warehouse/person2_2
Found 1 items
-rw-r--r-- 3 root supergroup 424 2021-11-11 15:21 /user/hive_remote/warehouse/person2_2/000000_0
查看元数据:
xxxxxxxxxx
mysql> select * from TBLS where TBL_NAME='person2_2';
+--------+-------------+-------+------------------+-------+------------+-----------+-------+-----------+---------------+--------------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | OWNER_TYPE | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+-------+------------+-----------+-------+-----------+---------------+--------------------+--------------------+--------------------+
| 23 | 1636614907 | 1 | 0 | root | USER | 0 | 23 | person2_2 | MANAGED_TABLE | NULL | NULL | |
+--------+-------------+-------+------------------+-------+------------+-----------+-------+-----------+---------------+--------------------+--------------------+--------------------+
1 row in set (0.00 sec)
mysql> select * from COLUMNS_V2 where CD_ID=23;
+-------+----------------+-------------+--------------------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+----------------+-------------+--------------------+-------------+
| 23 | 地址 | address | map<string,string> | 2 |
| 23 | 唯一标识id | id | int | 0 |
| 23 | 爱好 | likes | array<string> | 1 |
+-------+----------------+-------------+--------------------+-------------+
在hive中将表person2_2删除:
xxxxxxxxxx
hive> drop table person2_2;
OK
Time taken: 0.519 seconds
查看元数据:
xxxxxxxxxx
mysql> select * from TBLS where TBL_NAME='person2_2';
Empty set (0.00 sec)
mysql> select * from COLUMNS_V2 where CD_ID=23;
Empty set (0.00 sec)
查看hdfs上文件:
xxxxxxxxxx
[root@node4 ~]# hdfs dfs -ls /user/hive_remote/warehouse/
#没有了person2_2目录
查看hdfs对应的目录和文件不存在。
结论:删除内部表时元数据和表对应目录以及目录下数据文件都被删除了。
注意:hive:读时检查(实现解耦,提高数据加载的效率)
关系型数据库:写时检查
演示:
xxxxxxxxxx
hive> load data local inpath '/root/anaconda-ks.cfg' into table person;
Loading data to table default.person
OK
Time taken: 1.592 seconds
hive> select * from person;
OK
NULL NULL NULL NULL
NULL NULL NULL NULL
NULL NULL NULL NULL
NULL NULL NULL NULL