大数据全系列 教程
1869个小节阅读:465.1k
JAVA全系列 教程
面向对象的程序设计语言
Python全系列 教程
Python3.x版本,未来主流的版本
人工智能 教程
顺势而为,AI创新未来
大厂算法 教程
算法,程序员自我提升必经之路
C++ 教程
一门通用计算机编程语言
微服务 教程
目前业界流行的框架组合
web前端全系列 教程
通向WEB技术世界的钥匙
大数据全系列 教程
站在云端操控万千数据
AIGC全能工具班
A A
White Night
数据集:
id,姓名,爱好,地址 1,小明1,lol-book-movie,beijing:xisanqi-shanghai:pudong 2,小明2,lol-book-movie,beijing:xisanqi-shanghai:pudong 3,小明3,lol-book-movie,beijing:xisanqi-shanghai:pudong 4,小明4,lol-book-movie,beijing:xisanqi-shanghai:pudong 5,小明5,lol-movie,beijing:xisanqi-shanghai:pudong 6,小明6,lol-book-movie,beijing:xisanqi-shanghai:pudong 7,小明7,lol-book,beijing:xisanqi-shanghai:pudong 8,小明8,lol-book,beijing:xisanqi-shanghai:pudong 9,小明9,lol-book-movie,beijing:xisanqi-shanghai:pudong
语法格式:
xxxxxxxxxx
load data [local] inpath 'filepath' [overwrite] into table tablename [partition (partcol1=val1, partcol2=val2 ...)]
local:load的本地文件,将filepath指定的本地文件上传到hdfs指定的目录(对应表的目录)
无local:filepath是hdfs上一个文件,将它移动到指定目录(对应表的目录)
实操演示:
准备工作:
xxxxxxxxxx
[root@node4 ~]# mkdir data
[root@node4 ~]# cd data/
[root@node4 data]# vim person.txt
1,小明1,lol-book-movie,beijing:xisanqi-shanghai:pudong
2,小明2,lol-book-movie,beijing:xisanqi-shanghai:pudong
3,小明3,lol-book-movie,beijing:xisanqi-shanghai:pudong
4,小明4,lol-book-movie,beijing:xisanqi-shanghai:pudong
5,小明5,lol-movie,beijing:xisanqi-shanghai:pudong
6,小明6,lol-book-movie,beijing:xisanqi-shanghai:pudong
7,小明7,lol-book,beijing:xisanqi-shanghai:pudong
8,小明8,lol-book,beijing:xisanqi-shanghai:pudong
9,小明9,lol-book-movie,beijing:xisanqi-shanghai:pudong
本地文件load数据:
xxxxxxxxxx
hive> load data local inpath '/root/data/person.txt' into table person;
Loading data to table default.person
OK
Time taken: 6.155 seconds
hive> select * from person;
OK
1 小明1 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
2 小明2 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
3 小明3 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
4 小明4 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
5 小明5 ["lol","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
6 小明6 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
7 小明7 ["lol","book"] {"beijing":"xisanqi","shanghai":"pudong"}
8 小明8 ["lol","book"] {"beijing":"xisanqi","shanghai":"pudong"}
9 小明9 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
Time taken: 3.611 seconds, Fetched: 9 row(s)
在hdfs文件系统上的对应目录/user/hive_remote/warehouse/person下多出一个person.txt文件。
hdfs上的文件load数据:
首先将数据文件person.txt上传到hdfs的/ 目录下
xxxxxxxxxx
[root@node4 data]# ls
person.txt
[root@node4 data]# hdfs dfs -put person.txt /
2021-11-11 14:46:43,462 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[root@node4 data]# hdfs dfs -ls /
Found 6 items
-rw-r--r-- 3 root supergroup 496 2021-11-11 14:46 /person.txt
使用load向person表中添加数据:
xxxxxxxxxx
hive> load data inpath '/person.txt' into table person;
Loading data to table default.person
OK
Time taken: 0.817 seconds
hive> select * from person;
OK
1 小明1 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
2 小明2 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
3 小明3 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
4 小明4 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
5 小明5 ["lol","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
6 小明6 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
7 小明7 ["lol","book"] {"beijing":"xisanqi","shanghai":"pudong"}
8 小明8 ["lol","book"] {"beijing":"xisanqi","shanghai":"pudong"}
9 小明9 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
1 小明1 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
2 小明2 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
3 小明3 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
4 小明4 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
5 小明5 ["lol","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
6 小明6 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
7 小明7 ["lol","book"] {"beijing":"xisanqi","shanghai":"pudong"}
8 小明8 ["lol","book"] {"beijing":"xisanqi","shanghai":"pudong"}
9 小明9 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
Time taken: 0.379 seconds, Fetched: 18 row(s)
检查hdfs上文件,得出:在hdfs文件系统中的/person.txt没有了,然后在hdfs文件系统中的/user/hive_remote/warehouse/person目录下多出一个 person_copy_1.txt。就相当于将hdfs上的文件从目录"/"下移动到/user/hive_remote/warehouse/person下(由于该目录下已经有一个名称为person.txt的文件,所以移动过来的person.txt使用一个新的名称)。
注意:使用overwrite表示覆盖,没有它表示追加
xxxxxxxxxx
hive> load data local inpath '/root/data/person.txt' overwrite into table person;
Loading data to table default.person
OK
Time taken: 0.897 seconds
hive> select * from person;
OK
1 小明1 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
2 小明2 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
3 小明3 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
4 小明4 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
5 小明5 ["lol","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
6 小明6 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
7 小明7 ["lol","book"] {"beijing":"xisanqi","shanghai":"pudong"}
8 小明8 ["lol","book"] {"beijing":"xisanqi","shanghai":"pudong"}
9 小明9 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}