大数据全系列 教程
1869个小节阅读:467.6k
408考研
JAVA全系列 教程
面向对象的程序设计语言
Python全系列 教程
Python3.x版本,未来主流的版本
人工智能 教程
顺势而为,AI创新未来
大厂算法 教程
算法,程序员自我提升必经之路
C++ 教程
一门通用计算机编程语言
微服务 教程
目前业界流行的框架组合
web前端全系列 教程
通向WEB技术世界的钥匙
大数据全系列 教程
站在云端操控万千数据
AIGC全能工具班
A A
White Night
#三、单词数量统计案例实战
##3.1 运行自带的wordcount
###3.1.1 运行的命令:
xxxxxxxxxx
[root@node1 ~]# cd /opt/hadoop-3.1.3/share/hadoop/mapreduce/
[root@node1 mapreduce]# pwd
/opt/hadoop-3.1.3/share/hadoop/mapreduce
[root@node1 mapreduce]# ll *examples-3.1.3.jar
-rw-r--r-- 1 itbaizhan itbaizhan 316382 9月 12 2019 hadoop-mapreduce-examples-3.1.3.jar
[root@node1 ~]# cd
[root@node1 ~]# vim wc.txt
hello tom
andy joy
hello rose
hello joy
mark andy
hello tom
andy rose
hello joy
[root@node1 ~]# hdfs dfs -mkdir -p /wordcount/input
[root@node1 ~]# hdfs dfs -put wc.txt /wordcount/input
[root@node1 ~]# hdfs dfs -ls /wordcount/input
Found 1 items
-rw-r--r-- 3 root supergroup 80 2021-10-28 09:53 /wordcount/input/wc.txt
[root@node1 ~]# ll
-rw-r--r-- 1 root root 80 10月 28 09:52 wc.txt
[root@node1 ~]# cd -
/opt/hadoop-3.1.3/share/hadoop/mapreduce
[root@node1 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.1.3.jar wordcount /wordcount/input /wordcount/output
#出现如下bug:
Requested resource=<memory:1536, vCores:1>, maximum allowed allocation=<memory:1024, vCores:4>
默认情况下AM的请求1.5G的内存,降低am的资源请求配置项到分配的物理内存限制以内。
修改配置mapred-site.xml (四台上都要修改),修改后重启hadoop集群,重启在执行
xxxxxxxxxx
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>256</value>
</property>
<!-- 默认对mapred的内存请求都是1G,也降低和合适的值。-->
<property>
<name>mapreduce.map.memory.mb</name>
<value>256</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>256</value>
</property>
但如果太低也会出现OOM的问题。ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
xxxxxxxxxx
[root@node1 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.1.3.jar wordcount /wordcount/input /wordcount/output
2021-10-28 10:33:10,194 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2021-10-28 10:33:10,635 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1635388346175_0001
2021-10-28 10:33:10,837 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-10-28 10:33:11,853 INFO input.FileInputFormat: Total input files to process : 1
2021-10-28 10:33:11,927 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-10-28 10:33:12,042 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-10-28 10:33:12,092 INFO mapreduce.JobSubmitter: number of splits:1
2021-10-28 10:33:12,299 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-10-28 10:33:12,364 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1635388346175_0001
2021-10-28 10:33:12,364 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-10-28 10:33:12,662 INFO conf.Configuration: resource-types.xml not found
2021-10-28 10:33:12,663 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-10-28 10:33:13,177 INFO impl.YarnClientImpl: Submitted application application_1635388346175_0001
2021-10-28 10:33:13,235 INFO mapreduce.Job: The url to track the job: http://node4:8088/proxy/application_1635388346175_0001/
2021-10-28 10:33:13,235 INFO mapreduce.Job: Running job: job_1635388346175_0001
2021-10-28 10:33:22,435 INFO mapreduce.Job: Job job_1635388346175_0001 running in uber mode : false
2021-10-28 10:33:22,438 INFO mapreduce.Job: map 0% reduce 0%
2021-10-28 10:33:30,575 INFO mapreduce.Job: map 100% reduce 0%
2021-10-28 10:33:36,661 INFO mapreduce.Job: map 100% reduce 100%
2021-10-28 10:33:37,685 INFO mapreduce.Job: Job job_1635388346175_0001 completed successfully
2021-10-28 10:33:37,827 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=71
FILE: Number of bytes written=442495
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=184
HDFS: Number of bytes written=41
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5238
Total time spent by all reduces in occupied slots (ms)=3935
Total time spent by all map tasks (ms)=5238
Total time spent by all reduce tasks (ms)=3935
Total vcore-milliseconds taken by all map tasks=5238
Total vcore-milliseconds taken by all reduce tasks=3935
Total megabyte-milliseconds taken by all map tasks=2681856
Total megabyte-milliseconds taken by all reduce tasks=2014720
Map-Reduce Framework
Map input records=8
Map output records=16
Map output bytes=144
Map output materialized bytes=71
Input split bytes=104
Combine input records=16
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=71
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=411
CPU time spent (ms)=1930
Physical memory (bytes) snapshot=375353344
Virtual memory (bytes) snapshot=3779186688
Total committed heap usage (bytes)=210911232
Peak Map Physical memory (bytes)=203862016
Peak Map Virtual memory (bytes)=1874542592
Peak Reduce Physical memory (bytes)=171491328
Peak Reduce Virtual memory (bytes)=1904644096
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=80
File Output Format Counters
Bytes Written=41
*input:是hdfs文件系统中数据所在的目录
*ouput:是hdfs中不存在的目录,mr程序运行的结果会输出到该目录
xxxxxxxxxx
[root@node1 mapreduce]# hdfs dfs -ls /wordcount/output
Found 2 items
-rw-r--r-- 3 root supergroup 0 2021-10-28 10:33 /wordcount/output/_SUCCESS
-rw-r--r-- 3 root supergroup 41 2021-10-28 10:33 /wordcount/output/part-r-00000
[root@node1 mapreduce]# hdfs dfs -cat /wordcount/output/part-r-00000
andy 3
hello 5
joy 3
mark 1
rose 2
tom 2
/_SUCCESS:是信号/标志文件
/part-r-00000:是reduce输出的数据文件
r:reduce的意思,00000是对应的reduce编号,多个reduce会有多个数据文件