大数据全系列 教程
1869个小节阅读:467.7k
408考研
JAVA全系列 教程
面向对象的程序设计语言
Python全系列 教程
Python3.x版本,未来主流的版本
人工智能 教程
顺势而为,AI创新未来
大厂算法 教程
算法,程序员自我提升必经之路
C++ 教程
一门通用计算机编程语言
微服务 教程
目前业界流行的框架组合
web前端全系列 教程
通向WEB技术世界的钥匙
大数据全系列 教程
站在云端操控万千数据
AIGC全能工具班
A A
White Night
在ReduceTask的run方法中:
xxxxxxxxxx
if (useNewApi) {//hadoop2.x+
runNewReducer(job, umbilical, reporter, rIter, comparator,
keyClass, valueClass);
} else {//hadoop1.x
runOldReducer(job, umbilical, reporter, rIter, comparator,
keyClass, valueClass);
}
runNewReducer():
xxxxxxxxxx
private <INKEY,INVALUE,OUTKEY,OUTVALUE>
void runNewReducer(JobConf job,
final TaskUmbilicalProtocol umbilical,
final TaskReporter reporter,
RawKeyValueIterator rIter,
RawComparator<INKEY> comparator,
Class<INKEY> keyClass,
Class<INVALUE> valueClass
) throws IOException,InterruptedException,
ClassNotFoundException {
// wrap value iterator to report progress.
final RawKeyValueIterator rawIter = rIter;
rIter = new RawKeyValueIterator() {
public void close() throws IOException {
rawIter.close();
}
public DataInputBuffer getKey() throws IOException {
return rawIter.getKey();
}
public Progress getProgress() {
return rawIter.getProgress();
}
public DataInputBuffer getValue() throws IOException {
return rawIter.getValue();
}
public boolean next() throws IOException {
boolean ret = rawIter.next();
reporter.setProgress(rawIter.getProgress().getProgress());
return ret;
}
};
// make a task context so we can get the classes
org.apache.hadoop.mapreduce.TaskAttemptContext taskContext =
new org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl(job,
getTaskID(), reporter);
// make a reducer 创建Reduce实例
org.apache.hadoop.mapreduce.Reducer<INKEY,INVALUE,OUTKEY,OUTVALUE> reducer =
(org.apache.hadoop.mapreduce.Reducer<INKEY,INVALUE,OUTKEY,OUTVALUE>)
ReflectionUtils.newInstance(taskContext.getReducerClass(), job);
org.apache.hadoop.mapreduce.RecordWriter<OUTKEY,OUTVALUE> trackedRW =
new NewTrackingRecordWriter<OUTKEY, OUTVALUE>(this, taskContext);
job.setBoolean("mapred.skip.on", isSkipping());
job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());
org.apache.hadoop.mapreduce.Reducer.Context
reducerContext = createReduceContext(reducer, job, getTaskID(),
rIter, reduceInputKeyCounter,
reduceInputValueCounter,
trackedRW,
committer,
reporter, comparator, keyClass,
valueClass);
try {
reducer.run(reducerContext);
} finally {
trackedRW.close(reducerContext);
}
}
通过taskContext.getReducerClass()获取自定义Reducer类的Class对象,然后在通过反射ReflectionUtils.newInstance()创建对应的实例。
getReducerClass():
xxxxxxxxxx
public Class<? extends Reducer<?,?,?,?>> getReducerClass()
throws ClassNotFoundException;
Ctrl+Alt+B->JobContextImpl类的getReducerClass()方法:
xxxxxxxxxx
public Class<? extends Reducer<?,?,?,?>> getReducerClass()
throws ClassNotFoundException {
return (Class<? extends Reducer<?,?,?,?>>)
conf.getClass(REDUCE_CLASS_ATTR, Reducer.class);
}
先根据常量REDUCE_CLASS_ATTR对应的字符串找配置文件中的value对应的Class,如果没有找到自定义的Reducer类,则使用默认的Reducer类。
如何设置自定义的Reducer类呢?
xxxxxxxxxx
job.setReducerClass(WCReducer.class);
底层是如何对应的?
xxxxxxxxxx
public void setReducerClass(Class<? extends Reducer> cls
) throws IllegalStateException {
ensureState(JobState.DEFINE);
conf.setClass(REDUCE_CLASS_ATTR, cls, Reducer.class);
}
设置时也是通过常量REDUCE_CLASS_ATTR进行设置的,所以读取的就是我们设置的自定义Reducer类。