RDD_读取小文件创建RDD-【官方】百战程序员_IT在线教育培训机构


xxxxxxxxxx
//path指定小文件的路径目录
//minPartitions 最小分区数 可选参数
def wholeTextFiles(path: String,minPartitions: Int = defaultMinPartitions): RDD[(String, String)]

代码演示：


xxxxxxxxxx
package com.itbaizhan.rdd
//1.导入类
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

object CreateByWholeTextFiles {
  def main(args: Array[String]): Unit = {
    //2.构建SparkConf对象，并设置本地运行和程序名称
    val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("WholeTextFiles")
    //3.使用conf对象构建SparkContet对象
    val sc = new SparkContext(conf)
    //5.读取指定目录下的小文件
    val rdd: RDD[(String, String)] = sc.wholeTextFiles("data/tiny_files")
    //(filePath1, "内容1"),(filePath2, "内容2"),...,(filePathN, "内容N")
    val tuples: Array[(String, String)] = rdd.collect()
    tuples.foreach(ele=>println(ele._1,ele._2))
    //6.获取小文件中的内容
    val array: Array[String] = rdd.map(_._2).collect()
    println("---------------------------")
    println(array.mkString("|"))
    //4.关闭sc对象
    sc.stop()
  }
}

运行输出结果:


xxxxxxxxxx
(file:/D:/codes/itbaizhan/sparkdemo/data/tiny_files/file1.txt,hello Linux
hello Zookeper
hello Maven
hello hive
hello spark)
(file:/D:/codes/itbaizhan/sparkdemo/data/tiny_files/file2.txt,Spark Core
Spark RDD
Spark Sql)
----------------
hello Linux
hello Zookeper
hello Maven
hello hive
hello spark|Spark Core
Spark RDD
Spark Sql

RDD_读取文件创建RDD RDD_算子概述

北京市昌平区回龙观镇南店村综合商业楼2楼226室