大数据全系列 教程
1869个小节阅读:464.8k
JAVA全系列 教程
面向对象的程序设计语言
Python全系列 教程
Python3.x版本,未来主流的版本
人工智能 教程
顺势而为,AI创新未来
大厂算法 教程
算法,程序员自我提升必经之路
C++ 教程
一门通用计算机编程语言
微服务 教程
目前业界流行的框架组合
web前端全系列 教程
通向WEB技术世界的钥匙
大数据全系列 教程
站在云端操控万千数据
AIGC全能工具班
A A
White Night
根据需求将表分成如下三层:
xxxxxxxxxx
hive> create database baizhan_music;
OK
Time taken: 0.664 seconds
hive> use baizhan_music;
OK
Time taken: 0.053 second
基于以上逻辑表建立物理模型如下:
ODS层
TO_CLIENT_SONG_PLAY_OPERATE_REQ_D 客户端歌曲播放表(每日歌曲点播日志数据)
xxxxxxxxxx
CREATE EXTERNAL TABLE IF NOT EXISTS `TO_CLIENT_SONG_PLAY_OPERATE_REQ_D`(
`SONGID` string,
`MID` string,
`OPTRATE_TYPE` string,
`UID` string,
`CONSUME_TYPE` string,
`DUR_TIME` string,
`SESSION_ID` string,
`SONGNAME` string,
`PKG_ID` string,
`ORDER_ID` string
)
partitioned by (data_dt string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 'hdfs://mycluster/user/hive_remote/warehouse/data/song/TO_CLIENT_SONG_PLAY_OPERATE_REQ_D';
TO_SONG_INFO_D 歌库歌曲表
xxxxxxxxxx
CREATE EXTERNAL TABLE `TO_SONG_INFO_D`(
`NBR` string,
`NAME` string,
`OTHER_NAME` string,
`SOURCE` int,
`ALBUM` string,
`PRDCT` string,
`LANG` string,
`VIDEO_FORMAT` string,
`DUR` int,
`SINGER_INFO` string,
`POST_TIME` string,
`PINYIN_FST` string,
`PINYIN` string,
`SING_TYPE` int,
`ORI_SINGER` string,
`LYRICIST` string,
`COMPOSER` string,
`BPM_VAL` int,
`STAR_LEVEL` int,
`VIDEO_QLTY` int,
`VIDEO_MK` int,
`VIDEO_FTUR` int,
`LYRIC_FTUR` int,
`IMG_QLTY` int,
`SUBTITLES_TYPE` int,
`AUDIO_FMT` int,
`ORI_SOUND_QLTY` int,
`ORI_TRK` int,
`ORI_TRK_VOL` int,
`ACC_VER` int,
`ACC_QLTY` int,
`ACC_TRK_VOL` int,
`ACC_TRK` int,
`WIDTH` int,
`HEIGHT` int,
`VIDEO_RSVL` int,
`SONG_VER` int,
`AUTH_CO` string,
`STATE` int,
`PRDCT_TYPE` string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 'hdfs://mycluster/user/hive_remote/warehouse/data/song/TO_SONG_INFO_D';
EDS层
TW_SONG_BASEINFO_D 歌曲基本信息日全量表
由ODS层TO_SONG_INFO_D表数据清洗后得到。
xxxxxxxxxx
CREATE EXTERNAL TABLE `TW_SONG_BASEINFO_D`(
`NBR` string,
`NAME` string,
`SOURCE` int,
`ALBUM` string,
`PRDCT` string,
`LANG` string,
`VIDEO_FORMAT` string,
`DUR` int,
`SINGER1` string,
`SINGER2` string,
`SINGER1ID` string,
`SINGER2ID` string,
`MAC_TIME` int,
`POST_TIME` string,
`PINYIN_FST` string,
`PINYIN` string,
`SING_TYPE` int,
`ORI_SINGER` string,
`LYRICIST` string,
`COMPOSER` string,
`BPM_VAL` int,
`STAR_LEVEL` int,
`VIDEO_QLTY` int,
`VIDEO_MK` int,
`VIDEO_FTUR` int,
`LYRIC_FTUR` int,
`IMG_QLTY` int,
`SUBTITLES_TYPE` int,
`AUDIO_FMT` int,
`ORI_SOUND_QLTY` int,
`ORI_TRK` int,
`ORI_TRK_VOL` int,
`ACC_VER` int,
`ACC_QLTY` int,
`ACC_TRK_VOL` int,
`ACC_TRK` int,
`WIDTH` int,
`HEIGHT` int,
`VIDEO_RSVL` int,
`SONG_VER` int,
`AUTH_CO` string,
`STATE` int,
`PRDCT_TYPE` array<string> )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS PARQUET
LOCATION 'hdfs://mycluster/user/hive_remote/warehouse/data/song/TW_SONG_BASEINFO_D';
TW_SONG_FTUR_D 歌曲特征日统计表
xxxxxxxxxx
CREATE EXTERNAL TABLE `TW_SONG_FTUR_D`(
`NBR` string,
`NAME` string,
`SOURCE` int,
`ALBUM` string,
`PRDCT` string,
`LANG` string,
`VIDEO_FORMAT` string,
`DUR` int,
`SINGER1` string,
`SINGER2` string,
`SINGER1ID` string,
`SINGER2ID` string,
`MAC_TIME` int,
`SING_CNT` int,
`SUPP_CNT` int,
`USR_CNT` int,
`ORDR_CNT` int,
`RCT_7_SING_CNT` int,
`RCT_7_SUPP_CNT` int,
`RCT_7_TOP_SING_CNT` int,
`RCT_7_TOP_SUPP_CNT` int,
`RCT_7_USR_CNT` int,
`RCT_7_ORDR_CNT` int,
`RCT_30_SING_CNT` int,
`RCT_30_SUPP_CNT` int,
`RCT_30_TOP_SING_CNT` int,
`RCT_30_TOP_SUPP_CNT` int,
`RCT_30_USR_CNT` int,
`RCT_30_ORDR_CNT` int
)
PARTITIONED BY (data_dt string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 'hdfs://mycluster/user/hive_remote/warehouse/data/song/TW_SONG_FTUR_D';
TW_SINGER_RSI_D 歌手影响力日统计表
xxxxxxxxxx
CREATE EXTERNAL TABLE `TW_SINGER_RSI_D`(
`PERIOD` string,
`SINGER_ID` string,
`SINGER_NAME` string,
`RSI` string,
`RSI_RANK` int
)
PARTITIONED BY (data_dt string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 'hdfs://mycluster/user/hive_remote/warehouse/data/song/TW_SINGER_RSI_D';
TW_SONG_RSI_D 歌曲影响力日统计表
xxxxxxxxxx
CREATE EXTERNAL TABLE `TW_SONG_RSI_D`(
`PERIOD` string,
`NBR` string,
`NAME` string,
`RSI` string,
`RSI_RANK` int
)
PARTITIONED BY (data_dt string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 'hdfs://mycluster/user/hive_remote/warehouse/data/song/TW_SONG_RSI_D';
DM层两张表会在后期处理过程中,以SparkSQL方式每天覆盖更新到mysql表中,这里不单独建立对应的物理模型。以上各个物理表之间的流转关系如下: