大数据全系列 教程
1869个小节阅读:467.5k
408考研
JAVA全系列 教程
面向对象的程序设计语言
Python全系列 教程
Python3.x版本,未来主流的版本
人工智能 教程
顺势而为,AI创新未来
大厂算法 教程
算法,程序员自我提升必经之路
C++ 教程
一门通用计算机编程语言
微服务 教程
目前业界流行的框架组合
web前端全系列 教程
通向WEB技术世界的钥匙
大数据全系列 教程
站在云端操控万千数据
AIGC全能工具班
A A
White Night
缕清思路,编写伪sql语句:
xxxxxxxxxx
select u_ud,pl, 转换s_time成为yyyy-MM-dd,count(p_url) ct
from event_logs
where en='e_pv'
and p_url is not null
and pl is not null
and s_time >= ('yyyy-MM-dd')转换为时间戳
and s_time< ('yyyy-MM-dd+1')转换为时间戳
group by pl,u_ud,s_time对应的天;
编写hql语句:
xxxxxxxxxx
select u_ud,pl,from_unixtime(cast(s_time/1000 as bigint),'yyyy-MM-dd') day,count(p_url) ct
from event_logs
where en='e_pv'
and p_url is not null
and pl is not null
and s_time >= unix_timestamp('2022-01-20','yyyy-MM-dd')*1000
and s_time< unix_timestamp('2022-01-21','yyyy-MM-dd')*1000
group by pl,u_ud,from_unixtime(cast(s_time/1000 as bigint),'yyyy-MM-dd');
执行结果如下图:
下一步转换处理,先写伪hql语句:
xxxxxxxxxx
select u_ud,pl,from_unixtime(cast(s_time/1000 as bigint),'yyyy-MM-dd') day,count(p_url) ct case when
1 pv1
2 pv2
3 pv3
4 pv4
[5,10) pv5_10
[10,30) pv10_30
[30,60) pv30_60
[60,++) pv60_plus
from event_logs
where en='e_pv'
and p_url is not null
and pl is not null
and s_time >= unix_timestamp('2022-01-20','yyyy-MM-dd')*1000
and s_time< unix_timestamp('2022-01-21','yyyy-MM-dd')*1000
group by pl,u_ud,from_unixtime(cast(s_time/1000 as bigint),'yyyy-MM-dd');
编写正式的hql语句:
xxxxxxxxxx
select u_ud,pl,from_unixtime(cast(s_time/1000 as bigint),'yyyy-MM-dd') day,
(case when count(p_url)==1 then 'pv1'
when count(p_url)==2 then 'pv2'
when count(p_url)==3 then 'pv3'
when count(p_url)==4 then 'pv4'
when count(p_url)>=5 and count(p_url)<10 then 'pv5_10'
when count(p_url)>=10 and count(p_url)<30 then 'pv10_30'
when count(p_url)>=30 and count(p_url)<60 then 'pv30_60'
else 'pv60_plus' end) as pv
from event_logs
where en='e_pv'
and p_url is not null
and pl is not null
and s_time >= unix_timestamp('2022-01-20','yyyy-MM-dd')*1000
and s_time< unix_timestamp('2022-01-21','yyyy-MM-dd')*1000
group by pl,u_ud,from_unixtime(cast(s_time/1000 as bigint),'yyyy-MM-dd');
查询过如下图所示:
进一步升级hql语句:
xxxxxxxxxx
select pl,day,pv,count(u_ud) ct
from (select u_ud,pl,from_unixtime(cast(s_time/1000 as bigint),'yyyy-MM-dd') day,
(case when count(p_url)==1 then 'pv1'
when count(p_url)==2 then 'pv2'
when count(p_url)==3 then 'pv3'
when count(p_url)==4 then 'pv4'
when count(p_url)>=5 and count(p_url)<10 then 'pv5_10'
when count(p_url)>=10 and count(p_url)<30 then 'pv10_30'
when count(p_url)>=30 and count(p_url)<60 then 'pv30_60'
else 'pv60_plus' end) as pv
from event_logs
where en='e_pv'
and p_url is not null
and pl is not null
and s_time >= unix_timestamp('2022-01-20','yyyy-MM-dd')*1000
and s_time< unix_timestamp('2022-01-21','yyyy-MM-dd')*1000
group by pl,u_ud,from_unixtime(cast(s_time/1000 as bigint),'yyyy-MM-dd')) tmp
where u_ud is not null
group by pl,day,pv;
执行查询结果如下:
xxxxxxxxxx
pl day pv ct
website 2022-01-20 pv1 233
website 2022-01-20 pv2 82
website 2022-01-20 pv3 18
website 2022-01-20 pv4 7
将以上结果保存到临时表中,然后在进行行转列的操作
xxxxxxxxxx
pl day pv1 pv2 pv3 pv4 pv5_10 pv10_30 pv30_60 pv60_plus
website 2022-01-20 233 82 18 7 0 0 0 0 0