我有一个按州和县计数的数据集,我想按州和县计算中位数和平均值,例如:
有:
ID state county count
1 MD aa 2
2 MD aa 4
3 VA bb 1
4 VA bb 2
5 VA bb 4
6 VA cc 7
7 VA cc 8
想:
到目前为止,给我的错误是:
Select id, STATE,COUNTY,count,
percentile(cast(count as BIGINT), 0.5) OVER() as overall_median,
round(avg(count),2) OVER() as overall_avg,
percentile(cast(count as bigint),0.5) OVER(PARTITION BY id,STATE) as med_state,
percentile(cast(count as bigint),0.5) as med_county,
AVG(count) OVER (PARTITION BY id, STATE) as avg_state,
AVG(count) AS avg_county,
from have
group by id, state, county
不使用组时收到错误:
不带组的代码:
Select id, STATE,county,count,
percentile(cast(count as BIGINT), 0.5) OVER() as overall_median,
round(avg(count),2) OVER() as overall_avg,
percentile(cast(count as bigint),0.5) OVER(PARTITION BY id,STATE) as med_state,
percentile(cast(count as bigint),0.5) OVER(PARTITION BY id,STATE,county) as med_county,
AVG(count) OVER (PARTITION BY id, STATE) as avg_state,
AVG(count) OVER (PARTITION BY id, STATE, county) as avg_county,
from have
谢谢!
最佳答案
修复:回合(avg(count)OVER(),2)
select
id, STATE, county, count,
percentile(cast(count as BIGINT), 0.5) OVER() as overall_median,
round(avg(count) OVER(), 2) as overall_avg,
percentile(cast(count as bigint), 0.5) OVER(PARTITION BY id, STATE) as med_state,
percentile(cast(count as bigint), 0.5) OVER(PARTITION BY id, STATE, county) as med_county,
AVG(count) OVER (PARTITION BY id, STATE) as avg_state,
AVG(count) OVER (PARTITION BY id, STATE, county) as avg_county
from
have
提示:请勿将关键字(即计数)用作列名-将来会遇到很多问题
关于hadoop - Hive按组计算中位数和平均值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/60420645/