我当前的代码片段如下所示:ef execute(): ataframe = {//日志文件val hivecntxt = SparkContextLoaer.hiveContextval eventsourceTable= cusotmermetricConstants.source_table//日历信息val abc = Calenar.getInstanceabc.a(Calenar.month, -1)var 月份 = abc.get(Calenar.MONTH)var year = abc.get(Calenar.YEAR)var fileMonth = 月 + 1var monthStr = if (fileMonth我们可以在哪里编辑当前模块中的需求寻求建议 解决方案 您可以使用 filter 功能选择范围内的记录,如下所示//输入f|||||男|2018-01-01|2018-01-31|||||||||//参数startate和enateval enate=""val enate=""//过滤条件f.filter(s"start_ate>='$startate' an en_ate我希望这对您有所帮助,如果您想对过滤后的记录进行任何计算,则必须将列传递给 ufCurrently we are reaing ate using calenar instance for picking last one month recor using sparksql. Now we nee: In case of extra events being ae to previous ay we must also be able to manually insert summary start an en ates, in case we nee manual re run of job for a previous time perio:e.g: a manual re run table coul look like this:rprtng_perio_type_ summary_start_ate summary_en_ate summary_ivM 2018-01-01 2018-01-31 2018-01 2018-03-05 2018-03-05 2018-03-05This shoul tell the job to calculate a monthly summary for Jan18 an two aily summaries, one for 05 march an one for 27 marchThe job shoul take summary_start_ate summary_en_ate an ensure that only events with an event_t between those two ates are only inclue in calculations.My current coe snippet looks like:ef execute(): ataframe = { //log files val hivecntxt = SparkContextLoaer.hiveContext val eventsourceTable= cusotmermetricConstants.source_table // Calenar information val abc = Calenar.getInstance abc.a(Calenar.month, -1) var month = abc.get(Calenar.MONTH) var year = abc.get(Calenar.YEAR) var fileMonth = month + 1 var monthStr = if (fileMonth<=9) { monthStr ="0" + fileMonth.toString } else { monthStr = fileMonth.toString } //testing purpose monthStr = "11" year = 2016 val monthlyEventf = hiveContext.sql("select * from " + referenceB + " ." + eventsourceTable + "where(unix_timestamp(event_t, "yyyy-mm"))")=unix_timestamp("' +year+ "-"+"monthstr"+"',+'yyyy-MM'))") val uniquef = monthlyEventf.repartition(col("event_I")).withColumn("rank",rank().over(Winow.partitionBy("event_I").orerBy(esc("somevalue"))) val monthlyEventfinal = monthlyEventf.persist(StorageLevel.Memory_AN_ISK) return monthlyEventfinal}Where can we eit our requirement in current mouleLooking for suggestions 解决方案 You can use filter function to select recors in range like below//Input f+---+----------+----------+| |start_ate| en_ate|+---+----------+----------+| M|2018-01-01|2018-01-31|| |||| |||+---+----------+----------+//Parameter startate an enateval enate=""val enate=""//Filter conitionf.filter(s"start_ate>='$startate' an en_ate<='$enate'").show//Sample Output:+---+----------+----------+| |start_ate| en_ate|+---+----------+----------+| |||| |||+---+----------+----------+I hope this will help you, If you want to o any calculation on filtere recors then you have to pass columns to uf 这篇关于如何验证历史数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
07-25 09:02