问题描述
根据我对spark sql的调查,发现不能直接连接2个以上的表,我们必须使用子查询才能使其工作.所以我使用子查询并能够加入 3 个表:
As per the my investigation on spark sql, come to know that more than 2 tables can't be joined directly, we have to use sub query to make it work. So I am using sub query and able to join 3 tables :
使用以下查询:
"选择姓名、年龄、性别、dpi.msisdn、订阅类型、maritalStatus, isHighARPU, ipAddress, startTime, endTime, isRoaming,dpi.totalCount, dpi.website FROM (SELECT subsc.name, subsc.age,subsc.gender, subsc.msisdn, subsc.subscriptionType,subsc.maritalStatus, subsc.isHighARPU, cdr.ipAddress, cdr.startTime,cdr.endTime, cdr.isRoaming FROM SUBSCRIBER_META subsc, CDR_FACT cdr哪里 subsc.msisdn = cdr.msisdn AND cdr.isRoaming = 'Y') 温度,DPI_FACT dpi WHERE temp.msisdn = dpi.msisdn";
但是当在相同的模式下,我试图加入 4 个表,它抛出我以下异常
But when in the same pattern, i am trying to join 4 tables, It is throwing me following exception
java.lang.RuntimeException: [1.517] 失败:需要标识符
查询加入4个表:
SELECT 姓名,dueAmount FROM(SELECT 姓名,年龄,性别,dpi.msisdn,subscriptionType, maritalStatus, isHighARPU, ipAddress, startTime,endTime, isRoaming, dpi.totalCount, dpi.website FROM (SELECTsubsc.name, subsc.age, subsc.gender, subsc.msisdn,subsc.subscriptionType, subsc.maritalStatus, subsc.isHighARPU,cdr.ipAddress, cdr.startTime, cdr.endTime, cdr.isRoaming FROMSUBSCRIBER_META subsc, CDR_FACT cdr WHERE subsc.msisdn = cdr.msisdnAND cdr.isRoaming = 'Y') temp, DPI_FACT dpi WHERE temp.msisdn =dpi.msisdn) 内部,BILLING_META 计费,其中inner.msisdn =billing.msisdn
SELECT name, dueAmount FROM (SELECT name, age, gender, dpi.msisdn, subscriptionType, maritalStatus, isHighARPU, ipAddress, startTime, endTime, isRoaming, dpi.totalCount, dpi.website FROM (SELECT subsc.name, subsc.age, subsc.gender, subsc.msisdn, subsc.subscriptionType, subsc.maritalStatus, subsc.isHighARPU, cdr.ipAddress, cdr.startTime, cdr.endTime, cdr.isRoaming FROM SUBSCRIBER_META subsc, CDR_FACT cdr WHERE subsc.msisdn = cdr.msisdn AND cdr.isRoaming = 'Y') temp, DPI_FACT dpi WHERE temp.msisdn = dpi.msisdn) inner, BILLING_META billing where inner.msisdn = billing.msisdn
谁能帮我完成这个查询?
can anyone please help me making this query work?
提前致谢.错误如下:
09/02/2015 02:55:24 [ERROR] org.apache.spark.Logging$class: Error running job streaming job 1423479307000 ms.0
java.lang.RuntimeException: [1.517] failure: identifier expected
SELECT name, dueAmount FROM (SELECT name, age, gender, dpi.msisdn, subscriptionType, maritalStatus, isHighARPU, ipAddress, startTime, endTime, isRoaming, dpi.totalCount, dpi.website FROM (SELECT subsc.name, subsc.age, subsc.gender, subsc.msisdn, subsc.subscriptionType, subsc.maritalStatus, subsc.isHighARPU, cdr.ipAddress, cdr.startTime, cdr.endTime, cdr.isRoaming FROM SUBSCRIBER_META subsc, CDR_FACT cdr WHERE subsc.msisdn = cdr.msisdn AND cdr.isRoaming = 'Y') temp, DPI_FACT dpi WHERE temp.msisdn = dpi.msisdn) inner, BILLING_META billing where inner.msisdn = billing.msisdn
^
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60)
at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:73)
at org.apache.spark.sql.api.java.JavaSQLContext.sql(JavaSQLContext.scala:49)
at com.hp.tbda.rta.examples.JdbcRDDStreaming5$7.call(JdbcRDDStreaming5.java:596)
at com.hp.tbda.rta.examples.JdbcRDDStreaming5$7.call(JdbcRDDStreaming5.java:546)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:274)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:274)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
推荐答案
由于您在 sql 中使用了 Spark 的保留关键字inner"而发生异常.避免使用 Spark SQL 中的关键字 作为自定义标识符.
The exception occurred due to you have used the reserved keyword "inner" of Spark in your sql. Avoid using of Keywords in Spark SQL as custom identifier.
这篇关于Apache Spark SQL 问题:java.lang.RuntimeException:[1.517] 失败:需要标识符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!