本文介绍了如何删除从Spark数据帧创建的表中的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
基本上,我想使用SQL语句进行简单的删除,但是当我执行sql脚本时,会引发以下错误:
Basically, I would like to do a simple delete using SQL statements but when I execute the sql script it throws me the following error:
这些是我正在使用的脚本:
These is the script that I'm using:
sq = SparkSession.builder.config('spark.rpc.message.maxSize','1536').config("spark.sql.shuffle.partitions",str(shuffle_value)).getOrCreate()
adsquare = sq.read.csv(f, schema=adsquareSchemaDevice , sep=";", header=True)
adsquare_grid = adsqaureJoined.select("userid", "latitude", "longitude").repartition(1000).cache()
adsquare_grid.createOrReplaceTempView("adsquare")
sql = """
DELETE a.* FROM adsquare a
INNER JOIN codepoint c ON a.grid_id = c.grid_explode
WHERE dis2 > 1 """
sq.sql(sql)
注意:代码点表是在执行期间创建的.
Note: The codepoint table is created during the execution.
还有其他方法可以删除符合上述条件的行吗?
Is there any other way I can delete the rows with the above conditions?
推荐答案
您不能从数据框中删除行.但是您可以创建新的数据框,以排除不需要的记录.
You can not delete rows from Data Frame. But you can create new Data Frame which exclude unwanted records.
sql = """
Select a.* FROM adsquare a
INNER JOIN codepoint c ON a.grid_id = c.grid_explode
WHERE dis2 <= 1 """
sq.sql(sql)
通过这种方式,您可以创建新的数据框.在这里,我使用了逆向条件 dis2< = 1
In this way you can create new data frame. Here I used reverse condition dis2 <= 1
这篇关于如何删除从Spark数据帧创建的表中的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!