问题描述
想从Spark上下文('sc')删除单个数据表.我知道可以取消缓存单个缓存的表,但这与从sc删除对象不同-据我所知.
Would like to remove a single data table from the Spark Context ('sc'). I know a single cached table can be un-cached, but this isn't the same as removing an object from the sc -- as far as I can gather.
library(sparklyr)
library(dplyr)
library(titanic)
library(Lahman)
spark_install(version = "2.0.0")
sc <- spark_connect(master = "local")
batting_tbl <- copy_to(sc, Lahman::Batting, "batting")
titanic_tbl <- copy_to(sc, titanic_train, "titanic", overwrite = TRUE)
src_tbls(sc)
# [1] "batting" "titanic"
tbl_cache(sc, "batting") # Speeds up computations -- loaded into memory
src_tbls(sc)
# [1] "batting" "titanic"
tbl_uncache(sc, "batting")
src_tbls(sc)
# [1] "batting" "titanic"
要断开整个sc的连接,我将使用spark_disconnect(sc)
,但是在此示例中,它将破坏sc内存储的"titanic"表和"batt"表.
To disconnect the complete sc, I would use spark_disconnect(sc)
, but in this example it would destroy both "titanic" and "batting" tables stored inside of sc.
相反,我想删除诸如spark_disconnect(sc, tableToRemove = "batting")
之类的"batting",但这似乎是不可能的.
Rather, I would like to delete e.g., "batting" with something like spark_disconnect(sc, tableToRemove = "batting")
, but this doesn't seem possible.
推荐答案
dplyr::db_drop_table(sc, "batting")
我尝试了此功能,似乎可以正常工作.
I tried this function and it seems work.
这篇关于SparklyR从Spark上下文中删除表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!