问题描述
我是Apache Spark的新手,我创建了几个RDD和DataFrames,缓存了他们,现在我想使用下面的命令来解开他们中的一些。 rddName.unpersist()
但我不记得他们的名字。我使用 sc.getPersistentRDDs
,但输出不包括名称。我也使用浏览器来查看缓存的rdds,但是再也没有名字信息。我缺少一些东西吗?
@ Dikei的答案其实是正确的,但我相信你要找的是 sc.getPersistentRDDs
:
scala> val rdd1 = sc.makeRDD(1到100)
#rdd1:org.apache.spark.rdd.RDD [int] =在...的ParallelRDR [0]在<控制台>:27
scala> val rdd2 = sc.makeRDD(10到1000)
#rdd2:org.apache.spark.rdd.RDD [Int] =在...的ParallelRDR [1],在<控制台>:27
scala> rdd2.cache.setName(rdd_2)
#res0:rdd2.type = rdd_2在<控制台>上的makeRDD的ParallelCollectionRDD [1]:27
scala> sc.getPersistentRDDs
#res1:scala.collection.Map [Int,org.apache.spark.rdd.RDD [_]] = map(1 - > rdd_2 ParallelCollectionRDD [1]在< console> :27)
scala> rdd1.cache.setName(foo)
#res2:rdd1.type = foo在<控制台>上的makeRDD的ParallelCollectionRDD [0]:27
scala> sc.getPersistentRDDs
#res3:scala.collection.Map [Int,org.apache.spark.rdd.RDD [_]] =地图(1 - > rdd_2在...的makeRDD上的ParallelCollectionRDD [1]<控制台> :27,0 - > foo在<控制台>上makeRDD处的ParallelCollectionRDD [0]:27)
现在我们再添加一个rdd并命名它:
scala> rdd3.setName(bar)
#res4:rdd3.type = bar在<控制台>上的makeRDD的ParallelCollectionRDD [2]:27
scala> sc.getPersistentRDDs
#res5:scala.collection.Map [Int,org.apache.spark.rdd.RDD [_]] = map(1 - > rdd_2 ParallelCollectionRDD [1]在< console> :27,0 - > foo在<控制台>上makeRDD处的ParallelCollectionRDD [0]:27)
我们注意到,实际上并没有持续。
我希望这有助于。
I am new to Apache Spark, I created several RDD's and DataFrames, cached them, now I want to unpersist some of them by using the command below
rddName.unpersist()
but I can't remember their names. I used sc.getPersistentRDDs
but the output does not include the names. I also used the browser to view the cached rdds but again no name information. Am I missing something?
@Dikei's answer is actually correct but I believe what you are looking for is sc.getPersistentRDDs
:
scala> val rdd1 = sc.makeRDD(1 to 100)
# rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at makeRDD at <console>:27
scala> val rdd2 = sc.makeRDD(10 to 1000)
# rdd2: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at makeRDD at <console>:27
scala> rdd2.cache.setName("rdd_2")
# res0: rdd2.type = rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27
scala> sc.getPersistentRDDs
# res1: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27)
scala> rdd1.cache.setName("foo")
# res2: rdd1.type = foo ParallelCollectionRDD[0] at makeRDD at <console>:27
scala> sc.getPersistentRDDs
# res3: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27, 0 -> foo ParallelCollectionRDD[0] at makeRDD at <console>:27)
Now let's add another rdd and name it as well :
scala> rdd3.setName("bar")
# res4: rdd3.type = bar ParallelCollectionRDD[2] at makeRDD at <console>:27
scala> sc.getPersistentRDDs
# res5: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27, 0 -> foo ParallelCollectionRDD[0] at makeRDD at <console>:27)
We noticed that actually it isn't persisted.
I hope this helps.
这篇关于Spark列出所有缓存的RDD名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!