val df = sc.parallelize(Seq((1,"Emailab"), (2,"Phoneab"), (3, "Faxab"),(4,"Mail"),(5,"Other"),(6,"MSL12"),(7,"MSL"),(8,"HCP"),(9,"HCP12"))).toDF("c1","c2")
+---+-------+
| c1| c2|
+---+-------+
| 1|Emailab|
| 2|Phoneab|
| 3| Faxab|
| 4| Mail|
| 5| Other|
| 6| MSL12|
| 7| MSL|
| 8| HCP|
| 9| HCP12|
+---+-------+
我想过滤出具有列“ c2”的前3个字符“ MSL”或“ HCP”的记录。
因此输出应如下所示。
+---+-------+
| c1| c2|
+---+-------+
| 1|Emailab|
| 2|Phoneab|
| 3| Faxab|
| 4| Mail|
| 5| Other|
+---+-------+
有人可以帮忙吗?
我知道
df.filter($"c2".rlike("MSL"))
-这是用于选择记录,但如何排除记录。 ?版本:Spark 1.6.2
斯卡拉:2.10
最佳答案
df.filter(not(
substring(col("c2"), 0, 3).isin("MSL", "HCP"))
)