问题描述
我有以下图:
val pairs = lines.map( l => ( if (l.split(",")(1).toInt < 60) { "rest" } else if (l.split(",")(1).toInt > 110) { "sport" }, 10) ).reduceByKeyAndWindow((a:Int, b:Int) => (a+b), Seconds(12))
基本上,当一个人的HR是波纹管60,它列为休息,上述110被列为体育项目。元组重新presents的第二个变量该人已经做了10分钟。
Basically, when someone's HR is bellow 60, it's classified as rest, above 110 is classified as sport. The second variable of the tuple represents that the person has been doing it for 10 minutes.
现在分辩,这映射为60和110之间的值的空键我想是完全抛弃他们。这是怎么实现的?
Rigth now, this maps an empty key for values between 60 and 110. What I want is to completely discard them. How is that achievable?
因此,从
("rest", 30)
("sport", 120)
((),10)
我想筛选出((),10)
。
我试过
pairs.filter{case (key, value) => key.length < 3} //error: value length is not a member of Any
pairs.filter(_._1 != "") //no error, just still keeps the empty keys, too
无似乎工作。
推荐答案
您的问题是,你的如果
前pression返回或者字符串
在丢失的情况下单位
的匹配的情况下。您可以修复你的过滤器
轻松:
Your problem is that your if
expression returns either String
in case of match of Unit
in case of miss. You can fix your filter
easily:
val pairs = lines.map(
l => (if (l.split(",")(1).toInt < 60) {"rest"} else if (l.split(",")(1).toInt > 110) {"sport"}, 10))
.filter(_._1 != ())
()
在Scala是类型的身份单位
。
()
in scala is identity of type Unit
.
但是,这是不正确的做法,真的。你仍然可以得到(单位,智力)
的元组作为结果。你失去类型与此如果
语句。
But this is not the right way, really. You still get tuples of (Unit, Int)
as the result. You're losing type with this if
statement.
正确的方法是前两种来过滤数据,并详尽如果
:
The correct way is either to filter your data before and have exhaustive if
:
val pairs =
lines.map(_.split(",")(1).toInt)
.filter(hr => hr < 60 || hr > 110)
.map(hr => (if (hr < 60) "rest" else "sport", 10))
或者使用收集
,它在的快捷键 .filter.map
:
val pairs =
lines.map(_.split(",")(1).toInt)
.collect{
case hr if hr < 60 => "rest" -> 10
case hr if hr > 110 => "sport" -> 10
}
也许这变种是更具可读性。
Probably this variant is more readable.
另外,请注意我是如何移动拆分
成单独的步骤。这样做是为了避免呼吁第二分支,如果拆分
第二次。
Also, please note how I moved split
into separate step. This is done to avoid calling split
second time for second if branch.
UPD 。另一种方法是使用 flatMap
,所建议的意见:
UPD. Another approach is to use flatMap
, as suggested in comments:
val pairs =
lines.flatMap(_.split(",")(1).toInt match{
case hr if hr < 60 => Some("rest" -> 10)
case hr if hr > 110 => Some("sport" -> 10)
case _ => None
})
这可能会或可能不会是更有效,因为它避免了过滤器
的一步,但增加了包装和展开元素选项
。您可以测试不同的方法表现,告诉我们结果。
It may or may not be more efficient, as it avoids filter
step, but adds wrapping and unwrapping elements in Option
. You can test performance of different approaches and tell us the results.
这篇关于斯卡拉星火 - 放弃空键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!