关于aget performance的此问题的跟进
在优化方面似乎有些奇怪的事情。我们知道以下事实是对的:
=> (def xa (int-array (range 100000)))
#'user/xa
=> (set! *warn-on-reflection* true)
true
=> (time (reduce + (for [x xa] (aget ^ints xa x))))
"Elapsed time: 42.80174 msecs"
4999950000
=> (time (reduce + (for [x xa] (aget xa x))))
"Elapsed time: 2067.673859 msecs"
4999950000
Reflection warning, NO_SOURCE_PATH:1 - call to aget can't be resolved.
Reflection warning, NO_SOURCE_PATH:1 - call to aget can't be resolved.
但是,一些进一步的试验确实使我感到奇怪:
=> (for [f [get nth aget]] (time (reduce + (for [x xa] (f xa x)))))
("Elapsed time: 71.898128 msecs"
"Elapsed time: 62.080851 msecs"
"Elapsed time: 46.721892 msecs"
4999950000 4999950000 4999950000)
无反射警告,无提示。通过将aget绑定到root var或let中,可以看到相同的行为。
=> (let [f aget] (time (reduce + (for [x xa] (f xa x)))))
"Elapsed time: 43.912129 msecs"
4999950000
知道为什么绑定的功能似乎“知道”如何优化,而核心功能却不知道吗?
最佳答案
它与:inline
上的aget
指令有关,该指令扩展为(. clojure.lang.RT (aget ~a (int ~i))
,而常规函数调用涉及Reflector
。试试这些:
user> (time (reduce + (map #(clojure.lang.Reflector/prepRet
(.getComponentType (class xa)) (. java.lang.reflect.Array (get xa %))) xa)))
"Elapsed time: 63.484 msecs"
4999950000
user> (time (reduce + (map #(. clojure.lang.RT (aget xa (int %))) xa)))
Reflection warning, NO_SOURCE_FILE:1 - call to aget can't be resolved.
"Elapsed time: 2390.977 msecs"
4999950000
那么,您可能想知道内联的目的是什么。好吧,看看这些结果:
user> (def xa (int-array (range 1000000))) ;; going to one million elements
#'user/xa
user> (let [f aget] (time (dotimes [n 1000000] (f xa n))))
"Elapsed time: 187.219 msecs"
user> (time (dotimes [n 1000000] (aget ^ints xa n)))
"Elapsed time: 8.562 msecs"
事实证明,在您的示例中,一旦收到反射警告,您的新瓶颈就是
reduce +
部分,而不是数组访问权限。此示例消除了这一点,并显示了带类型提示的内联aget
的数量级优势。