apache-spark - 如何使用 array_contains 和 ElasticSearch 数据源进行谓词下推？

我正在尝试在 ElasticSearch 中查询一个数组

data: "names":[{"name":"allen"},{"name":"bill"},{"name":"dave"},{"name":"poter"}]
goal: "select names from table where array_contains(names.name, "bill")"

但是如果 SQL 语句使用 array_contains 函数，spark 不会做谓词下推。
hint: names.name = ["allen","bill","dave","poter"]我试过了

select * from table where array_contains(names.name,"bill")
-- and
select explode(names.name) as name from table as t1;select * from t1 where name = "bill"
-- and
select * from table where cast(names.name as string) like '%bill%'

都没有做下推，还有其他方法吗？

最佳答案

下推失败是意料之中的。对于要委托(delegate)的谓词，您需要数据源支持，并且 ElasticSearch 连接器没有在 pushed operations 中列出 array_contains ，今天包括:

= , => , < , >= , <=

is_null/is_not_null

in

String[Starts|Ends]With , StringContains

NULL 安全相等。

bool 运算符的应用 AND/OR/NOT 。

此外，任何其他转换(包括 CAST )都会禁用谓词下推。

关于apache-spark - 如何使用 array_contains 和 ElasticSearch 数据源进行谓词下推？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/47320597/