从JDBC连接读取时如何使用谓词?

本文介绍了从JDBC连接读取时如何使用谓词?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

默认情况下，spark_read_jdbc()将整个数据库表读入Spark.我使用以下语法创建这些连接.

By default, spark_read_jdbc() reads an entire database table into Spark. I've used the following syntax to create these connections.

library(sparklyr)
library(dplyr)

config <- spark_config()
config$`sparklyr.shell.driver-class-path` <- "mysql-connector-java-5.1.43/mysql-connector-java-5.1.43-bin.jar"

sc <- spark_connect(master         = "local",
                    version        = "1.6.0",
                    hadoop_version = 2.4,
                    config         = config)

db_tbl <- sc %>%
  spark_read_jdbc(sc      = .,
                  name    = "table_name",
                  options = list(url      = "jdbc:mysql://localhost:3306/schema_name",
                                 user     = "root",
                                 password = "password",
                                 dbtable  = "table_name"))

但是，我现在遇到了这样的情况:我在MySQL数据库中有一个表，我希望只将该表的一部分读入Spark.

However, I've now encountered the scenario where I have a table in a MySQL database and I would prefer to only read in a subset of this table into Spark.

如何获取spark_read_jdbc来接受谓词?我尝试将谓词添加到选项列表中没有成功，

How do I get spark_read_jdbc to accept a predicate? I've tried adding the predicate to the options list without success,

db_tbl <- sc %>%
  spark_read_jdbc(sc      = .,
                  name    = "table_name",
                  options = list(url      = "jdbc:mysql://localhost:3306/schema_name",
                                 user       = "root",
                                 password   = "password",
                                 dbtable    = "table_name",
                                 predicates = "field > 1"))

推荐答案

您可以将dbtable替换为查询:

db_tbl <- sc %>%
  spark_read_jdbc(sc      = .,
              name    = "table_name",
              options = list(url      = "jdbc:mysql://localhost:3306/schema_name",
                             user     = "root",
                             password = "password",
                             dbtable  = "(SELECT * FROM table_name WHERE field > 1) as my_query"))

但在这种简单情况下，Spark会在过滤时自动将其推送:

but with simple condition like this Spark should push it automatically when you filter:

db_tbl %>% filter(field > 1)

只需确保设置:

memory = FALSE

在spark_read_jdbc中.

这篇关于从JDBC连接读取时如何使用谓词?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！