问题描述
在Redshift中,我运行以下命令将数据从表中卸载到S3中的文件中:
In Redshift, I run the following to unload data from a table into a file in S3:
unload('select * from table')
to 's3://bucket/unload/file_'
iam_role 'arn:aws:iam:<aws-account-id>:role/<role_name>'
我想在Python中做同样的事情-关于如何复制它的任何建议?我看到了使用访问密钥和机密的示例,但这不是我的选择-需要在非公共存储桶上使用基于角色的凭据.
I would like to do the same in Python- any suggestion how to replicate this? I saw examples using access key and secret, but that is not an option for me- need to use role based credentials on a non-public bucket.
推荐答案
您将需要两组凭据.通过IAM角色的IAM凭据访问S3存储桶,并通过Redshift ODBC凭据执行SQL命令.
You will need two sets of credentials. IAM credentials via an IAM Role to access the S3 bucket and Redshift ODBC credentials to execute SQL commands.
以类似于其他数据库(例如SQL Server)的方式,创建一个连接到Redshift的Python程序,并执行查询.该程序将需要Redshift登录凭据,而不需要IAM凭据(Redshift用户名,密码).
Create a Python program that connects to Redshift, in a manner similar to other databases such as SQL Server, and execute your query. This program will need Redshift login credentials and not IAM credentials (Redshift username, password).
将S3的IAM凭据作为角色分配给Redshift,以便Redshift可以将结果存储在S3上.这是您问题中Redshift查询的iam_role 'arn:aws:iam:<aws-account-id>:role/<role_name>'
部分.
The IAM credentials for S3 are assigned as a role to Redshift so that Redshift can store the results on S3. This is the iam_role 'arn:aws:iam:<aws-account-id>:role/<role_name>'
part of the Redshift query in your question.
除非您打算与Redshift API(实际上不访问Redshift中存储的数据库)进行实际交互,否则您不需要boto3(或boto)来访问Redshift.
You do not need boto3 (or boto) to access Redshift, unless you plan to actually interface with the Redshift API (which does not access the database stored inside Redshift).
这是访问Redshift的示例Python程序.指向此代码的链接是此处.归功于 Varun Verma
Here is an example Python program to access Redshift. The link to this code is here. Credit due to Varun Verma
Internet上还有其他示例可以帮助您入门.
There are other examples on the Internet to help you get started.
############ REQUIREMENTS ####################
# sudo apt-get install python-pip
# sudo apt-get install libpq-dev
# sudo pip install psycopg2
# sudo pip install sqlalchemy
# sudo pip install sqlalchemy-redshift
##############################################
import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
#>>>>>>>> MAKE CHANGES HERE <<<<<<<<<<<<<
DATABASE = "dbname"
USER = "username"
PASSWORD = "password"
HOST = "host"
PORT = ""
SCHEMA = "public" #default is "public"
####### connection and session creation ##############
connection_string = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = "SET search_path TO %s" % SCHEMA
s.execute(SetPath)
###### All Set Session created using provided schema #######
################ write queries from here ######################
query = "unload('select * from table') to 's3://bucket/unload/file_' iam_role 'arn:aws:iam:<aws-account-id>:role/<role_name>';"
rr = s.execute(query)
all_results = rr.fetchall()
def pretty(all_results):
for row in all_results :
print "row start >>>>>>>>>>>>>>>>>>>>"
for r in row :
print " ----" , r
print "row end >>>>>>>>>>>>>>>>>>>>>>"
pretty(all_results)
########## close session in the end ###############
s.close()
这篇关于使用IAM角色凭据使用Python卸载到S3的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!