问题描述
我有一组文本文件,每5分钟上传一次到Google云存储中.我想每5分钟将它们放入BigQuery中一次(因为每5分钟将文本文件上传到Cloud Storage中).我知道文本文件无法上传到BigQuery.最好的方法是什么?
I have a set of text files that are uploaded every 5 minutes into the google cloud storage. I want to put them into BigQuery in every 5 minutes (because text files uploaded into Cloud Storage in every 5 min). I know text files cant to be uploaded into BigQuery. What is the best approach for this?
文本文件的样本
谢谢.
推荐答案
他是一种替代方法,该方法将使用基于事件的Cloud Function
将数据加载到BigQuery中.使用"Trigger Type"
作为云存储创建云功能.将文件/文件加载到云存储分区后,它将立即调用/触发云功能事件,并将来自云存储的数据加载到BigQuery中.
He is an alternative approach, which will use an event-based Cloud Function
to load data into BigQuery. Create a cloud function with "Trigger Type"
as cloud storage. As soon as file/files loaded into cloud storage bucket, it will invoke/trigger cloud function event and data from cloud storage will be loaded into BigQuery.
import pandas as pd
from google.cloud import bigquery
def bqDataLoad(event, context):
bucketName = event['bucket']
blobName = event['name']
fileName = "gs://" + bucketName + "/" + blobName
bigqueryClient = bigquery.Client()
tableRef = bigqueryClient.dataset("bq-dataset-name").table("bq-table-name")
dataFrame = pd.read_csv(fileName)
bigqueryJob = bigqueryClient.load_table_from_dataframe(dataFrame, tableRef)
bigqueryJob.result()
这篇关于将云存储中的文本文件(.txt)加载到大查询表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!