本文介绍了Big Query 作业因“遇到错误字符 (ASCII 0)"而失败.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的工作因错误而失败

行:14222274/字段:1,遇到错误字符(ASCII 0).其余文件未处理.

数据已压缩,我已验证文件中不存在 ASCII 0 字符.文件中只有 14222273 行,因此错误消息中打印的行号是文件末尾后的一行.我有来自同一数据集的其他块已成功上传,所以我怀疑这是一个 BQ 错误,或者错误消息并不表示潜在的问题.任何解决此问题的帮助将不胜感激.谢谢.

>>>data = open("data.csv").read()>>>数据中的 chr(0)错误的>>>数据[-1]'
'
解决方案

我遇到了类似的问题,试图在 BigQuery 中加载一个压缩文件(保存在 Google Cloud Storage 中).这些是日志:

文件:0/偏移量:4563403089/行:328480/字段:21:遇到错误字符(ASCII 0):字段开头为:(错误代码:无效)文件:0/偏移量:4563403089/行:328485/字段:21:遇到错误字符(ASCII 0):字段开头为:(错误代码:无效)文件:0/偏移量:4563403089/行:328490/字段:21:遇到错误字符(ASCII 0):字段开头为:(错误代码:无效)文件:0/偏移量:4563403089/行:328511/字段:21:遇到错误字符(ASCII 0):字段开头为:(错误代码:无效)文件:0/偏移量:4563403089/行:328517/字段:21:遇到错误字符(ASCII 0):字段开头为:(错误代码:无效)

为了解决这个问题,我所做的是从压缩文件中删除 ASCII 0 字符.为此,我从安装了 sdk 的 Compute Engine 实例中执行了以下命令:

gsutil cp gs://bucket_987234/compress_file.gz - |gunzip |tr -d '00' |gsutil cp - gs://bucket_987234/uncompress_and_clean_file

通过使用管道,我避免了将所有存储空间都放在硬盘上(1G 压缩 + 52G 解压缩).第一个程序从 Storage 中获取压缩文件,第二个程序对其进行解压缩,第三个程序删除 ASCII 0 字符,第四个程序将结果上传到 Storage.

当我再次上传到 Storage 时,我不会压缩结果,因为对于 BigQuery 来说,加载未压缩的文件会更快.之后,我可以毫无问题地在 BigQuery 上加载数据.

I have a job that is failing with the error

The data is compressed and I have verified that no ASCII 0 character exists in the file. There are only 14222273 lines in the file, so the line number that is printed in the error message is one line past the end of the file. I have other chunks from the same data set which have uploaded successfully, so I suspect that this is either a BQ bug, or the error message is not indicative of the underlying issue. Any help solving this problem would be appreciated. Thanks.

>>> data = open("data.csv").read()
>>> chr(0) in data
False
>>> data[-1]
'
'
解决方案

I had similar problems, trying to load in BigQuery a compressed file (saved it in Google Cloud Storage). These are the logs:

File: 0 / Offset:4563403089 / Line:328480 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328485 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328490 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328511 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328517 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)

For resolve the problem what I've done is remove the ASCII 0 characters from the compressed file. To do it, I executed the following commnad from an instance of Compute Engine with the sdk installed:

gsutil cp gs://bucket_987234/compress_file.gz - | gunzip | tr -d '00' | gsutil cp - gs://bucket_987234/uncompress_and_clean_file

By using pipelines, I avoid having to have all storage on the hard disk (1G compress + 52G uncompress). The first program gets the compressed file from Storage, the second decompresses it, the thrid removes the ASCII 0 characters and the fourth program updaloads the result to Storage.

I don't compress the result when I upload again to Storage, because for BigQuery is faster load a uncompressed file. After that I can load on BigQuery the data without problems.

这篇关于Big Query 作业因“遇到错误字符 (ASCII 0)"而失败.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-29 11:44