本文介绍了通过Google翻译批量翻译大量记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将大量的记录从各种语言(事先不知道语言)翻译成英语.该集合大约有3M条记录,每条记录都是相当短的文本.它不是详尽的文本,主要只是项目描述.诸如诺基亚黑的移动路由器3G"之类的东西,以各种语言编写(虽然没有什么特别的东西,主要是德语,法语,阿拉伯语,俄语等).我也不知道每个记录是用哪种语言写的,所以我需要依靠自动语言检测.

I need to translate a rather large set of records from various languages (language is not known in advance) to English. The set is about 3M records, and each record is a rather short text. It is not elaborate texts, mostly just item descriptions. Something like 'Mobile router 3G by Nokia Black', written in all kinds of languages (nothing too exotic though, mostly it is German, French, Arabic, Russian etc). I also don't know in advance in which language each record is written so I need to rely on automatic language detection.

到目前为止,我可以使用 Google Cloud来完成此任务API

As of now, I am able to achieve this task by using Google Cloud API

这非常简单,我一次只将一条记录传递给API,而无需指定源语言,并且能够根据需要正确翻译记录.

It is rather straightforward, I just pass over one record at a time to API without specifying source language and it is able to properly translate records as needed.

这里的问题是该过程非常缓慢.我们选择单个文本字符串,连接到API,发送过来,获取结果并存储.每次记录处理都会增加与API进行通信的开销,并且当您翻译数百万条记录时,这将花费很长时间.

The issue here is that the process is painfully slow. We pick up single text string, connect to API, send it over, get the result and store. Each record processing introduces significant overhead on communicating with API and when you perform a translation of several millions of records it takes really long time.

我想知道有什么方法可以批量执行此操作吗?也许一次发送大量字符串记录进行翻译,以最大程度地减少与Google API通信的开销?还是有某种方法可以将文件和我需要翻译并下载的所有记录直接上传到Google(如果可用)?

I was wondering is there any way to perform this operation in bulk? Maybe sending over a lot of string records at a time for translation to minimize overhead on communication with google API? Or maybe there is some way to directly upload a file to Google with all of the records I need to translate and download the result when it is available?

推荐答案

我认为 Cloud Translation API 服务目前不支持批量和文件翻译请求.基于此,您可以使用 GCP客户端库开发解决方案.a>,将您的文本字符串连接到单个分隔的字符串记录中;这样,您可以在同一调用中转换多个值.将完整的字符串翻译成所需的语言后,您可以根据定界符值对其进行拆分,以获取一个由多个文本字符串分隔的数组.请记住,建议为每个请求发送少于5000 ,以避免性能问题

I think that the Cloud Translation API service doesn't currently support bulk and file translations requests. Based on this, you could develop a solution, by using the GCP Client Libraries, that concatenates your text strings in a single delimited string record; In this way, you can translate several values within the same call. Once you have the full string translated to the desired language, you can split it based on the delimiter value to get an array of the text strings separated. Keep in mind that it is recommended to send less than 5000 per request in order to avoid performance issues.

如果此解决方法不能满足您当前的需求,则可以使用发送反馈按钮,该按钮位于服务公共文档,并查看问题跟踪器工具,以便提出Translation API功能请求,并将此所需功能通知Google.

In case this workaround doesn't cover your current needs, you can use the Send Feedback button, located at the lower left and upper right corners of the service public documentation, as well as take a look the Issue Tracker tool in order to raise a Translation API feature request and notify to Google about this desired functionality.

这篇关于通过Google翻译批量翻译大量记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-15 15:27
查看更多