本文介绍了如何使用谷歌的文本到语音的服务,为Android上的中国字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图拉离谷歌的文本到语音功能的音频文件。基本上,你在链接折腾,然后CONCAT无论你想在它结束时发言。我已经得到了下面的code的工作就好了英语,所以我想这个问题一定是中国汉字是如何获得连接请求codeD。下面是我得到了什么:

I'm trying to pull an audio file from google's text-to-speech function. Basically, you toss in the link and then concat whatever you want to be spoken at the end of it. I've gotten the below code to work just fine for English, so I think the problem must be how the Chinese characters are getting encoded in the request. Here's what I've got:

String text = "text to be spoken";
public static final String AUDIO_CHINESE= "http://www.translate.google.com/translate_tts?tl=zh&q=";
public static final String AUDIO_ENGLISH = "http://www.translate.google.com/translate_tts?tl=en&q=";

URL url = new URL(AUDIO_ENGLISH + text);

urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setRequestMethod("GET");
urlConnection.setRequestProperty("Accept-Charset", Variables.UTF_8);

if (urlConnection.getResponseCode() ==200) {
     //get byte array in response
     in = new DataInputStream(urlConnection.getInputStream());
} else {
     in = new DataInputStream(urlConnection.getErrorStream());
}
//use commons io
byte[] bytes = IOUtils.toByteArray(in);

in.close();
urlConnection.disconnect();

return bytes;

当我尝试这个有中国特色,虽然,它返回的东西,我不能在媒体播放器(我怀疑这是不是一个正确的音频文件作为绝大多数的字节是'85')扮演。于是,我都试过

When I try this with Chinese characters, though, it returns something that I can't get to play in the mediaplayer (I suspect it's not a proper audio file as the vast majority of bytes are '85'). So I've tried both

String chText = "你好";
URL url = new URL(AUDIO_CHINESE + URLEncoder.encode(chText, "UTF-8));

URL url = new URL(AUDIO_CHINESE + Uri.encode(chText, "UTF-8"));

和然后加

urlConnection.setRequestProperty("content-type", "application/x-www-form-urlencoded; charset=UTF-8");

要请求头。这只是使病情加重,不过,因为现在它甚至不返回200 code,而是指出FileNotFound,在logcat中。

to the request header. This just made it worse, though, because now it doesn't even return a 200 code, instead stating "FileNotFound" in logcat.

于是一时兴起,我回去,并试图将URL / URI编码与英文文本,现在的英语也不会返回一个有效的结果。不知道是怎么回事:在调试器中的原始URL工作正常,如果我复制并粘贴到浏览器,但由于某些原因的URLConnection是行不通的。觉得我缺少明显的东西。

So on a whim, I went back and tried the URL/Uri encoding with the English text, and now the English won't return a valid result either. Not sure what's going on here: the raw url in the debugger works fine if I copy and paste into Chrome, but for some reason the urlConnection just doesn't work. Feel like I'm missing something obvious.

修改

与它摆弄多一些曾透露没有回答,只是更混乱(和愤怒)。出于某种原因,当超过HttpURLConnection类发送后,谷歌TTS机器读取UTF-8%-CN codeD文本为UTF-16,至少据我可以告诉。例如,字符维(卫2)为%E7%B6%AD ,但如果你将其穿过的连接,你会得到一个宣告文件见(C,以precise)。

Fiddling with it some more has revealed no answer, just more confusion (and exasperation). For some reason, when sent over httpurlconnection, the Google tts machine reads the utf-8 percent-encoded text as utf-16, at least as far as I can tell. For example, the character "維" (wei2) is %E7%B6%AD, but if you pass it through the connection, you'll get a file that pronounces "see" ("ç", to be precise).

C,如它的出现,为 0x00E7 在UTF-16(其UTF-8%-CN codeD的版本是% C3%A7 )。我不知道为什么它在Java中,因为把适当%的在任何浏览器链接的结束将正常工作。到目前为止,我已经试过的试图让TTS阅读%E7%B6%AD 的全部没有多少成功的各种组合。

ç, as it turns out, is 0x00E7 in UTF-16 (its utf-8 percent-encoded version is %C3%A7). I have no idea why it does that in Java, because putting the appropriate % at the end of the link in any browser will work properly. Thus far, I have tried various combinations of trying to get the tts to read the entirety of %E7%B6%AD without much success.

EDIT2

解决我的问题找到了!请参阅下面的答案。这个问题是不是在编码,它是在对谷歌的最终解析。已经相应修改标题。干杯!

Solution to my problem found! See below for answer. The problem wasn't in the encoding, it was in the parsing on Google's end. Have edited the title accordingly. Cheers!

推荐答案

所以,事实证明,在最后的问题是不是所有的编码;这是在谷歌结束的处理。为了得到正确识别UTF-8的服务,您需要使用此链接 http://www.translate.google.com/translate_tts?ie=utf-8&tl=zh-cn&q = 而不是以上之一。注意即= UTF-8 添加到参数。所以,你可以只 URLEn coder.en code(你好吗,UTF-8),其追加到链接,发送它按通常。呼!

So, as it turns out, the problem at the end wasn't the encoding at all; it was the processing at Google's end. To get the service to correctly recognize UTF-8, you need to use this link http://www.translate.google.com/translate_tts?ie=utf-8&tl=zh-cn&q= instead of the one above. Note the ie=utf-8 added to the parameter. So you can just URLEncoder.encode("你好嗎", "UTF-8"), append it to the link, and send it up as per usual. Whew!

这篇关于如何使用谷歌的文本到语音的服务,为Android上的中国字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 19:46
查看更多