问题描述
我正在使用Apache HTTP组件(4.1-alpha2)将文件上传到保管箱.这是使用多部分表单数据完成的.用包含国际(非ascii)字符的多部分格式编码文件名的正确方法是什么?
I am using Apache HTTP components (4.1-alpha2) to upload a files to dropbox. This is done using multipart form data. What is the correct way to encode filenames in in a multipart form that contain international (non-ascii) characters?
如果我在那里使用标准API,则服务器将返回HTTP状态禁止".如果我修改了上传代码,则文件名使用了urlencoded:
If I use there standard API, the server returns an HTTP status Forbidden. If I modify the upload code so the file name is urlencoded:
MultipartEntity entity = new MultipartEntity(HttpMultipartMode.BROWSER_COMPATIBLE);
FileBody bin = new FileBody(file_obj, URLEncoder.encode(file_obj.getName(), HTTP.UTF_8), HTTP.UTF_8, HTTP.OCTET_STREAM_TYPE );
entity.addPart("file", bin);
req.setEntity(entity);
文件已上传,但最终得到的文件名仍然经过编码.例如.%D1%82%D0%B5%D1%81%D1%82.txt
The file is uploaded, but I end up with a filename that is still encoded. E.g. %D1%82%D0%B5%D1%81%D1%82.txt
推荐答案
要专门针对保管箱服务器解决此问题,我必须在utf8中编码文件名.为此,我必须声明我的多部分实体,如下所示:
To solve this issue specifically for the dropbox server I had to encode the filename in utf8. To do this I had to declare my multipart entity as follows:
MultipartEntity实体=新的MultipartEntity(HttpMultipartMode.BROWSER_COMPATIBLE,空,Charset.forName(HTTP.UTF_8));
我之所以被禁止,是因为OAuth签名的实体与发送的实际实体不匹配(已被URL编码).
I was getting the forbidden because of the OAuth signed entity not matching the actual entity sent (it was being URL encoded).
对于那些对标准有什么要求感兴趣的人,我读了一些RFC.如果严格遵守该标准,则所有标头都应编码为7位,这将使文件名的utf8编码非法.但是RFC2388()指出:
For those interested on what the standards have to say on this I did some reading of RFCs.If the standard is strictly adhered then all headers should be encoded 7bit, this would make utf8 encoding of the filename illegal. However RFC2388 () states:
许多帖子提到使用rfc2231或rfc2047对7位非US-ASCII的标头进行编码.但是,rfc2047在第5.3节中明确指出,不得在Content-Disposition字段上使用编码字.这只会留下rfc2231,但这是扩展,不能依赖于在所有服务器上实现.实际上,大多数主流浏览器都以UTF-8发送非US-ASCII字符(因此在Apache HTTP客户端中为HttpMultipartMode.BROWSER_COMPATIBLE模式),因此大多数Web服务器都支持此功能.要注意的另一件事是,如果在多部分实体上使用HttpMultipartMode.STRICT,则该库实际上将用非ASCII代替文件名中的问号(?).S
Many posts mention using either rfc2231 or rfc2047 for encoding headers in non US-ASCII in 7bit. However rfc2047 explicitly states in section 5.3 encoded words MUST NOT be used on a Content-Disposition field. This would only leave rfc2231, this however is an extension and cannot be relied upon being implemented in all servers. The reality is most of the major browsers send non-US-ASCII characters in UTF-8 (hence the HttpMultipartMode.BROWSER_COMPATIBLE mode in Apache HTTP client), and because of this most web servers will support this. Another thing to note is that if you use HttpMultipartMode.STRICT on the multipart entity, the library will actually substitute non-ASCII for question mark (?) in the filename.S
这篇关于mutipart formdata中文件名中的国际字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!