此代码有什么问题:
Document doc = Jsoup.connect("www.dw.com/ur/مارشل-لاء-کا-مطالبہ-سازش-یا-خواہش؟/a-19395440?maca=urd-rss-urd-all-1497-xml-mrss").get();
当我尝试打开连接时,它会打开www.dw.com,但我想打开此www.dw.com/ur/مارشل-لاء-کا-مطالبہ-سازش-یا-خواہش?/ a-19395440?maca = urd-rss-urd-all-1497-xml-mrss。
我认为这是因为此网址有urdu字
您认为我该如何解决?
最佳答案
使用HttpClient和uriencoding
String url = "http://www.dw.com/ur/مارشل-لاء-کا-مطالبہ-سازش-یا-خواہش؟/a-19395440?maca=urd-rss-urd-all-1497-xml-mrss";
url = StringUtils.replaceEach(URLEncoder.encode(url, "UTF-8"), new String[]{"+", "*", "%7E"}, new String[]{"%20", "%2A", "~"})
HttpClient httpClient = HttpClientBuilder.create().build();
HttpGet httpget = new HttpGet(url);
HttpResponse response = httpClient.execute(httpget);
BasicResponseHandler bh = new BasicResponseHandler();
String res = new String(bh.handleResponse(response));
Document doc = Jsoup.parse(res);
关于java - jsoup无法连接到包含urdu单词的URL,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/38342871/