下面的代码同时检查meta属性REFRESH和javascript重定向...如果其中任何一个存在 RedirectedUrl
String RedirectedUrl = null;
元素meta = page.select(html head meta);
RedirectedUrl = meta.attr(content)。split(=)[1];
} else {
meta = page.select(script); (元素脚本:元){
String s = script.data();
if(!s.isEmpty()&& s.startsWith(window.location.href)){
int start = s.indexOf(=);
int end = s.indexOf(;);
if(start> 0&& end> start){
s = s.substring(start + 1,end);
s = s.replace(',).replace(\,);
RedirectedUrl = s.trim();
I originally had this question:
Having trouble fetching the proper site in Java (second word for website search query gets cut off)
Basically, when I searched a website for an item with two words, for example "summer clothes" I would be redirected to a search with just "summer". From that answer suspect that it's because Sears uses javascript to redirect and Jsoup does not support javascript redirecting, so I was wondering if there is any way to fetch that website while still using Jsoup.
The code below checks both for meta attribute "REFRESH" and javascript redirects... If either of them exists RedirectedUrl
variable is set. So you know your target...
String RedirectedUrl=null;
Elements meta = page.select("html head meta");
if (meta.attr("http-equiv").contains("REFRESH")) {
RedirectedUrl = meta.attr("content").split("=")[1];
} else {
if (page.toString().contains("window.location.href")) {
meta = page.select("script");
for (Element script:meta) {
String s = script.data();
if (!s.isEmpty() && s.startsWith("window.location.href")) {
int start = s.indexOf("=");
int end = s.indexOf(";");
if (start>0 && end >start) {
s = s.substring(start+1,end);
s =s.replace("'", "").replace("\"", "");
RedirectedUrl = s.trim();
... now retrieve the redirected page again...