


I am trying to extract the content of a webpage A. Using groovy I've tried the following

String urlStr = "url-of-webpage-A"
String pageText = urlStr.toURL().text
//println pageText

只要不重定向到其他网页,上述代码就会检索网页A的文字B.如果A重定向到B,则在pageText变量中检索webPage B的页面内容。有没有一种方法来编写和检查webPage A是否重定向到其他网页(在groovy或java中)?

The above code retrieves the text of webPage A as long as it doesn't redirect to an other webpage B. If A redirects to B, the page content of webPage B is retrieved in the pageText variable. Is there a way to code and check if webPage A is redirecting to an other webpage (in groovy or java)?


PS: The above piece of code is not a part of server side logic. I am executing it on the client side within the scope of a desktop appilcation.



In groovy, you could do what Joachim suggests by doing:

String location = "url-of-webpage-A"
boolean wasRedirected = false
String pageContent = null

while( location ) {
  new URL( location ).openConnection().with { con ->
    // We'll do redirects ourselves
    con.instanceFollowRedirects = false

    // Get the response code, and the location to jump to (in case of a redirect)
    location = con.getHeaderField( "Location" )
    if( !wasRedirected && location ) {
      wasRedirected = true

    // Read the HTML and close the inputstream
    pageContent = con.inputStream.withReader { it.text }

println "wasRedirected:$wasRedirected contentLength:${pageContent.length()}"


If you don't want to be redirected, and want the contents of the first page, you simply need to do:

String location = "url-of-webpage-A"
String pageContent = new URL( location ).openConnection().with { con ->
  // We'll do redirects ourselves
  con.instanceFollowRedirects = false

  // Get the location to jump to (in case of a redirect)
  location = con.getHeaderField( "Location" )

  // Read the HTML and close the inputstream
  con.inputStream.withReader { it.text }

if( location ) {
  println "Page wanted to redirect to $location"
println "Content was:"
println pageContent


08-26 21:20