除了我知道它要求从服务器读取网页之外,我在理解HTTP GET请求的概念方面有些困难。今天,我写了一个类,尝试使用HTTP GET Request来访问网页上的html资料。让我加入课程并解释我的困惑:

    import java.io.*;
import java.net.*;

public class HTMLFetcher
{
    private static final int PORT = 80;
    private URL url;


    public HTMLFetcher(String url) throws Exception // url = http://www.-----.com/birds.html
    {
        this.url = new URL(url);
        fetch(this.url.getHost());
    }

    private  String createRequest(URL url) { // Is there a problem with this request?
        String request = "GET" + "/index.html" + "HTTP/1.1\n";
        request += "Host: www.cs.usfca.edu\n";
        request += "Connection: close";
        request += "\r\n";
        return request;
        }

    public void fetch(String urlDomain) throws Exception {

        System.out.println(urlDomain + ":" + PORT);

        // TODO: create a new socket here for a given urlDomain and a given PORT
        Socket socket = new Socket(urlDomain, PORT);

        // TODO: create PrintWriter for the socket's output stream
        PrintWriter writer =
                new PrintWriter(new OutputStreamWriter(socket.getOutputStream()));

        BufferedReader reader =
                new BufferedReader(new InputStreamReader(socket.getInputStream()));

        String request = createRequest(urlDomain); // createRequest is complaining       that it is a string and not a URL
        System.out.println(request);
        writer.write(request);
        writer.flush();

        StringBuilder string = new StringBuilder();
        boolean htmlFound = false;
        String line;
        while ((line = reader.readLine()) != null) {
            if (!htmlFound) {
                if (line.toLowerCase().startsWith("<html>")) {
                    htmlFound = true;
                } else {
                    continue;
                }
            }
            System.out.println("This is each line: " + line);
            string.append(line + "\n");
        }

        reader.close();
        writer.close();
        socket.close();

        //System.out.println(string.toString());
        System.out.println("[done]");
    }
    }


因此,基本上我对如何在期望URL时将String urlDomain发送到createRequest方法感到困惑? HTTP请求是否需要createMethod参数?我是否正确设置了请求?

现在,它正在输出以下内容:

www.cs.usfca.edu:80
GET/index.htmlHTTP/1.1
Host: www.cs.usfca.edu
Connection: close

This is each line: <html><head>
This is each line: <title>501 Method Not Implemented</title>
This is each line: </head><body>
This is each line: <h1>Method Not Implemented</h1>
This is each line: <p>GET/index.htmlHTTP/1.1 to /index.html not supported.<br />
This is each line: </p>
This is each line: <hr>
This is each line: <address>Apache/2.2.15 (CentOS) Server at www.cs.usfca.edu Port 80</address>
This is each line: </body></html>
[done]


感谢您的帮助。请让我知道是否可以更具体。谢谢。

最佳答案

据我了解,当网站位于共享托管服务器上时,将使用请求中的主机标头,其中多个域将被映射到同一ip,并且服务器需要Host标头来标识请求所针对的虚拟服务器。被路由。因此,最好将其包含在请求中。

顺便说一句,在当前代码中,请求字符串中没有空格。这就是为什么您得到错误html作为响应的原因。

private String createRequest(String url) { // Is there a problem with this request?
    String request = "GET " + "/ " + "HTTP/1.1\r\n";
    request += "Host: www.cs.usfca.edu\n";
    request += "\r\n";
    return request;
}


另外,不要这样检查

if (line.toLowerCase().startsWith("<html>"))


改为使用

if (line.toLowerCase().startsWith("<html"))


顺便说一句,为什么你必须要努力呢?改为使用HTTPUrlConnection。

10-01 14:11
查看更多