除了我知道它要求从服务器读取网页之外,我在理解HTTP GET请求的概念方面有些困难。今天,我写了一个类,尝试使用HTTP GET Request来访问网页上的html资料。让我加入课程并解释我的困惑:
import java.io.*;
import java.net.*;
public class HTMLFetcher
{
private static final int PORT = 80;
private URL url;
public HTMLFetcher(String url) throws Exception // url = http://www.-----.com/birds.html
{
this.url = new URL(url);
fetch(this.url.getHost());
}
private String createRequest(URL url) { // Is there a problem with this request?
String request = "GET" + "/index.html" + "HTTP/1.1\n";
request += "Host: www.cs.usfca.edu\n";
request += "Connection: close";
request += "\r\n";
return request;
}
public void fetch(String urlDomain) throws Exception {
System.out.println(urlDomain + ":" + PORT);
// TODO: create a new socket here for a given urlDomain and a given PORT
Socket socket = new Socket(urlDomain, PORT);
// TODO: create PrintWriter for the socket's output stream
PrintWriter writer =
new PrintWriter(new OutputStreamWriter(socket.getOutputStream()));
BufferedReader reader =
new BufferedReader(new InputStreamReader(socket.getInputStream()));
String request = createRequest(urlDomain); // createRequest is complaining that it is a string and not a URL
System.out.println(request);
writer.write(request);
writer.flush();
StringBuilder string = new StringBuilder();
boolean htmlFound = false;
String line;
while ((line = reader.readLine()) != null) {
if (!htmlFound) {
if (line.toLowerCase().startsWith("<html>")) {
htmlFound = true;
} else {
continue;
}
}
System.out.println("This is each line: " + line);
string.append(line + "\n");
}
reader.close();
writer.close();
socket.close();
//System.out.println(string.toString());
System.out.println("[done]");
}
}
因此,基本上我对如何在期望URL时将String urlDomain发送到createRequest方法感到困惑? HTTP请求是否需要createMethod参数?我是否正确设置了请求?
现在,它正在输出以下内容:
www.cs.usfca.edu:80
GET/index.htmlHTTP/1.1
Host: www.cs.usfca.edu
Connection: close
This is each line: <html><head>
This is each line: <title>501 Method Not Implemented</title>
This is each line: </head><body>
This is each line: <h1>Method Not Implemented</h1>
This is each line: <p>GET/index.htmlHTTP/1.1 to /index.html not supported.<br />
This is each line: </p>
This is each line: <hr>
This is each line: <address>Apache/2.2.15 (CentOS) Server at www.cs.usfca.edu Port 80</address>
This is each line: </body></html>
[done]
感谢您的帮助。请让我知道是否可以更具体。谢谢。
最佳答案
据我了解,当网站位于共享托管服务器上时,将使用请求中的主机标头,其中多个域将被映射到同一ip,并且服务器需要Host
标头来标识请求所针对的虚拟服务器。被路由。因此,最好将其包含在请求中。
顺便说一句,在当前代码中,请求字符串中没有空格。这就是为什么您得到错误html作为响应的原因。
private String createRequest(String url) { // Is there a problem with this request?
String request = "GET " + "/ " + "HTTP/1.1\r\n";
request += "Host: www.cs.usfca.edu\n";
request += "\r\n";
return request;
}
另外,不要这样检查
if (line.toLowerCase().startsWith("<html>"))
改为使用
if (line.toLowerCase().startsWith("<html"))
顺便说一句,为什么你必须要努力呢?改为使用HTTPUrlConnection。