我有一位教授要求我们删除HTML标记(中的任何内容)而不使用removeAll方法。
我目前有这个:
public static void main(String[] args)
throws FileNotFoundException {
Scanner input = new Scanner(new File("src/HTML_1.txt"));
while (input.hasNext())
{
String html = input.next();
System.out.println(stripHtmlTags(html));
}
}
static String stripHtmlTags(String html)
{
int i;
String[] str = html.split("");
String s = "";
boolean tag = false;
for (i = html.indexOf("<"); i < html.indexOf(">"); i++)
{
tag = true;
}
if (!tag)
{
for (i = 0; i < str.length; i++)
{
s += str[i];
}
}
return s;
}
这是文件内部的内容:
<html>
<head>
<title>My web page</title>
</head>
<body>
<p>There are many pictures of my cat here,
as well as my <b>very cool</b> blog page,
which contains <font color="red">awesome
stuff about my trip to Vegas.</p>
Here's my cat now:<img src="cat.jpg">
</body>
</html>
输出结果如下所示:
My web page
There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.
Here's my cat now:
最佳答案
String
在Java中是不可变的+您从不显示任何内容
我建议您在close
完成后使用Scanner
(作为最佳做法),并从用户的HOME目录中读取HTML_1.txt
文件。 close
的最简单方法是try-with-resources
public static void main(String[] args) {
try (Scanner input = new Scanner(new File(
System.getProperty("user.home"), "HTML_1.txt"))) {
while (input.hasNextLine()) {
String html = stripHtmlTags(input.nextLine().trim());
if (!html.isEmpty()) { // <-- removes empty lines.
System.out.println(html);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
因为
String
是不可变的,所以我建议使用StringBuilder
删除HTML标签,例如static String stripHtmlTags(String html) {
StringBuilder sb = new StringBuilder(html);
int open;
while ((open = sb.indexOf("<")) != -1) {
int close = sb.indexOf(">", open + 1);
sb.delete(open, close + 1);
}
return sb.toString();
}
当我运行上面的我得到
My web page
There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.
Here's my cat now: