准备工作:
环境:
A.mysql5.6
mysql设置编码:
0.若是您确定你的mysql是UTF-8编码的,可以直接进入底4步验证。
1.下载https://github.com/pgq10240817/PlayNews/blob/master/conf/db/my.ini
2.把1下载的文件放到您安装目录下面,可以直接命为my.ini(默认是没有改文件的,默认是有my-default.ini).
3. 然后重启mysq打开cmd,输入services.msc 进入服务管理,定位到MySql,右键重新启动。
4.打开cmd,进入到%MYSQL%/bin(如果设置了mysql环境变量,可以略过),输入mysql,
再次输入
show variables like 'char%';
mysql> show variables like 'char%';
+--------------------------+-------------
-+
| Variable_name | Value
|
+--------------------------+-------------
-+
| character_set_client | utf8
|
| character_set_connection | utf8
|
| character_set_database | utf8
|
| character_set_filesystem | binary
|
| character_set_results | utf8
|
| character_set_server | utf8
|
| character_set_system | utf8
打开windows环境变量设置,添加系统变量:
PLAY_OPTS:-Dfile.encoding=GBK
楼主尝试过改为UTF8、UTF-8,然而还是乱码。。。。。
C.创建数据库:
运行:https://github.com/pgq10240817/PlayNews/blob/master/conf/db/database.sql
创建数据库,主要是做了下默认编码。
D.play配置:
1.数据库配置:
参考https://github.com/pgq10240817/PlayNews/blob/master/client/myNews/conf/db.conf
# Database configuration
# db.default.url="jdbc:mysql://127.0.0.1:3306/dbnews1?characterEncoding=utf8&zeroDateTimeBehavior=convertToNull"
db.default.driver=com.mysql.jdbc.Driver
db.default.url="jdbc:mysql://127.0.0.1/dbnews1"
db.default.user=root
db.default.pass="123456"
# Ebean configuration
#ebean.default="com.yhpl.model.*"
#evolutionplugin=disabled
ebean.default="models.*"
以上代码可以下载play project/conf/application.conf下面,也可以新建一个db.conf,然后application.conf加入
include "db.conf"就可以了。
2.常量配置:
参考:https://github.com/pgq10240817/PlayNews/blob/master/client/myNews/conf/http.conf
配置了网易音乐的一些常用链接,然后https://github.com/pgq10240817/PlayNews/blob/master/client/myNews/app/com/yhpl/utils/NewsUrlUtil.java
NewsUtil提供了对该类的写访问。
3.play路由配置:
https://github.com/pgq10240817/PlayNews/blob/master/client/myNews/conf/routes
暂时只是配置了如下:
GET /initChannels controllers.CaptureController.initChannels()
GET /initNews controllers.CaptureController.initNews()
其中initChannels代表初始化频道,initNews代表读取频道的value去初始化新闻。
数据分析:
数据来源:
本次数据来源是采用网易新闻的来源。总共会采集2个分类,1:新闻频道,2:频道下的新闻。
数据格式:
通过fiddler抓包,发现网易新闻的数据包格式如下:
1:频道
https://github.com/pgq10240817/PlayNews/blob/master/conf/data/fiddler/channels.txt
2:新闻
https://github.com/pgq10240817/PlayNews/blob/master/conf/data/fiddler/news.txt
其中2中的抓包需要用到1的频道ID
数据分析:
通过分析数据格式发现,获取频道的内容的URL是固定的,获取新闻需要用到3个参数,频道ID,page,pageCount。
编写Bean:
Bean类位于:
https://github.com/pgq10240817/PlayNews/tree/master/client/myNews/app/models
下Channels.java,News.java。
数据采集:
1.频道采集:
public static Result initChannels() {
NewChannalsVo chanals = (NewChannalsVo) JsonFileUtil.getGetUrlContentAsObject(NewsUrlUtil.getChannelUrl(),
NewChannalsVo.class);
if (chanals != null) {
NewChannalVo[] channelArray = chanals.gettList();
List<Channels> beans = new ArrayList<Channels>();
for (int i = 0; i < channelArray.length; i++) {
NewChannalVo jsonObj = channelArray[i];
Channels bean = new Channels();
bean.cname = jsonObj.getTname();
bean.cid = jsonObj.getTid();
bean.subnum = jsonObj.getSubnum();
beans.add(bean);
}
Ebean.beginTransaction();
for (int i = 0; i < channelArray.length; i++) {
Channels bean = beans.get(i);
Channels target = Channels.getChannelWithCname(bean.cname);
if (target == null) {
Ebean.save(bean);
} else {
System.out.println("exist -- " + target.cname);
}
}
Ebean.commitTransaction();
}
return ok("init Channels success");
}
public static Result initNews() {
Page<Channels> pageChannel = Channels.page(1, 20, "id", "asc");
if (pageChannel.getTotalRowCount() > 0) {
List<Channels> channelBeans = pageChannel.getList();
if (!CollectionUtil.isEmpty(channelBeans)) {
for (int i = 0; i < channelBeans.size(); i++) {
// A.解析数据
Channels channelBean = channelBeans.get(i);
String url = NewsUrlUtil.getChannelNewsUrlWithCidPageCount(channelBean.cid);
JsonNode node = JsonFileUtil.getGetUrlContentAsJsonNode(url);
ArrayNode arrayNodes = (ArrayNode) node.get(channelBean.cid);
Iterator<JsonNode> iter = arrayNodes.iterator();
List<News> mNews = new ArrayList<News>();
while (iter.hasNext()) {
JsonNode childNode = iter.next();
NewsVo childNews = Json.fromJson(childNode, NewsVo.class);
News news = new News();
news.cid = channelBean.cid;
news.cp = childNews.getSource();
news.icon = childNews.getImgsrc();
news.url = childNews.getUrl();
news.title = childNews.getTitle();
news.snapDetail = childNews.getDigest();
news.time = DateUtil.getDateFromString(childNews.getPtime());
mNews.add(news);
System.out.println("child:" + childNews);
}
// B.过滤数据库
CollectionUtil.trimListWithFilter(mNews, new TrimFilter<News>() {
@Override
public boolean isFilter(News t) {
return t != null && News.getNewsWithTitle(t.title) != null;
}
});
// C.POJO -> DB
if (!CollectionUtil.isEmpty(mNews)) {
System.out.println("save --- > :" + channelBean.cid);
Ebean.save(mNews);
}
}
}
}
return ok("init News success");
}
3.运行play之后,第一次初始化会要求add scripts,点击add即可。
Next:
昨天在wooyun提交的漏洞又是审核不通过,理由是:
无法联系到厂商并且问题影响不大。。。
那个道友助我一臂啊。。。
下次估计会在国庆节后更,内容主要是,编写接口共客户端调用,会有2个接口,获取频道,获取频道下的新闻。