Matlab文本字符串/ html解析 | Matlab文本字符串

本文介绍了Matlab文本字符串/ html解析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图从网站（html）获取信息到MATLAB。我可以使用以下命令将网上的html转换为字符串：

  urlread（'http://www.websiteNameHere.com ...'）;

一旦我有字符串，我有一个非常长的字符串变量，包含整个html文件内容。从这个变量中，我正在寻找特定类中的值/字符。例如，html /网站将有一堆行，然后将具有以下形式的感兴趣的类：

  ... 
< h4 class =price> 
< span class =priceSort> $ 39,991< / span> 
< / h4> 
< div class =mileage> 
< span class =milesSort> 19,570 mi。< / span> 
< / div> 
 ... 
< h4 class =price> 
< span class =priceSort> $ 49,999< / span> 
< / h4> 
< div class =mileage> 
< span class =milesSort> 9,000英里。< / span> 
< / div> 
 ...

我需要能够获取< span class =priceSort> 和< / span> ;即上述例子中的39,991美元和49,999美元。什么是最好的方式去做这件事？如果标签的特定开始和结束也是相同的（例如<价格> 和< / price>

我也需要知道最健壮的方法，因为我希望能够找到< span class =milesSort> 以及其他此类信息。感谢！

解决方案

使用

  s = urlread（'http：/ /www.websiteNameHere.com ...'）; 
 
 x ='class =priceSort>'; ％起始字符串x 
 y ='class =milesSort>'; ％起始字符串y 
 z ='< / span>'; ％结束字符串z 
 
 s2 = strsplit（s，x）; ％开始字符串x 
 s3 = strsplit（s，y）; ％开始字符串分裂
 
 result1 = cell（size（s2,2）-1,1）; ％create cell array 1 
 result2 = cell（size（s3,2）-1,1）; ％create cell array 2 
 
％通过忽略第一个值的值循环
％（更改ind = 2：size（s2,2）到ind = 1：size（s2,2）以查看为什么）
 
％起始字符串x循环
用于ind = 2：size（s2,2）
m = strsplit（s2 {1，ind}，z）; 
 result1 {ind-1} = m {1,1}; 
 end 
 
％起始字符串y循环
用于ind = 2：size（s3,2）
 m = strsplit（s3 {1，ind}，z）; 
 result2 {ind-1} = m {1,1}; 
 end

希望这可以帮助您

I am trying to get information from a website (html) into MATLAB. I am able to get the html from online into a string using: urlread('http://www.websiteNameHere.com...'); Once I have the string I have a very LONG string variable, containing the entire html file contents. From this variable, I am looking for the value/characters in very specific classes. For example, the html/website will have a bunch of lines, and then will have the classes of interest in the following form: ... <h4 class="price"> <span class="priceSort">$39,991</span> </h4> <div class="mileage"> <span class="milesSort">19,570 mi.</span> </div> ... <h4 class="price"> <span class="priceSort">$49,999</span> </h4> <div class="mileage"> <span class="milesSort">9,000 mi.</span> </div> ... I need to be able to get the information between <span class="priceSort"> and </span>; ie $39,991 and $49,999 in the above example. What is the best way to go about this? If the tags were specific beginning and ends that were also the same (such as <price> and </price>), I would have no problem... I also need to know the most robust method, since I would like to be able to find <span class="milesSort"> and other information of this sort too. Thanks! 解决方案 Simple solution using strsplit s = urlread('http://www.websiteNameHere.com...'); x = 'class="priceSort">'; %starting string x y = 'class="milesSort">'; %starting string y z = '</span>'; %ending string z s2 = strsplit(s,x); %split for starting string x s3 = strsplit(s,y); %split for starting string y result1 = cell(size(s2,2)-1,1); %create cell array 1 result2 = cell(size(s3,2)-1,1); %create cell array 2 %loop through values ignoring first value %(change ind=2:size(s2,2) to ind=1:size(s2,2) to see why) %starting string x loop for ind=2:size(s2,2) m = strsplit(s2{1,ind},z); result1{ind-1} = m{1,1}; end %starting string y loop for ind=2:size(s3,2) m = strsplit(s3{1,ind},z); result2{ind-1} = m{1,1}; end Hope this helps 这篇关于Matlab文本字符串/ html解析的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！