问题描述
我试图从网站(html)获取信息到MATLAB。我可以使用以下命令将网上的html转换为字符串:
urlread('http://www.websiteNameHere.com ...');
一旦我有字符串,我有一个非常长的字符串变量,包含整个html文件内容。从这个变量中,我正在寻找特定类中的值/字符。例如,html /网站将有一堆行,然后将具有以下形式的感兴趣的类:
...
< h4 class =price>
< span class =priceSort> $ 39,991< / span>
< / h4>
< div class =mileage>
< span class =milesSort> 19,570 mi。< / span>
< / div>
...
< h4 class =price>
< span class =priceSort> $ 49,999< / span>
< / h4>
< div class =mileage>
< span class =milesSort> 9,000英里。< / span>
< / div>
...
我需要能够获取 我也需要知道最健壮的方法,因为我希望能够找到 使用 希望这可以帮助您 I am trying to get information from a website (html) into MATLAB. I am able to get the html from online into a string using: Once I have the string I have a very LONG string variable, containing the entire html file contents. From this variable, I am looking for the value/characters in very specific classes. For example, the html/website will have a bunch of lines, and then will have the classes of interest in the following form: I need to be able to get the information between I also need to know the most robust method, since I would like to be able to find Simple solution using strsplit Hope this helps 这篇关于Matlab文本字符串/ html解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!< span class =priceSort>
和< / span>
;即上述例子中的39,991美元和49,999美元。什么是最好的方式去做这件事?如果标签的特定开始和结束也是相同的(例如<价格>
和< / price> $ c $我也没有问题...
< span class =milesSort>
以及其他此类信息。感谢!
s = urlread('http:/ /www.websiteNameHere.com ...');
x ='class =priceSort>'; %起始字符串x
y ='class =milesSort>'; %起始字符串y
z ='< / span>'; %结束字符串z
s2 = strsplit(s,x); %开始字符串x
s3 = strsplit(s,y); %开始字符串分裂
result1 = cell(size(s2,2)-1,1); %create cell array 1
result2 = cell(size(s3,2)-1,1); %create cell array 2
%通过忽略第一个值的值循环
%(更改ind = 2:size(s2,2)到ind = 1:size(s2,2)以查看为什么)
%起始字符串x循环
用于ind = 2:size(s2,2)
m = strsplit(s2 {1,ind},z);
result1 {ind-1} = m {1,1};
end
%起始字符串y循环
用于ind = 2:size(s3,2)
m = strsplit(s3 {1,ind},z);
result2 {ind-1} = m {1,1};
end
urlread('http://www.websiteNameHere.com...');
...
<h4 class="price">
<span class="priceSort">$39,991</span>
</h4>
<div class="mileage">
<span class="milesSort">19,570 mi.</span>
</div>
...
<h4 class="price">
<span class="priceSort">$49,999</span>
</h4>
<div class="mileage">
<span class="milesSort">9,000 mi.</span>
</div>
...
<span class="priceSort">
and </span>
; ie $39,991 and $49,999 in the above example. What is the best way to go about this? If the tags were specific beginning and ends that were also the same (such as <price>
and </price>
), I would have no problem...<span class="milesSort">
and other information of this sort too. Thanks!s = urlread('http://www.websiteNameHere.com...');
x = 'class="priceSort">'; %starting string x
y = 'class="milesSort">'; %starting string y
z = '</span>'; %ending string z
s2 = strsplit(s,x); %split for starting string x
s3 = strsplit(s,y); %split for starting string y
result1 = cell(size(s2,2)-1,1); %create cell array 1
result2 = cell(size(s3,2)-1,1); %create cell array 2
%loop through values ignoring first value
%(change ind=2:size(s2,2) to ind=1:size(s2,2) to see why)
%starting string x loop
for ind=2:size(s2,2)
m = strsplit(s2{1,ind},z);
result1{ind-1} = m{1,1};
end
%starting string y loop
for ind=2:size(s3,2)
m = strsplit(s3{1,ind},z);
result2{ind-1} = m{1,1};
end