if($ table ){
///基于行分解
$ rows = preg_split('#< / tr> #i',$ table);
///
foreach($ rows为$ key => $ row){
$ rows [$ key] = preg_split('#< / td> #i',$行);
$ / code $ / pre
上面应该给你类似的东西:
array(
'0'=> array(
'0'=>< td class ='标题'> 1,
'1'=>< td rowspan ='1'class ='empty'>
'2'=>< td rowspan ='5'class ='value'> 3D< br /> 009< br />< br />< br />
),
'0'=>数组(
'0'=>< td class ='heading'> 2,
'1'= >< td rowspan ='2'class ='empty'>
'2'=>< td rowspan ='3'class ='value'> Hk< br /> ;
...
),
)
现在你可以扫描每一行,并且你在preg_match一个行范围内创建一个单元格信息的副本到下面的行中(在正确的位置),所以实际创建一个完整的表结构(wi thout rowspans)。
///不能在这里使用foreach,因为我们想修改数组中的数组循环
$ lof = count($ rows); ($ rkey = 0; $ rkey< $ lof; $ rkey ++){
///拉出行
$ row = $ rows [$ rkey];
;
foreach($ row as $ ckey => $ cell){
if(preg_match('/ rowspan =。([0-9] +)./',$ cell,$ regs)) {
$ rowspan =(int)$ regs [1];
if($ rowspan> 1){
///这里有一个问题,我后来意识到我正在构建
///替换模式,看起来像'$ 14 $ 2' 。这意味着
///系统试图在偏移14处找到一个组。为了解决这个
///问题,PHP允许用{}包装组参考号。
///,所以我们现在得到'$ 1'和'$ 2'的值,并在文字数字周围插入
$ newcell = preg_replace('/(rowspan =。)[0-9] +(。 )/','$ {1}'。($ rowspan-1)。'$ {2}',$ cell);
array_splice($ rows [$ rkey + 1],$ ckey,$ newcell);
}
}
}
}
上面应该规范表格,以便rowspans不再成为问题。
(请注意以上是理论代码,我已经手动输入并且有但我很快就会这样做)
经过测试
有几个小错误与上面我已更新,即获得PHP的某些函数的参数错误的方式......排序后,它似乎工作:
///获取html
$ html = file_get_contents('http://www.cibap.nl/beheer/modules/roosters/create_rooster.php?element=CR13A& soort = KLAS&安培;周= 37安培; JAAR = 2012' );
///从无到有
$ table = $ start = $ end = false;
///'Vrijdag'应该是唯一的,但如果它出现在其他地方会失败
$ pos = strpos($ html,'Vrijdag');
///根据可靠的标签查找您的开始和结束
if($ pos!== false){
$ start = stripos($ html,'< tr> ;',$ pos);
if($ start!== false){
$ end = stripos($ html,'< / table>,$ start);
}
}
///确保我们有一个开始和结束
if($ start!== false&& $ end!==假){
///我们现在可以获取我们的表$ html;
$ table = substr($ html,$ start,$ end - $ start);
///将brs转换为不会被strip_tags移除的内容
$ table = preg_replace('#< br?/> #i',\\\
,$ table);
if($ table){
///基于行分解(close tr非常可靠)
$ rows = preg_split( '#< / tr> #i',$ table);
///分解单元格(关闭td非常可靠)
foreach($ rows as $ key => $ row){
$ rows [$ key] = preg_split('#< / td> #i',$ row);
else {
///创建所以我们避免错误
$ rows = array();
}
///将此处从foreach更改为a,因为它似乎是
/// foreach正在从$ rows的副本中进行处理,因此所有修改都是
///我们在发生循环时对$行进行了忽略。
$ lof = count($ rows); ($ rkey = 0; $ rkey< $ lof; $ rkey ++){
///拉出行
$ row = $ rows [$ rkey];
;
///将行中的每个单元格分隔
foreach($ row as $ ckey => $ cell){
///拉出rowspan值
if(preg_match ('/ rowspan =。([0-9] +)./',$ cell,$ regs)){
///如果rowspan大于1(即跨多行)
$ rowspan =(int)$ regs [1];
if($ rowspan> 1){
///然后将这个单元格复制到下一行,但是减少它的行数
///,以便当我们在下一次找到它时我们知道多少次
///它应该跨越多少次。
$ newcell = preg_replace('/(rowspan =。)([0-9] +)(。)/','$ {1}'。($ rowspan-1)。'$ {3}' ,$ cell);
array_splice($ rows [$ rkey + 1],$ ckey,0,$ newcell);
}
}
}
}
///现在终于步进标准化表格并摆脱不需要的标签
// / $同时将我们的值分割为更有用的
foreach($ rows为$ rkey => $ row){
foreach($ row as $ ckey => $ cell) {
$ rows [$ rkey] [$ ckey] = preg_split('/ \\\
+ /',trim(strip_tags($ cell)));
}
}
echo'< xmp>';
print_r($ rows);
echo'< / xmp>';
I have a schoolcalendar online, but I want to have it in my own application.Unfortunately I can't get it working with PHP and regex.
The problem is that the table cells are not divided equally and that it changes per class.You can find the schedule here and here.
The regex I tried is this:
<td rowspan='(?:[0-9]{1,3})' class='value'>(.+?)<br/>(.+?)<br/>(.+?)<br/><br/><br/></td>
But it does not work correctly!
The end array must look something like this:
[0] => Array
(
[0] => maandag //the day
[1] => 1 //lesson period
[2] => MEN, 16, dm //content of the cell
)
I hope that this question is clear enough, because I'm not an English ;)
解决方案
Good luck with this one, it's going to be tricky... just 'using a HTML parser' isn't actually going to avoid the major problem, which is the nature of a table that uses rowspans. Although whilst it is always good advice to use a HTML Parser for parsing large amounts of HTML, if you can break that HTML down into smaller, reliable chunks - then parsing using other techniques is always going to be more optimal (but obviously more prone to subtle unexpected differences in the HTML).
Normalise the table
If it were me I'd start with something that can detect where your table starts and ends (as I wouldn't want to parse the entire page even when using a HTML Parser if I don't need to):
$table = $start = $end = false;
/// 'Vrijdag' should be unique enough, but will fail if it appears elsewhere
$pos = strpos($html, 'Vrijdag');
/// find your start and end based on reliable tags
if ( $pos !== false ) {
$start = stripos($html, '<tr>', $pos);
if ( $start !== false ) {
$end = stripos($html, '</table>', $start);
}
}
if ( $start !== false && $end !== false ) {
/// we can now grab our table $html;
$table = substr($html, $start, $end - $start);
}
Then due to the haphazard way the cells are spanned vertically (but seem to be uniform horizontally) I would choose a 'day' column and work downwards.
if ( $table ) {
/// break apart based on rows
$rows = preg_split('#</tr>#i', $table);
///
foreach ( $rows as $key => $row ) {
$rows[$key] = preg_split('#</td>#i', $row);
}
}
The above should give you something like:
array (
'0' => array (
'0' => "<td class='heading'>1",
'1' => "<td rowspan='1' class='empty'>"
'2' => "<td rowspan='5' class='value'>3D<br/>009<br/>Hk<br/><br/><br/>"
...
),
'0' => array (
'0' => "<td class='heading'>2",
'1' => "<td rowspan='2' class='empty'>"
'2' => "<td rowspan='3' class='value'>Hk<br/>"
...
),
)
Now that you have that, you can scan across each row, and where you preg_match a rowspan, you'd have to create a copy of that cell's information into the row below (in the right place) so as to actually create a complete table structure (without rowspans).
/// can't use foreach here because we want to modify the array within the loop
$lof = count($rows);
for ( $rkey=0; $rkey<$lof; $rkey++ ) {
/// pull out the row
$row = $rows[$rkey];
foreach ( $row as $ckey => $cell ) {
if ( preg_match('/ rowspan=.([0-9]+)./', $cell, $regs) ) {
$rowspan = (int) $regs[1];
if ( $rowspan > 1 ) {
/// there was a gotcha here, I realised afterwards i was constructing
/// a replacement pattern that looked like this '$14$2'. Which meant
/// the system tried to find a group at offset 14. To get around this
/// problem, PHP allows the group reference numbers to be wraped with {}.
/// so we now get the value of '$1' and '$2' inserted around a literal number
$newcell = preg_replace('/( rowspan=.)[0-9]+(.)/', '${1}'.($rowspan-1).'${2}', $cell);
array_splice( $rows[$rkey+1], $ckey, $newcell );
}
}
}
}
The above should normalise the table so that the rowspans are no longer a problem.
(Please note the above is theoretical code, I've manually typed it and have yet to test it -- which I will be doing so shortly)
After testing
There were a few little bugs with the above that I have updated, namely getting php's arguments for certain functions round the wrong way... After sorting those it seems to work:
/// grab the html
$html = file_get_contents('http://www.cibap.nl/beheer/modules/roosters/create_rooster.php?element=CR13A&soort=klas&week=37&jaar=2012');
/// start with nothing
$table = $start = $end = false;
/// 'Vrijdag' should be unique enough, but will fail if it appears elsewhere
$pos = strpos($html, 'Vrijdag');
/// find your start and end based on reliable tags
if ( $pos !== false ) {
$start = stripos($html, '<tr>', $pos);
if ( $start !== false ) {
$end = stripos($html, '</table>', $start);
}
}
/// make sure we have a start and end
if ( $start !== false && $end !== false ) {
/// we can now grab our table $html;
$table = substr($html, $start, $end - $start);
/// convert brs to something that wont be removed by strip_tags
$table = preg_replace('#<br ?/>#i', "\n", $table);
}
if ( $table ) {
/// break apart based on rows (a close tr is quite reliable to find)
$rows = preg_split('#</tr>#i', $table);
/// break apart the cells (a close td is quite reliable to find)
foreach ( $rows as $key => $row ) {
$rows[$key] = preg_split('#</td>#i', $row);
}
}
else {
/// create so we avoid errors
$rows = array();
}
/// changed this here from a foreach to a for because it seems
/// foreach was working from a copy of $rows and so any modifications
/// we made to $rows while the loop was happening were ignored.
$lof = count($rows);
for ( $rkey=0; $rkey<$lof; $rkey++ ) {
/// pull out the row
$row = $rows[$rkey];
/// step each cell in the row
foreach ( $row as $ckey => $cell ) {
/// pull out our rowspan value
if ( preg_match('/ rowspan=.([0-9]+)./', $cell, $regs) ) {
/// if rowspan is greater than one (i.e. spread across multirows)
$rowspan = (int) $regs[1];
if ( $rowspan > 1 ) {
/// then copy this cell into the next row down, but decrease it's rowspan
/// so that when we find it in the next row we know how many more times
/// it should span down.
$newcell = preg_replace('/( rowspan=.)([0-9]+)(.)/', '${1}'.($rowspan-1).'${3}', $cell);
array_splice( $rows[$rkey+1], $ckey, 0, $newcell );
}
}
}
}
/// now finally step the normalised table and get rid of the unwanted tags
/// that remain at the same time split our values in to something more useful
foreach ( $rows as $rkey => $row ) {
foreach ( $row as $ckey => $cell ) {
$rows[$rkey][$ckey] = preg_split('/\n+/',trim(strip_tags( $cell )));
}
}
echo '<xmp>';
print_r($rows);
echo '</xmp>';
这篇关于HTML表格来数组PHP的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!