我试图插入从网页中提取的文本,但未将其插入数据库。我正在使用xpath表达式提取数据,并且网页上的数据在多个html段或列表项标签内。

这是代码

<?php
set_time_limit(0);
$dbhost = "localhost";
$dbuser = "root";
$dbpass = "";
$dbname = "olx";
$conn = mysql_connect($dbhost, $dbuser, $dbpass) or die ("Error connecting to database");
mysql_select_db($dbname, $conn);
$res1 = mysql_query("SELECT * FROM `item_url` WHERE id=10");
while($r1 = mysql_fetch_array($res1))
{
    $url = $r1['url'];

    $html = file_get_contents($url);
    $doc = new DOMDocument();
    @$doc->loadHTML($html);
    $xpath = new DOMXPath($doc);

    $details = $xpath->evaluate("//div[@id='description-text']/child::div");
    foreach ($details as $detail) {
        $nodes = $detail->childNodes;
        foreach ($nodes as $node) {
        $string = $node->nodeValue;
        $string = preg_replace('/[^a-zA-Z0-9@.\-]/', ' ', $string); //allow required character
        $string = strip_tags($string); //remove html tags
        echo $string . '<br>';
        }
    }

  mysql_query("INSERT INTO `test` (`detail`) VALUES ('$string')") or die(mysql_error());
 }
 ?>


它以这种方式显示数据

Performs skilled technical work in the maintenance, repair, replacement, and installation of air conditioning systems.

Installs, troubleshoots and repairs air conditioning units.

Replaces expansion valves, compressors, motors, coil units and other component parts.

Technicians work in residential homes, schools, hospitals, office buildings, or factories.


无法将此数据插入db。这是xpath节点的问题。每行都在网页的标记内。

以下是网页的html

<div id="description-text">
    <h2 class="title-desc">
    <span>Ad details</span>
    </h2>
    <ul class="item-optionals">
    <li style="background-color: rgb(251, 251, 251);">
    </ul>
  <div style="padding-right: 30px; width: 388px;">
      <p> Performs skilled technical work in the maintenance, repair, replacement, and installation of air conditioning systems.</p>
      <p>Installs, troubleshoots and repairs air conditioning units.</p>

      <p>Replaces expansion valves, compressors, motors, coil units and other component parts.</p>

      <p>Technicians work in residential homes, schools, hospitals, office buildings, or factories.</p>

   </div>
</div>

最佳答案

您的代码很好,唯一的问题是您在循环的末尾调用了mysql_query,该循环处理html页面的单个节点,以解决该问题,足以在最内部的foreach循环中调用mysql_query了。

<?php
set_time_limit(0);
$dbhost = "localhost";
$dbuser = "root";
$dbpass = "";
$dbname = "olx";
$conn = mysql_connect($dbhost, $dbuser, $dbpass) or die ("Error connecting to database");
mysql_select_db($dbname, $conn);
$res1 = mysql_query("SELECT * FROM `item_url` WHERE id=10");
while($r1 = mysql_fetch_array($res1))
{
    $url = $r1['url'];

    $html = file_get_contents($url);
    $doc = new DOMDocument();
    @$doc->loadHTML($html);
    $xpath = new DOMXPath($doc);

    $details = $xpath->evaluate("//div[@id='description-text']/child::div");
    foreach ($details as $detail) {
        $nodes = $detail->childNodes;
        foreach ($nodes as $node) {
        $string = $node->nodeValue;
        $string = preg_replace('/[^a-zA-Z0-9@.\-]/', ' ', $string); //allow required character
        $string = strip_tags($string); //remove html tags
        echo $string . '<br>';
        mysql_query("INSERT INTO 'test' ('detail') VALUES ('$string')") or die(mysql_error());
        }
    }


}
?>

关于php - 从网页中提取的数据/文本未插入mysql db,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/23988863/

10-11 14:17
查看更多