我有两个表的数据库。 “ speechesLCMcoded”包括40万行编码文本,“ concreteness”包括80k个带分数的单词。

我编写了一个脚本,该脚本使用已解析的文本(speechesLCMcoded)查看表,删除标签(具体表)后检查另一个表中的每个单词,然后将所得的分数相加。

我是PHP的初学者,我的代码根本没有优化。我不介意我的脚本是否整天运行,但我不能将其运行一周。您如何建议我优化脚本?

我的脚本执行我需要的一切。这太慢了。

<?php
//Include functions
        include "functions.php";
        ini_set('max_execution_time', 900000);
        echo 'Time Limit = ' . ini_get('max_execution_time');

//Conecting the database
        if (!$conn) {
         die('Not connected : ' . mysql_error());}

// make LCM the current db
        mysql_select_db('senate');
        $data = mysql_query("SELECT `key`, `tagged` FROM speechesLCMcoded") or die(mysql_error());

// puts the "data" info into the $info array
        while($info = mysql_fetch_array( $data) ){
        $key=$info['key'];
        $tagged=$info['tagged'];
        unset($weight);
        unset($count);
        $weight=0;
        $count=0;

// Print out the contents of the entry
        Print "<b>Key:</b> ".$info['key'] .  " <br>";

// Explodes the sentence
        $speech = explode(" ", $tagged);

// Loop every word
        foreach($speech as $word) {

//Print each word
        //Print "<b>Key:</b> ".$word .  " <br>";

//Check if string contains our tag

if(!preg_match('/({V}|{J}|{N}|{RB})/', $word, $matches)) {} else{

//Removes our tags
        $word = str_replace("{V}", "", $word);
        $word = str_replace("{RB}", "", $word);
        $word = str_replace("{J}", "", $word);
        $word = str_replace("{N}", "", $word);
        $word = str_replace("{/V}", "", $word);
        $word = str_replace("{/RB}", "", $word);
        $word = str_replace("{/J}", "", $word);
        $word = str_replace("{/N}", "", $word);

        //print $word .  " <br>";

        //Check for the score
        $checksql = "SELECT word, score FROM concreteness WHERE word = '$word'";
        $query = mysql_query("$checksql");
        $check_count = mysql_num_rows($query);
            if($check_count > 0 ){
            $data2 = mysql_fetch_assoc($query);
            $weight=$weight+$data2['score'];
            $count=$count +1;
        //  echo $weight;
        //  print "<br>";
        //  echo $count;
        //  print "<br>";
            } else {
        //  echo"The word was NOT found.<br>";
 }   }
        }

        $sql = "UPDATE speechesLCMcoded SET weight='$weight', count='$count' WHERE `key`='$key';" ;
        $retval = mysql_query( $sql, $conn );
        if(! $retval )
        {die('Could not update data: ' . mysql_error());}
        echo "Updated data successfully\n";

}?>

最佳答案

对于来自speechsLCMcoded的每一行(400K行),您执行str_replace和sql查询。

您可以将标签删除到第一个SQL查询使用替换功能(http://dev.mysql.com/doc/refman/5.0/en/replace.html)中。您无需为每行执行exec str_replace x 8。

这是第一步。

第二步,您只能将一个查询与use join一起使用,以从两个表中获取所有数据。

10-04 10:56