问题描述
很抱歉重复这个问题,但是在这里我试图更详细地解释它。
我需要解析某些文件中的数据,并将其存储到数据库(MySQL)。数据在文件中的显示方式如下:
Sorry for duplicating this question, but here I tried to explain it in more details.I need to parse the data from certain file and store it to database (MySQL). This is how the data is displayed in the file:
戚谊
戚誼
[m1][b]qīyì[/b][/m]
[m2]translation 1[/m]
[m1][b]qīyi[b][/m]
[m2]translation 2[/m]
三州府
[m1][b]sānzhōufǔ[/b][/m]
[m2]translation of other character[/m]
etc.
第一行和第二行代表相同的字符,但第一行是简化的第二行是传统字符。我需要将它们分别存储到 ch_simplified
和 ch_trad
列中。
The first and the second line represent the same character, but the first line is a simplified and the second line is a traditional character. I need to store them to ch_simplified
and ch_trad
columns accordingly.
以[m1]开头的第三行是转录(拼音),第四行(以[m2]开头)是字符的翻译。字符还有第二种翻译,您会注意到它具有不同的转录。
The third line, which begins with [m1], is a transcription (pinyin), the forth line (begins with [m2]) is a translation of the character. There is also the second translation of the character, you can notice it has different transcription.
我们需要存储两个转录(有时同一转录有两个以上字符)放在单独的列(笔录
)中,然后将所有翻译部分存储到列 translation
中。
We need to store both transcriptions (sometimes there are more than 2 transcriptions for the same character) in a separate column (transcription
), and then store all translation part to a column translation
.
并且mysql db中的表如下所示:
And the table in mysql db looks like this:
ID | ch_simplified | ch_trad | transcription | translation |
---------------------------------------------------------------------------------------------
1. 戚谊 戚誼 [m1][b]qīyì[/b][/m]; [m1][b]qīyì[/b][/m]
[m1][b]qīyi[b][/m] [m2]translation 1[/m]
[m1][b]qīyi[b][/m]
[m2]translation 2[/m]
---------------------------------------------------------------------------------------------
2. 三州府 三州府 [m1][b]sānzhōufǔ[/b][/m] [m1][b]sānzhōufǔ[/b][/m]
[m2]translation of other character[/m]
问题是我不知道如何使用php解析此数据。我试着以
The problem is I don't know how parse this data using php. I tried to start with
$content = file_get_contents('myfile.txt', true);
并停留在我必须将第一个字符和第二个字符分开的步骤和三州府)。
and stuck at the step where I have to separate data between first character and the second character (戚谊 and 三州府).
任何帮助将不胜感激!
PS抱歉,这么长的文本和令人困惑的解释。
P.S. Sorry for such a long text and confusing explanation.
推荐答案
您的数据字段位于单独的行上,因此Phil的explode()调用会在换行符上。因此基本的数据字段获取是这样的:
Your data fields are on separate lines, so Phil's explode() call would be on the newline character. So the basic datafield acquisition is something like this:
$content = file_get_contents('myfile.txt', true);
foreach(explode("\n", $content) as $line)
{
$line = trim($line); // remove leading white space
// if necessary, check for empty lines here
switch(substr($line, 0,4)) // examine first four characters
{
case '[m1]':
// regular expression has some escaped characters
preg_match('/^\[m1](.+)\[\/m]$/', $line, $matches);
$field = $matches[1];
echo "pinyin: '$field'\n";
break;
case '[m2]':
preg_match('/^\[m2](.+)\[\/m]$/', $line, $matches);
$field = $matches[1];
echo "translation: '$field'\n";
break;
default:
$field = $line; // for clarity
echo "character: '$field'\n";
break;
}
}
在这里,我没有尝试过标识(a)新记录的开始,或(b)标识简体和繁体字符。这些问题可能是通过对字符字段标识进行计数来解决的-第一个是简化的,第二个是传统的,第一次显示一个新字段-但这就是您的工作。
Here, I have not attempted to identify (a) the start of a new record, or (b) identification of simplified and trad characters. These issues are probably addressed by counting character field identifications -- first one is simplified, second trad, first for a while indicates a new field -- but that's your job.
我也没有评估与非ASCII字符集有关的任何问题。我认为您是最重要的。
Nor have I assessed any issues relating to the non-ascii character set. I assume you are on top of that stuff.
我已经借此机会将内容与表示性标记分开(例如[b]标签)。最好将这些语义与数据区分开来。
I have taken the opportunity to separate the content from presentational markup (like the [b] tags). It's just good practice to keep those semantics separate from the data proper.
这篇关于通过php解析数据并将其存储到MySQL数据库时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!