Unicode组织提供一个文本文件,其中包含emojis的分类和名称详细信息。
最新版本在此处提供:
http://unicode.org/Public/emoji/5.0/emoji-test.txt
每个emoji属于8个broadGroups
中的一个,然后将每个Group
分成若干子组,例如,Animals & Nature
组的子组如下所示:
# group: Smileys & People
# group: Animals & Nature
# subgroup: animal-mammal
# subgroup: animal-bird
# subgroup: animal-amphibian
# subgroup: animal-reptile
# subgroup: animal-marine
# subgroup: animal-bug
# subgroup: plant-flower
# subgroup: plant-other
# group: Food & Drink
# group: Travel & Places
# group: Activities
# group: Objects
# group: Symbols
# group: Flags
然后根据每个子组列出每个子组中的表情符号-例如,对于
animal-bird
子组,列出这些表情符号:1F983 ; fully-qualified # 🦃 turkey
1F414 ; fully-qualified # 🐔 chicken
1F413 ; fully-qualified # 🐓 rooster
1F423 ; fully-qualified # 🐣 hatching chick
1F424 ; fully-qualified # 🐤 baby chick
1F425 ; fully-qualified # 🐥 front-facing baby chick
1F426 ; fully-qualified # 🐦 bird
1F427 ; fully-qualified # 🐧 penguin
1F54A FE0F ; fully-qualified # 🕊️ dove
1F54A ; non-fully-qualified # 🕊 dove
1F985 ; fully-qualified # 🦅 eagle
1F986 ; fully-qualified # 🦆 duck
1F989 ; fully-qualified # 🦉 owl
因此,每个Emoji都具有以下属性—以
turkey
Emoji为例:组别:动物与自然
亚组:动物爬行动物
姓名:1F983
状态:完全合格
表情符号:🦃
描述:土耳其
我有一个MySQL表,我想将emoji的详细信息存储在:
CREATE TABLE `xx_emoji` (
`fld_id` int(11) NOT NULL AUTO_INCREMENT,
`fld_group` varchar(255) DEFAULT NULL,
`fld_cat` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`fld_name` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`fld_status` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`fld_emoji` varbinary(255) DEFAULT NULL,
`fld_description` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`fld_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4
我可以一次一个地手动浏览文本文件并将详细信息保存到MySQL表中,这样就可以在表中获得这样的数据。
但是,我想知道是否可以使用PHP来解析文本文件?
我想它需要一系列嵌套循环:
foreach group {
foreach subgroup {
loop through emoji list and save into MySQL table...
group
subgroup
name
status
emoji
description
end loop
}
}
我知道这只是一个很基本的大纲,我很抱歉问了这么广泛的问题。
我在unicode网站上查看了emoji数据是否有其他更有用的格式,如XML或JSON,但是我找不到任何东西,只能在当前emoji版本中看到:
https://unicode.org/Public/emoji/5.0/
最佳答案
她不是很漂亮,如果他们改变了格式,她可能会崩溃,但现在你去,至少它给你指出了正确的方向:p
<?php
if (!file_exists('emoji-test.txt')) {
file_put_contents('emoji-test.txt', file_get_contents('http://unicode.org/Public/emoji/5.0/emoji-test.txt'));
}
// break into blocks
$blocks = explode(PHP_EOL.PHP_EOL, file_get_contents('emoji-test.txt'));
// unset header
unset($blocks[0]);
$emoji = [];
foreach ($blocks as $chunk) {
$top = explode(PHP_EOL, $chunk)[0];
if (substr($top, 0, strlen('# group:')) == '# group:') {
$group = trim(str_replace('# group:', '', $top));
} elseif (substr($top, 0, strlen('# subgroup:')) == '# subgroup:') {
$lines = explode(PHP_EOL, $chunk);
unset($lines[0]);
foreach ($lines as $line) {
$subgroup = trim(str_replace('# subgroup:', '', $top));
$linegroup = explode(';', $line);
$parts = explode('#', $linegroup[1]);
$icon = explode(' ', trim($parts[1]), 2);
$emoji[$group][$subgroup][] = [
'group' => trim($group),
'subgroup' => $subgroup,
'name' => trim($linegroup[0]),
'status' => trim($parts[0]),
'emoji' => trim($icon[0]),
'description' => trim($icon[1]),
];
}
}
}
print_r($emoji);
输出如下所示,分组,然后分组嵌套,然后可以轻松地循环并插入到数据库中。
Array
(
[Smileys & People] => Array
(
[face-positive] => Array
(
[0] => Array
(
[group] => Smileys & People
[subgroup] => face-positive
[name] => 1F600
[status] => fully-qualified
[emoji] => 😀
[description] => grinning face
)
[1] => Array
(
[group] => Smileys & People
[subgroup] => face-positive
[name] => 1F601
[status] => fully-qualified
[emoji] => 😁
[description] => beaming face with smiling eyes
)
[2] => Array
(
[group] => Smileys & People
[subgroup] => face-positive
[name] => 1F602
[status] => fully-qualified
[emoji] => 😂
[description] => face with tears of joy
)
...snip
希望有帮助。
关于php - 使用PHP解析Unicode表情文字文件,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/47429694/