php - 如何使用PHP和外键将“大量”数据导入MySQL？

我有这些桌子：

create table person (
    person_id int unsigned auto_increment,
    person_key varchar(40) not null,
    primary key (person_id),
    constraint uc_person_key unique (person_key)
)
-- person_key is a varchar(40) that identifies an individual, unique
-- person in the initial data that is imported from a CSV file to this table

create table marathon (
    marathon_id int unsigned auto_increment,
    marathon_name varchar(60) not null,
    primary key (marathon_id)
)

create table person_marathon (
    person_marathon _id int unsigned auto_increment,

    person_id int unsigned,
    marathon_id int unsigned,

    primary key (person_marathon_id),
    foreign key person_id references person (person_id),
    foreign key marathon_id references person (marathon_id),

    constraint uc_marathon_person unique (person_id, marathon_id)
)

Person表由包含大约130000行的CSV填充。这个CSV为每个人和一些其他人数据包含一个惟一的varchar（40）。CSV中没有ID。
每次马拉松比赛，我都会得到一个包含1千到3万人的CSV列表。CSV实际上只包含一个person_key值列表，显示哪些人参加了特定的马拉松。
将数据导入person_marathon表以维护FK关系的最佳方式是什么？
这些是我现在能想到的想法：
从MySQL中提取person_id + person_key信息，并在PHP中合并person_marathon数据，在插入person_id表之前获取其中的person_marathon
使用临时表插入。。。但这是为了工作，我被要求永远不要在这个特定的数据库中使用临时表
完全不使用person_id并且只使用person_key字段，但是我必须加入到varchar(40)中，这通常不是一件好事
或者，对于插入，使其看起来像这样（我必须插入<hr>否则它不会将整个插入格式化为代码）：

insert  into person_marathon

select  p.person_id, m.marathon_id

from    ( select 'person_a' as p_name, 'marathon_a' as m_name union
          select 'person_b' as p_name, 'marathon_a' as m_name )
          as imported_marathon_person_list

        join person p
           on p.person_name = imported_marathon_person_list.p_name

        join marathon m
           on m.marathon_name = imported_marathon_person_list.m_name

这个insert的问题是，要用PHP构建它，imported_marathon_person_list将是巨大的，因为它很容易成为30000个select union项。不过，我不知道还能怎么做。

最佳答案

我也处理过类似的数据转换问题，虽然规模较小。如果我正确理解了你的问题（我不确定），听起来让你的处境充满挑战的细节是：你试图在同一步做两件事：
将CSV中的大量行导入mysql，然后
进行转换，使person marathon协会通过person_id和marathon_id工作，而不是（笨拙和不受欢迎的）varchar personkey列。
简而言之，我会尽一切可能避免在同一步做这两件事。把它分成这两个步骤-首先以可接受的形式导入所有数据，然后再对其进行优化。Mysql是进行这种转换的良好环境，因为当您将数据导入persons和marathons表时，会为您设置id。
步骤1：导入数据
我发现在mysql环境中执行数据转换比在mysql环境外执行更容易。因此，将数据输入mysql，以一种即使不是最优的也能保留person-marathon关联的形式，并担心之后会更改关联方法。
你提到临时工表，但我觉得你不需要。在personsømarathons表上设置一个临时列“personkey”。当您导入所有关联时，您将暂时保留person_id为空，只导入personkey。重要的是，确保personkey是associations表和persons表上的索引列。然后，您可以稍后完成并为每个personkey填写正确的person_id，而不必担心mysql效率低下。
我不清楚马拉松赛程数据的性质。你有成千上万的马拉松比赛要参加吗？如果是的话，我不羡慕你每次马拉松处理一个电子表格的工作。但如果少了，你也许可以用手摆马拉松的桌子。让mysql为您生成marathon id。然后，在为每个马拉松导入person_marathon CSV时，请确保在与该马拉松相关的每个关联中指定该马拉松ID。
导入完数据后，将有三个表：
*persons-您有丑陋的personkey，以及新生成的person id，外加任何其他字段
*马拉松-你现在应该参加马拉松比赛，对吧？或者是新生成的，或者是你从旧系统中带过来的一个数字。
*人马拉松-这张表应该填上马拉松的id并指向马拉松表中正确的一行，对吧？您还拥有personkey（丑陋但存在）和person_id（仍然为空）。
步骤2：使用personkey为关联表中的每一行填写person\u id
然后使用直接的Mysql或编写一个简单的PHP脚本，为persons_marathons表中的每一行填写persons_id。如果我很难让mysql直接完成这项工作，我通常会编写一个php脚本来一次处理一行。其中的步骤很简单：
查找person\u id为空但personkey不为空的任何一行
查那个人的身份证
在该行的关联表中写入该人员的id
您可以告诉PHP重复100次然后结束脚本，或者1000次，如果您一直遇到超时问题或类似taht的问题。
这种转换涉及大量查找，但每次查找只需要一行。这很吸引人，因为在任何时候都不需要让mysql（或PHP）将整个数据集“放在头上”。
此时，您的关联表应该为每一行填写person_id。现在可以安全地删除personkey列了，瞧，你有了有效的外键。

关于php - 如何使用PHP和外键将“大量”数据导入MySQL？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/17793322/