本文介绍了内存使用从数据库导出到csv在php的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将数据从mysql导出到csv。我必须从几个表中选择数据,将它们放在数组中,然后处理它们并将其返回给浏览器.csv。
我注意到数组消耗了大量的行数。
例如,我在1.8M的数据库中导入了.csv,然后我尝试从.csv中的数据库导出此数据。 memory_get_peak_usage()显示超过128M的数据存储数据。

I need to export data from mysql to csv. I have to select data from several tables putting them in arrays, then process them and return them to browser as .csv. I noticed that arrays consume huge ammount of rows.For example, I imported a .csv in the database which is 1.8M, then I try to export this data from the database in .csv. The memory_get_peak_usage() shows more than 128M to store arrays with data.

例如,这个小数组需要700多个字节:

For example this small array takes more than 700 bytes:

$startMemory = memory_get_usage();  
        //get constant fields of the subscriber
        $data = array(array('subscriber_id' => 1315444, 'email_address' => '[email protected]',
                            'first_name' => 'Michael', 'last_name' => 'Allen'));
        echo memory_get_usage() - $startMemory;

所以导出甚至几兆字节的数据,在PHP脚本中需要数百兆字节的内存。
有没有办法解决这个问题?
表:

So exporting even several megabytes of data, require hundreds megabytes of memory in php script.Is there a way to solve this problem? Tables:

    CREATE TABLE `subscribers` (
     `subscriber_id` int(10) unsigned NOT NULL auto_increment,
     `list_id` int(10) unsigned NOT NULL,
     `account_id` int(10) unsigned NOT NULL,
     `email_address` varchar(100) collate utf8_unicode_ci NOT NULL,
     `first_name` varchar(50) collate utf8_unicode_ci NOT NULL default '',
     `last_name` varchar(50) collate utf8_unicode_ci NOT NULL default '',
     `ip` int(10) unsigned default NULL COMMENT '\nThe ip address of the subscriber that we can get when he opens the \nthe email or subscribe using subsribe form.\nTheoretically it can be used to segment by Location (which is not correct if someone uses proxy).',
     `preferred_format` tinyint(4) NOT NULL default '0' COMMENT 'Preferred format of \n0 - HTML, \n1 -Text,\n2 - Mobile',
     `state` tinyint(4) NOT NULL default '1' COMMENT '1 - subscribed\n2 - unsubscribed\n3 - cleaned\n4 - not confirmed, it means the user subscribed but has not confirmed it yet.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n',
     `cause_of_cleaning` tinyint(4) NOT NULL default '0' COMMENT '\nThis field is the cause of moving the subscriber to the \n0 - not used\n1 - spam complaint\n2 - hard bounce\n3 - several soft bounces',
     `date_added` datetime NOT NULL COMMENT 'The data when the subscriber was added. I suppose this field can be used in the conditions forming the segment',
     `last_changed` datetime NOT NULL,
     PRIMARY KEY  (`subscriber_id`),
     UNIQUE KEY `email_list_id` (`email_address`,`list_id`),
     KEY `FK_list_id` (`list_id`),
     CONSTRAINT `FK_list_id` FOREIGN KEY (`list_id`) REFERENCES `lists` (`list_id`) ON DELETE CASCADE ON UPDATE CASCADE
    ) ENGINE=InnoDB 
    CREATE TABLE `subscribers_multivalued` (
     `id` int(10) unsigned NOT NULL auto_increment,
     `subscriber_id` int(10) unsigned NOT NULL,
     `field_id` int(10) unsigned NOT NULL,
     `value` varchar(100) collate utf8_unicode_ci NOT NULL,
     `account_id` int(10) unsigned NOT NULL COMMENT '\nThe identifier of the account',
     PRIMARY KEY  (`id`),
     KEY `subscriber_fk` (`subscriber_id`),
     KEY `field_fk` (`field_id`),
     CONSTRAINT `field_fk_string_multivalued` FOREIGN KEY (`field_id`) REFERENCES `custom_fields` (`field_id`) ON DELETE CASCADE ON UPDATE CASCADE,
     CONSTRAINT `subscriber_fk_multivalued` FOREIGN KEY (`subscriber_id`) REFERENCES `subscribers` (`subscriber_id`) ON DELETE CASCADE ON UPDATE CASCADE
    ) ENGINE=InnoDB
CREATE TABLE `subscribers_custom_data_string` (
 `subscriber_id` int(10) unsigned NOT NULL,
 `field_id` int(10) unsigned NOT NULL,
 `value` varchar(255) collate utf8_unicode_ci NOT NULL,
 `account_id` int(10) unsigned NOT NULL COMMENT '\nThe identifier of the account',
 PRIMARY KEY  (`subscriber_id`,`field_id`),
 KEY `subscriber_fk` (`subscriber_id`),
 KEY `field_fk` (`field_id`),
 CONSTRAINT `field_fk_string` FOREIGN KEY (`field_id`) REFERENCES `custom_fields` (`field_id`) ON DELETE CASCADE ON UPDATE CASCADE,
 CONSTRAINT `subscriber_fk_string` FOREIGN KEY (`subscriber_id`) REFERENCES `subscribers` (`subscriber_id`) ON DELETE CASCADE ON UPDATE CASCADE
) 

还有其他与表字段类似的表,其中包含数字,日期的字符串。对于他们,主键是subscriber_id,field_id。

There are other tables for fields similar to the table with strings for numbers, dates. For them primary key is subscriber_id, field_id.

查询失败时(例如我们有几个自定义字段):

When query fails (for example we have several custom fields):

SELECT 订阅者 email_address 订阅者。 c $ c> first_name ,订阅者 last_name
GROUP_CONCAT(t1 ASVAL,GROUP_CONCAT(t2.value SEPARATOR'|')AS语言
FROM 订阅者
LEFT JOIN subscribers_multivalued AS t1 ON subscribers.subscriber_id = t1.subscriber_id AND t1.field_id = 112
LEFT JOIN subscribers_multivalued AS t2 ON subscribers.subscriber_id = t2.subscriber_id AND t2.field_id = 111
WHERE(list_id = 40)
GROUP BY 订阅者 email_address 订阅者 first_name 订阅者 last_name

SELECT subscribers.email_address, subscribers.first_name, subscribers.last_name,GROUP_CONCAT(t1.value SEPARATOR '|') AS Colors, GROUP_CONCAT(t2.value SEPARATOR '|') AS LanguagesFROM subscribersLEFT JOIN subscribers_multivalued AS t1 ON subscribers.subscriber_id=t1.subscriber_id AND t1.field_id=112LEFT JOIN subscribers_multivalued AS t2 ON subscribers.subscriber_id=t2.subscriber_id AND t2.field_id=111WHERE (list_id=40)GROUP BY subscribers.email_address, subscribers.first_name, subscribers.last_name

它将返回:

[email protected] Mi chelle布什红色|红色|蓝色|蓝色英语|西班牙语|英语|西班牙语而不是
[email protected]米歇尔布什红色|蓝色
英语|西班牙语

[email protected] Michelle Bush Red|Red|Blue|Blue English|Spanish|English|Spanish instead [email protected] Michelle Bush Red|BlueEnglish|Spanish

感谢您的任何信息。

推荐答案

仅使用两个表:

您的原始查询:

SELECT subscribers.email_address, 
       subscribers.first_name, 
       subscribers.last_name, 
       t1.value AS Languages 
  FROM subscribers 
  LEFT JOIN (SELECT subscriber_id, 
                    field_id, 
                    GROUP_CONCAT(value SEPARATOR '|') AS value 
               FROM subscribers_multivalued 
              WHERE field_id=37 
              GROUP BY subscriber_id, field_id
            ) AS t1 
         ON subscribers.subscriber_id=t1.subscriber_id 
        AND t1.field_id=37 
 WHERE (list_id=49) 
   AND (state=1)

给出解释计划:

id  select_type  table                    type  possible_keys  key         key_len  ref    rows  Extra
1   PRIMARY      subscribers              ref   FK_list_id     FK_list_id  4        const  2     Using where
1   PRIMARY      <derived2>               ALL   NULL           NULL        NULL     NULL   5      
2   DERIVED      subscribers_multivalued  ALL   field_fk       field_fk    4               11    Using filesort

我的加入建议:

SELECT subscribers.email_address, 
       subscribers.first_name, 
       subscribers.last_name, 
       GROUP_CONCAT(t1.value SEPARATOR '|') AS Languages 
  FROM subscribers 
  LEFT JOIN subscribers_multivalued t1 
         ON subscribers.subscriber_id=t1.subscriber_id 
        AND t1.field_id=37 
 WHERE (list_id=49) 
   AND (state=1)
 GROUP BY subscribers.email_address, 
          subscribers.first_name, 
          subscribers.last_name

给出解释计划:

id  select_type  table        type  possible_keys           key            key_len  ref                             rows  Extra
1   SIMPLE       subscribers  ref   FK_list_id              FK_list_id     4        const                           2     Using where; Using filesort
1   SIMPLE       t1           ref   subscriber_fk,field_fk  subscriber_fk  4        test.subscribers.subscriber_id  1      

虽然我仅使用非常小的数据量填充这两个表,但这表明我的版本的查询将对数据库执行更有效,因为它不使用您的查询生成的派生表。

While I only populated those two tables with a very small volume of data, this suggests to me that my version of the query will execute more efficiently against the database, because it isn't using the derived table that your query generates.

其他表可以以相同的方式链接到查询中,并且整个结果直接假脱机到一个csv文件,而不是用PHP进一步解析。

The other tables could be linked into the query in much the same way, and the entire result spooled directly to a csv file rather than parsed further with PHP.

这应该可以让您运行速度更快,内存更高效。

This should give you a run that is both faster and more memory efficient.

编辑

SELECT subscribers.email_address, 
       subscribers.first_name, 
       subscribers.last_name, 
       GROUP_CONCAT(DISTINCT t1.value SEPARATOR '|') AS Colors, 
       GROUP_CONCAT(DISTINCT t2.value SEPARATOR '|') AS Languages 
  FROM subscribers 
  LEFT JOIN subscribers_multivalued AS t1 
         ON subscribers.subscriber_id=t1.subscriber_id 
        AND t1.field_id=112 
  LEFT JOIN subscribers_multivalued AS t2 
    ON subscribers.subscriber_id=t2.subscriber_id 
   AND t2.field_id=37 
 WHERE (list_id=49) 
 GROUP BY subscribers.email_address, 
          subscribers.first_name, 
          subscribers.last_name

请注意在GROUP_CONCAT()函数中使用DISTINCT

Note the use of DISTINCT in the GROUP_CONCAT() function

这篇关于内存使用从数据库导出到csv在php的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-23 08:04