问题描述
我需要将数据从mysql导出到csv。我必须从几个表中选择数据,将它们放在数组中,然后处理它们并将其返回给浏览器.csv。
我注意到数组消耗了大量的行数。
例如,我在1.8M的数据库中导入了.csv,然后我尝试从.csv中的数据库导出此数据。 memory_get_peak_usage()显示超过128M的数据存储数据。
I need to export data from mysql to csv. I have to select data from several tables putting them in arrays, then process them and return them to browser as .csv. I noticed that arrays consume huge ammount of rows.For example, I imported a .csv in the database which is 1.8M, then I try to export this data from the database in .csv. The memory_get_peak_usage() shows more than 128M to store arrays with data.
例如,这个小数组需要700多个字节:
For example this small array takes more than 700 bytes:
$startMemory = memory_get_usage();
//get constant fields of the subscriber
$data = array(array('subscriber_id' => 1315444, 'email_address' => '[email protected]',
'first_name' => 'Michael', 'last_name' => 'Allen'));
echo memory_get_usage() - $startMemory;
所以导出甚至几兆字节的数据,在PHP脚本中需要数百兆字节的内存。
有没有办法解决这个问题?
表:
So exporting even several megabytes of data, require hundreds megabytes of memory in php script.Is there a way to solve this problem? Tables:
CREATE TABLE `subscribers` (
`subscriber_id` int(10) unsigned NOT NULL auto_increment,
`list_id` int(10) unsigned NOT NULL,
`account_id` int(10) unsigned NOT NULL,
`email_address` varchar(100) collate utf8_unicode_ci NOT NULL,
`first_name` varchar(50) collate utf8_unicode_ci NOT NULL default '',
`last_name` varchar(50) collate utf8_unicode_ci NOT NULL default '',
`ip` int(10) unsigned default NULL COMMENT '\nThe ip address of the subscriber that we can get when he opens the \nthe email or subscribe using subsribe form.\nTheoretically it can be used to segment by Location (which is not correct if someone uses proxy).',
`preferred_format` tinyint(4) NOT NULL default '0' COMMENT 'Preferred format of \n0 - HTML, \n1 -Text,\n2 - Mobile',
`state` tinyint(4) NOT NULL default '1' COMMENT '1 - subscribed\n2 - unsubscribed\n3 - cleaned\n4 - not confirmed, it means the user subscribed but has not confirmed it yet.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n',
`cause_of_cleaning` tinyint(4) NOT NULL default '0' COMMENT '\nThis field is the cause of moving the subscriber to the \n0 - not used\n1 - spam complaint\n2 - hard bounce\n3 - several soft bounces',
`date_added` datetime NOT NULL COMMENT 'The data when the subscriber was added. I suppose this field can be used in the conditions forming the segment',
`last_changed` datetime NOT NULL,
PRIMARY KEY (`subscriber_id`),
UNIQUE KEY `email_list_id` (`email_address`,`list_id`),
KEY `FK_list_id` (`list_id`),
CONSTRAINT `FK_list_id` FOREIGN KEY (`list_id`) REFERENCES `lists` (`list_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB
CREATE TABLE `subscribers_multivalued` (
`id` int(10) unsigned NOT NULL auto_increment,
`subscriber_id` int(10) unsigned NOT NULL,
`field_id` int(10) unsigned NOT NULL,
`value` varchar(100) collate utf8_unicode_ci NOT NULL,
`account_id` int(10) unsigned NOT NULL COMMENT '\nThe identifier of the account',
PRIMARY KEY (`id`),
KEY `subscriber_fk` (`subscriber_id`),
KEY `field_fk` (`field_id`),
CONSTRAINT `field_fk_string_multivalued` FOREIGN KEY (`field_id`) REFERENCES `custom_fields` (`field_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `subscriber_fk_multivalued` FOREIGN KEY (`subscriber_id`) REFERENCES `subscribers` (`subscriber_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB
CREATE TABLE `subscribers_custom_data_string` (
`subscriber_id` int(10) unsigned NOT NULL,
`field_id` int(10) unsigned NOT NULL,
`value` varchar(255) collate utf8_unicode_ci NOT NULL,
`account_id` int(10) unsigned NOT NULL COMMENT '\nThe identifier of the account',
PRIMARY KEY (`subscriber_id`,`field_id`),
KEY `subscriber_fk` (`subscriber_id`),
KEY `field_fk` (`field_id`),
CONSTRAINT `field_fk_string` FOREIGN KEY (`field_id`) REFERENCES `custom_fields` (`field_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `subscriber_fk_string` FOREIGN KEY (`subscriber_id`) REFERENCES `subscribers` (`subscriber_id`) ON DELETE CASCADE ON UPDATE CASCADE
)
还有其他与表字段类似的表,其中包含数字,日期的字符串。对于他们,主键是subscriber_id,field_id。
There are other tables for fields similar to the table with strings for numbers, dates. For them primary key is subscriber_id, field_id.
查询失败时(例如我们有几个自定义字段):
When query fails (for example we have several custom fields):
SELECT 订阅者
。 email_address
,订阅者
。 c $ c> first_name ,订阅者
。 last_name
,
GROUP_CONCAT(t1 ASVAL,GROUP_CONCAT(t2.value SEPARATOR'|')AS语言
FROM 订阅者
LEFT JOIN subscribers_multivalued
AS t1
ON subscribers.subscriber_id = t1.subscriber_id AND t1.field_id = 112
LEFT JOIN subscribers_multivalued
AS t2
ON subscribers.subscriber_id = t2.subscriber_id AND t2.field_id = 111
WHERE(list_id = 40)
GROUP BY 订阅者
。 email_address
,订阅者
。 first_name
,订阅者
。 last_name
SELECT subscribers
.email_address
, subscribers
.first_name
, subscribers
.last_name
,GROUP_CONCAT(t1.value SEPARATOR '|') AS Colors, GROUP_CONCAT(t2.value SEPARATOR '|') AS LanguagesFROM subscribers
LEFT JOIN subscribers_multivalued
AS t1
ON subscribers.subscriber_id=t1.subscriber_id AND t1.field_id=112LEFT JOIN subscribers_multivalued
AS t2
ON subscribers.subscriber_id=t2.subscriber_id AND t2.field_id=111WHERE (list_id=40)GROUP BY subscribers
.email_address
, subscribers
.first_name
, subscribers
.last_name
它将返回:
[email protected] Mi chelle布什红色|红色|蓝色|蓝色英语|西班牙语|英语|西班牙语而不是
[email protected]米歇尔布什红色|蓝色
英语|西班牙语
[email protected] Michelle Bush Red|Red|Blue|Blue English|Spanish|English|Spanish instead [email protected] Michelle Bush Red|BlueEnglish|Spanish
感谢您的任何信息。
推荐答案
仅使用两个表:
您的原始查询:
SELECT subscribers.email_address,
subscribers.first_name,
subscribers.last_name,
t1.value AS Languages
FROM subscribers
LEFT JOIN (SELECT subscriber_id,
field_id,
GROUP_CONCAT(value SEPARATOR '|') AS value
FROM subscribers_multivalued
WHERE field_id=37
GROUP BY subscriber_id, field_id
) AS t1
ON subscribers.subscriber_id=t1.subscriber_id
AND t1.field_id=37
WHERE (list_id=49)
AND (state=1)
给出解释计划:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY subscribers ref FK_list_id FK_list_id 4 const 2 Using where
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 5
2 DERIVED subscribers_multivalued ALL field_fk field_fk 4 11 Using filesort
我的加入建议:
SELECT subscribers.email_address,
subscribers.first_name,
subscribers.last_name,
GROUP_CONCAT(t1.value SEPARATOR '|') AS Languages
FROM subscribers
LEFT JOIN subscribers_multivalued t1
ON subscribers.subscriber_id=t1.subscriber_id
AND t1.field_id=37
WHERE (list_id=49)
AND (state=1)
GROUP BY subscribers.email_address,
subscribers.first_name,
subscribers.last_name
给出解释计划:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE subscribers ref FK_list_id FK_list_id 4 const 2 Using where; Using filesort
1 SIMPLE t1 ref subscriber_fk,field_fk subscriber_fk 4 test.subscribers.subscriber_id 1
虽然我仅使用非常小的数据量填充这两个表,但这表明我的版本的查询将对数据库执行更有效,因为它不使用您的查询生成的派生表。
While I only populated those two tables with a very small volume of data, this suggests to me that my version of the query will execute more efficiently against the database, because it isn't using the derived table that your query generates.
其他表可以以相同的方式链接到查询中,并且整个结果直接假脱机到一个csv文件,而不是用PHP进一步解析。
The other tables could be linked into the query in much the same way, and the entire result spooled directly to a csv file rather than parsed further with PHP.
这应该可以让您运行速度更快,内存更高效。
This should give you a run that is both faster and more memory efficient.
编辑
SELECT subscribers.email_address,
subscribers.first_name,
subscribers.last_name,
GROUP_CONCAT(DISTINCT t1.value SEPARATOR '|') AS Colors,
GROUP_CONCAT(DISTINCT t2.value SEPARATOR '|') AS Languages
FROM subscribers
LEFT JOIN subscribers_multivalued AS t1
ON subscribers.subscriber_id=t1.subscriber_id
AND t1.field_id=112
LEFT JOIN subscribers_multivalued AS t2
ON subscribers.subscriber_id=t2.subscriber_id
AND t2.field_id=37
WHERE (list_id=49)
GROUP BY subscribers.email_address,
subscribers.first_name,
subscribers.last_name
请注意在GROUP_CONCAT()函数中使用DISTINCT
Note the use of DISTINCT in the GROUP_CONCAT() function
这篇关于内存使用从数据库导出到csv在php的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!