csv文件的列比我当前导入的列更多.仅导入所有列是否更容易?我正在阅读有关LOAD DATA INFILE(http://dev.mysql.com/doc/refman/5.1/en/load-data.html)的内容,但是我想知道如何使用它并散列ID和格式化日期而不执行逐行执行.我当前的脚本花费的时间太长,并且在运行时导致网站性能问题.这就是我所拥有的:$url = 'http://www.example.com/directory/file.csv';if (($handle = fopen($url, "r")) !== FALSE){fgetcsv($handle, 1000, ",");while (($data = fgetcsv($handle, 1000, ",")) !== FALSE){ $EvID = $data[0]; $Ev = $data[1]; $PerID = $data[2]; $Per = $data[3]; $VName = $data[4]; $VID = $data[5]; $VSA = $data[6]; $DateTime = $data[7]; $PCatID = $data[8]; $PCat = $data[9]; $CCatID = $data[10]; $CCat = $data[11]; $GCatID = $data[12]; $GCat = $data[13]; $City = $data[14]; $State = $data[15]; $StateID = $data[16]; $Country = $data[17]; $CountryID = $data[18]; $Zip = $data[19]; $TYN = $data[20]; $IMAGEURL = $data[21]; $URLLink = $data[22]; $data[7] = strtotime($data[7]); $data[7] = date("Y-m-d H:i:s",$data[7]); if((($PCatID == '2') && (($CountryID == '217') or ($CountryID == '38'))) || (($GCatID == '16') or ($GCatID == '19') or ($GCatID == '30') or ($GCatID == '32'))) { if(!mysql_query("INSERT IGNORE INTO TNDB_CSV2 (id, EvID, Event, PerID, Per, VName, VID, VSA, DateTime, PCatID, PCat, CCatID, CCat, GCatID, GCat, City, State, StateID, Country, CountryID, Zip, TYN, IMAGEURL) VALUES ('".md5($EventID.$PerformerID)."','".addslashes($data[0])."','".addslashes($data[1])."','".addslashes($data[2])."','".addslashes($data[3])."','".addslashes($data[4])."', '".addslashes($data[5])."','".addslashes($data[6])."','".addslashes($data[7])."','".addslashes($data[8])."','".addslashes($data[9])."', '".addslashes($data[10])."','".addslashes($data[11])."','".addslashes($data[12])."','".addslashes($data[13])."','".addslashes($data[14])."', '".addslashes($data[15])."','".addslashes($data[16])."','".addslashes($data[17])."','".addslashes($data[18])."','".addslashes($data[19])."', '".addslashes($data[20])."','".addslashes($data[21])."')")) { exit("<br>" . mysql_error()); } }}fclose($handle);}总是非常感谢您的帮助.预先感谢.解决方案请首先尝试优化脚本.首先,除非别无选择,否则在导入时切勿运行单个查询,否则网络开销可能是致命的.尝试类似的方法(显然未经测试并在SO文本框中进行了编码,请检查方括号是否与e.c.t.相符):$url = 'http://www.example.com/directory/file.csv';if (($handle = fopen($url, "r")) !== FALSE){fgetcsv($handle, 1000, ",");$imports = array();while (($data = fgetcsv($handle, 1000, ",")) !== FALSE){ $EvID = $data[0]; $Ev = $data[1]; $PerID = $data[2]; $Per = $data[3]; $VName = $data[4]; $VID = $data[5]; $VSA = $data[6]; $DateTime = $data[7]; $PCatID = $data[8]; $PCat = $data[9]; $CCatID = $data[10]; $CCat = $data[11]; $GCatID = $data[12]; $GCat = $data[13]; $City = $data[14]; $State = $data[15]; $StateID = $data[16]; $Country = $data[17]; $CountryID = $data[18]; $Zip = $data[19]; $TYN = $data[20]; $IMAGEURL = $data[21]; $URLLink = $data[22]; $data[7] = strtotime($data[7]); $data[7] = date("Y-m-d H:i:s",$data[7]); if((($PCatID == '2') && (($CountryID == '217') or ($CountryID == '38'))) || (($GCatID == '16') or ($GCatID == '19') or ($GCatID == '30') or ($GCatID == '32'))) { $imports[] = "('".md5($EventID.$PerformerID)."','".addslashes($data[0])."','".addslashes($data[1])."','".addslashes($data[2])."','".addslashes($data[3])."','".addslashes($data[4])."', '".addslashes($data[5])."','".addslashes($data[6])."','".addslashes($data[7])."','".addslashes($data[8])."','".addslashes($data[9])."', '".addslashes($data[10])."','".addslashes($data[11])."','".addslashes($data[12])."','".addslashes($data[13])."','".addslashes($data[14])."', '".addslashes($data[15])."','".addslashes($data[16])."','".addslashes($data[17])."','".addslashes($data[18])."','".addslashes($data[19])."', '".addslashes($data[20])."','".addslashes($data[21])."')"; }}$importarrays = array_chunk($imports, 100);foreach($importarrays as $arr) { if(!mysql_query("INSERT IGNORE INTO TNDB_CSV2 (id, EvID, Event, PerID, Per, VName, VID, VSA, DateTime, PCatID, PCat, CCatID, CCat, GCatID, GCat, City, State, StateID, Country, CountryID, Zip, TYN, IMAGEURL) VALUES ".implode(',', $arr)){ die("error: ".mysql_error()); } }fclose($handle);}尝试使用array_chunk中的数字太大,可能会导致查询过长(是的,my.cnf中有可配置的限制),太小以及不必要的开销.由于脚本的大小,您也可以放弃将$ data [x]分配给变量的做法,这是一种浪费,只需直接在查询e.c.t中使用$ data [x]. (不会带来很大的改善,但是根据您的导入大小,可能会节省一些).接下来的事情是使用低优先级的插入/更新,请查看此内容以获取更多有关入门的信息:如何为某些查询赋予优先级?所有这些之后,您可能会想到mysql配置优化,但这是Google真正解释的一个问题,因为最佳设置对于每个人及其独特情况都是不同的 我之前做过的另一件事是,如果设置了很多导入不需要的键,则可以暂时删除这些键,并在导入时将其重新添加.脚本完成.这样也可以改善时间,但是当您使用实时数据库时,如果您沿该路线走,将会遇到一些陷阱.I am trying to import a csv file into a mysql table and I currently have a script that is running line by line because I need to hash an id combined with another id as well as format the date for mysql format.The csv file has MORE columns than I am currently importing. Is it easier to just import all columns?I was reading about LOAD DATA INFILE (http://dev.mysql.com/doc/refman/5.1/en/load-data.html), but I am wondering how I can use this and hash the ids and format the date without doing row by row execution. My current script is taking way too long and causing site performance issues while running.Here is what I have:$url = 'http://www.example.com/directory/file.csv';if (($handle = fopen($url, "r")) !== FALSE){fgetcsv($handle, 1000, ",");while (($data = fgetcsv($handle, 1000, ",")) !== FALSE){ $EvID = $data[0]; $Ev = $data[1]; $PerID = $data[2]; $Per = $data[3]; $VName = $data[4]; $VID = $data[5]; $VSA = $data[6]; $DateTime = $data[7]; $PCatID = $data[8]; $PCat = $data[9]; $CCatID = $data[10]; $CCat = $data[11]; $GCatID = $data[12]; $GCat = $data[13]; $City = $data[14]; $State = $data[15]; $StateID = $data[16]; $Country = $data[17]; $CountryID = $data[18]; $Zip = $data[19]; $TYN = $data[20]; $IMAGEURL = $data[21]; $URLLink = $data[22]; $data[7] = strtotime($data[7]); $data[7] = date("Y-m-d H:i:s",$data[7]); if((($PCatID == '2') && (($CountryID == '217') or ($CountryID == '38'))) || (($GCatID == '16') or ($GCatID == '19') or ($GCatID == '30') or ($GCatID == '32'))) { if(!mysql_query("INSERT IGNORE INTO TNDB_CSV2 (id, EvID, Event, PerID, Per, VName, VID, VSA, DateTime, PCatID, PCat, CCatID, CCat, GCatID, GCat, City, State, StateID, Country, CountryID, Zip, TYN, IMAGEURL) VALUES ('".md5($EventID.$PerformerID)."','".addslashes($data[0])."','".addslashes($data[1])."','".addslashes($data[2])."','".addslashes($data[3])."','".addslashes($data[4])."', '".addslashes($data[5])."','".addslashes($data[6])."','".addslashes($data[7])."','".addslashes($data[8])."','".addslashes($data[9])."', '".addslashes($data[10])."','".addslashes($data[11])."','".addslashes($data[12])."','".addslashes($data[13])."','".addslashes($data[14])."', '".addslashes($data[15])."','".addslashes($data[16])."','".addslashes($data[17])."','".addslashes($data[18])."','".addslashes($data[19])."', '".addslashes($data[20])."','".addslashes($data[21])."')")) { exit("<br>" . mysql_error()); } }}fclose($handle);}Any help is always greatly appreciated. Thanks in advance. 解决方案 try optimising your scripts first. First off, never run single queries when importing unless you have no other choice, the network overhead can be a killer.Try something like (obviously untested and coded in the SO textbox, check brackets match e.c.t.):$url = 'http://www.example.com/directory/file.csv';if (($handle = fopen($url, "r")) !== FALSE){fgetcsv($handle, 1000, ",");$imports = array();while (($data = fgetcsv($handle, 1000, ",")) !== FALSE){ $EvID = $data[0]; $Ev = $data[1]; $PerID = $data[2]; $Per = $data[3]; $VName = $data[4]; $VID = $data[5]; $VSA = $data[6]; $DateTime = $data[7]; $PCatID = $data[8]; $PCat = $data[9]; $CCatID = $data[10]; $CCat = $data[11]; $GCatID = $data[12]; $GCat = $data[13]; $City = $data[14]; $State = $data[15]; $StateID = $data[16]; $Country = $data[17]; $CountryID = $data[18]; $Zip = $data[19]; $TYN = $data[20]; $IMAGEURL = $data[21]; $URLLink = $data[22]; $data[7] = strtotime($data[7]); $data[7] = date("Y-m-d H:i:s",$data[7]); if((($PCatID == '2') && (($CountryID == '217') or ($CountryID == '38'))) || (($GCatID == '16') or ($GCatID == '19') or ($GCatID == '30') or ($GCatID == '32'))) { $imports[] = "('".md5($EventID.$PerformerID)."','".addslashes($data[0])."','".addslashes($data[1])."','".addslashes($data[2])."','".addslashes($data[3])."','".addslashes($data[4])."', '".addslashes($data[5])."','".addslashes($data[6])."','".addslashes($data[7])."','".addslashes($data[8])."','".addslashes($data[9])."', '".addslashes($data[10])."','".addslashes($data[11])."','".addslashes($data[12])."','".addslashes($data[13])."','".addslashes($data[14])."', '".addslashes($data[15])."','".addslashes($data[16])."','".addslashes($data[17])."','".addslashes($data[18])."','".addslashes($data[19])."', '".addslashes($data[20])."','".addslashes($data[21])."')"; }}$importarrays = array_chunk($imports, 100);foreach($importarrays as $arr) { if(!mysql_query("INSERT IGNORE INTO TNDB_CSV2 (id, EvID, Event, PerID, Per, VName, VID, VSA, DateTime, PCatID, PCat, CCatID, CCat, GCatID, GCat, City, State, StateID, Country, CountryID, Zip, TYN, IMAGEURL) VALUES ".implode(',', $arr)){ die("error: ".mysql_error()); } }fclose($handle);}Play around with the number in array_chunk, too big and it may cause problems like the query being too long (yes there is a configurable limit in my.cnf), too small and its unneccassary overhead.You could also drop the use of assign the $data[x] to variables as its a waste given how small the script is, just use the $data[x] directly in your query e.c.t. (wont give a massive improvement, but depending on your import size it could save a little).Next thing would be to use low priority inserts/updates, check out this for more info on that to get you started: How to give priority to certain queries?after all of that, you could think of mysql config optimisation's, but that's one for google to explain really as the best settings are different for everyone and their unique situationsEdit: Another thing i've done before is if you have a lot of keys set up that aren't required for the import, you can drop those keys temporarily and add them back when the script is done. This can yield good time improvements too, but as your working on a live database there are pitfalls to work around if you go down that route. 这篇关于将大型CSV文件导入MySQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-10 22:42