本文介绍了解析csv,并替换用双引号括起来的逗号。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hello Gurus



我有非常复杂的csv文件包含如下数据,事实上我删除了几列,但它仍然是一个很长的行,但我想给这样复杂的行作为示例以获得最佳输出..

Hello Gurus

I have very complex csv file that contain data like below, infact I removed few columns, still it is a lengthy row but I want to give such complex row as example to get the best output..

10.0,"[""Categories""]",,0,,,,,"[""Automotive and Transport"", ""Business and Finance"", ""Chemicals and Materials"", ""Company Reports"", ""Consumer Goods and Services"", ""Country Reports"", ""Energy and Natural Resources"", ""Food and Beverage"", ""Government and Public Sector"", ""Healthcare"", ""Humanities Books"", ""Industry Standards"", ""Manufacturing and Construction"", ""Military Aerospace and Defense"", ""Pharmaceuticals"", ""Science Books"", ""Telecommunications and Computing""]","[""Automotive"", ""Commercial Aerospace"", ""Maritime"", ""Private Transport Services"", ""Public Transport"", ""Rail"", ""Banking"", ""Business"", ""Economics"", ""Finance"", ""Insurance"", ""Legal"", ""Property"", ""Adhesives and Sealants"", ""Advanced Materials"", ""Ceramics"", ""Chemicals"", ""Composites"", ""Glass"", ""Metals and Minerals"", ""Nanomaterials"", ""Packaging"", ""Plastics"", ""Pulp and Paper"", ""Recycling and Recycled Materials"", ""Rubber"", ""Textiles"", ""Wax"", ""Wood and Wood Products"", ""Arts and Crafts Supplies"", ""Baby and Child Products"", ""Clothing"", ""Consumer Electronics"", ""Consumer Services"", ""Household"", ""Media and Entertainment"", ""Personal Care Products"", ""Retail"", ""Sporting Goods and Equipment"", ""Toys"", ""Travel and Tourism"", ""ICON Country Market Reports"", ""Biofuels"", ""Combined Heat and Power (CHP)"", ""Electricity"", ""Energy Maps"", ""Energy Storage"", ""Environmental"", ""Fossil Fuels"", ""Mining"", ""Nuclear Power"", ""Pipelines"", ""Renewable Energy"", ""Utilities"", ""Water"", ""Agriculture"", ""Beverage"", ""Food"", ""Food Ingredients"", ""Food Processing"", ""Food Safety"", ""Food Service"", ""Tobacco"", ""Government"", ""International Relations"", ""Public Sector"", ""Biotechnology"", ""Healthcare Services"", ""Laboratory Equipment"", ""Medical Devices"", ""Library and Information Science"", ""Political Science"", ""Psychology"", ""Society and Social Sciences"", ""Construction"", ""Engineering"", ""Manufacturing and Industry"", ""Security Services"", ""Ammunition"", ""Biodefense"", ""C4ISR"", ""Homeland Defense"", ""Infantry Weapons and Equipment"", ""Military Aircraft"", ""Military Logistics"", ""Military Unmanned Systems"", ""Missiles and Missiles Technology"", ""Naval Vessels"", ""Radar Systems"", ""Tanks and Armored Vehicles"", ""Anesthetic Drugs"", ""Animal Pharmaceuticals"", ""Antibiotics"", ""Biopharmaceuticals"", ""Clinical Trials"", ""Drug Delivery"", ""Drug Discovery"", ""Drugs by Therapeutic Area"", ""Generic Drugs"", ""Oncology Drugs"", ""Over the Counter (OTC) Drugs"", ""Pharmaceutical Intermediates"", ""Pharmaceutical Manufacturing"", ""Pharmacoeconomics"", ""Vitamins and Dietary Supplements"", ""Biology and Life Sciences"", ""Chemistry"", ""Geography"", ""Mathematics"", ""Physics"", ""Computing and Technology"", ""Internet and E-Commerce"", ""Telecommunications and Networks""]","","width=device-width, initial-scale=1.0, maximum-scale=1.0",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"{""Content-Type"":""text/html; charset=utf-8"",""Via"":""1.1 varnish"",""Fastly-Debug-Digest"":""e8f4fd7caeda67cf3daa747e38e1310cb8d84d9e2af22e6156b3d792b4ef7821"",""Content-Length"":""400501"",""Accept-Ranges"":""bytes"",""Date"":""Wed, 05 Apr 2017 21:56:34 GMT"",""Age"":""16376"",""Connection"":""keep-alive"",""X-Served-By"":""cache-lhr6350-LHR, cache-iad2128-IAD"",""X-Cache"":""HIT, HIT"",""X-Cache-Hits"":""1, 1"",""X-Timer"":""S1491429394.073711,VS0,VE7"",""Cache-Control"":""no-cache, no-store, private, must-revalidate, max-age=0, max-stale=0, post-check=0, pre-check=0"",""Expires"":""0"",""Pragma"":""no-cache""}"





你可以看到其他逗号作为封闭的双引号的一部分..我想忽略它们并用任何唯一字符替换逗号分隔符。这些数据有许多特殊字符,如|,&,^等,所以我将不得不决定唯一分隔符。



寻找powershell或vbscript中的东西



我做了通过在excel中打开csv文件来记录宏,然后用##替换所有逗号。这是手动工作但是当执行宏时,我能够看到许多行用逗号括在双引号中。



文件实际上是大约7GB所以不能真的手动吗



我尝试过:





As you can see there additional commas as part of the enclosed double quotes.. I want to ignore them and replace the comma delimiter with any unique character. This data has many special characters like |, &, ^ etc. so I will have to decide about unique delimiter.

Looking for something in either powershell or vbscript

I did record a macro by opening the csv file in excel and then replace all commas with "##". This is working manually but when the macro is executed, I am able to see many rows with the commas enclosed in double quotes.

The file is actually huge about 7GB so can't really do it manually

What I have tried:

Set wb = Application.Workbooks.Open(ToPath, UpdateLinks:=0)

    DisplayAlerts = False
    ActiveSheet.Cells.Select
    Cells.Replace What:=",", Replacement:=";", LookAt:=xlPart, SearchOrder _
        :=xlByRows, MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False
    wb.Close savechanges:=True
    Application.WindowState = xlNormal

推荐答案


这篇关于解析csv,并替换用双引号括起来的逗号。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 12:32