问题描述
我知道这个(或类似的)已经被问过很多次了,但是在尝试了很多可能性之后,我一直没能找到一个 100% 有效的正则表达式.
我有一个 CSV 文件,我试图将它拆分成一个数组,但遇到两个问题:引号引起来的逗号和空元素.
CSV 看起来像:
123,2.99,AMO024,Title,"描述,更多信息",,123987564
我尝试使用的正则表达式是:
thisLine.split(/,(?=(?:[^"]*"[^"]*")*(?![^"]*"))/)
唯一的问题是在我的输出数组中,第 5 个元素显示为 123987564 而不是空字符串.
说明
我认为简单地执行匹配并处理所有找到的匹配会更容易,而不是使用拆分.
这个表达式将:
- 用逗号分隔您的示例文本
- 将处理空值
- 将忽略双引号逗号,前提是双引号不嵌套
- 从返回值中去除分隔逗号
- 从返回值中修剪周围的引号
正则表达式:(?:^|,)(?=[^"]|(")?)"?((?(1)[^"]*|[^,"]*))"?(?=,|$)
示例
示例文本
123,2.99,AMO024,Title,"描述,更多信息",,123987564
使用非 java 表达式的 ASP 示例
设置正则表达式 = 新正则表达式regEx.Global = 真regEx.IgnoreCase = TrueregEx.MultiLine = Truesourcestring = "你的源字符串"regEx.Pattern = "(?:^|,)(?=[^""]|("")?)""?((?(1)[^""]*|[^,""]*))""?(?=,|$)"设置匹配 = regEx.Execute(sourcestring)对于 z = 0 到 Matches.Count-1结果 = 结果 &"匹配(" & z & ") = " &chr(34) &Server.HTMLEncode(Matches(z)) &chr(34) &铬(13)对于 zz = 0 到 Matches(z).SubMatches.Count-1结果 = 结果 &"Matches(" & z & ").SubMatches(" & zz & ") = " &chr(34) &Server.HTMLEncode(Matches(z).SubMatches(zz)) &chr(34) &铬(13)下一个结果=左(结果,Len(结果)-1)&铬(13)下一个Response.Write "
"&结果
使用非 java 表达式匹配
第 0 组获取包含逗号的整个子字符串
如果使用过,第 1 组将获得报价
第 2 组获取不包括逗号的值[0][0] = 123[0][1] =[0][2] = 123[1][0] = ,2.99[1][1] =[1][2] = 2.99[2][0] = ,AMO024[2][1] =[2][2] = AMO024[3][0] = ,标题[3][1] =[3][2] = 标题[4][0] = ,"说明,更多信息"[4][1] = "[4][2] = 描述,更多信息[5][0] = ,[5][1] =[5][2] =[6][0] = ,123987564[6][1] =[6][2] = 123987564
I know this (or similar) has been asked many times but having tried out numerous possibilities I've not been able to find a a regex that works 100%.
I've got a CSV file and I'm trying to split it into an array, but encountering two problems: quoted commas and empty elements.
The CSV looks like:
123,2.99,AMO024,Title,"Description, more info",,123987564
The regex I've tried to use is:
thisLine.split(/,(?=(?:[^"]*"[^"]*")*(?![^"]*"))/)
The only problem is that in my output array the 5th element comes out as 123987564 and not an empty string.
解决方案Description
Instead of using a split, I think it would be easier to simply execute a match and process all the found matches.
This expression will:
- divide your sample text on the comma delimits
- will process empty values
- will ignore double quoted commas, providing double quotes are not nested
- trims the delimiting comma from the returned value
- trims surrounding quotes from the returned value
Regex: (?:^|,)(?=[^"]|(")?)"?((?(1)[^"]*|[^,"]*))"?(?=,|$)
Example
Sample Text
123,2.99,AMO024,Title,"Description, more info",,123987564
ASP example using the non-java expression
Set regEx = New RegExp
regEx.Global = True
regEx.IgnoreCase = True
regEx.MultiLine = True
sourcestring = "your source string"
regEx.Pattern = "(?:^|,)(?=[^""]|("")?)""?((?(1)[^""]*|[^,""]*))""?(?=,|$)"
Set Matches = regEx.Execute(sourcestring)
For z = 0 to Matches.Count-1
results = results & "Matches(" & z & ") = " & chr(34) & Server.HTMLEncode(Matches(z)) & chr(34) & chr(13)
For zz = 0 to Matches(z).SubMatches.Count-1
results = results & "Matches(" & z & ").SubMatches(" & zz & ") = " & chr(34) & Server.HTMLEncode(Matches(z).SubMatches(zz)) & chr(34) & chr(13)
next
results=Left(results,Len(results)-1) & chr(13)
next
Response.Write "<pre>" & results
Matches using the non-java expression
Group 0 gets the entire substring which includes the comma
Group 1 gets the quote if it's used
Group 2 gets the value not including the comma
[0][0] = 123
[0][1] =
[0][2] = 123
[1][0] = ,2.99
[1][1] =
[1][2] = 2.99
[2][0] = ,AMO024
[2][1] =
[2][2] = AMO024
[3][0] = ,Title
[3][1] =
[3][2] = Title
[4][0] = ,"Description, more info"
[4][1] = "
[4][2] = Description, more info
[5][0] = ,
[5][1] =
[5][2] =
[6][0] = ,123987564
[6][1] =
[6][2] = 123987564
这篇关于正则表达式拆分 CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!