我有一个包含6列的csv文件,其中一列的文本用逗号分隔,例如BOLT,RD HD SQ SHORT NECK,METRIC。

当我在R中读取此文件时,此列溢出,随后数据移至新行。

下面我粘贴几行


014003051906,ETN5080,0450,螺栓套件5速,1.000,F
014003051906,ETN5967,0460,SENSOR SENSOR FH BACKSHAFT SPEED,1.000,F
014003051906,ETN64267,0470,倾斜单元传感器,1.000,F

014003065376,03M7184,0020,螺栓-M 8.0 X 1.250 X 20.0-
8.8-锌,4.000,G 014003065376,03M7386,0090,螺栓,RD HD SQ短颈,公制,18.000,G 014003065376,14M7296,0090,螺母,公制,十六进制
法兰,14.000,G


最后两行是问题所在。 “螺母,公制,十六进制法兰”应归入一个变量。

如何解决?

最佳答案

data <- readLines(con = textConnection("014003051906,ETN5080 ,0450,BOLT KIT UPPER SHAFT WITH 5 SPEED,1.000,F
014003051906,ETN5967 ,0460,SENSOR SENSOR FH BACKSHAFT SPEED,1.000,F
014003051906,ETN64267 ,0470,TILT UNIT SENSOR,1.000,F

014003065376,03M7184 ,0020,BOLT - M 8.0 X 1.250 X 20.0 - 8.8-Zinc,4.000,G
014003065376,03M7386 ,0090,BOLT, RD HD SQ SHORT NECK, METRIC,18.000,G
014003065376,14M7296 ,0090,NUT, METRIC, HEX FLANGE,14.000,G"))

pattern <- "^([^,]*),([^,]*),([^,]*),(.*),([^,]*),([^,]*)$"

library(stringr)
str_match(data, pattern)[, - 1]
#      [,1]           [,2]        [,3]   [,4]                                     [,5]     [,6]
# [1,] "014003051906" "ETN5080 "  "0450" "BOLT KIT UPPER SHAFT WITH 5 SPEED"      "1.000"  "F"
# [2,] "014003051906" "ETN5967 "  "0460" "SENSOR SENSOR FH BACKSHAFT SPEED"       "1.000"  "F"
# [3,] "014003051906" "ETN64267 " "0470" "TILT UNIT SENSOR"                       "1.000"  "F"
# [4,] NA             NA          NA     NA                                       NA       NA
# [5,] "014003065376" "03M7184 "  "0020" "BOLT - M 8.0 X 1.250 X 20.0 - 8.8-Zinc" "4.000"  "G"
# [6,] "014003065376" "03M7386 "  "0090" "BOLT, RD HD SQ SHORT NECK, METRIC"      "18.000" "G"
# [7,] "014003065376" "14M7296 "  "0090" "NUT, METRIC, HEX FLANGE"                "14.000" "G"


编辑:
正则表达式对初学者的解释,用通俗易懂的语言,因此请原谅以下错误之处:


初始^和终端$表示字符串的开始和结束。
括号用于分组(str_match()将提取的组)。
.表示任何字符,并且.*表示任意数量的任何字符。
[^,]表示不是逗号的任何字符。


当放在一起时,表示:start of string-substring without a comma-comma(重复3次)-substring possibly containing commas-comma-substring without a comma-comma-substring without a comma-end of string,并且仅括号中的组被提取。

关于r - 读取一列中带有逗号的CSV文件,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/41613014/

10-11 03:05