问题描述
我确定有一个简单的解决方案,但我正在尝试找到它。任何帮助都非常感激。
我有一个包含2列的数据框; 亲和pep。
pro被格式化为因素,并包含220; 300; 4的条目;有时更多的数字(由;分隔)),有时只是一个数字(而不是;)。
pep列格式化为整数,并包含单个数字,例如20.
我想做的是扩展行pro:220; 300; 4和pep:20
到三行一,pro:220和pep:20,一个与pro:300和pep:20,一个与pro:4和pep:20。
我想为整个数据帧执行此操作,从而最终得到一个包含两个字符格式的列的数据帧,其中所有行最初都包含多个;分隔的数字已被扩展。
我希望避免循环,因为数据框架相当大(> 100000行)
$ b $我很抱歉,我没有能够以更具代表性的方式发布这个...我是新来的,并以代码格式丢失。
在来自西蒙的非常赞赏的请求中:
> dput(head(dat,10))
structure(list(Protein.Group.IDs = structure(c(1095L,60L,299L,
242L,1091L,147L,161L,884L,783L,1040L ),.Label = c(0,
1,10,100,101,102,103,104 ,107,
108,109,11,110,111,112,113,114 b115,116,117,118,119,12,120,121,121; 920; 530,
530; 589,121; 920; 530; 589; 934,121; 920; 589,121; 920; 934,
122; 351,122; 351; 122; 351; 950; 224; 904,122; 351; 950; 687,
122; 901; 224; 904,122; 901; 351 351; 950,122; 901; 351; 950; 224,
122; 901; 351; 950; 224; 890; 904,122; 901; 351; 950; 224; 890; 904; 687,
122; 901; 351; 950; 890; 687,122; 901; 950,122; 901; 950; 904; 687,
950,123,124,125,126,127,127,952,128,
129,13,130 131,131,204,132,133,134,135,
136,137,138,139,14 140,140; 259; 436,141,
142,143,144,1 45,146,147,148,149,15,
150,151,152,153,154 ,156,157,158,
159,16,16; 331,16; 331; 329,16; 16; 331; 329; 910,
16; 331; 329; 910; 62,16; 331; 62,16; 331; 910,160,161 b $ b162,163,164,165,166,166; 743,167,167; 595,
168 ,17,170,170,48,171,172,173,174,
175,176,177 ,179,18,180,181,182,
183,184,185,186,187 188; 813,188; 813; 852,
189,19,19; 14,19; 6; 9; 14; 11,19; 884; 9; 14; 20; 26; 11; 1,
19; 9,19; 9; 14,190,190; 260,191,192 193,194,
195,196,197,198,199,2,20,20; 26,200 b $ b201,202,203,204,205,206,207,208,209,
21 4,210,211,212,213,214,215,216,
217 ,220,221,222,223,224,
224; 890,224; 890; 904,225 1,225; 221; 308,225; 295,
226,227,228,228; 396,228; 396; 73 73,229,
23,23; 137,23; 17; 137,230,231,232,233 b $ b235,236,237,238,239,24,240,241,242,
243,244,245,246,247,248,249,25,
250,251,252 ,256,255,256,257,258,
259,26,260,261 264,265,266,
267,268,269,27,270,271,272,273 541; 905,
273; 905,274,275,276,277,278,279,28,280,
281,281,192,282,283,284,285,286,287,288,
289 ,290,291,292,293,294,295,296,
297,298,299,3 30,300,301,302,303,304,
304,770,305,306,307,308 309,31,310,311,
312,313; 293,314,314; 658,315,316 ,318,
319,32,320,321,322,323,32 4,324; 34; 564; 637; 282; 229; 565,
324; 564; 282,324; 637; 229; 565,325,326 327,328,
328; 586,329,33,330,331,332,333,334,335,
336,337,338,339,34,340,341,342,343,
344 ,346,346,523,347,348,349,35,350,
351 353,353,277,354,355,356,
357,358,359,36,360 362,363,364,
365,366,367,368,369,37,370,371 ,
373,374,375,376,377,377; 938,378,379,38,
,381,382,382; 147,383,384,385,386,387,
388,389 ,39; 417,390,391,392,393,394,
395,396,397,398 ,399; 955,4,40,400,
401,402,403,404,405,406 ,408,409,
41,410,411,412,413,414,415,416,417 b $ b418,419,42,420,421,422,423 4,424,640,b $ b425,426,427,427; 930,428,429,43,430 ,
432,433,434,435,436,437,438,438; 178,439,
,440,441,442,443,444,445,446,447,
448,449,45 450,451,452,453,454,455,
456,457,458,459,46 ,461,462,463,
464,465,466,467,468,469,47 471,b7,47,478,477,471,
472,473,474,475,476 ,481,482,483,484,485,486,487,488,
488; 648,489 ,490,491,492,493,494,495,
496,497,498,499,5 50,500,501,502,503,
504,505,506,507,508,509 ,510,511,
512,513,514,515,516,516; 603; 845,516; 603; 845; 837 ,
517,518,519,52,520,521,522,523,524,
526,527,527; 509,528,529,53,530,531,
532,533,534,535,536,537,538,539,54,
541,542,543,544,545,546,547,
548,549,55 550,549,551,552,553,554,
555,556,557,558,559 560,561,562,
563,564,564; 282,564; 637,565,566,567 ,
568; 569,568; 569; 286,568; 569; 574,568; 569; 574; 286,568; 574,
57,573,574,575,576,
577,578,579 579; 577; 578,579; 577; 580,579; 577; 580; 578,
58,580,581,582 584,585,585; 609,586,
587,587; 167,587; 167; 595,587; 167; 595; 557 588,589,
59,590,591,592,593,594,595,596 $ b598,599,6,60,600,601,601; 10,602,603,
604 606,607,608,609,61,610,611,
612,613,614 615; 269,615; 926; 269,616,
617,61 62,627,628,62,62,62 ,623,636,635,636,637,63 ,638,639,64,
64; 72,640,641,642,643,643; 529,644 645,
646,647,648,649,65,650,651,652,653 ,655,656,657,658,659,66,660,661,
662,663 ,664,665,666,667,668,669,
67,670,671,672 674,675,676,677,
678,679,68,680,681,681; 97,682 683,689,689,689,689,689,698 b693,694,695,696,697,698,699,7,7; 25; 5,
7; ,7,752; 24,7; 752; 25; 24; 8,70,700,701,702,
703,704 705,706,707,708,709,71,710,
711,712,713,714 ,716,717,718,719,
72,72; 746; 944,72; 746; 944; 772 ; 772,72,727,728,727,727,727 729,
73,730,731,732,733,734,735,735; 522,735; 665 $ b735; 665; 522,735; 665; 876,735; 876,735; 876; 522,736,
737,738 74,746,746,742,743,744,
745,746 ,747,748,749,
75,750,751,752,752; 24,753,754 756,
757,758,759,76,76; 313,76; 313; 293,760,761,
762,763,764,765,766,767,768,769,77,
770 772,773,774,775,776,777,778,
779,78,780,781 783,784,785,786,
787,788,789,79 792,793,792,793,
793,863,794,795,796,797 ,
800,801,802,803,804,805,806,807,808,
,809,81,810,811,812,813,814,815 ,
815; 413,815; 777,815; 777; 339,815; 777; 838,815; 838,
816,817 ,818,818; 7; 752,818; 7; 752; 23; 25; 17; 8,819,
82,820,821 822,823,824,824,823,824,824 838,832,833,832,833,834,
835,836,837,838 ,842,
843,844,845,846,847,847; 560; 590,848,849,
852,853,853,420,854,
855,856,857 ,858,858; 638,858; 638; 409,859,
86,860,861,861; 593,862 863,864,865,871,863,864,865,866 872,873,873,
874,875,876,877,878,879,88 b882,883,884,884; 6,884; 6; 9,885,886,887,
888 189,889,89,890,890; 904,891,891; 953,
892,892; 941,893 894\" , 895, 896, 897, 898, 899,
9, 90, 900,901,901,224,902,903,904,905,
906,907,908 91,910,911,912,913,
914,915,916,917 922,921,922,923,924,924,924 ,925,
926,927,928,929,93,930,931,932,933,
934,938,936,937,938,939,94,940,941,
,945,946,947,948,949,95,
950,951,952,953 955,956,957,958,
959,96,960,961,962,963,964 ,966,
967,97,98,99,99; 392),class =factor),Mod..Peptide.ID = c(23L,
24L,25L,26L,27L,29L,30L,31L,32L,33L)),Names = c(Protein.Group.IDs,
Mod..Peptide.ID) ,row.names = c(318L,344L,380L,406L,409L,
417L,436L,462L,494L,505L),class =data.frame)
此致
的Mads
我已经长大了的真正的爱 data.table
对于这样的任务。这是非常简单的但是,首先,我们来做一些样例数据(你应该提供理想的)。
#样本数据
set.seed (1)
df = data.frame(pep = replicate(3,paste(sample(999,3),collapse =;)),pro = sample(3),stringsAsFactors = FALSE)
'/ code>
现在我们使用 data.table
包做重塑在几行...
#负载data.table包
需要(data.table)
#打开data.frame成data.table,它看起来像..
DT< - data.table(DF)
#PEP亲
#1: 266; 372; 572 1
#2:908; 202; 896 3
#3:944; 660; 628 2
# 。
dt [,list(pep = unlist(strsplit(pep,;))),by = pro]
#pro pep
#1:1 266
# 2:1 372
#3:1 572
#4:3 908
#5:3 202
#6:3 896
#7:2 944
#8:2 660
#9:2 628
I am sure there is a simple solution to this, but i am going nuts trying to find it. Any help is very much appreciated.
I have a data frame with 2 columns; "pro" and "pep".pro is formatted as factors and contains entries in the form 220;300;4 sometimes more numbers (seperated by ";") and sometimes just a single number (and no ";").The pep column is formatted as integers and contains single numbers, e.g. 20.What i would like to do is to "expand" e.g. the row pro: 220;300;4 and pep: 20to three rows one with pro: 220 and pep: 20, one with pro: 300 and pep: 20 and one with pro: 4 and pep: 20.
I want to do this for the whole data frame and thus end up with a data frame with two character formatted columns where all the rows originally containing multiple ";" seperated numbers have been expanded.
I would prefer to avoid loops since the data frame is fairly large (>100000 rows)
I am sorry that i havent been able to post this in a more case-representative way...i am new here and got lost in the code format.
On a much appreciated request from simon:
> dput( head( dat , 10 ) )
structure(list(Protein.Group.IDs = structure(c(1095L, 60L, 299L,
242L, 1091L, 147L, 161L, 884L, 783L, 1040L), .Label = c("0",
"1", "10", "100", "101", "102", "103", "104", "105", "106", "107",
"108", "109", "11", "110", "111", "112", "113", "114", "114;680",
"115", "116", "117", "118", "119", "12", "120", "121", "121;920;530",
"121;920;530;589", "121;920;530;589;934", "121;920;589", "121;920;934",
"122;351", "122;351;950", "122;351;950;224;904", "122;351;950;687",
"122;901;224;904", "122;901;351", "122;901;351;950", "122;901;351;950;224",
"122;901;351;950;224;890;904", "122;901;351;950;224;890;904;687",
"122;901;351;950;890;687", "122;901;950", "122;901;950;904;687",
"122;950", "123", "124", "125", "126", "127", "127;952", "128",
"129", "13", "130", "131", "131;204", "132", "133", "134", "135",
"136", "137", "138", "139", "14", "140", "140;259;436", "141",
"142", "143", "144", "145", "146", "147", "148", "149", "15",
"150", "151", "152", "153", "154", "155", "156", "157", "158",
"159", "16", "16;331", "16;331;329", "16;331;329;62", "16;331;329;910",
"16;331;329;910;62", "16;331;62", "16;331;910", "160", "161",
"162", "163", "164", "165", "166", "166;743", "167", "167;595",
"168", "169", "17", "170", "170;48", "171", "172", "173", "174",
"175", "176", "177", "178", "179", "18", "180", "181", "182",
"183", "184", "185", "186", "187", "188", "188;813", "188;813;852",
"189", "19", "19;14", "19;6;9;14;11", "19;884;6;9;14;20;26;11;1",
"19;9", "19;9;14", "190", "190;260", "191", "192", "193", "194",
"195", "196", "197", "198", "199", "2", "20", "20;26", "200",
"201", "202", "203", "204", "205", "206", "207", "208", "209",
"21", "21;4", "210", "211", "212", "213", "214", "215", "216",
"217", "218", "219", "22", "220", "221", "222", "223", "224",
"224;890", "224;890;904", "225", "225;221", "225;221;308", "225;295",
"226", "227", "228", "228;396", "228;396;73", "228;73", "229",
"23", "23;137", "23;17;137", "230", "231", "232", "233", "234",
"235", "236", "237", "238", "239", "24", "240", "241", "242",
"242;171", "243", "244", "245", "246", "247", "248", "249", "25",
"250", "251", "252", "253", "254", "255", "256", "257", "258",
"259", "26", "260", "261", "262", "263", "264", "265", "266",
"267", "268", "269", "27", "270", "271", "272", "273", "273;541;905",
"273;905", "274", "275", "276", "277", "278", "279", "28", "280",
"281", "281;192", "282", "283", "284", "285", "286", "287", "288",
"289", "29", "290", "291", "292", "293", "294", "295", "296",
"297", "298", "299", "3", "30", "300", "301", "302", "303", "304",
"304;770", "305", "306", "307", "308", "309", "31", "310", "311",
"312", "313;293", "314", "314;658", "315", "316", "317", "318",
"319", "32", "320", "321", "322", "323", "324", "324;34;564;637;282;229;565",
"324;564;282", "324;637;229;565", "325", "326", "327", "328",
"328;586", "329", "33", "330", "331", "332", "333", "334", "335",
"336", "337", "338", "339", "34", "340", "341", "342", "343",
"344", "345", "346", "346;523", "347", "348", "349", "35", "350",
"351", "351;890", "352", "353", "353;277", "354", "355", "356",
"357", "358", "359", "36", "360", "361", "362", "363", "364",
"365", "366", "367", "368", "369", "37", "370", "371", "372",
"373", "374", "375", "376", "377", "377;938", "378", "379", "38",
"380", "381", "382", "382;147", "383", "384", "385", "386", "387",
"388", "389", "39", "39;417", "390", "391", "392", "393", "394",
"395", "396", "397", "398", "399", "399;955", "4", "40", "400",
"401", "402", "403", "404", "405", "406", "407", "408", "409",
"41", "410", "411", "412", "413", "414", "415", "416", "417",
"418", "419", "42", "420", "421", "422", "423", "424", "424;640",
"425", "426", "427", "427;930", "428", "429", "43", "430", "431",
"432", "433", "434", "435", "436", "437", "438", "438;178", "439",
"44", "440", "441", "442", "443", "444", "445", "446", "447",
"448", "449", "45", "450", "451", "452", "453", "454", "455",
"456", "457", "458", "459", "46", "460", "461", "462", "463",
"464", "465", "466", "467", "468", "469", "47", "470", "471",
"472", "473", "474", "475", "476", "477", "478", "479", "48",
"480", "481", "482", "483", "484", "485", "486", "487", "488",
"488;648", "489", "49", "490", "491", "492", "493", "494", "495",
"496", "497", "498", "499", "5", "50", "500", "501", "502", "503",
"504", "505", "506", "507", "508", "509", "51", "510", "511",
"512", "513", "514", "515", "516", "516;603;845", "516;603;845;837",
"517", "518", "519", "52", "520", "521", "522", "523", "524",
"525", "526", "527", "527;509", "528", "529", "53", "530", "531",
"532", "533", "534", "535", "536", "537", "538", "539", "54",
"540", "540;67", "541", "542", "543", "544", "545", "546", "547",
"548", "549", "55", "550", "550;549", "551", "552", "553", "554",
"555", "556", "557", "558", "559", "56", "560", "561", "562",
"563", "564", "564;282", "564;637", "565", "566", "567", "568",
"568;569", "568;569;286", "568;569;574", "568;569;574;286", "568;574",
"569", "57", "570", "571", "572", "573", "574", "575", "576",
"577", "578", "579", "579;577;578", "579;577;580", "579;577;580;578",
"58", "580", "581", "582", "583", "584", "585", "585;609", "586",
"587", "587;167", "587;167;595", "587;167;595;557", "588", "589",
"59", "590", "591", "592", "593", "594", "595", "596", "597",
"598", "599", "6", "60", "600", "601", "601;10", "602", "603",
"604", "605", "606", "607", "608", "609", "61", "610", "611",
"612", "613", "614", "615", "615;269", "615;926;269", "616",
"617", "618", "619", "62", "620", "621", "622", "623", "624",
"625", "626", "627", "628", "629", "63", "63;397", "630", "631",
"632", "633", "634", "635", "636", "637", "638", "639", "64",
"64;72", "640", "641", "642", "643", "643;529", "644", "645",
"646", "647", "648", "649", "65", "650", "651", "652", "653",
"654", "655", "656", "657", "658", "659", "66", "660", "661",
"662", "663", "663;819", "664", "665", "666", "667", "668", "669",
"67", "670", "671", "672", "673", "674", "675", "676", "677",
"678", "679", "68", "680", "681", "681;97", "682", "683", "684",
"685", "686", "687", "688", "689", "69", "690", "691", "692",
"693", "694", "695", "696", "697", "698", "699", "7", "7;25;5",
"7;752", "7;752;24", "7;752;25;24;8", "70", "700", "701", "702",
"703", "704", "705", "706", "707", "708", "709", "71", "710",
"711", "712", "713", "714", "715", "716", "717", "718", "719",
"72", "72;746;944", "72;746;944;772", "72;772", "72;927", "720",
"721", "722", "723", "724", "725", "726", "727", "728", "729",
"73", "730", "731", "732", "733", "734", "735", "735;522", "735;665",
"735;665;522", "735;665;876", "735;876", "735;876;522", "736",
"737", "738", "739", "74", "740", "741", "742", "743", "744",
"745", "746", "746;944", "746;944;772", "747", "748", "749",
"75", "750", "751", "752", "752;24", "753", "754", "755", "756",
"757", "758", "759", "76", "76;313", "76;313;293", "760", "761",
"762", "763", "764", "765", "766", "767", "768", "769", "77",
"770", "771", "772", "773", "774", "775", "776", "777", "778",
"779", "78", "780", "781", "782", "783", "784", "785", "786",
"787", "788", "789", "79", "790", "790;552", "791", "792", "793",
"793;863", "794", "795", "796", "797", "798", "799", "8", "80",
"800", "801", "802", "803", "804", "805", "806", "807", "808",
"808;21", "809", "81", "810", "811", "812", "813", "814", "815",
"815;413", "815;777", "815;777;339", "815;777;838", "815;838",
"816", "817", "818", "818;7;752", "818;7;752;23;25;17;8", "819",
"82", "820", "821", "822", "823", "824", "824;957", "825", "826",
"827", "828", "829", "83", "830", "831", "832", "833", "834",
"835", "836", "837", "838", "839", "84", "840", "841", "842",
"843", "844", "845", "846", "847", "847;560;590", "848", "849",
"85", "850", "850;817", "851", "852", "853", "853;420", "854",
"855", "856", "857", "858", "858;638", "858;638;409", "859",
"86", "860", "861", "861;593", "862", "863", "864", "865", "866",
"867", "868", "869", "869;614", "87", "870", "871", "872", "873",
"874", "875", "876", "877", "878", "879", "88", "880", "881",
"882", "883", "884", "884;6", "884;6;9", "885", "886", "887",
"888", "888;189", "889", "89", "890", "890;904", "891", "891;953",
"892", "892;941", "893", "894", "895", "896", "897", "898", "899",
"9", "90", "900", "901", "901;224", "902", "903", "904", "905",
"906", "907", "908", "909", "91", "910", "911", "912", "913",
"914", "915", "916", "917", "918", "918;947", "919", "92", "920;530;589",
"920;530;589;934", "921", "922", "923", "924", "924;576", "925",
"926", "927", "928", "929", "93", "930", "931", "932", "933",
"934", "935", "936", "937", "938", "939", "94", "940", "941",
"942", "943", "944", "945", "946", "947", "948", "949", "95",
"950", "951", "952", "953", "954", "955", "956", "957", "958",
"959", "96", "960", "961", "962", "963", "964", "965", "966",
"967", "97", "98", "99", "99;392"), class = "factor"), Mod..Peptide.ID = c(23L,
24L, 25L, 26L, 27L, 29L, 30L, 31L, 32L, 33L)), .Names = c("Protein.Group.IDs",
"Mod..Peptide.ID"), row.names = c(318L, 344L, 380L, 406L, 409L,
417L, 436L, 462L, 494L, 505L), class = "data.frame")
Kind RegardsMads
I've grown to really love data.table
for this kind of task. It is so very simple. But first, let's make some sample data (which you should provide idealy!)
# Sample data
set.seed(1)
df = data.frame( pep = replicate( 3 , paste( sample(999,3) , collapse=";") ) , pro = sample(3) , stringsAsFactors = FALSE )
Now we use the data.table
package to do the reshaping in a couple of lines...
# Load data.table package
require(data.table)
# Turn data.frame into data.table, which looks like..
dt <- data.table(df)
# pep pro
#1: 266;372;572 1
#2: 908;202;896 3
#3: 944;660;628 2
# Transform it in one line like this...
dt[ , list( pep = unlist( strsplit( pep , ";" ) ) ) , by = pro ]
# pro pep
#1: 1 266
#2: 1 372
#3: 1 572
#4: 3 908
#5: 3 202
#6: 3 896
#7: 2 944
#8: 2 660
#9: 2 628
这篇关于R:如何展开包含“列表”的行到几行...每个列表成员一个?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!