我正在尝试使用fread将基因组比对读取到R中的data.table中。这是比对文件的快照:

USI-EAS28:1:100:1786:674#0/1    +   1_maternal  68326824      CTCAATTATACTGAAAGAAACACAATATATCATA    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0
USI-EAS28:1:100:1786:940#0/1    +   16_maternal 11407541    CTATTAGTGACCTGCTGTGGGACCTTGGGATGGT  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0
USI-EAS28:1:100:1786:705#0/1    +   1_maternal  63849584    CTGAGGGTTTGTGTCAGGAAGGGGTGTGGAATTG  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   0:T>C
USI-EAS28:1:100:1786:1168#0/1   -   5_maternal  31381649    GCATCATTCATGAAACAATTTTCAAGAGAGGAAA  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0
 USI-EAS28:1:100:1787:582#0/1   +   10_maternal 54587781    CTACAATAATAATAGGGGACTAAAACACCCCACT  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0
 USI-EAS28:1:100:1787:62#0/1    +   10_maternal 70390747     CTATTTGCTACTGAATTGTTAATTTTAAAACAGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0
 USI-EAS28:1:100:1788:573#0/1   -   7_maternal  92583837     CACTGTCAACATTAGACAGACCAATGAGACAAAG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0
 USI-EAS28:1:100:1788:854#0/1   +   7_maternal  129611206    GTTTGTTTTTTTTTTTGAGATGGAGTCTCATTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   32:C>T
 USI-EAS28:1:100:1788:185#0/1   -   13_maternal 23694307    CAAACAAACTCAAAATGGACTATCGACTGAAAAA  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0
 USI-EAS28:1:100:1788:1339#0/1  -   13_maternal 33699510    TTAACTCTAGTTTTTAGGGATTGCAAATTAGACG  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   0:A>G

第二列报告读取对齐的链(+为正向,-为反向)。不幸的是,fread试图将该列读取为整数,并将值始终分配为0。为此,该列应读取为字符,甚至是 bool 值。尝试使用sepsep2参数没有帮助。

最佳答案

感谢您的举报。现在已在v1.8.9修订版849中修复。+-现在被视为字符,已添加测试。

顺便说一句,我们还打算添加colClasses,以便您可以覆盖fread检测到的列类型。与fread相关的待办事项 list 在此处是源文件的顶部:
https://r-forge.r-project.org/scm/viewvc.php/pkg/src/fread.c?view=markup&root=datatable

关于读取带有fread,data.table包的链(+,-)列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/15388714/

10-12 23:49