问题描述
什么是解析Java中的制表符分隔文件的最原始的方式,使表格数据不会丢失的结构?我的不可以寻找一种方式,用豆或Jsoup做到这一点,因为他们不熟悉我,初学者。我需要什么是它背后的逻辑,什么是做它的有效途径,例如,如果我有像
What would be the most primitive way of parsing a tab-separated file in Java, so that the tabular data would not lose the structure? I am not looking for a way to do it with Bean or Jsoup, since they are not familiar to me, a beginner. I need advice on what would be the logic behind it and what would be the efficient way to do it, for example if I have a table like
ID reference | Identifier | Type 1| Type 2 | Type 3 |
1 | red#01 | 15% | 20% | 10% |
2 | yellow#08 | 13% | 20% | 10% |
更正:在这个例子中我有类型1 - 3,但我的问题适用于N多的种类
Correction: In this example I have Types 1 - 3, but my question applies to N number of types.
我可以做到,只需使用数组或有在Java中不同的数据结构,将成为这个任务的更好的表解析?这是我觉得我应该这样做:
Can I achieve table parsing by just using arrays or is there a different data structure in Java that would be better for this task? This is how I think I should do it:
- 扫描/读取
\\ t的第一行分裂
并创建一个字符串数组。 - 拆分成数组每个子阵列 1表标题的子阵
- 之后,开始读取表中的下一行,并且对于每个子阵列,由列添加相应的值。
- Scan/read the first line splitting at
"\t"
and create a String array. - Split that array into sub-arrays of 1 table heading per sub-array
- Then, start reading the next line of the table, and for each sub-array, add the corresponding values from the columns.
请问这个计划听起来正确的还是我的事情过于复杂/被完全错了吗?是否有更简单的方法来做到这一点? (前提是我仍然不知道如何将数组拆分成子阵,以及如何填充从表中的值的子阵)
Does this plan sound right or am I overcomplicating things/being completely wrong? Is there an easier way to do it? (provided that I still don't know how to split arrays into subarrays and how to populate the subarrays with the values from the table)
推荐答案
我会强烈建议你用一个读平面文件解析库对于这一点,就像优秀的。
I would strongly suggest you use a read flat file parsing library for this, like the excellent OpenCSV.
如果做不到这一点,这里是Java 8的解决方案。
Failing that, here is a solution in Java 8.
首先,创建一个类来重新present您的数据:
First, create a class to represent your data:
static class Bean {
private final int id;
private final String name;
private final List<Integer> types;
public Bean(int id, String name, List<Integer> types) {
this.id = id;
this.name = name;
this.types = types;
}
//getters
}
您使用各种清单的建议是基于非常脚本。 Java是面向对象的,所以你应该用它来你的优势。
Your suggestion to use various lists is very scripting based. Java is OO so you should use that to your advantage.
现在,我们只需要分析该文件:
Now we just need to parse the file:
public static void main(final String[] args) throws Exception {
final Path path = Paths.get("path", "to", "file.tsv");
final List<Bean> parsed;
try (final Stream<String> lines = Files.lines(path)) {
parsed = lines.skip(1).map(line -> line.split("\\s*\\|\\s*")).map(line -> {
final int id = Integer.parseInt(line[0]);
final String name = line[1];
final List<Integer> types = Arrays.stream(line).
skip(2).map(t -> Integer.parseInt(t.replaceAll("\\D", ""))).
collect(Collectors.toList());
return new Bean(id, name, types);
}).collect(Collectors.toList());
}
}
在本质上code,然后跳过第一行遍历文件中的行和每行:
In essence the code skips the first line then loops over lines in the file and for each line:
- 分割上的分隔符行了 - 似乎是
|
。这就需要正则表达式,所以你需要躲避管,因为它是一个特殊字符。此外,我们分隔符后/前消耗任何空间。 - 创建一个
新的Bean
通过解析数组元素的每一行。 - 首先解析ID给
INT
- 下一步获取名称
- 最后得到的线条的
流
,跳过前两个元素,并解析其余为列表&LT;整数GT;
- Split the line on the delimiter - seems to be
|
. This requires regex so you need to escape the pipe as it is a special character. Also we consume any spaces before/after the delimiter. - Create a
new Bean
for each line by parsing the array elements. - First parse the id to an
int
- Next get the name
- Finally get a
Stream
of the lines, skip the first two elements, and parse the remaining to aList<Integer>
这篇关于一种解析制表符分隔文件策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!