问题描述
我正在构建将XLSX转换为CSV文件的XLSX处理器.由于文件可能会变得很大,因此我正在使用基于事件的方法,即使用XSSFSheetXMLHandler
I'm building a XLSX processor that transforms a XLSX into a CSV file. Because the files can get quite big, I'm using the event-based approach using XSSFSheetXMLHandler
这很好用,但是我的XLSX文件包含长号(13位数字),这是唯一的标识号,而不是实数.在Windows计算机上运行我的代码时,它会正确提取数字,但是在Linux计算机上运行时,它将其转换为E表示法.
This works perfectly fine, but my XLSX files contains long numbers (13 digits) which are unique identification numbers, not real numbers. When running my code on a Windows machine it correctly extracts the numbers, but when running on a Linux machine it converts it to E-notation.
例如:源值为7401075293087.在Windows上,此值已正确提取到我的CSV中,但在Linux上,该值通过7.40108E + 12
For example: the source value is 7401075293087. On windows this is correctly extracted into my CSV, but on Linux the value comes through as 7.40108E+12
XSSFSheetXMLHandler的问题在于,它在幕后读取XLSX,然后引发需要实现的SheetContentsHandler捕获的事件. SheetContentsHandler中的方法之一是具有签名的单元格方法:cell(String cellReference,String formattedValue,XSSFComment comment)
The problem with the XSSFSheetXMLHandler is that it reads the XLSX under the covers and then throws events that are caught by a SheetContentsHandler that you need to implement. Once of the method in the SheetContentsHandler is a cell method with the signature: cell(String cellReference, String formattedValue, XSSFComment comment)
如您所见,此方法已经接收到格式化的单元格(因此,在我的情况下,它接收到"7.40108E + 12").其余所有逻辑都在后台进行.
As your can see, this method already received the formatted cell (so in my case it receives "7.40108E+12"). All the rest of the logic happens under the covers.
根据我的调查,我认为解决方案在于定义一个自定义DataFormatter,它将特定地将13位整数视为字符串,而不是将其格式化为E表示法.
Based on my investigations I believe the solution lies in defining a custom DataFormatter that will specifically treat 13 digit integers as a string, instead of formatting them as E-notation.
不幸的是,我的计划没有按预期工作,我无法在线找到帮助.下面是我的代码的一部分.我在processSheet方法中尝试了以下方法:
Unfortunately my plan didn't work as expected and I couldn't find an help online. Below is an extract of my code. I tried the following in the processSheet method:
Locale locale = new Locale.Builder().setLanguage("en").setRegion("ZA").build();
DataFormatter formatter = new DataFormatter(locale);
Format format = new MessageFormat("{0,number,full}");
formatter.addFormat("#############", format);
这是我的代码的一部分:
Here's an extract of my code:
代码主体:
public void process(String Filename)throws IOException, OpenXML4JException, ParserConfigurationException, SAXException {
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(this.xlsxPackage);
XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
while (iter.hasNext()) {
InputStream stream = iter.next();
String sheetName = iter.getSheetName();
outStream = new FileOutputStream(Filename);
logger.info(sheetName);
this.output = new PrintWriter(Filename);
processSheet(styles, strings, new SheetToCSV(), stream);
logger.info("Done with Sheet :"+sheetName);
output.flush();
stream.close();
outStream.close();
output.close();
++index;
}
}
public void processSheet(StylesTable styles,ReadOnlySharedStringsTable strings,SheetContentsHandler sheetHandler, InputStream sheetInputStream)
throws IOException, ParserConfigurationException, SAXException {
InputSource sheetSource = new InputSource(sheetInputStream);
try {
XMLReader sheetParser = SAXHelper.newXMLReader();
ContentHandler handler = new XSSFSheetXMLHandler(styles, null, strings, sheetHandler, formatter, false);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
} catch(ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
}
这是自定义处理程序:
private class SheetToCSV implements SheetContentsHandler {
private boolean firstCellOfRow = false;
private int currentRow = -1;
private int currentCol = -1;
private void outputMissingRows(int number) {
for (int i=0; i<number; i++) {
for (int j=0; j<minColumns; j++) {
output.append(',');
}
output.append('\n');
}
}
public void startRow(int rowNum) {
// If there were gaps, output the missing rows
outputMissingRows(rowNum-currentRow-1);
// Prepare for this row
firstCellOfRow = true;
currentRow = rowNum;
currentCol = -1;
}
public void endRow(int rowNum) {
// Ensure the minimum number of columns
for (int i=currentCol; i<minColumns; i++) {
output.append(',');
}
output.append('\n');
}
public void cell(String cellReference, String formattedValue,
XSSFComment comment) {
logger.info("CellRef :: Formatted Value :"+cellReference+" :: "+formattedValue);
if (firstCellOfRow) {
firstCellOfRow = false;
} else {
output.append(',');
}
// gracefully handle missing CellRef here in a similar way as XSSFCell does
if(cellReference == null) {
cellReference = new CellRangeAddress(currentRow, currentCol, currentCol, currentCol).formatAsString();
}
// Did we miss any cells?
int thisCol = (new CellReference(cellReference)).getCol();
int missedCols = thisCol - currentCol - 1;
for (int i=0; i<missedCols; i++) {
output.append(',');
}
currentCol = thisCol;
// Number or string?
try {
Double.parseDouble(formattedValue);
output.append(formattedValue);
} catch (NumberFormatException e) {
//formattedValue = formattedValue.replaceAll("\\t", "");
//formattedValue = formattedValue.replaceAll("\\n", "");
//formattedValue = formattedValue.trim();
output.append('"');
output.append(formattedValue.replace("\"", "\\\"").trim());
output.append('"');
}
}
public void headerFooter(String text, boolean isHeader, String tagName) {
// Skip, no headers or footers in CSV
}
@Override
public void ovveriddenFormat(String celRef, int formatIndex,
String formatedString) {
// TODO Auto-generated method stub
}
}
推荐答案
如果文件是使用Excel
生成的,并且包含13位数字的单元格使用数字格式0
或#
,不是 General
.
Cannot reproducing if the file is generated using Excel
and the cells containing the 13 digit numbers are formatted using number format 0
or #
, not General
.
但是在Linux机器上运行"是什么意思?如果我使用Libreoffice Calc
创建*.xlsx
文件,使包含以数字格式General
格式化的13位数字的单元格,则Calc
会将它们显示为13位数字,但Excel
不会.为了显示Excel
中的13位数字,必须使用数字格式0
或#
格式化单元格.
But what is meant with "running on a Linux machine"? If I am creating the *.xlsx
file using Libreoffice Calc
having the cells containing the 13 digit numbers formatted using number format General
, then Calc
will showing them as 13 digit numbers but Excel
will not. For showing the numbers 13 digit in Excel
the cells must be formatted using number format 0
or #
.
apache poi
DataFormatter
的工作方式与Excel
相同.当使用General
进行格式化时,Excel
会以科学计数法的形式显示12位数字的值.
The apache poi
DataFormatter
is made to work like Excel
would do. And Excel
shows values from 12 digits on as scientific notation when formatted using General
.
您可以使用以下方式更改此行为:
You could changing this behavior using:
...
public void processSheet(
StylesTable styles,
ReadOnlySharedStringsTable strings,
SheetContentsHandler sheetHandler,
InputStream sheetInputStream) throws IOException, SAXException {
DataFormatter formatter = new DataFormatter();
formatter.addFormat("General", new java.text.DecimalFormat("#.###############"));
...
这篇关于Apache POI如何添加自定义DataFormatter以将13位整数作为字符串而非数字进行处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!