本文介绍了Java:Apache Poi 的 excel 到 csv 日期转换问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当使用以下实用程序将大型 excel 文件转换为 csv 时,由于 excel 单元格格式定义为 *format,某些日期值转换不正确.

代码位置:https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/eventusermodel/XLSX2CSV.java

/* ====================================================================根据一项或多项授权给 Apache 软件基金会 (ASF)贡献者许可协议.请参阅随附的 NOTICE 文件这项工作以获取有关版权所有权的其他信息.ASF 根据 Apache 许可,版本 2.0 向您授予此文件的许可(许可证");除非符合以下规定,否则您不得使用此文件许可证.您可以在以下网址获取许可证副本http://www.apache.org/licenses/LICENSE-2.0除非适用法律要求或书面同意,否则软件根据许可证分发的分发是按原样"的基础分发的,不作任何形式的明示或暗示的保证或条件.请参阅许可证以了解管理权限的特定语言和许可证下的限制.====================================================================== */包 org.apache.poi.xssf.eventusermodel;导入 java.io.File;导入 java.io.IOException;导入 java.io.InputStream;导入 java.io.PrintStream;导入 javax.xml.parsers.ParserConfigurationException;导入 org.apache.poi.openxml4j.exceptions.OpenXML4JException;导入 org.apache.poi.openxml4j.opc.OPCPackage;导入 org.apache.poi.openxml4j.opc.PackageAccess;导入 org.apache.poi.ss.usermodel.DataFormatter;导入 org.apache.poi.ss.util.CellAddress;导入 org.apache.poi.ss.util.CellReference;导入 org.apache.poi.util.SAXHelper;导入 org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler;导入 org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor;导入 org.apache.poi.xssf.model.StylesTable;导入 org.apache.poi.xssf.usermodel.XSSFComment;导入 org.xml.sax.ContentHandler;导入 org.xml.sax.InputSource;导入 org.xml.sax.SAXException;导入 org.xml.sax.XMLReader;/*** 一个基本的 XLSX ->CSV 处理器以* POI 示例程序 XLS2CSVmra 来自包* org.apache.poi.hssf.eventusermodel.examples.* 与 HSSF 版本一样,这会尝试发现缺失* 行和单元格,并为它们输出空条目.* <p/>* 使用 SAX 解析器读取数据表以保持* 内存占用比较小,所以这个应该是* 能够阅读大量的工作簿.样式表和* 共享字符串表必须保存在内存中.这* 使用标准 POI 样式表类,但自定义*(只读)类用于共享字符串表* 因为标准的 POI SharedStringsTable 增长得非常快* 快速获取唯一字符串的数量.* <p/>* 更高级的 SAX 事件解析实现* XLSX 文件,请参阅 {@link XSSFEventBasedExcelExtractor}* 和 {@link XSSFSheetXMLHandler}.请注意,在许多情况下,* 可以简单地使用具有自定义的那些* {@link SheetContentsHandler} 并且不需要 SAX 代码* 你自己!*/公共类 XLSX2CSV {/*** 使用 XSSF Event SAX 助手完成大部分工作* 解析Sheet XML,并输出内容* 作为(基本)CSV.*/私有类 SheetToCSV 实现 SheetContentsHandler {私有布尔 firstCellOfRow = false;私有 int currentRow = -1;私人 int currentCol = -1;private void outputMissingRows(int number) {for (int i=0; iCSV转换器** @param pkg 要处理的 XLSX 包* @param output 将 CSV 输出到的 PrintStream* @param minColumns 要输出的最小列数,或 -1 表示没有最小值*/公共 XLSX2CSV(OPCPackage pkg,PrintStream 输出,int minColumns){this.xlsxPackage = pkg;this.output = 输出;this.minColumns = minColumns;}/*** 解析并显示一张表的内容* 使用指定的样式和共享字符串表.** @param 样式表中的单元格可能引用的样式表* @param strings 工作表中单元格可能引用的字符串表* @param sheetInputStream 要从中读取工作表数据的流.* @exception java.io.IOException 来自解析器的 IO 异常,* 可能来自字节流或字符流* 由应用程序提供.* @throws SAXException 如果解析 XML 数据失败.*/public void processSheet(样式表样式,ReadOnlySharedStringsTable 字符串,SheetContentsHandler sheetHandler,InputStream sheetInputStream) 抛出 IOException, SAXException {DataFormatter formatter = new DataFormatter();InputSource sheetSource = new InputSource(sheetInputStream);尝试 {XMLReader sheetParser = SAXHelper.newXMLReader();ContentHandler handler = new XSSFSheetXMLHandler(样式,空,字符串,sheetHandler,格式化程序,假);sheetParser.setContentHandler(handler);sheetParser.parse(sheetSource);} catch(ParserConfigurationException e) {throw new RuntimeException("SAX 解析器似乎坏了 - " + e.getMessage());}}/*** 开始将 XLS 工作簿文件处理为 CSV.** @throws IOException 如果从包中读取数据失败.* @throws SAXException 如果解析 XML 数据失败.*/public void process() 抛出 IOException、OpenXML4JException、SAXException {ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(this.xlsxPackage);XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);StylesTable 样式 = xssfReader.getStylesTable();XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();整数索引 = 0;而 (iter.hasNext()) {InputStream stream = iter.next();String sheetName = iter.getSheetName();this.output.println();this.output.println(sheetName + " [index=" + index + "]:");processSheet(样式,字符串,新的 SheetToCSV(),流);流.关闭();++索引;}}public static void main(String[] args) 抛出异常 {如果 (args.length < 1) {System.err.println("使用:");System.err.println(" XLSX2CSV  [最小列数]");返回;}File xlsxFile = new File(args[0]);如果 (!xlsxFile.exists()) {System.err.println("未找到或不是文件:" + xlsxFile.getPath());返回;}int minColumns = -1;如果 (args.length >= 2)minColumns = Integer.parseInt(args[1]);//包打开是即时的,应该是这样.OPCPackage p = OPCPackage.open(xlsxFile.getPath(), PackageAccess.READ);XLSX2CSV xlsx2csv = 新的 XLSX2CSV(p, System.out, minColumns);xlsx2csv.process();p.close();}}

输入格式图片 - 点击此处查看日期格式

输入:

date1 date21/1/1900 1/1/19002/28/2012 2/28/20123/15/1965 3/15/19652000 年 1 月 1 日 2000 年 1 月 1 日1/1/2100 1/1/21001/1/2115 1/1/2115

输出:

date1 date21/1/2000 1/1/19002/28/2012 2/28/20123/15/1965 3/15/19652000 年 1 月 1 日 2000 年 1 月 1 日1/1/2000 1/1/21001/1/2015 1/1/2115

如果您查看输入数据,单元格(即 date1 列)的格式是使用区域设置的星号,具有该格式的单元格会受到影响,因为 1900 被转换为 2000,因此任何高于 2099 的数据...如果单元格(即 Date2 列)的格式不带 *,然后值会按预期输出.这是此实用程序的限制还是有解决方法?

如果我在不应用区域设置格式的情况下以另一种方式格式化单元格,我会得到正确的输出.

解决方案

无法使用 apache poi 3.15 final 重现该行为.

您显示的代码产生以下输出:

Sheet1 [index=0]:日期1",日期2"1/1/00"、1/1/1900"2/28/12"、2/28/2012"3/15/65"、3/15/1965"1/1/00"、1/1/2000"1/1/00"、1/1/2100"1/1/15"、1/1/2115"

因此对于默认日期格式(格式 ID = 0xe = 短日期 = *3/14/2012)使用 m/d/yy.这是在 BuiltinFormats.

当然,如果您将该输出直接放入 CSV 文件中,则不存在世纪,Excel 在打开此 CSV 文件时必须猜测世纪.

您可以稍微更改代码,例如:

...public void processSheet(样式表样式,ReadOnlySharedStringsTable 字符串,SheetContentsHandler sheetHandler,InputStream sheetInputStream) 抛出 IOException, SAXException {DataFormatter 格式化程序 = new DataFormatter() {@覆盖public String formatRawCellContents(double value, int formatIndex, String formatString, boolean use1904Windowing) {if ("m/d/yy".equals(formatString)) formatString = "m/d/yyyy";返回 super.formatRawCellContents(value, formatIndex, formatString, use1904Windowing);}};InputSource sheetSource = new InputSource(sheetInputStream);尝试 {XMLReader sheetParser = SAXHelper.newXMLReader();ContentHandler handler = new XSSFSheetXMLHandler(样式,空,字符串,sheetHandler,格式化程序,假);sheetParser.setContentHandler(handler);sheetParser.parse(sheetSource);} catch(ParserConfigurationException e) {throw new RuntimeException("SAX 解析器似乎坏了 - " + e.getMessage());}}...

这将产生以下输出:

Sheet1 [index=0]:日期1",日期2"1/1/1900"、1/1/1900"2/28/2012"、2/28/2012"3/15/1965"、3/15/1965"2000 年 1 月 1 日"、2000 年 1 月 1 日"1/1/2100"、1/1/2100"1/1/2115"、1/1/2115"

When using below utility to convert large excel files to csv, some date values are converted incorrectly due to excel cell format defined as *format.

Code Location : https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/eventusermodel/XLSX2CSV.java

/* ====================================================================
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
==================================================================== */

package org.apache.poi.xssf.eventusermodel;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.PrintStream;

import javax.xml.parsers.ParserConfigurationException;

import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackageAccess;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.util.CellAddress;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.util.SAXHelper;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler;
import org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor;
import org.apache.poi.xssf.model.StylesTable;
import org.apache.poi.xssf.usermodel.XSSFComment;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;

/**
 * A rudimentary XLSX -> CSV processor modeled on the
 * POI sample program XLS2CSVmra from the package
 * org.apache.poi.hssf.eventusermodel.examples.
 * As with the HSSF version, this tries to spot missing
 *  rows and cells, and output empty entries for them.
 * <p/>
 * Data sheets are read using a SAX parser to keep the
 * memory footprint relatively small, so this should be
 * able to read enormous workbooks.  The styles table and
 * the shared-string table must be kept in memory.  The
 * standard POI styles table class is used, but a custom
 * (read-only) class is used for the shared string table
 * because the standard POI SharedStringsTable grows very
 * quickly with the number of unique strings.
 * <p/>
 * For a more advanced implementation of SAX event parsing
 * of XLSX files, see {@link XSSFEventBasedExcelExtractor}
 * and {@link XSSFSheetXMLHandler}. Note that for many cases,
 * it may be possible to simply use those with a custom
 * {@link SheetContentsHandler} and no SAX code needed of
 * your own!
 */
public class XLSX2CSV {
    /**
     * Uses the XSSF Event SAX helpers to do most of the work
     *  of parsing the Sheet XML, and outputs the contents
     *  as a (basic) CSV.
     */
    private class SheetToCSV implements SheetContentsHandler {
        private boolean firstCellOfRow = false;
        private int currentRow = -1;
        private int currentCol = -1;

        private void outputMissingRows(int number) {
            for (int i=0; i<number; i++) {
                for (int j=0; j<minColumns; j++) {
                    output.append(',');
                }
                output.append('\n');
            }
        }

        @Override
        public void startRow(int rowNum) {
            // If there were gaps, output the missing rows
            outputMissingRows(rowNum-currentRow-1);
            // Prepare for this row
            firstCellOfRow = true;
            currentRow = rowNum;
            currentCol = -1;
        }

        @Override
        public void endRow(int rowNum) {
            // Ensure the minimum number of columns
            for (int i=currentCol; i<minColumns; i++) {
                output.append(',');
            }
            output.append('\n');
        }

        @Override
        public void cell(String cellReference, String formattedValue,
                XSSFComment comment) {
            if (firstCellOfRow) {
                firstCellOfRow = false;
            } else {
                output.append(',');
            }

            // gracefully handle missing CellRef here in a similar way as XSSFCell does
            if(cellReference == null) {
                cellReference = new CellAddress(currentRow, currentCol).formatAsString();
            }

            // Did we miss any cells?
            int thisCol = (new CellReference(cellReference)).getCol();
            int missedCols = thisCol - currentCol - 1;
            for (int i=0; i<missedCols; i++) {
                output.append(',');
            }
            currentCol = thisCol;

            // Number or string?
            try {
                //noinspection ResultOfMethodCallIgnored
                Double.parseDouble(formattedValue);
                output.append(formattedValue);
            } catch (NumberFormatException e) {
                output.append('"');
                output.append(formattedValue);
                output.append('"');
            }
        }

        @Override
        public void headerFooter(String text, boolean isHeader, String tagName) {
            // Skip, no headers or footers in CSV
        }
    }


    ///////////////////////////////////////

    private final OPCPackage xlsxPackage;

    /**
     * Number of columns to read starting with leftmost
     */
    private final int minColumns;

    /**
     * Destination for data
     */
    private final PrintStream output;

    /**
     * Creates a new XLSX -> CSV converter
     *
     * @param pkg        The XLSX package to process
     * @param output     The PrintStream to output the CSV to
     * @param minColumns The minimum number of columns to output, or -1 for no minimum
     */
    public XLSX2CSV(OPCPackage pkg, PrintStream output, int minColumns) {
        this.xlsxPackage = pkg;
        this.output = output;
        this.minColumns = minColumns;
    }

    /**
     * Parses and shows the content of one sheet
     * using the specified styles and shared-strings tables.
     *
     * @param styles The table of styles that may be referenced by cells in the sheet
     * @param strings The table of strings that may be referenced by cells in the sheet
     * @param sheetInputStream The stream to read the sheet-data from.

     * @exception java.io.IOException An IO exception from the parser,
     *            possibly from a byte stream or character stream
     *            supplied by the application.
     * @throws SAXException if parsing the XML data fails.
     */
    public void processSheet(
            StylesTable styles,
            ReadOnlySharedStringsTable strings,
            SheetContentsHandler sheetHandler,
            InputStream sheetInputStream) throws IOException, SAXException {
        DataFormatter formatter = new DataFormatter();
        InputSource sheetSource = new InputSource(sheetInputStream);
        try {
            XMLReader sheetParser = SAXHelper.newXMLReader();
            ContentHandler handler = new XSSFSheetXMLHandler(
                  styles, null, strings, sheetHandler, formatter, false);
            sheetParser.setContentHandler(handler);
            sheetParser.parse(sheetSource);
         } catch(ParserConfigurationException e) {
            throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
         }
    }

    /**
     * Initiates the processing of the XLS workbook file to CSV.
     *
     * @throws IOException If reading the data from the package fails.
     * @throws SAXException if parsing the XML data fails.
     */
    public void process() throws IOException, OpenXML4JException, SAXException {
        ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(this.xlsxPackage);
        XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
        StylesTable styles = xssfReader.getStylesTable();
        XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
        int index = 0;
        while (iter.hasNext()) {
            InputStream stream = iter.next();
            String sheetName = iter.getSheetName();
            this.output.println();
            this.output.println(sheetName + " [index=" + index + "]:");
            processSheet(styles, strings, new SheetToCSV(), stream);
            stream.close();
            ++index;
        }
    }

    public static void main(String[] args) throws Exception {
        if (args.length < 1) {
            System.err.println("Use:");
            System.err.println("  XLSX2CSV <xlsx file> [min columns]");
            return;
        }

        File xlsxFile = new File(args[0]);
        if (!xlsxFile.exists()) {
            System.err.println("Not found or not a file: " + xlsxFile.getPath());
            return;
        }

        int minColumns = -1;
        if (args.length >= 2)
            minColumns = Integer.parseInt(args[1]);

        // The package open is instantaneous, as it should be.
        OPCPackage p = OPCPackage.open(xlsxFile.getPath(), PackageAccess.READ);
        XLSX2CSV xlsx2csv = new XLSX2CSV(p, System.out, minColumns);
        xlsx2csv.process();
        p.close();
    }
}

Input format Image - Click here to see the formatting for dates

date1        date2
1/1/1900    1/1/1900
2/28/2012   2/28/2012
3/15/1965   3/15/1965
1/1/2000    1/1/2000
1/1/2100    1/1/2100
1/1/2115    1/1/2115
date1        date2
1/1/2000    1/1/1900
2/28/2012   2/28/2012
3/15/1965   3/15/1965
1/1/2000    1/1/2000
1/1/2000    1/1/2100
1/1/2015    1/1/2115

If you look at the input data, cells (i.e. date1 column) is formatted with asterisk which is utilizing regional settings, cell with that format are getting affected as 1900 is converted as 2000 so is any data above 2099... If the Cell (i.e. Date2 Column) is formatted without * then value is coming out as expected. Is that a limitation with this utility or is there a workaround ?

If I format the cell the other way without applying regional settings format, I get the correct output.

解决方案

Cannot reproducing that behavior using apache poi 3.15 final.

The code you have shown produces the following output:

Sheet1 [index=0]:
"date1","date2"
"1/1/00","1/1/1900"
"2/28/12","2/28/2012"
"3/15/65","3/15/1965"
"1/1/00","1/1/2000"
"1/1/00","1/1/2100"
"1/1/15","1/1/2115"

So for the default date format (format-id = 0xe = short date = *3/14/2012) m/d/yy is used. This is the one which is defined in BuiltinFormats.

Of course, if you put that output directly into a CSV file, then there is not a century present and Excel must guessing the century while opening this CSV file.

You could changing the code a little bit, like:

...
    public void processSheet(
            StylesTable styles,
            ReadOnlySharedStringsTable strings,
            SheetContentsHandler sheetHandler,
            InputStream sheetInputStream) throws IOException, SAXException {
        DataFormatter formatter = new DataFormatter() {
            @Override
            public String formatRawCellContents(double value, int formatIndex, String formatString, boolean use1904Windowing) {
                if ("m/d/yy".equals(formatString)) formatString = "m/d/yyyy";
                return super.formatRawCellContents(value, formatIndex, formatString, use1904Windowing);
            }
        };
        InputSource sheetSource = new InputSource(sheetInputStream);
        try {
            XMLReader sheetParser = SAXHelper.newXMLReader();
            ContentHandler handler = new XSSFSheetXMLHandler(
                  styles, null, strings, sheetHandler, formatter, false);
            sheetParser.setContentHandler(handler);
            sheetParser.parse(sheetSource);
         } catch(ParserConfigurationException e) {
            throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
         }
    }
...

This will produce the following output:

Sheet1 [index=0]:
"date1","date2"
"1/1/1900","1/1/1900"
"2/28/2012","2/28/2012"
"3/15/1965","3/15/1965"
"1/1/2000","1/1/2000"
"1/1/2100","1/1/2100"
"1/1/2115","1/1/2115"

这篇关于Java:Apache Poi 的 excel 到 csv 日期转换问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 21:44