



I am working in an automation project in order to learn new tricks with java and data science (at the very easy level), everything self taught.



Here is an example .csv file of how I store this data.

Date when obtained
Format for identifying the numbers below


CSV I am currently using.

Tejido,321 908,13.55,43.18,$15.98,
Ropa,195 045,20.55,45.93,$123.01,
Gorra de visera,126 561,17.43,42.32,$79.54,
Cerveza,80 109,3.37,17.93,$12.38,
Mercancías de playa,75 065,11.48,39.73,$105.93,
Bebidas alcohólicas,31 215,4.84,27.90,$32.29,
Artículos de cuero,19 098,23.13,44.09,$198.74,
Bolsas y carteras,7 754,23.09,41.34,$1 176.54,
Tejido,252 229,12.86,43.14,$18.87,
Ropa,132 392,18.09,46.02,$177.58,
Gorra de visera,87 676,14.42,42.46,$122.48,
Cerveza,44 593,2.72,17.79,$18.71,
Mercancías de playa,44 593,8.26,39.56,$200.78,
Bebidas alcohólicas,27 306,4.30,23.88,$31.95,
Artículos de cuero,16 147,21.08,43.91,$207.49,
Bolsas y carteras,6 552,21.11,40.59,$1 195.41,
Tejido,321 908,13.55,43.18,$15.98,
Ropa,195 045,20.55,45.93,$123.01,
Gorra de visera,126 561,17.43,42.32,$79.54,
Cerveza,80 109,3.37,17.93,$12.38,
Mercancías de playa,75 065,11.48,39.73,$105.93,
Bebidas alcohólicas,31 215,4.84,27.90,$32.29,
Artículos de cuero,19 098,23.13,44.09,$198.74,
Bolsas y carteras,7 754,23.09,41.34,$1 176.54,
Tejido,321 908,13.55,43.18,$15.98,
Ropa,195 045,20.55,45.93,$123.01,
Gorra de visera,126 561,17.43,42.32,$79.54,
Cerveza,80 109,3.37,17.93,$12.38,
Mercancías de playa,75 065,11.48,39.73,$105.93,
Bebidas alcohólicas,31 215,4.84,27.90,$32.29,
Artículos de cuero,19 098,23.13,44.09,$198.74,
Bolsas y carteras,7 754,23.09,41.34,$1 176.54,


I want to make it dynamic and also bigger. Instead of multiple .csv files classified by date I decided to have one big .csv file to store everything and that is the result.


The code I used so far can read a single .csv but if I add more data below. It doesn't work. I know it is something related with the loop as I see in the debugger, but still can't find the right solution.


public class CSVinput {

    static String[] nombre = new String[8];
    static int[] cantidad = new int[8];
    static double[] calidad = new double[8];
    static double[] realmQ = new double[8];
    static double[] coste = new double[8];

public static void ImportData(String path) throws FileNotFoundException
    /*Can only load one csv with 8 stuff in it*/
    System.out.println("Presenting data...");

        try (Scanner scan = new Scanner(new File(path))) {
            String date = scan.nextLine();
            System.out.println("fecha: " + date);

            int index = 0;
            while(scan.hasNext() == true)
                    String name = scan.next().replaceAll("\n", "");
                        nombre[index] = name;
                    System.out.println("nombre: " + name);
                    int quantity = Integer.parseInt(scan.next().replaceAll(" ", ""));
                        cantidad[index] = quantity;
                    System.out.println("cantidad: " + quantity);
                    double quality = Double.parseDouble(scan.next());
                        calidad[index] = quality;
                    System.out.println("calidad: " + quality);
                    double realmq = Double.parseDouble(scan.next());
                        realmQ[index] = realmq;
                    System.out.println("realmQ: " + realmq);
                    double cost = Double.parseDouble(scan.next().replace("$", "").replace(" ", ""));
                        coste[index] = cost;
                    System.out.println("coste: $" + cost);

                } catch(ArrayIndexOutOfBoundsException e){}

   public static void main(String[] args) throws FileNotFoundException



This code posted is the one that works with a single .csv and that means you need to input this and the code should "split" the data too make it easy to work with.

Tejido,321 908,13.55,43.18,$15.98,
Ropa,195 045,20.55,45.93,$123.01,
Gorra de visera,126 561,17.43,42.32,$79.54,
Cerveza,80 109,3.37,17.93,$12.38,
Mercancías de playa,75 065,11.48,39.73,$105.93,
Bebidas alcohólicas,31 215,4.84,27.90,$32.29,
Artículos de cuero,19 098,23.13,44.09,$198.74,
Bolsas y carteras,7 754,23.09,41.34,$1 176.54,



Was that if I add more .csv data below the previous one (appended), I want it to read it, no matter how big is the .csv


Thanks for the interest in this question.



发明了 CSV格式,以表示单个简单的数据表. 制表符分隔文件的同上.

CSV ➙ flat table

The CSV format was invented to represent a single simple flat table of data. Ditto for Tab-delimited files.


You have a hierarchy of a date mapping to a collection of name-quantity-quality-realmQ-cost tuples. That is not simple flat tabular data.


If you want to store that in CSV, you must flatten by adding a column for the date and repeating the date value across the collection of tuples, to become date-name-quantity-quality-realmQ-cost tuples.

2018-12-29,Tejido,321 908,13.55,43.18,$15.98
2018-12-29,Ropa,195 045,20.55,45.93,$123.01
2018-12-29,Gorra de visera,126 561,17.43,42.32,$79.54
2018-12-29,Cerveza,80 109,3.37,17.93,$12.38
2018-12-29,Mercancías de playa,75 065,11.48,39.73,$105.93
2018-12-29,Bebidas alcohólicas,31 215,4.84,27.90,$32.29
2018-12-29,Artículos de cuero,19 098,23.13,44.09,$198.74
2018-12-29,Bolsas y carteras,7 754,23.09,41.34,$1 176.54


That data could now be read and written to CSV files.


And watch your delimiters. Notice there should be no comma after the last field of each row.

Apache Commons CSV 库将执行CSV为您解析,阅读和编​​写.几次对我来说效果很好.

The Apache Commons CSV library will perform the CSV parsing, reading, and writing for you. It has worked well for me a few times.


Let’s parse a data.csv file with this content, with a flattened version of your example data. The data has been cleaned up:

  • 将日期切换为标准ISO 8601格式
  • 已消除空格字符(整数)
  • 删除了$字符
  • 删除每行末尾的多余逗号
  • 将产品名称翻译为英语(对于此英文版的Stack Overflow).
  • Switched dates to standard ISO 8601 format
  • Eliminated SPACE character in integer numbers
  • Removed $ character
  • Deleted the extra comma at end of each row
  • Translated the product names to English (for this English edition of Stack Overflow).
2018-12-29,Visor Cap,126561,17.43,42.32,79.54
2018-12-29,Beach goods,75065,11.48,39.73,105.93
2018-12-29,Alcoholic beverages,31215,4.84,27.90,32.29
2018-12-29,Leather goods,19098,23.13,44.09,198.74
2018-12-29,Bags and wallets,7754,23.09,41.34,1176.54
2018-12-30,Visor Cap,87676,14.42,42.46,122.48
2018-12-30,Beach goods,44593,8.26,39.56,200.78
2018-12-30,Alcoholic beverages,27306,4.30,23.88,31.95
2018-12-30,Leather goods,16147,21.08,43.91,207.49
2018-12-30,Bags and wallets,6552,21.11,40.59,1195.41
2019-01-02,Visor Cap,126561,17.43,42.32,79.54
2019-01-02,Beach goods,75065,11.48,39.73,105.93
2019-01-02,Alcoholic beverages,31215,4.84,27.90,32.29
2019-01-02,Leather goods,19098,23.13,44.09,198.74
2019-01-02,Bags and wallets,7754,23.09,41.34,1176.54
2019-01-03,Visor Cap,126561,17.43,42.32,79.54
2019-01-03,Beach goods,75065,11.48,39.73,105.93
2019-01-03,Alcoholic beverages,31215,4.84,27.90,32.29
2019-01-03,Leather goods,19098,23.13,44.09,198.74
2019-01-03,Bags and wallets,7754,23.09,41.34,1176.54


We define a class to hold each tuple.

package com.basilbourque.example;

import java.math.BigDecimal;
import java.time.LocalDate;
import java.util.Objects;

public class DailyProduct {
    // date,name,quantity,quality,realmQ,cost
    // 2018-12-29,Fabric,321908,13.55,43.18,15.98
    // 2018-12-29,Clothing,195045,20.55,45.93,123.01
    // 2018-12-29,Visor Cap,126561,17.43,42.32,79.54
    // 2018-12-29,Beer,80109,3.37,17.93,12.38
    // 2018-12-29,Beach goods,75065,11.48,39.73,105.93
    // 2018-12-29,Alcoholic beverages,31215,4.84,27.90,32.29
    // 2018-12-29,Leather goods,19098,23.13,44.09,198.74
    // 2018-12-29,Bags and wallets,7754,23.09,41.34,1176.54

    public enum Header {

    // ----------|  Member vars  |-----------------------------------
    public LocalDate localDate;
    public String name;
    public Integer quantity;
    public BigDecimal quality, realmQ, cost;

    // ----------|  Constructor  |-----------------------------------
    public DailyProduct ( LocalDate localDate , String name , Integer quantity , BigDecimal quality , BigDecimal realmq , BigDecimal cost ) {
        this.localDate = Objects.requireNonNull( localDate );
        this.name = Objects.requireNonNull( name );
        this.quantity = Objects.requireNonNull( quantity );
        this.quality = Objects.requireNonNull( quality );
        this.realmQ = Objects.requireNonNull( realmq );
        this.cost = Objects.requireNonNull( cost );

    // ----------|  `Object` overrides  |-----------------------------------
    public String toString ( ) {
        return "com.basilbourque.example.DailyProduct{ " +
                "localDate=" + localDate +
                " | name='" + name + '\'' +
                " | quantity=" + quantity +
                " | quality=" + quality +
                " | realmq=" + realmQ +
                " | cost=" + cost +
                " }";

    public boolean equals ( Object o ) {
        if ( this == o ) return true;
        if ( o == null || getClass() != o.getClass() ) return false;
        DailyProduct that = ( DailyProduct ) o;
        return localDate.equals( that.localDate ) &&
                name.equals( that.name );

    public int hashCode ( ) {
        return Objects.hash( localDate , name );



Write a class to read and write files containing the data of the DailyProduct objects.

package com.basilbourque.example;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;

import java.io.BufferedReader;
import java.io.IOException;
import java.math.BigDecimal;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Instant;
import java.time.LocalDate;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;

public class DailyProductFileHandler {
    public List < DailyProduct > read ( Path path ) {
        // TODO: Add a check for valid file existing.

        List < DailyProduct > list = List.of();  // Default to empty list.
        try {
            // Prepare list.
            int initialCapacity = ( int ) Files.lines( path ).count();
            list = new ArrayList <>( initialCapacity );

            // Read CSV file. For each row, instantiate and collect `DailyProduct`.
            BufferedReader reader = Files.newBufferedReader( path );
            Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );
            for ( CSVRecord record : records ) {
                // date,name,quantity,quality,realmQ,cost
                LocalDate localDate = LocalDate.parse( record.get( "date" ) );
                String name = record.get( "name" );
                Integer quantity = Integer.valueOf( record.get( "quantity" ) );
                BigDecimal quality = new BigDecimal( record.get( "quality" ) );
                BigDecimal realmQ = new BigDecimal( record.get( "realmQ" ) );  // Note: case-sensitive.
                BigDecimal cost = new BigDecimal( record.get( "cost" ) );
                // Instantiate `DailyProduct` object, and collect it.
                DailyProduct dailyProduct = new DailyProduct( localDate , name , quantity , quality , realmQ , cost );
                list.add( dailyProduct );
        } catch ( IOException e ) {
        return list;

    public void write ( final List < DailyProduct > dailyProducts , final Path path ) {
        try ( final CSVPrinter printer = CSVFormat.RFC4180.withHeader( "date" , "name" , "quantity" , "quality" , "realmQ" , "cost" ).print( path , StandardCharsets.UTF_8 ) ; ) {
            for ( DailyProduct dp : dailyProducts ) {
                printer.printRecord( dp.localDate , dp.name , dp.quantity , dp.quality , dp.realmQ , dp.cost );
        } catch ( IOException e ) {

    public static void main ( final String[] args ) {
        DailyProductFileHandler fileHandler = new DailyProductFileHandler();

        Path pathInput = Paths.get( "/Users/basilbourque/data.csv" );
        List < DailyProduct > list = fileHandler.read( pathInput );
        System.out.println( list );

        String when = Instant.now().truncatedTo( ChronoUnit.SECONDS ).toString().replace( ":" , "•" );
        Path pathOutput = Paths.get( "/Users/basilbourque/data_" + when + ".csv" );
        fileHandler.write( list , pathOutput );
        System.out.println( "Writing file: " + pathOutput );



Writing file: /Users/basilbourque/data_2019-01-05T03•48•37Z.csv

ISO 8601

在将日期时间值序列化为文本时,请始终使用标准 ISO 8601 格式.对于没有日期且没有时区的仅日期值,该值为YYYY-MM-DD.

ISO 8601

By the way, when serializing date-time values to text, always use the standard ISO 8601 formats. For a date-only value without time-of-day and without time zone, that would be YYYY-MM-DD.

如果要保留层次结构,请使用 CSV 以外的其他文件格式.通常是 XML JSON 用于此类数据.

If you want to preserve the hierarchy, use some file format other than CSV. Commonly XML or JSON is used for such data.


Your Question does not provide enough detail to know for certain, but I get the feeling you should be using a database rather than text files. If you are reading, editing, and appending new data, for large amounts of data (large meaning enough to be concerned about impacting memory limits) or you are using multiple processes/threads/users, then a database is called for. A database is designed to efficiently handle data too large to fit entirely into memory. And a database is designed to handle concurrent access.

那不是您所说的大"字.甚至 Raspberry Pi Beaglebone Black 具有足够的 RAM 将数千个这样的元组加载到内存中.

That is not "large" as you put it. Even a Raspberry Pi or Beaglebone Black has enough RAM to load several thousand of such tuples into memory.

您需要学习 Java集合框架,而不是使用简单的数组.

You need to learn about Java Collections Framework, rather than using simple arrays.

特别是,通常使用 Map (也被某些人称为字典 ).此数据结构是键-值对的集合,其中日期为您的 Set List 将是您的.

In particular, your date-to-tuple hierarchy would commonly be represented by using a Map (also called a dictionary by some folks). This data structure is a collection of key-value pairs, where the date would be your key and a Set or List of your tuples would be your value.


Define a class for your tuple data, named something like Product. Add member variables: name, quantity, quality, realmq, and cost. Instantiate an object for each tuple.

创建 Map ,例如 TreeMap .成为 可以使您的日期按时间顺序排列.

Create a Map such as a TreeMap. Being a SortedMap it keeps your dates in chronological order.

SortedMap< Product > map = new TreeMap<>() ;

使用LocalDate作为日期值,即地图中的 key .

Use LocalDate for your date values, the key in your map.

LocalDate ld = LocalDate.of( 2018 , 1 , 23 ) ;
map.put( ld , new ArrayList< Product >() ) ; // Pass an initial capacity in those parens if you know a likely size of the list.


For each Product object, retrieve the list from the map for the relevant date, add the product to the list.


When serializing, use an XML or JSON framework to write the map to storage.

或者自己做,编写自己的数据格式.从地图上获取所有键,循环它们,将每个日期写入文件.并针对每个日期从地图中提取其列表(每个键的每个值).在列表中循环Product对象.写下每个产品的成员变量.使用任何字段和行定界符.尽管由于我从未理解的原因而很少使用,但ASCII(Unicode的一个子集)具有特定的分隔符.我建议您使用这些分隔符. 代码点:

Or do so yourself, writing your own data format. Get all the keys from the map, loop them, writing each date to file. And for each date, extract its list from the map (each value for each key). Loop the Product objects in the list. Write out each product’s member variables. Use any field and row delimiters. Though not often used for reasons I have never understood, ASCII (a subset of Unicode) has specific delimiter characters. I suggest you use these separators. The code points:

  • 字段31(信息分隔符一)
  • 每行30条(信息分隔符2)
  • 组29个(信息分隔符3)
  • 28个文件(信息分隔符四)

所有这些问题在Stack Overflow上已经得到了很多解决.搜索以了解更多信息.

All of these issues have been addressed many times on Stack Overflow. Search to learn more.


When serializing data, do not include extraneous text.

cost列中的$只是噪音.如果要表示一种特定的货币,那么简单的$可能无法完成工作,因为它可能是加元,美元,墨西哥比索或其他货币.因此,请使用标准货币符号,例如CAD& USD& MXN.如果所有值都使用一种已知的货币(例如CAD),请完全省略"$".

The $ in your cost column is just noise. If you meant to indicated a particular currency, a simple $ fails to do the job as it could be Canadian dollars, United States dollars, Mexican pesos, or perhaps other currencies. So use a standard currency symbol such as CAD & USD & MXN. If all the values are in a single known currency such as CAD, then omit the ‘$’ entirely.


Preface: If you are frequently moving data in and out of these files for updating, you should be using a database rather than text files.


No need to worry about performance of CSV versus XML versus JSON.


Firstly, you are falling into the evil trap of premature optimization (google/duckduckgo that phrase).


Secondly, you would have to have enormous amount of data frequently processed to have any performance difference be significant, far beyond that of common business apps. Accessing files of any format from storage, even from SSD drives, is so slow that it dwarfs time taken for the CPU-driven processing of the data.


Choose a format based on fitting the needs of your data and app.


For simple flat data, use CSV or Tab-delimited or the ASCII/Unicode codes for delimiting (codepoints 28-31).

对于分层数据,请使用XML. XML具有可以通过规范非常精确地定义的优点.已经为XML构建了许多工具. XML Schema也定义明确.这提供了一种强大的方法,可以在尝试处理之前验证传入的数据文件.

For hierarchical data, use XML. XML has the advantage of being very precisely defined by specification. So much tooling has been built for XML. And XML Schema is also well-defined. This provides a powerful way to validate incoming data files before attempting to process.

对于JSON,仅在必要时使用,并且仅用于少量相对简单的数据.它缺少XML的定义明确的规范和架构.它不适用于深层次结构或庞大的集合. JSON之所以存在,是因为它对JavaScript程序员来说很方便,并且因为IT行业对重新发明轮子的自虐倾向.

As for JSON, use only if you must, and only for small amounts of relatively simple data. It lacks the well-defined specs and schema of XML. It is not intended to work well with deep hierarchies or vast collections. JSON only exists because it is convenient for JavaScript programmers, and because of the IT industry’s masochistic penchant for reinventing the wheel over and over again.


XML and JSON share one major advantage: binding. In the Java world, there are both standard and handy-but-non-standard frameworks for automatically serializing your Java object’s as XML or JSON text. Going the other direction, the frameworks can instantiate Java objects directly from your incoming XML/JSON. So you needn’t write code yourself to handle each field of data.

对于问题"中显示的简单数据,此绑定功能不值得打扰.为此,使用 Apache Commons CSV 的CSV或制表符分隔是合适的,如本答案所示.

This binding feature is not worth the bother for the simple data shown in the Question. For that, CSV or Tab-delimited is appropriate, with Apache Commons CSV as shown in this Answer.


Tip: You should send a hash (MD5, SHA, etc) of each data file. Upon receiving the file and the hash, the receiving computer recalculates the hash of the incoming file. Then compare hash results to verify that the data file arrived without corruption in its data.


08-14 12:17