How to read multiple CSV files from a folder in Spring Boot Batch

The csv file can be read in the spring boot batch application using the ItemReader implemented java class FlatFileItemReader. In the spring boot batch application, the FlatFileItemReader class reads the csv file data and converts it to an object. In this post, we’ll look at how to read a CSV file in the spring boot batch application. The spring boot batch will read a large amount of data, process it, and save it to the destination. The entire batch process will be carried out automatically, with no human interaction.

Reading from a CSV file is typical in most practical applications. In this section, we will look at how to read a CVS file in Spring Boot Batch. The CSV (Comma separated values) files contain each record’s value in a comma separated string on a single line. The spring boot batch will read and parse the comma separated string. Using the reflection method, the values are saved in the java bean object. Spring boot batch includes classes and APIs for converting comma separated data to java objects and vice versa.

The FlatFileItemReader class in Spring Boot Batch can read data from CSV files. The FlatFileItemReader reads a CSV file and transforms it to the necessary format, such as a Java bean object, String, or Java collection object. In this section, we’ll look at how to read data from a CSV file and how to read data from a folder containing several CSV files in a spring boot batch application.



Sample Test Data

The CVS file format is used to transmit data from one application to another. The source application provides the cvs file from the source database and puts it in the pre-configured location. If it finds a csv file in the pre-configured folder, the other application will begin processing the data. The following input.csv content is added in the spring boot batch application. It will be placed in the data folder.

data/input.csv

id,name.salary
1,name01,1000
2,name02,2000
3,name03,3000


How to read data from a CSV file

The spring boot batch configuration and FlatFileItemReader are used to read a CSV file in the spring boot batch. The java class’s code is shown below. The FlatFileItemReader class is extended to implement the ItemReader interface. The code for creating batch jobs and batch steps will be included in the spring boot batch configuration.

BatchConfig.java

package com.yawintutor;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
@EnableBatchProcessing
public class BatchConfig {
	@Autowired
	public JobBuilderFactory jobBuilderFactory;

	@Autowired
	public StepBuilderFactory stepBuilderFactory;
	
	@Autowired
	MyCustomReader myCustomReader;
	
	@Autowired
	MyCustomWriter myCustomWriter;

	@Bean
	public Job createJob() {
		return jobBuilderFactory.get("MyJob")
				.incrementer(new RunIdIncrementer())
				.flow(createStep()).end().build();
	}

	@Bean
	public Step createStep() {
		return stepBuilderFactory.get("MyStep")
				.<Employee, Employee> chunk(1)
				.reader(myCustomReader)
				.writer(myCustomWriter)
				.build();
	}	
}

The ItemReader interface is used to read csv data from a file. The FlatFileItemReader class is used by the spring boot batch to read the csv file. The FlatFileItemReader class reads each line of the file and converts it to an Employee object. An exception will be thrown if an error occurs. In the FlatFileItemReader class, the Employee object properties must be specified.

MyCustomReader.java

package com.yawintutor;

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.core.io.FileSystemResource;
import org.springframework.stereotype.Component;

@Component
public class MyCustomReader extends FlatFileItemReader<Employee> implements ItemReader<Employee>{
	
	public MyCustomReader() {
		setResource(new FileSystemResource("data/input.csv"));
		setLinesToSkip(1); 
		setLineMapper(getDefaultLineMapper());
	}
	
	public DefaultLineMapper<Employee> getDefaultLineMapper() {
		DefaultLineMapper<Employee> defaultLineMapper = new DefaultLineMapper<Employee>();
		
		DelimitedLineTokenizer delimitedLineTokenizer =new DelimitedLineTokenizer();
		delimitedLineTokenizer.setNames(new String[] { "id", "name", "salary" });
		defaultLineMapper.setLineTokenizer(delimitedLineTokenizer);
		
		BeanWrapperFieldSetMapper<Employee> beanWrapperFieldSetMapper = new BeanWrapperFieldSetMapper<Employee>();
		beanWrapperFieldSetMapper.setTargetType(Employee.class);
		defaultLineMapper.setFieldSetMapper(beanWrapperFieldSetMapper);
		
		return defaultLineMapper;
	}
}

When you run the spring boot batch application, the data from the csv file is read line by line and processed before being converted into an Employee object. The Employee object data will be printed in the console window by the ItemWriter. The employee object data printed from ItemWriter in the console window is shown in the log below.

2021-07-25 08:39:53.949  INFO 2178 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=MyJob]] launched with the following parameters: [{run.id=1, time=2021-07-25 08:39:45.239}]
2021-07-25 08:39:53.986  INFO 2178 --- [           main] o.s.batch.core.job.SimpleStepHandler     : Executing step: [MyStep]
MyCustomWriter    : Writing data    : 1 : name01 : 1000
MyCustomWriter    : Writing data    : 2 : name02 : 2000
MyCustomWriter    : Writing data    : 3 : name03 : 3000
2021-07-25 08:39:54.023  INFO 2178 --- [           main] o.s.batch.core.step.AbstractStep         : Step: [MyStep] executed in 37ms
2021-07-25 08:39:54.027  INFO 2178 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=MyJob]] completed with the following parameters: [{run.id=1, time=2021-07-25 08:39:45.239}] and the following status: [COMPLETED] in 51ms


How to read data from CSV files in a folder

In the real world, the input csv files are saved in a folder. One or more csv files may be found in the folder. The spring boot batch should read and process each file individually. The following example shows how to read all CSV files from a folder and process them using spring boot batch.

BatchConfig.java

package com.yawintutor;

import java.io.IOException;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.file.MultiResourceItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.Resource;
import org.springframework.core.io.support.PathMatchingResourcePatternResolver;
import org.springframework.core.io.support.ResourcePatternResolver;

@Configuration
@EnableBatchProcessing
public class BatchConfig {
	@Autowired
	public JobBuilderFactory jobBuilderFactory;

	@Autowired
	public StepBuilderFactory stepBuilderFactory;
	
	@Autowired
	MyCustomReader myCustomReader;
	
	@Autowired
	MyCustomWriter myCustomWriter;

	@Bean
	public Job createJob() {
		return jobBuilderFactory.get("MyJob")
				.incrementer(new RunIdIncrementer())
				.flow(createStep()).end().build();
	}

	@Bean
	public Step createStep() {
		return stepBuilderFactory.get("MyStep")
				.<Employee, Employee> chunk(1)
				.reader(reader())
				.writer(myCustomWriter)
				.build();
	}
	
	@Bean
	public ItemReader<Employee> reader() {
	    Resource[] resources = null;
	    ResourcePatternResolver patternResolver = new PathMatchingResourcePatternResolver();   
	    try {
	        resources = patternResolver.getResources("file:./data/*.csv");
	    } catch (IOException e) {
	        e.printStackTrace();
	    }

	    MultiResourceItemReader<Employee> reader = new MultiResourceItemReader<>();
	    reader.setResources(resources);
	    reader.setDelegate(myCustomReader);
	    return reader;
	}
}

To read csv data from a file, use the ItemReader interface. The spring boot batch reads the csv file using the FlatFileItemReader class. Each line of the file is read by the FlatFileItemReader class and converted to an Employee object. If an error occurs, an exception will be thrown. The Employee object properties must be provided in the FlatFileItemReader class.

MyCustomReader.java

package com.yawintutor;

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.stereotype.Component;

@Component
public class MyCustomReader extends FlatFileItemReader<Employee> implements ItemReader<Employee>{
	
	public MyCustomReader() {
		//setResource(new FileSystemResource("data/input.csv"));
		setLinesToSkip(1); 
		setLineMapper(getDefaultLineMapper());
	}
	
	public DefaultLineMapper<Employee> getDefaultLineMapper() {
		DefaultLineMapper<Employee> defaultLineMapper = new DefaultLineMapper<Employee>();
		
		DelimitedLineTokenizer delimitedLineTokenizer =new DelimitedLineTokenizer();
		delimitedLineTokenizer.setNames(new String[] { "id", "name", "salary" });
		defaultLineMapper.setLineTokenizer(delimitedLineTokenizer);
		
		BeanWrapperFieldSetMapper<Employee> beanWrapperFieldSetMapper = new BeanWrapperFieldSetMapper<Employee>();
		beanWrapperFieldSetMapper.setTargetType(Employee.class);
		defaultLineMapper.setFieldSetMapper(beanWrapperFieldSetMapper);
		
		return defaultLineMapper;
	}
}

When you start the spring boot batch application, the data from the csv files are read and processed line by line before being converted into an Employee object. The ItemWriter will print the Employee object data in the terminal window. The log below shows the employee object data printed in multiple csv file from ItemWriter in the terminal window.

Batch job starting
2021-07-25 09:08:53.427  INFO 3729 --- [   scheduling-1] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=MyJob]] launched with the following parameters: [{time=2021-07-25 09:08:53.420}]
2021-07-25 09:08:53.433  INFO 3729 --- [   scheduling-1] o.s.batch.core.job.SimpleStepHandler     : Executing step: [MyStep]
MyCustomWriter    : Writing data    : 1 : name01 : 1000
MyCustomWriter    : Writing data    : 2 : name02 : 2000
MyCustomWriter    : Writing data    : 3 : name03 : 3000
MyCustomWriter    : Writing data    : 21 : name21 : 21000
MyCustomWriter    : Writing data    : 22 : name22 : 22000
MyCustomWriter    : Writing data    : 23 : name23 : 23000
2021-07-25 09:08:53.448  INFO 3729 --- [   scheduling-1] o.s.batch.core.step.AbstractStep         : Step: [MyStep] executed in 15ms
2021-07-25 09:08:53.450  INFO 3729 --- [   scheduling-1] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=MyJob]] completed with the following parameters: [{time=2021-07-25 09:08:53.420}] and the following status: [COMPLETED] in 22ms
Batch job executed successfully


Other dependent files

The other files required to run the entire spring boot application are mentioned here.

Employee.java

package com.yawintutor;

public class Employee {
	private int id;
	private String name;
	private int salary;
	public int getId() {
		return id;
	}
	public void setId(int id) {
		this.id = id;
	}
	public String getName() {
		return name;
	}
	public void setName(String name) {
		this.name = name;
	}
	public int getSalary() {
		return salary;
	}
	public void setSalary(int salary) {
		this.salary = salary;
	}	
}


ItemWriter Implementation

MyCustomWriter.java

package com.yawintutor;

import java.util.List;

import org.springframework.batch.item.ItemWriter;
import org.springframework.stereotype.Component;

@Component
public class MyCustomWriter implements ItemWriter<Employee> {

	@Override
	public void write(List<? extends Employee> list) throws Exception {
		for (Employee data : list) {
			System.out.println("MyCustomWriter    : Writing data    : " + data.getId()+" : "+data.getName()+" : "+data.getSalary());
		}
	}
}


Scheduler Configuration class

SchedulerConfig.java

package com.yawintutor;

import java.text.SimpleDateFormat;
import java.util.Calendar;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableScheduling;
import org.springframework.scheduling.annotation.Scheduled;

@Configuration
@EnableScheduling
public class SchedulerConfig {

	@Autowired
	JobLauncher jobLauncher;

	@Autowired
	Job job;

	SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.S");

	@Scheduled(fixedDelay = 5000, initialDelay = 5000)
	public void scheduleByFixedRate() throws Exception {
		System.out.println("Batch job starting");
		JobParameters jobParameters = new JobParametersBuilder()
				.addString("time", format.format(Calendar.getInstance().getTime())).toJobParameters();
		jobLauncher.run(job, jobParameters);
		System.out.println("Batch job executed successfully\n");
	}
}


Application properties in Resources folder

application.properties

spring.datasource.url=jdbc:h2:file:./DB
spring.batch.initialize-schema=ALWAYS



Leave a Reply