How to read data from CSV file and store it in the database Spring Boot Batch

In this article, we’ll explore at how to read data from a CSV file and save it to a database using Spring Boot Batch. In reality, data is transferred from one application to another using CSV files. Spring boot batch reads data from a CSV file and stores it to a database. In this post, we’ll be using the MySQL database. Spring boot batch will read data from the CSV file, process it, and save it in the MySQL database without any manual involvement on a regular basis.

The CSV (comma separated values) file stores data in comma separated strings on each line. The spring boot batch application reads the data from the CSV file line by line and converts it to a java bean object. JPA repository is used to store the java bean object in the database. To read, process, and store data from a CSV file to a relational database, three interfaces are used.



Test Data

The test sample data file name should begin with input*.csv in the data folder created in the project folder structure. In this example, two csv files are utilized, each having Employee information in a comma separated string. Each employee’s information will be stored in the csv file on a single line. This post makes use of the two CVS input files shown below.

input.csv

id,name.salary
1,name01,1000
2,name02,2000
3,name03,3000

input2.csv

id,name.salary
21,name21,21000
22,name22,22000
23,name23,23000


Database SQL Query

create table employee (
id int primary key,
name varchar(100),
salary int
);


Data Model class

In this example, the java bean data model class is created. The employee class has fields for id, name, and salary. The Employee class will be assigned the data from the example csv file. The spring boot batch reads data from a csv file and assigns it to the Employee class.

Employee.java

package com.yawintutor;

import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;

@Entity
public class Employee {
	@GeneratedValue(strategy = GenerationType.AUTO)
	@Id
	private int id;
	private String name;
	private int salary;
	public int getId() {
		return id;
	}
	public void setId(int id) {
		this.id = id;
	}
	public String getName() {
		return name;
	}
	public void setName(String name) {
		this.name = name;
	}
	public int getSalary() {
		return salary;
	}
	public void setSalary(int salary) {
		this.salary = salary;
	}	
}


JPA Repository

The Employee JPA Repository stores employee information in the MySQL Database. Database configurations are added to the application.properties file. The JPA Repository will run the save command to save the employee data in the database.

EmployeeRepository.java

package com.yawintutor;

import org.springframework.data.jpa.repository.JpaRepository;

public interface EmployeeRepository extends JpaRepository<Employee, Long> {

}


Spring Boot Batch Configurations

The spring boot batch configuration file stores the batch job and batch step configurations. A csv file is read and parsed using the FlatFileItemReader class. Column names and the number of lines to skip are specified in the FlatFileItemReader class. The MultiResourceItemReader is used in the spring boot batch to build an ItemReader. The MultiResourceItemReader reads all csv files in a given directory and provides batch steps for executing them.

The spring boot step receives data from CSV files and converts it to a java bean object using the ItemReader interface. The ItemProcessor interface is used to process the java bean object. The java bean object is saved in the MySQL database using JPA Repository in the ItemWriter Interface.

BatchConfig.java

package com.yawintutor;

import java.io.IOException;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.file.MultiResourceItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.Resource;
import org.springframework.core.io.support.PathMatchingResourcePatternResolver;
import org.springframework.core.io.support.ResourcePatternResolver;

@Configuration
@EnableBatchProcessing
public class BatchConfig {
	@Autowired
	public JobBuilderFactory jobBuilderFactory;

	@Autowired
	public StepBuilderFactory stepBuilderFactory;
	
	@Autowired
	MyCustomReader myCustomReader;
	
	@Autowired
	MyCustomWriter myCustomWriter;

	@Bean
	public Job createJob() {
		return jobBuilderFactory.get("MyJob")
				.incrementer(new RunIdIncrementer())
				.flow(createStep()).end().build();
	}

	@Bean
	public Step createStep() {
		return stepBuilderFactory.get("MyStep")
				.<Employee, Employee> chunk(1)
				.reader(reader())
				.writer(myCustomWriter)
				.build();
	}
	
	@Bean
	public ItemReader<Employee> reader() {
	    Resource[] resources = null;
	    ResourcePatternResolver patternResolver = new PathMatchingResourcePatternResolver();   
	    try {
	        resources = patternResolver.getResources("file:./data/*.csv");
	    } catch (IOException e) {
	        e.printStackTrace();
	    }

	    MultiResourceItemReader<Employee> reader = new MultiResourceItemReader<>();
	    reader.setResources(resources);
	    reader.setDelegate(myCustomReader);
	    return reader;
	}
}


ItemReader Implementation

The FlatFileItemReader class is extended to implement the ItemReader interface. FlatFileItemReader is a class that reads a CSV file and converts it to a java bean object. ItemReader interface help to read the data from a CSV file.

MyCustomReader.java

package com.yawintutor;

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.stereotype.Component;

@Component
public class MyCustomReader extends FlatFileItemReader<Employee> implements ItemReader<Employee>{
	
	public MyCustomReader() {
//		setResource(new FileSystemResource("data/input.csv"));
		setLinesToSkip(1); 
		setLineMapper(getDefaultLineMapper());
	}
	
	public DefaultLineMapper<Employee> getDefaultLineMapper() {
		DefaultLineMapper<Employee> defaultLineMapper = new DefaultLineMapper<Employee>();
		
		DelimitedLineTokenizer delimitedLineTokenizer =new DelimitedLineTokenizer();
		delimitedLineTokenizer.setNames(new String[] { "id", "name", "salary" });
		defaultLineMapper.setLineTokenizer(delimitedLineTokenizer);
		
		BeanWrapperFieldSetMapper<Employee> beanWrapperFieldSetMapper = new BeanWrapperFieldSetMapper<Employee>();
		beanWrapperFieldSetMapper.setTargetType(Employee.class);
		defaultLineMapper.setFieldSetMapper(beanWrapperFieldSetMapper);
		
		return defaultLineMapper;
	}
}


ItemWriter Implementation

The ItemWriter Interface is used to save data from a CSV file to a relational database such as MySQL. The data will be stored by the ItemWriter in the JPA Repository. The JPA Repository stores data in the database.

MyCustomWriter.java

package com.yawintutor;

import java.util.List;

import org.springframework.batch.item.ItemWriter;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

@Component
public class MyCustomWriter implements ItemWriter<Employee> {

	@Autowired
	EmployeeRepository employeeRepository;
	
	@Override
	public void write(List<? extends Employee> list) throws Exception {
		for (Employee data : list) {
			System.out.println("MyCustomWriter    : Writing data    : " + data.getId()+" : "+data.getName()+" : "+data.getSalary());
			employeeRepository.save(data);
		}
	}
}


Scheduler class

SchedulerConfig.java

package com.yawintutor;

import java.text.SimpleDateFormat;
import java.util.Calendar;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableScheduling;
import org.springframework.scheduling.annotation.Scheduled;

@Configuration
@EnableScheduling
public class SchedulerConfig {

	@Autowired
	JobLauncher jobLauncher;

	@Autowired
	Job job;

	SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.S");

	@Scheduled(fixedDelay = 5000, initialDelay = 5000)
	public void scheduleByFixedRate() throws Exception {
		System.out.println("Batch job starting");
		JobParameters jobParameters = new JobParametersBuilder()
				.addString("time", format.format(Calendar.getInstance().getTime())).toJobParameters();
		jobLauncher.run(job, jobParameters);
		System.out.println("Batch job executed successfully\n");
	}
}


Spring boot Batch Main Class

The main class for the spring boot batch will be shown as below. The default implementation will be used for the main class. The main method in this file will be executed when the spring boot batch application starts.

SpringBootBatch4Application.java

package com.yawintutor;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class SpringBootBatch4Application {

	public static void main(String[] args) {
		SpringApplication.run(SpringBootBatch4Application.class, args);
	}
}


Application properties details

The application properties will be placed in the resources folder’s application.properties files. The application.properties now has two types of setting. One is for the mysql database configuration, and the other is for allowing batch-related tables to be created in the database.

application.properties

spring.datasource.url=jdbc:mysql://localhost/testdb
spring.datasource.username=root
spring.datasource.password=root
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5Dialect
spring.jpa.show-sql=true
spring.jpa.hibernate.ddl-auto=update
spring.batch.initialize-schema=ALWAYS


Application configuration using pom.xml file

The required application configurations and dependencies are created using maven pom.xml file. The Maven POM.xml file contains dependencies for spring boot batch, JPA Repository and mysql database. If you are using some other database other than MySQL Database, you need to add the dependency to the respective database in the pom.xml file.

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.5.3</version>
		<relativePath/> <!-- lookup parent from repository -->
	</parent>
	<groupId>com.yawintutor</groupId>
	<artifactId>SpringBootBatchCSVToDB</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>SpringBootBatchCSVToDB</name>
	<description>Demo project for Spring Boot</description>
	<properties>
		<java.version>11</java.version>
	</properties>
	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-batch</artifactId>
		</dependency>

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-data-jpa</artifactId>
		</dependency>
		<dependency>
			<groupId>mysql</groupId>
			<artifactId>mysql-connector-java</artifactId>
			<scope>runtime</scope>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>org.springframework.batch</groupId>
			<artifactId>spring-batch-test</artifactId>
			<scope>test</scope>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>

</project>


How to run

When. the spring boot application starts, the following logs will be shown in the console window.

2021-07-27 11:38:31.345  INFO 52298 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=MyJob]] launched with the following parameters: [{run.id=5, time=2021-07-26 15:55:08.729}]
2021-07-27 11:38:31.393  INFO 52298 --- [           main] o.s.batch.core.job.SimpleStepHandler     : Executing step: [MyStep]
MyCustomWriter    : Writing data    : 1 : name01 : 1000
Hibernate: select employee0_.id as id1_0_0_, employee0_.name as name2_0_0_, employee0_.salary as salary3_0_0_ from employee employee0_ where employee0_.id=?
Hibernate: select next_val as id_val from hibernate_sequence for update
Hibernate: update hibernate_sequence set next_val= ? where next_val=?
Hibernate: insert into employee (name, salary, id) values (?, ?, ?)
MyCustomWriter    : Writing data    : 2 : name02 : 2000
Hibernate: select employee0_.id as id1_0_0_, employee0_.name as name2_0_0_, employee0_.salary as salary3_0_0_ from employee employee0_ where employee0_.id=?
Hibernate: select next_val as id_val from hibernate_sequence for update
Hibernate: update hibernate_sequence set next_val= ? where next_val=?
Hibernate: insert into employee (name, salary, id) values (?, ?, ?)
MyCustomWriter    : Writing data    : 3 : name03 : 3000
Hibernate: select employee0_.id as id1_0_0_, employee0_.name as name2_0_0_, employee0_.salary as salary3_0_0_ from employee employee0_ where employee0_.id=?
Hibernate: select next_val as id_val from hibernate_sequence for update
Hibernate: update hibernate_sequence set next_val= ? where next_val=?
Hibernate: insert into employee (name, salary, id) values (?, ?, ?)
MyCustomWriter    : Writing data    : 21 : name21 : 21000
Hibernate: select employee0_.id as id1_0_0_, employee0_.name as name2_0_0_, employee0_.salary as salary3_0_0_ from employee employee0_ where employee0_.id=?
Hibernate: select next_val as id_val from hibernate_sequence for update
Hibernate: update hibernate_sequence set next_val= ? where next_val=?
Hibernate: insert into employee (name, salary, id) values (?, ?, ?)
MyCustomWriter    : Writing data    : 22 : name22 : 22000
Hibernate: select employee0_.id as id1_0_0_, employee0_.name as name2_0_0_, employee0_.salary as salary3_0_0_ from employee employee0_ where employee0_.id=?
Hibernate: select next_val as id_val from hibernate_sequence for update
Hibernate: update hibernate_sequence set next_val= ? where next_val=?
Hibernate: insert into employee (name, salary, id) values (?, ?, ?)
MyCustomWriter    : Writing data    : 23 : name23 : 23000
Hibernate: select employee0_.id as id1_0_0_, employee0_.name as name2_0_0_, employee0_.salary as salary3_0_0_ from employee employee0_ where employee0_.id=?
Hibernate: select next_val as id_val from hibernate_sequence for update
Hibernate: update hibernate_sequence set next_val= ? where next_val=?
Hibernate: insert into employee (name, salary, id) values (?, ?, ?)
2021-07-27 11:38:31.582  INFO 52298 --- [           main] o.s.batch.core.step.AbstractStep         : Step: [MyStep] executed in 189ms
2021-07-27 11:38:31.597  INFO 52298 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=MyJob]] completed with the following parameters: [{run.id=5, time=2021-07-26 15:55:08.729}] and the following status: [COMPLETED] in 219ms



Leave a Reply