Contents
In this tutorial, we will walk through the steps to implement a batch upload and processing of PDF files with ASP.NET Core. We will allow clients to upload a ZIP file containing multiple PDF files, extract the ZIP file, read each PDF file’s field values, and finally save the values into a MSSQL database.
1. Receive the Upload File
The first step is to create an API endpoint that allows clients to upload a ZIP file containing multiple PDF files. We can use the IFormFile
interface provided by ASP.NET Core to handle file uploads. Here is an example code snippet:
[HttpPost("upload")] public async TaskUpload(IFormFile file) { // validate file extension and content type // save file to disk return Ok(); }
2. Extract the ZIP File
Once we have received the ZIP file, we need to extract it to a temporary directory. We can use the System.IO.Compression
namespace to extract the ZIP file. Here is an example code snippet:
using System.IO.Compression; ... var tempDir = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName()); ZipFile.ExtractToDirectory(zipFilePath, tempDir);
3. Read Each PDF File in ZIP
Now that we have extracted the ZIP file, we can loop through each PDF file in the directory and read its field values. We can use the iTextSharp
library to read the field values. Here is an example code snippet:
using iTextSharp.text.pdf; ... foreach (var pdfFile in Directory.GetFiles(tempDir, "*.pdf")) { var pdfReader = new PdfReader(pdfFile); var pdfFormFields = pdfReader.AcroFields.Fields; foreach (var pdfFormField in pdfFormFields) { var fieldValue = pdfReader.AcroFields.GetField(pdfFormField.Key); // save fieldValue to database } }
4. Read the PDF File Field’s Value
As mentioned in the previous step, we can use the iTextSharp
library to read the field values of each PDF file. We can loop through each field and get its value using the GetField
method of the AcroFields
class.
5. Save the Value into MSSQL Database
Finally, we can save the field values into a MSSQL database. We can use the System.Data.SqlClient
namespace to connect to the database and execute SQL commands. Here is an example code snippet:
using System.Data.SqlClient; ... var connectionString = "Data Source=MyServer;Initial Catalog=MyDatabase;User ID=MyUsername;Password=MyPassword;"; using var connection = new SqlConnection(connectionString); await connection.OpenAsync(); foreach (var pdfFormField in pdfFormFields) { var fieldValue = pdfReader.AcroFields.GetField(pdfFormField.Key); var sqlCommand = new SqlCommand("INSERT INTO MyTable (FieldName, FieldValue) VALUES (@FieldName, @FieldValue)", connection); sqlCommand.Parameters.AddWithValue("@FieldName", pdfFormField.Key); sqlCommand.Parameters.AddWithValue("@FieldValue", fieldValue); await sqlCommand.ExecuteNonQueryAsync(); }
And that’s it! We have implemented a batch upload and processing of PDF files with ASP.NET Core.