Skip to main content

File Uploads

Objective

File upload module handles all files uploaded from the HireXL user interfaces. It ensures smooth upload of files with validations, security consideration, and efficient resource management at the server.

Tools & Technologies

  1. Nestjs
  2. Multer
  3. Libreoffice-convert - Extract content of file
  4. MMMagic - To identify MIME of a file

API Contract

  1. Client should send a file in a form object, It should be a binary with a key “file”
  2. File size should be not more than 10 MB (to avoid resource overhead on server)
  3. File types should be PDF, DOC, DOCX, ODT, XLS, XLSX, CSV, and ZIP. Zip files should only contains (PDF, DOC, DOCX, ODT)
  4. Server should provide a progress indicator to the client.
  5. File validation check is required on both client and server side.
  6. File metadata needs to be saved eventually to enhance search.
  7. File content needs to be extracted from the file.
  8. Server should save or return extracted content of the file based on the API endpoint or Flag.

Implementation

Code & business logic of the codebase

Upload File flow

HTTP Request

POST {{ROOT_URL}}/job-application/upload

# Body
{
file: binary
}
// Valid file types

const validFileType = [
"application/pdf",
"application/msword",
"application/octet-stream",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
];

Detect Valid file type

const mmm = require("mmmagic");
const Magic = mmm.Magic;

const promisifyDetectFile = (filePath: string) => {
return new Promise((resolve, reject) => {
magic.detectFile(filePath, (err, result) => {
if (err) {
reject(err);
}
resolve(result);
});
});
};

const fileMime: any = await promisifyDetectFile(file);

Upload API Business logic

 @Post('upload')
@UseInterceptors(
FileInterceptor('file', {
storage: diskStorage({
destination: './uploads/.tmp',
}),
fileFilter: (request, file, callback) => {
if (!bulkUploadFileType.includes(file.mimetype)) {
return callback(
new BadRequestException('Provide a valid zip or CSV file'),
false,
);
}
callback(null, true);
},
limits: {
fileSize: Math.pow(1024, 2), // 1MB
},
}),
)
async uploadFile(@UploadedFile() file, @Body() body) {
let applications = [];
let inValidFiles = [];

if (file.mimetype === 'application/zip') {
const cvs: any = await APIFeatures.extractZip(file.path, 'uploads/cvs');
for (let i = 0; i < cvs.length; i++) {
try {
const file = cvs[i].cv;
const fileMime: any = await promisifyDetectFile(file);
if (validFileType.includes(fileMime)) {
applications.push({
cv: file,
jobId: Number(body.job),
});
}
} catch (err) {
console.log('mime error', err);
inValidFiles.push(file);
}
}
}
if (file.mimetype === 'text/csv') {
const rows = await APIFeatures.extractCSV(file.path);

applications = rows.map(row => {
return {
...row,
jobId: Number(body.job),
};
});
}
await this.applicationService.createMultiple(applications);

return {
message: 'File received successfully',
status: 'success',
files: applications,
inValidFiles,
};
}

Handle Progress indicator at frontend

// axios
onUploadProgress: ({
loaded,
total,
progress = 0,
bytes,
estimated,
rate,
upload = true,
}) => {
console.log("loaded", loaded, progress);
setFileProgress({
isSubmitting: true,
progress: Math.round(progress * 100),
});
},

Create a PDF file stream if files are in another format

import { join, extname } from "path";
import { createReadStream } from "fs";
import { promisify } from "util";
import libre from "libreoffice-convert";

const readFileAsync = promisify(readFile);

async function viewFile({ file }, res) {
const fileName = file.split("/").pop();
const ext = ".pdf";
const fileExt = extname(fileName);
let fileStream = null;

if (fileExt !== ".pdf") {
const data = await readFileAsync(file);
const convertedFile = await libre.convertAsync(data, ext);
fileStream = Readable.from(convertedFile);
} else {
fileStream = createReadStream(join(process.cwd(), file));
}

res.set("Content-disposition", "attachment; filename=resume.pdf");
fileStream.pipe(res);
}

Textract - A OCR extraction

import { promisify } from "util";
import textract from "textract";

const fromFileWithMimeAndPathAsync = promisify(
textract.fromFileWithMimeAndPath
);

class FileDataExtractor {
static async extractFileData(mimetype, file) {
try {
const text = await fromFileWithMimeAndPathAsync(mimetype, file);

const name = text.split(" ").slice(0, 2).join(" ");
const email = text.match(
/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi
)[0];
const mobile = text.match(/(\+\d{1,3}[- ]?)?\d{10}/gi)[0];

return {
name,
email,
mobile,
filePath: file,
};
} catch (err) {
throw err;
}
}
}