Skip to content

Blob Service (src/azure/blob.service.ts)

Overview

The BlobService manages document storage and retrieval using Azure Blob Storage in the BidScript backend. It provides a comprehensive API for uploading, downloading, listing, and deleting documents, with support for metadata management and URL generation.

Dependencies

import { Injectable, Logger, NotFoundException } from '@nestjs/common';
import { ConfigService } from '@nestjs/config';
import { BlobServiceClient, ContainerClient, BlockBlobClient } from '@azure/storage-blob';
import { DocumentMetadata } from '../types/interfaces/document.interface';

Key Features

  • Document upload and storage
  • Document retrieval and download
  • URL generation for document access
  • Metadata management
  • Container and blob management
  • Error handling and logging

Core Methods

uploadDocument

async uploadDocument(
  file: Buffer | Readable,
  metadata?: DocumentMetadata,
  options?: {
    filename?: string;
    contentType?: string;
    containerName?: string;
  }
): Promise<{
  id: string;
  url: string;
  metadata: DocumentMetadata;
  contentType: string;
  size: number;
}>

Uploads a document to Azure Blob Storage.

Parameters: - file: The document content as Buffer or Readable stream - metadata: Optional metadata for the document - options: Upload options - filename: Custom filename (defaults to generated UUID) - contentType: MIME type (auto-detected if not provided) - containerName: Target container name (uses default if not specified)

Returns: - Object containing document ID, URL, metadata, content type, and size

Example:

const documentBuffer = fs.readFileSync('document.pdf');
const result = await this.blobService.uploadDocument(documentBuffer, {
  title: 'Contract Document',
  tags: ['contract', 'legal'],
  userId: '123'
}, {
  contentType: 'application/pdf'
});

getDocument

async getDocument(
  documentId: string,
  options?: {
    containerName?: string;
  }
): Promise<{
  content: Buffer;
  metadata: DocumentMetadata;
  contentType: string;
  size: number;
}>

Retrieves a document from Azure Blob Storage.

Parameters: - documentId: The ID of the document to retrieve - options: Retrieval options - containerName: Source container name (uses default if not specified)

Returns: - Object containing document content, metadata, content type, and size

Throws: - NotFoundException: If the document doesn't exist

Example:

try {
  const document = await this.blobService.getDocument('document-id');
  console.log(`Retrieved document: ${document.metadata.title}`);
  console.log(`Size: ${document.size} bytes`);
  // Process document.content
} catch (error) {
  if (error instanceof NotFoundException) {
    console.error('Document not found');
  } else {
    console.error('Error retrieving document:', error);
  }
}

getDocumentUrl

async getDocumentUrl(
  documentId: string,
  options?: {
    containerName?: string;
    expiresInMinutes?: number;
    permissions?: 'read' | 'write' | 'delete' | 'all';
  }
): Promise<string>

Generates a URL for accessing a document.

Parameters: - documentId: The ID of the document - options: URL generation options - containerName: Container name - expiresInMinutes: URL expiration time in minutes (default: 60) - permissions: Access permissions (default: 'read')

Returns: - URL for accessing the document

Example:

// Generate a read-only URL that expires in 30 minutes
const url = await this.blobService.getDocumentUrl('document-id', {
  expiresInMinutes: 30,
  permissions: 'read'
});

listDocuments

async listDocuments(
  options?: {
    containerName?: string;
    prefix?: string;
    maxResults?: number;
  }
): Promise<{
  id: string;
  url: string;
  metadata: DocumentMetadata;
  contentType: string;
  size: number;
  lastModified: Date;
}[]>

Lists documents in a container.

Parameters: - options: Listing options - containerName: Container name - prefix: Filter by name prefix - maxResults: Maximum number of results

Returns: - Array of document objects

Example:

// List up to 20 PDF documents
const documents = await this.blobService.listDocuments({
  prefix: 'pdf/',
  maxResults: 20
});

deleteDocument

async deleteDocument(
  documentId: string,
  options?: {
    containerName?: string;
  }
): Promise<boolean>

Deletes a document from Azure Blob Storage.

Parameters: - documentId: The ID of the document to delete - options: Deletion options - containerName: Container name

Returns: - Boolean indicating whether the deletion was successful

Example:

const deleted = await this.blobService.deleteDocument('document-id');
if (deleted) {
  console.log('Document deleted successfully');
}

Implementation Details

Azure Blob Storage Client Initialization

private blobServiceClient: BlobServiceClient;
private containerClient: ContainerClient;
private defaultContainerName: string;

constructor(private configService: ConfigService) {
  const connectionString = this.configService.get<string>('AZURE_STORAGE_CONNECTION_STRING');

  if (!connectionString) {
    throw new Error('Azure Storage connection string not configured');
  }

  this.blobServiceClient = BlobServiceClient.fromConnectionString(connectionString);
  this.defaultContainerName = this.configService.get<string>('AZURE_STORAGE_CONTAINER', 'documents');
  this.containerClient = this.blobServiceClient.getContainerClient(this.defaultContainerName);

  // Ensure container exists
  this.initializeContainer();
}

private async initializeContainer(): Promise<void> {
  try {
    await this.containerClient.createIfNotExists();
    this.logger.log(`Container '${this.defaultContainerName}' initialized`);
  } catch (error) {
    this.logger.error(`Failed to initialize container: ${error.message}`);
    throw error;
  }
}

Metadata Serialization

private serializeMetadata(metadata: DocumentMetadata): Record<string, string> {
  const serialized: Record<string, string> = {};

  for (const [key, value] of Object.entries(metadata)) {
    if (value === undefined || value === null) continue;

    if (typeof value === 'object') {
      serialized[key] = JSON.stringify(value);
    } else {
      serialized[key] = String(value);
    }
  }

  return serialized;
}

private deserializeMetadata(metadata: Record<string, string>): DocumentMetadata {
  const deserialized: DocumentMetadata = {};

  for (const [key, value] of Object.entries(metadata)) {
    try {
      // Try to parse as JSON
      deserialized[key] = JSON.parse(value);
    } catch (e) {
      // Not JSON, use as-is
      deserialized[key] = value;
    }
  }

  return deserialized;
}

Integration with Other Services

The BlobService integrates with:

  • DocumentParseService: For processing documents after upload
  • RAG Module: For document storage as part of the RAG pipeline
  • Editor Module: For saving edited documents

Error Handling

The service includes robust error handling for various scenarios:

  • Connection errors: Issues connecting to Azure Storage
  • Authentication errors: Invalid connection string or credentials
  • Not found errors: Document not found in container
  • Permission errors: Insufficient permissions
  • Throttling errors: Exceeding Azure Storage request limits

Logging

The service uses NestJS Logger for detailed logging:

private readonly logger = new Logger(BlobService.name);

// Usage
this.logger.log(`Uploading document with size ${file.length} bytes`);
this.logger.error(`Error uploading document: ${error.message}`, error.stack);

Configuration

Required environment variables:

AZURE_STORAGE_CONNECTION_STRING=DefaultEndpointsProtocol=https;AccountName=youraccount;AccountKey=yourkey;EndpointSuffix=core.windows.net
AZURE_STORAGE_CONTAINER=documents