Data Persistence in the Tender Parsing Module¶
Technologies Used¶
- NestJS: The primary backend framework, providing structure for services, controllers, and dependency injection.
- Redis: An in-memory key-value store, utilised as a high-speed, temporary cache for document and constraint data during active editing and processing.
- MongoDB: A NoSQL database, serving as the durable, long-term storage for all tender documents, constraints, tags, and analysis results.
- BullMQ (if enabled): Used for background job processing, including scheduled persistence tasks and cache management.
- Python Microservices: Invoked for advanced tagging and reference extraction, with results persisted via the NestJS services.
Historical Note: Data Storage Evolution
Originally, document data was stored in Azure Blob Storage, using naming conventions to read and write blob data. While this approach offered low latency and cost benefits, it proved inefficient and cumbersome for querying and aggregating data. Transitioning to MongoDB enabled more flexible and performant queries. Additionally, by leveraging MongoDB's VCore, the system now supports advanced features such as vectorisation and efficient indexing, further enhancing data retrieval and analysis capabilities.
High-Level Logic¶
Two-Tiered Persistence Strategy
The module uses a two-tiered data persistence strategy: Redis for fast, in-memory access during active editing, and MongoDB for durable, long-term storage. This ensures both high performance and reliability.
-
Document and Constraint Ingestion
-
When a document is uploaded or updated (including any constraints or tags), it is parsed and tagged.
-
The resulting document object, which includes all constraints and metadata, is immediately cached in Redis for rapid access.
-
Redis as the Primary Store During Active Editing
-
All reads and writes for documents and constraints during active sessions are performed against Redis.
- Redis keys are structured as follows:
- Document:
tenderdoc:${projectId}:${documentId}:data - Constraints (if stored separately):
tenderdoc:${projectId}:${documentId}:constraints
- Document:
-
Each key is set with a Time-To-Live (TTL), typically 15 minutes, to ensure cache freshness and to trigger eventual persistence.
-
Persistence to MongoDB
- When a document or its constraints are updated in Redis, a background job is scheduled (or a direct call is made) to persist the latest state to MongoDB.
- If the TTL expires and Redis evicts the key, the latest version should already have been flushed to MongoDB.
- On certain triggers (such as explicit save, project closure, or scheduled flush), all Redis-cached documents for a project are upserted into MongoDB.
Detailed Flow: Constraints and Document Data¶
Redis Key Example
Example Redis key for a document:
A TTL of 15 minutes is set to ensure that the cache remains fresh and that data is eventually persisted to MongoDB.
2. Updating Constraints¶
- Constraints, such as question tags, deadlines, and requirements, are embedded within the document object.
- When constraints are updated:
- The document object in Redis is updated with the new constraints.
- This update triggers a background job (or immediate call) to persist the updated document to MongoDB, ensuring that no changes are lost.
3. Flushing from Redis to MongoDB¶
Flush Triggers
- TTL expiry (handled by scheduled jobs or on-demand flush) - Explicit save or project closure - Bulk flush (e.g., on server shutdown or at scheduled intervals)
- Link this to frontend documentation on how this happens for specific flush or update triggers (maybe just make this a full trigger note)
- All relevant Redis keys for a project are scanned using a pattern such as
tenderdoc:${projectId}:*. - Each document is deserialised from Redis.
- The document is upserted into MongoDB using its unique identifiers (
projectId,documentId). -
If constraints are stored separately, they are merged into the document before upsert.
-
MongoDB Upsert:
- The system uses
updateOnewithupsert: trueto ensure the document is created or updated as necessary. - All fields, including constraints, tags, and metadata, are persisted to guarantee data integrity.
4. Cache Miss Handling¶
Cache Miss Handling
If a document is requested and not found in Redis, the service queries MongoDB for the latest version of the document. The document is then re-cached in Redis for subsequent fast access, maintaining performance for future operations.
Example: Code Snippet for Persistence¶
// filepath: src/resourceHub/tender-parsing/tender-pack-db.service.ts
async persistRedisDocsToMongoDB(projectId: string) {
const keys = await this.redisClient.keys(`tenderdoc:${projectId}:*:data`);
for (const key of keys) {
const docData = await this.redisClient.get(key);
if (docData) {
const document = JSON.parse(docData);
await this.mongoCollection.updateOne(
{ projectId: document.projectId, documentId: document.documentId },
{ $set: document },
{ upsert: true }
);
}
}
}
Rationale for This Approach¶
Why This Pattern?
- Performance: Redis provides extremely low-latency access for active editing and collaboration, ensuring a responsive user experience.
- Durability: MongoDB guarantees that all data is safely stored for long-term retrieval, audit, and compliance.
- Consistency: All updates are first made in Redis, then flushed to MongoDB, ensuring the latest state is always persisted and recoverable.
- Scalability: This pattern supports high concurrency and large document sets without overloading the database, making it suitable for enterprise-scale tender management.
Summary Table¶
| Layer | Technology | Purpose | TTL/Flush Mechanism |
|---|---|---|---|
| Caching | Redis | Fast, in-memory document/constraint cache | 15 min TTL, explicit flush |
| Persistence | MongoDB | Durable storage for all data | Upsert on flush/cache miss |
Additional Notes¶
Atomicity & Consistency
All constraints and document updates are persisted as part of the document object, ensuring atomicity and consistency.
- The system ensures that no data is lost by flushing Redis to MongoDB before TTL expiry or on explicit triggers such as project closure.
- This approach balances speed for users with reliability and auditability for the business, supporting both collaborative workflows and robust data management.