NoSQL Data Modeling with banking use case
Yen Trinh
Cloud Platform Engineer - VPBank
Agenda
NoSQL Databases in Banking
Advantages of NoSQL in Banking
High Scalability
Horizontal Scaling
High Performance
Fast Query Response
Flexible Data Structure
Adaptable Schema
High Performance
Multi-source Analysis
{JSON}
<XML>
CSV
Binary
Plain Text
NoSQL Database Types in Banking
Document-oriented (MongoDB)
- This structure allows for flexible schema, accommodating varied customer data.�- Nested arrays (e.g., 'accounts', 'transactions') enable efficient storage of related data.�- Easy to add new fields (e.g., 'credit_score') without affecting existing records.�- Supports atomic operations on entire documents, ensuring data consistency.
Example: Storing customer information and transactions:
Key – Value�(Redis)
- Extremely fast read/write operations, ideal for caching frequently accessed data.�- Can store various data types (strings, lists, sets) as values.�- Supports atomic operations and transactions.�- Use case: Storing session data, caching account balances for quick access in high-traffic periods.
Example: Caching account information for quick retrieval
Column-family (Cassandra)
- Optimized for write-heavy workloads and time-series data.�- Partition key (account_id) determines data distribution across nodes.�- Clustering key (transaction_time) sorts data within partitions.�- Efficient for querying recent transactions or time ranges for a specific account.�- Scales horizontally to handle massive amounts of transaction data.
Example: Storing time-series transaction history data
Graph �(Neo4j)
- Nodes represent entities (customers, transactions, merchants).�- Relationships show connections between entities.�- Cypher query language is designed for graph traversal.�- No need for complex joins as in SQL; relationships are first-class citizens.�- Efficient for detecting patterns, like potential fraud scenarios.
Example: Analyzing customer relationships, fraud detection
NoSQL Applications in Banking
NoSQL vs SQL in Banking Context
Aspect | SQL | NoSQL |
1. Schema Flexibility | Rigid schema, ALTER TABLE for changes | Flexible schema, easily add new fields |
2. Scalability | Vertical scaling primarily | Designed for horizontal scaling |
3. Querying | Complex joins, aggregations in-database | Simple queries, complex operations often in application layer |
4. Consistency Models | ACID transactions | Often eventual consistency, with tunable consistency levels |
5. Use Cases in Banking | Core banking systems, regulatory reporting | Real-time analytics, high-volume transactions, customer 360 views |
Data Modeling introduction
What is Data Modeling?
Data Modeling: Crafting the Digital Blueprint�- Process of creating an abstract representation�- Defines data structure, relationships, and constraints�- Foundation for effective database design��Traditional vs. NoSQL Approach�- Relational: Structured, normalized, ACID properties�- NoSQL: Flexible schema, denormalized, eventual consistency
Importance of Data Modeling in Banking
Why Data Modeling Matters in Banking?
- Ensures financial data integrity and accuracy
- Supports business decisions and risk management
- Facilitates regulatory compliance and data governance
Banking's Data Evolution: Relational to NoSQL
Modeling �for RDBMS
Concerns
Step 1:
Define the schema
Step 2:
Develop the application and queries
CORRECT
Modeling �for RDBMS
Concerns
?
?
DENORMALIZED
Step 1:
Define the schema
Step 2:
Develop the application and queries
Modeling �for RDBMS
Concerns
Step 1:
Define the schema
Step 2:
Develop the application and queries
Data dictates
Modeling �for RDBMS
Concerns
Step 1:
Define the schema
Step 2:
Develop the application and queries
NoSQL Data Modeling (e.g. with MongoDB)
Develop the application
Improve the application
Define the �Data Model
Improve the �Data Model
Designed for the usage pattern
Data Model evolution is easy
Can evolve without any downtime
Many design options
Improve the �Data Model
Improve the application
NoSQL: Powering Modern Banking
- Scalability for millions of daily transactions�- Flexibility for rapid product innovation�- Real-time data processing for instant insights
NoSQL Data Modeling for Banking: Principles and Practices
Understanding Banking Domain for Data Modeling
Core Banking Entities | Key Relationships | Critical Banking Operations | Data Characteristics |
1. Customer | Customer (1) --- (N) Account | 1. Account Opening and KYC | High write volume (transactions) |
2. Account | Account (1) --- (N) Transaction | 2. Transaction Processing | Complex read patterns (reporting, analytics) |
3. Transaction | Customer (1) --- (N) Product | 3. Loan Origination | Long-term data retention |
4. Product | Branch (1) --- (N) Account | 4. Fraud Detection | Strict consistency requirements for financial data |
5. Branch/ATM | | 5. Regulatory Reporting | |
NoSQL Data Modeling Techniques�for Banking
Technique | Description | Banking Example | Pros | Cons |
Embedding | Nesting related data within a document | Embedding account details in customer document | Fast reads, reduces joins | Data duplication, potential for large documents |
Referencing/Link | Using document references or foreign keys | Storing transaction IDs in account document | Maintains normalization, flexible relationships | Requires multiple queries for related data |
Denormalization | Duplicating data to optimize read performance | Storing customer name in transaction documents | Improves query performance | Data inconsistency risks, update overhead |
Aggregation | Grouping related data in a single document | Storing monthly statement summaries in account document | Efficient for reporting and analytics | May require frequent updates |
Polymorphic Schema | Flexible document structures for similar entities | Different account types (savings, checking) in one collection | Accommodates diverse product types | Can complicate querying and indexing |
Computed Data | Storing pre-calculated values for quick access | Maintaining running balance in account documents | Fast access to derived data | Requires careful management of updates |
To link or embed?
How often does the embedded information �get accessed?
Is the data queried using the embedded information?
Does the embedded information change often?
What can be linked?
Relationships:
One-to-one
One-to-many
Many-to-many
Example: entities and relationships in a blog
users
articles
tags
categories
comments
1-to-N
1-to-N
1-to-N
N-to-N
N-to-N
Step-by-step iteration
Identify and apply relevant design patterns
Evaluate the application workload
Map out entities and their relationships
Finalize the �data model for each collection
Collections with documents fields and shapes for each
Data size
Database queries �and indexes
Current operations assumptions, and �growth projections
Data size
A list of �operations �ranked by importance
CRD: �Collection Relationship Diagram �(link or embed?)
Business domain expertise
Current and predicted scenarios
Production logs and stats
Designing Customer-Centric �Data Model
Design Considerations
1. Embed frequently accessed data (account summaries)
2. Use references for large, variable data (full transaction history)
3. Include computed fields for quick access (risk profile)
4. Implement versioning for auditable fields (addresses)
Querying Patterns
1. Customer lookup by ID or SSN
2. Aggregation for customer 360° view
3. Filtering based on risk profile or KYC status
Customer Document Structure
Account and Transaction �Modeling
Modeling Strategies
1. Embed recent transactions in account document for quick access
2. Store detailed transactions in a separate collection for scalability
3. Use time-based bucketing for efficient querying of historical data
4. Implement compound indexes for common query patterns
Common Query Patterns
1. Balance inquiry and recent transactions
2. Statement generation for a given time period
3. Transaction search by various criteria (amount, type, date range)
Transaction Document
Account Document
Handling Relationships in NoSQL for Banking - �Customer-Account Relationship (One-to-Many)
Technique:
Embedding with References
Approach Explanation:
- Basic account information is embedded in the customer document for quick access.
- Detailed account information is stored in separate documents, referenced by account_id.
- This approach balances fast retrieval of essential account data with the ability to store and manage detailed account information separately.
- It allows for efficient querying of customer data along with basic account details in a single operation.
Handling Relationships in NoSQL for Banking - �Account-Transaction Relationship (One-to-Many)
Technique:
Referencing with Recent Transactions Embedded
Approach Explanation:
- Recent transactions are embedded directly in the account document for quick access.
- All transactions are referenced by their IDs, with full details stored in separate documents.
- This hybrid approach provides fast access to recent transaction data while maintaining a complete transaction history.
- It's particularly useful for displaying account summaries and recent activity without additional queries.
Handling Relationships in NoSQL for Banking - �Customer-Product Relationship (Many-to-Many)
Technique:
Referencing
Approach Explanation:
- Both customer and product documents contain arrays of references to each other.
- A separate subscription document stores detailed information about each customer-product relationship.
- This approach allows for efficient querying of products associated with a customer and vice versa.
- The subscription document enables storing additional relationship-specific data without cluttering the main documents.
- It provides flexibility in managing complex many-to-many relationships while maintaining good read performance.
Handling Relationships in NoSQL for Banking - �Considerations
Key Consideration | Description |
Technique Selection | Choose the appropriate technique based on data access patterns and query requirements. |
Performance Balance | Balance between read performance and write complexity. |
Data Integrity | Consider data consistency and integrity in distributed NoSQL systems. |
Continuous Optimization | Regularly review and optimize data models as business needs evolve. |
Indexing Strategies for Banking Data (MongoDB)
Purpose of Indexing
- Enhance query performance for critical banking operations
- Support real-time financial transactions and balance inquiries
- Enable efficient data retrieval for regulatory reporting and compliance
- Facilitate fast search capabilities for customer service operations
- Optimize performance for high-volume transaction processing
- Support complex analytical queries for risk assessment and fraud detection
- Ensure scalability of banking systems as data volume grows
Indexing Strategies for Banking Data (MongoDB) – Key considerations
Consideration | Description |
Data Access Patterns | Analyze common queries in banking operations (e.g., account lookups, transaction searches) |
Regulatory Compliance | Ensure indexes support required reporting and auditing capabilities |
Data Security | Consider the impact of indexes on data encryption and access control |
Write-Heavy Workloads | Balance index benefits against performance impact on high-volume transaction insertions |
Read vs. Write Trade-offs | Optimize for read-heavy operations like balance checks while managing write performance |
Index Size and RAM Usage | Ensure critical indexes fit in memory for optimal performance |
Data Cardinality | Focus on high-cardinality fields like account numbers or transaction IDs |
Time-Based Queries | Support efficient historical data access for statements and analytics |
Scalability | Design indexes to support future growth in data volume and user base |
MongoDB Indexing Strategies for Banking Data - Single Field Indexes
Useful for queries on a single field, like account number lookups.
Banking application: Quick customer account retrieval:
MongoDB Indexing Strategies for Banking Data - Compound Indexes
For queries involving multiple fields, such as transaction searches.
Banking application: Banking application: Efficient transaction history retrieval and reporting
MongoDB Indexing Strategies for Banking Data - Multikey Indexes
For indexing array fields, useful for multi-party accounts or transaction categories.
Banking application: Quick lookup of joint accounts or categorized transactions
MongoDB Indexing Strategies for Banking Data - Text Indexes
Enables full-text search capabilities, useful for transaction descriptions or customer notes.
Banking application: Enhanced transaction search functionality for customer service
MongoDB Indexing Strategies for Banking Data - Wildcard Indexes
Useful for fields with unpredictable structures, like custom attributes on accounts or transactions.
Banking application: Flexible querying on custom account attributes or dynamic transaction metadata
MongoDB Indexing Strategies for Banking Data - Geospatial Indexes
For location-based queries, useful for ATM or branch locators.
Banking application: Nearest ATM/branch finder for mobile banking apps
MongoDB Indexing Strategies for Banking Data - Hashed Indexes
For equality queries and supporting sharding, useful for large-scale transaction systems.
Banking application: Efficient sharding of large transaction collections for scalability
MongoDB Indexing Strategies for Banking Data - Indexing Strategies
The ESR (Equality, Sort, Range) Rule
Optimize compound indexes for query patterns:
- Equality (E) fields first
- Sort (S) fields next
- Range (R) fields last
Example: For queries filtering by account type, sorting by date, and ranging on amount
MongoDB Indexing Strategies for Banking Data - Indexing Strategies
The ESR (Equality, Sort, Range) Rule
Optimize compound indexes for query patterns:
- Equality (E) fields first
- Sort (S) fields next
- Range (R) fields last
Example: For queries filtering by account type, sorting by date, and ranging on amount
MongoDB Indexing Strategies for Banking Data - Create Indexes to Support Your Queries
Analyze query patterns and create indexes accordingly:
- Use db.collection.explain() to understand query execution
- Create indexes for frequently used queries
- Consider read/write ratio when adding indexes
Banking application:
Optimize for common operations like balance checks, recent transactions, and reporting queries
MongoDB Indexing Strategies for Banking Data - Use Indexes to Sort Query Results
Leverage indexes for efficient sorting
Banking application:
Efficient retrieval of recent transactions or statement generation
MongoDB Indexing Strategies for Banking Data - Ensure Indexes Fit in RAM
Monitor index size and server RAM:
- Use db.collection.stats() to check index sizes
- Ensure frequently used indexes fit in RAM
- Consider using partial indexes for large collections
Banking application:
Optimize performance for critical real-time operations like balance checks and fraud detection
MongoDB Indexing Strategies for Banking Data - Create Indexes to Ensure Query Selectivity
Focus on high-cardinality fields for better query performance
Banking application:
Prioritize indexing on unique identifiers like account numbers or transaction IDs
Advanced NoSQL Patterns for Banking - Bucketing Pattern
Use case:
Group related data into buckets to optimize query performance.
Advanced NoSQL Patterns for Banking - Versioning Pattern
Use case:
Maintain history of changes for auditing and compliance.
Advanced NoSQL Patterns for Banking - Polymorphic Pattern
Use case:
Use a single collection for similar but varying entities.
Advanced NoSQL Patterns for Banking - Subset Pattern
Use case:
Store a subset of data for quick access, with references to full data.
Best Practices for NoSQL Data Modeling in Banking
Case Study: Customer 360 View
Introduction �to Customer 360 View
Requirements and Challenges for Customer 360 View
Data Model Design - Overview
Data Model Design - Document Structure
Embedded vs. Referenced Data: This model primarily uses embedded data for faster read operations and to reduce the need for multiple queries.
Embedded personal_info: Frequently accessed data is embedded directly in the document for quick retrieval.
Arrays for multiple entries: accounts, recent_transactions, products, and interactions are stored as arrays, allowing for multiple entries while keeping related data together.
Scalability considerations: For very active customers with many transactions or interactions, consider implementing a capped array or moving older data to separate collections.
Flexibility: This structure allows for easy addition of new fields or sections as the application evolves.
Data Model Design - Document Structure
Rationale for Embedding Accounts
- Frequent access with customer data: Embedding accounts allows for quick retrieval of account information alongside customer details, reducing the need for separate queries.
- Relatively small, stable set of accounts per customer: Most customers have a limited number of accounts, making embedding a practical choice without risking document growth issues.
Data Model Design - Document Structure
Handling Historical Account Data
For closed accounts or historical data:
- Archiving strategy: Move closed accounts to a separate 'archived_accounts' array or collection after a specified period.
- Status field: Add a 'status' field (e.g., 'active', 'closed') to each account object for easy filtering.
- Date fields: Include 'opened_date' and 'closed_date' fields to track the account lifecycle.
- Periodic cleanup: Implement a process to move or delete old, closed accounts to prevent unbounded growth of the customer document.
Data Model Design - Recent Transactions
Keeping only recent transactions in the customer document
- Limited set: Store only the most recent transactions (e.g., last 30 days or last 50 transactions).
- Regular updates: Implement a system to add new transactions and remove old ones from this array.
Rationale
- Quick access: Enables fast retrieval of recent activity without querying a separate collection.
- Performance: Limits document size, maintaining fast read/write operations on the customer document.
- Relevance: Most common use cases involve recent transactions, making this data immediately available.
Data Model Design - Recent Transactions
Data Model Design - Products and Interactions
Data Model Design - Products and Interactions
MongoDB Implementation - Indexing
Performance Optimization - Caching
Key benefits of this architecture:
- Reduced load on MongoDB for frequently accessed data
- Improved read performance for cached items
- Ability to handle traffic spikes more efficiently
Real-time Updates - Change Streams
Considerations:
- Error handling: Implement robust error handling and reconnection logic for network issues
- Scaling: Use multiple change stream consumers for high-volume changes
- Filtering: Utilize pipeline to filter changes and reduce unnecessary processing
Complex Queries - Aggregation Pipeline
Explanation of pipeline stages:
- $unwind: Deconstructs the accounts array, creating a document for each account
- $group: Groups by customer ID, calculating total balance and count of accounts
- $match: Filters results to include only high-value customers (balance > $100,000)
Monitoring and Performance Tuning
How MongoDB help with NoSQL Data Modeling problems
NoSQL Benefits in Banking
=> MongoDB: Flexible document model, auto-scaling with Atlas, powerful aggregation framework.
- Flexible data modeling for diverse financial products
- Scalability for high-volume transactions
- Advanced real-time analytics capabilities
Key NoSQL Banking Use Cases
=> MongoDB Realm: Real-time sync for omnichannel banking experiences.
- Customer 360 View: ICICI Bank example
- Real-time Fraud Detection: Rabobank case study
- Regulatory Reporting: Deutsche Bank solution
AI/ML Integration
=> MongoDB: Atlas Data Lake for AI/ML integration, Atlas Search for NLP, native vector search.
- Predictive modeling for customer behavior
- ML model management and serving
- NLP for customer communications analysis
Hybrid Database Solutions
=> MongoDB Atlas: Multi-cloud support, SQL connectors, BI Connector for analytics.
- Polyglot persistence: SQL + NoSQL
- Multi-model databases
- Data virtualization for unified views
Edge Computing in Banking
=> MongoDB Realm: Edge computing support, offline-first functionality for mobile/IoT.
- Distributed transaction processing
- IoT data handling (ATMs, POS terminals)
- Blockchain data management
Challenges and Solutions
=> MongoDB: Advanced security features, MongoDB University for training, migration tools.
- Data governance and compliance
- Skill gap and training needs
- Legacy system integration
Thank you!