1 of 78

NoSQL Data Modeling with banking use case

Yen Trinh

Cloud Platform Engineer - VPBank

2 of 78

Agenda

  • NoSQL database in banking
  • Data Modeling introduction
  • NoSQL Data Modeling principals
  • Case study: Customer 360 View
  • Conclusion

3 of 78

NoSQL Databases in Banking

4 of 78

Advantages of NoSQL in Banking

High Scalability

Horizontal Scaling

High Performance

Fast Query Response

Flexible Data Structure

Adaptable Schema

High Performance

Multi-source Analysis

{JSON}

<XML>

CSV

Binary

Plain Text

5 of 78

NoSQL Database Types in Banking

6 of 78

Document-oriented (MongoDB)

- This structure allows for flexible schema, accommodating varied customer data.�- Nested arrays (e.g., 'accounts', 'transactions') enable efficient storage of related data.�- Easy to add new fields (e.g., 'credit_score') without affecting existing records.�- Supports atomic operations on entire documents, ensuring data consistency.

Example: Storing customer information and transactions:

7 of 78

Key – Value�(Redis)

- Extremely fast read/write operations, ideal for caching frequently accessed data.�- Can store various data types (strings, lists, sets) as values.�- Supports atomic operations and transactions.�- Use case: Storing session data, caching account balances for quick access in high-traffic periods.

Example: Caching account information for quick retrieval

8 of 78

Column-family (Cassandra)

- Optimized for write-heavy workloads and time-series data.�- Partition key (account_id) determines data distribution across nodes.�- Clustering key (transaction_time) sorts data within partitions.�- Efficient for querying recent transactions or time ranges for a specific account.�- Scales horizontally to handle massive amounts of transaction data.

Example: Storing time-series transaction history data

9 of 78

Graph �(Neo4j)

- Nodes represent entities (customers, transactions, merchants).�- Relationships show connections between entities.�- Cypher query language is designed for graph traversal.�- No need for complex joins as in SQL; relationships are first-class citizens.�- Efficient for detecting patterns, like potential fraud scenarios.

Example: Analyzing customer relationships, fraud detection

10 of 78

NoSQL Applications in Banking

11 of 78

NoSQL vs SQL in Banking Context

Aspect

SQL

NoSQL

1. Schema Flexibility

Rigid schema, ALTER TABLE for changes

Flexible schema, easily add new fields

2. Scalability

Vertical scaling primarily

Designed for horizontal scaling

3. Querying

Complex joins, aggregations in-database

Simple queries, complex operations often in application layer

4. Consistency Models

ACID transactions

Often eventual consistency, with tunable consistency levels

5. Use Cases in Banking

Core banking systems, regulatory reporting

Real-time analytics, high-volume transactions, customer 360 views

12 of 78

Data Modeling introduction

13 of 78

What is Data Modeling?

Data Modeling: Crafting the Digital Blueprint�- Process of creating an abstract representation�- Defines data structure, relationships, and constraints�- Foundation for effective database design��Traditional vs. NoSQL Approach�- Relational: Structured, normalized, ACID properties�- NoSQL: Flexible schema, denormalized, eventual consistency

14 of 78

Importance of Data Modeling in Banking

Why Data Modeling Matters in Banking?

- Ensures financial data integrity and accuracy

- Supports business decisions and risk management

- Facilitates regulatory compliance and data governance

15 of 78

Banking's Data Evolution: Relational to NoSQL

16 of 78

Modeling �for RDBMS

Concerns

Step 1:

Define the schema

Step 2:

Develop the application and queries

CORRECT

17 of 78

Modeling �for RDBMS

Concerns

?

?

DENORMALIZED

Step 1:

Define the schema

Step 2:

Develop the application and queries

18 of 78

Modeling �for RDBMS

Concerns

Step 1:

Define the schema

Step 2:

Develop the application and queries

Data dictates

19 of 78

Modeling �for RDBMS

Concerns

Step 1:

Define the schema

Step 2:

Develop the application and queries

20 of 78

NoSQL Data Modeling (e.g. with MongoDB)

Develop the application

Improve the application

Define the �Data Model

Improve the �Data Model

21 of 78

Designed for the usage pattern

Data Model evolution is easy

Can evolve without any downtime

Many design options

Improve the �Data Model

Improve the application

22 of 78

NoSQL: Powering Modern Banking

- Scalability for millions of daily transactions�- Flexibility for rapid product innovation�- Real-time data processing for instant insights

23 of 78

NoSQL Data Modeling for Banking: Principles and Practices

24 of 78

Understanding Banking Domain for Data Modeling

Core Banking Entities

Key Relationships

Critical Banking Operations

Data Characteristics

1. Customer

Customer (1) --- (N) Account

1. Account Opening and KYC

High write volume (transactions)

2. Account

Account (1) --- (N) Transaction

2. Transaction Processing

Complex read patterns (reporting, analytics)

3. Transaction

Customer (1) --- (N) Product

3. Loan Origination

Long-term data retention

4. Product

Branch (1) --- (N) Account

4. Fraud Detection

Strict consistency requirements for financial data

5. Branch/ATM

5. Regulatory Reporting

25 of 78

NoSQL Data Modeling Techniques�for Banking

Technique

Description

Banking Example

Pros

Cons

Embedding

Nesting related data within a document

Embedding account details in customer document

Fast reads, reduces joins

Data duplication, potential for large documents

Referencing/Link

Using document references or foreign keys

Storing transaction IDs in account document

Maintains normalization, flexible relationships

Requires multiple queries for related data

Denormalization

Duplicating data to optimize read performance

Storing customer name in transaction documents

Improves query performance

Data inconsistency risks, update overhead

Aggregation

Grouping related data in a single document

Storing monthly statement summaries in account document

Efficient for reporting and analytics

May require frequent updates

Polymorphic Schema

Flexible document structures for similar entities

Different account types (savings, checking) in one collection

Accommodates diverse product types

Can complicate querying and indexing

Computed Data

Storing pre-calculated values for quick access

Maintaining running balance in account documents

Fast access to derived data

Requires careful management of updates

26 of 78

To link or embed?

How often does the embedded information �get accessed?

Is the data queried using the embedded information?

Does the embedded information change often?

27 of 78

What can be linked?

Relationships:

One-to-one

One-to-many

Many-to-many

Example: entities and relationships in a blog

users

  • name
  • email

articles

  • title
  • date
  • text

tags

  • name
  • url

categories

  • name
  • url

comments

  • name
  • url

1-to-N

1-to-N

1-to-N

N-to-N

N-to-N

28 of 78

Step-by-step iteration

Identify and apply relevant design patterns

Evaluate the application workload

Map out entities and their relationships

Finalize the �data model for each collection

Collections with documents fields and shapes for each

Data size

Database queries �and indexes

Current operations assumptions, and �growth projections

Data size

A list of �operations �ranked by importance

CRD: �Collection Relationship Diagram �(link or embed?)

Business domain expertise

Current and predicted scenarios

Production logs and stats

29 of 78

Designing Customer-Centric �Data Model

Design Considerations

1. Embed frequently accessed data (account summaries)

2. Use references for large, variable data (full transaction history)

3. Include computed fields for quick access (risk profile)

4. Implement versioning for auditable fields (addresses)

Querying Patterns

1. Customer lookup by ID or SSN

2. Aggregation for customer 360° view

3. Filtering based on risk profile or KYC status

Customer Document Structure

30 of 78

Account and Transaction �Modeling

Modeling Strategies

1. Embed recent transactions in account document for quick access

2. Store detailed transactions in a separate collection for scalability

3. Use time-based bucketing for efficient querying of historical data

4. Implement compound indexes for common query patterns

Common Query Patterns

1. Balance inquiry and recent transactions

2. Statement generation for a given time period

3. Transaction search by various criteria (amount, type, date range)

Transaction Document

Account Document

31 of 78

Handling Relationships in NoSQL for Banking - �Customer-Account Relationship (One-to-Many)

Technique:

Embedding with References

Approach Explanation:

- Basic account information is embedded in the customer document for quick access.

- Detailed account information is stored in separate documents, referenced by account_id.

- This approach balances fast retrieval of essential account data with the ability to store and manage detailed account information separately.

- It allows for efficient querying of customer data along with basic account details in a single operation.

32 of 78

Handling Relationships in NoSQL for Banking - �Account-Transaction Relationship (One-to-Many)

Technique:

Referencing with Recent Transactions Embedded

Approach Explanation:

- Recent transactions are embedded directly in the account document for quick access.

- All transactions are referenced by their IDs, with full details stored in separate documents.

- This hybrid approach provides fast access to recent transaction data while maintaining a complete transaction history.

- It's particularly useful for displaying account summaries and recent activity without additional queries.

33 of 78

Handling Relationships in NoSQL for Banking - �Customer-Product Relationship (Many-to-Many)

Technique:

Referencing

Approach Explanation:

- Both customer and product documents contain arrays of references to each other.

- A separate subscription document stores detailed information about each customer-product relationship.

- This approach allows for efficient querying of products associated with a customer and vice versa.

- The subscription document enables storing additional relationship-specific data without cluttering the main documents.

- It provides flexibility in managing complex many-to-many relationships while maintaining good read performance.

34 of 78

Handling Relationships in NoSQL for Banking - �Considerations

Key Consideration

Description

Technique Selection

Choose the appropriate technique based on data access patterns and query requirements.

Performance Balance

Balance between read performance and write complexity.

Data Integrity

Consider data consistency and integrity in distributed NoSQL systems.

Continuous Optimization

Regularly review and optimize data models as business needs evolve.

35 of 78

Indexing Strategies for Banking Data (MongoDB)

Purpose of Indexing

- Enhance query performance for critical banking operations

- Support real-time financial transactions and balance inquiries

- Enable efficient data retrieval for regulatory reporting and compliance

- Facilitate fast search capabilities for customer service operations

- Optimize performance for high-volume transaction processing

- Support complex analytical queries for risk assessment and fraud detection

- Ensure scalability of banking systems as data volume grows

36 of 78

Indexing Strategies for Banking Data (MongoDB) – Key considerations

Consideration

Description

Data Access Patterns

Analyze common queries in banking operations (e.g., account lookups, transaction searches)

Regulatory Compliance

Ensure indexes support required reporting and auditing capabilities

Data Security

Consider the impact of indexes on data encryption and access control

Write-Heavy Workloads

Balance index benefits against performance impact on high-volume transaction insertions

Read vs. Write Trade-offs

Optimize for read-heavy operations like balance checks while managing write performance

Index Size and RAM Usage

Ensure critical indexes fit in memory for optimal performance

Data Cardinality

Focus on high-cardinality fields like account numbers or transaction IDs

Time-Based Queries

Support efficient historical data access for statements and analytics

Scalability

Design indexes to support future growth in data volume and user base

37 of 78

MongoDB Indexing Strategies for Banking Data - Single Field Indexes

Useful for queries on a single field, like account number lookups.

Banking application: Quick customer account retrieval:

38 of 78

MongoDB Indexing Strategies for Banking Data - Compound Indexes

For queries involving multiple fields, such as transaction searches.

Banking application: Banking application: Efficient transaction history retrieval and reporting

39 of 78

MongoDB Indexing Strategies for Banking Data - Multikey Indexes

For indexing array fields, useful for multi-party accounts or transaction categories.

Banking application: Quick lookup of joint accounts or categorized transactions

40 of 78

MongoDB Indexing Strategies for Banking Data - Text Indexes

Enables full-text search capabilities, useful for transaction descriptions or customer notes.

Banking application: Enhanced transaction search functionality for customer service

41 of 78

MongoDB Indexing Strategies for Banking Data - Wildcard Indexes

Useful for fields with unpredictable structures, like custom attributes on accounts or transactions.

Banking application: Flexible querying on custom account attributes or dynamic transaction metadata

42 of 78

MongoDB Indexing Strategies for Banking Data - Geospatial Indexes

For location-based queries, useful for ATM or branch locators.

Banking application: Nearest ATM/branch finder for mobile banking apps

43 of 78

MongoDB Indexing Strategies for Banking Data - Hashed Indexes

For equality queries and supporting sharding, useful for large-scale transaction systems.

Banking application: Efficient sharding of large transaction collections for scalability

44 of 78

MongoDB Indexing Strategies for Banking Data - Indexing Strategies

The ESR (Equality, Sort, Range) Rule

Optimize compound indexes for query patterns:

- Equality (E) fields first

- Sort (S) fields next

- Range (R) fields last

Example: For queries filtering by account type, sorting by date, and ranging on amount

45 of 78

MongoDB Indexing Strategies for Banking Data - Indexing Strategies

The ESR (Equality, Sort, Range) Rule

Optimize compound indexes for query patterns:

- Equality (E) fields first

- Sort (S) fields next

- Range (R) fields last

Example: For queries filtering by account type, sorting by date, and ranging on amount

46 of 78

MongoDB Indexing Strategies for Banking Data - Create Indexes to Support Your Queries

Analyze query patterns and create indexes accordingly:

- Use db.collection.explain() to understand query execution

- Create indexes for frequently used queries

- Consider read/write ratio when adding indexes

Banking application:

Optimize for common operations like balance checks, recent transactions, and reporting queries

47 of 78

MongoDB Indexing Strategies for Banking Data - Use Indexes to Sort Query Results

Leverage indexes for efficient sorting

Banking application:

Efficient retrieval of recent transactions or statement generation

48 of 78

MongoDB Indexing Strategies for Banking Data - Ensure Indexes Fit in RAM

Monitor index size and server RAM:

- Use db.collection.stats() to check index sizes

- Ensure frequently used indexes fit in RAM

- Consider using partial indexes for large collections

Banking application:

Optimize performance for critical real-time operations like balance checks and fraud detection

49 of 78

MongoDB Indexing Strategies for Banking Data - Create Indexes to Ensure Query Selectivity

Focus on high-cardinality fields for better query performance

Banking application:

Prioritize indexing on unique identifiers like account numbers or transaction IDs

50 of 78

Advanced NoSQL Patterns for Banking - Bucketing Pattern

Use case:

Group related data into buckets to optimize query performance.

51 of 78

Advanced NoSQL Patterns for Banking - Versioning Pattern

Use case:

Maintain history of changes for auditing and compliance.

52 of 78

Advanced NoSQL Patterns for Banking - Polymorphic Pattern

Use case:

Use a single collection for similar but varying entities.

53 of 78

Advanced NoSQL Patterns for Banking - Subset Pattern

Use case:

Store a subset of data for quick access, with references to full data.

54 of 78

Best Practices for NoSQL Data Modeling in Banking

55 of 78

Case Study: Customer 360 View

56 of 78

Introduction �to Customer 360 View

57 of 78

Requirements and Challenges for Customer 360 View

58 of 78

Data Model Design - Overview

59 of 78

Data Model Design - Document Structure

Embedded vs. Referenced Data: This model primarily uses embedded data for faster read operations and to reduce the need for multiple queries.

Embedded personal_info: Frequently accessed data is embedded directly in the document for quick retrieval.

Arrays for multiple entries: accounts, recent_transactions, products, and interactions are stored as arrays, allowing for multiple entries while keeping related data together.

Scalability considerations: For very active customers with many transactions or interactions, consider implementing a capped array or moving older data to separate collections.

Flexibility: This structure allows for easy addition of new fields or sections as the application evolves.

60 of 78

Data Model Design - Document Structure

Rationale for Embedding Accounts

- Frequent access with customer data: Embedding accounts allows for quick retrieval of account information alongside customer details, reducing the need for separate queries.

- Relatively small, stable set of accounts per customer: Most customers have a limited number of accounts, making embedding a practical choice without risking document growth issues.

61 of 78

Data Model Design - Document Structure

Handling Historical Account Data

For closed accounts or historical data:

- Archiving strategy: Move closed accounts to a separate 'archived_accounts' array or collection after a specified period.

- Status field: Add a 'status' field (e.g., 'active', 'closed') to each account object for easy filtering.

- Date fields: Include 'opened_date' and 'closed_date' fields to track the account lifecycle.

- Periodic cleanup: Implement a process to move or delete old, closed accounts to prevent unbounded growth of the customer document.

62 of 78

Data Model Design - Recent Transactions

Keeping only recent transactions in the customer document

- Limited set: Store only the most recent transactions (e.g., last 30 days or last 50 transactions).

- Regular updates: Implement a system to add new transactions and remove old ones from this array.

Rationale

- Quick access: Enables fast retrieval of recent activity without querying a separate collection.

- Performance: Limits document size, maintaining fast read/write operations on the customer document.

- Relevance: Most common use cases involve recent transactions, making this data immediately available.

63 of 78

Data Model Design - Recent Transactions

64 of 78

Data Model Design - Products and Interactions

65 of 78

Data Model Design - Products and Interactions

66 of 78

MongoDB Implementation - Indexing

67 of 78

Performance Optimization - Caching

Key benefits of this architecture:

- Reduced load on MongoDB for frequently accessed data

- Improved read performance for cached items

- Ability to handle traffic spikes more efficiently

68 of 78

Real-time Updates - Change Streams

Considerations:

- Error handling: Implement robust error handling and reconnection logic for network issues

- Scaling: Use multiple change stream consumers for high-volume changes

- Filtering: Utilize pipeline to filter changes and reduce unnecessary processing

69 of 78

Complex Queries - Aggregation Pipeline

Explanation of pipeline stages:

- $unwind: Deconstructs the accounts array, creating a document for each account

- $group: Groups by customer ID, calculating total balance and count of accounts

- $match: Filters results to include only high-value customers (balance > $100,000)

70 of 78

Monitoring and Performance Tuning

71 of 78

How MongoDB help with NoSQL Data Modeling problems

72 of 78

NoSQL Benefits in Banking

=> MongoDB: Flexible document model, auto-scaling with Atlas, powerful aggregation framework.

- Flexible data modeling for diverse financial products

- Scalability for high-volume transactions

- Advanced real-time analytics capabilities

73 of 78

Key NoSQL Banking Use Cases

=> MongoDB Realm: Real-time sync for omnichannel banking experiences.

- Customer 360 View: ICICI Bank example

- Real-time Fraud Detection: Rabobank case study

- Regulatory Reporting: Deutsche Bank solution

74 of 78

AI/ML Integration

=> MongoDB: Atlas Data Lake for AI/ML integration, Atlas Search for NLP, native vector search.

- Predictive modeling for customer behavior

- ML model management and serving

- NLP for customer communications analysis

75 of 78

Hybrid Database Solutions

=> MongoDB Atlas: Multi-cloud support, SQL connectors, BI Connector for analytics.

- Polyglot persistence: SQL + NoSQL

- Multi-model databases

- Data virtualization for unified views

76 of 78

Edge Computing in Banking

=> MongoDB Realm: Edge computing support, offline-first functionality for mobile/IoT.

- Distributed transaction processing

- IoT data handling (ATMs, POS terminals)

- Blockchain data management

77 of 78

Challenges and Solutions

=> MongoDB: Advanced security features, MongoDB University for training, migration tools.

- Data governance and compliance

- Skill gap and training needs

- Legacy system integration

78 of 78

Thank you!