1 of 43

Admin stuff

  • A2 available after lecture and doable after this lecture. There is an early submission bonus.
  • If you struggled with A1 (8+ hours), goto office hours for more help.
  • Expect to receive A1 marks by next lecture

2 of 43

CSCC09 Programming on the Web

Storing and Serving Data

Cho Yin Yong

3 of 43

So far

  • In memory array
  • What’s good?
    • Easy to develop with - Array manipulation!
  • What’s bad?

4 of 43

Problems

  • On server restart, everything is wiped
  • Hard to change data schema
  • Hard to support one-to-many, many-to-many relationships
  • Impossible to scale!

5 of 43

This Lecture

  • Extracting the persistence layer in the backend to a database and make the API stateless.�
  • Avoiding the N+1 problem�
  • Choosing a suitable pagination method�
  • Ways to upload files to the backend, and serve them properly

6 of 43

Now: One file approach

  • Unscalable
  • Messy code
  • Unmaintainable

7 of 43

Model View Controller (MVC)

  • Model: Business logic and storage layer
    • 1 model -> 1 database table�
  • View: Serving the HTML�
  • Controller: Linking Model and View together
    • Accepts HTTP requests as input and send commands to model for execution

8 of 43

Example folder structure

static/ # view

routers/ # controllers

chirps-router.js

models/

chirp.js

app.js # entrypoint

9 of 43

Persistent storage on disk - like SQL (in C43)

CRUD operations in SQL

C - INSERT INTO Chirps VALUES (...)�R - SELECT * FROM Chirps�U - UPDATE Chirps SET …�D - DELETE Chirps WHERE …

10 of 43

SQL Abstraction - Object Relational Mapper (ORM)

Active Record Pattern

C - await Chirp.create({ content: "Hello" });

R - await Chirp.findByPk(123);

After creating Chirp object:

U - await chirp.update({ content: "Bye" })

D - await chirp.destroy()

11 of 43

async / await

  • await keyword must be used within an async function.
  • You can only await a Promise!
  • Promise then will be evaluated SYNCHRONOUSLY.

app.get(“/messages”, async () => {

const msgs = await Message.findAll();

return res.json({ msgs });

})

12 of 43

Synchronous = blocking

for (let i = 0; i < 10; i++) {

user = await User.findByPk(i); // BLOCKING

console.log(user);

}

We can use Promise syntax instead

13 of 43

Resolving Promises in parallel

const users = await Promise.all([

User.findByPk(1),

User.findByPk(2),

User.findByPk(3),

])

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/all

These must be independent calls

14 of 43

Chirper Demo

Express + sequelize + sqlite

We will:

  • See added complexities when we move away from the in memory array
  • Consider error handling with databases and output relevant error messages

Chirp

Has many

15 of 43

Database Structure

https://tableplus.com/

16 of 43

Interacting with databases with code

17 of 43

Example code for GET /chirps

const chirps = await Chirp.findAll();

let user;

for (const chirp of chirps) {

user = await User.findByPk(chirp.userId)

}

What’s the problem?

Let’s assume this makes a SQL call

SELECT * FROM User WHERE UserId=1;

18 of 43

N+1 problem

const chirps = await Chirp.findAll();

let user;

for (const chirp of chirps) {

user = await User.findByPk(chirp.userId);

}

SELECT * FROM Chirps;

SELECT * FROM Users WHERE id=UserId

How many SQL calls in total?

19 of 43

Problems

  • Too many database calls from application
  • Time it takes to respond will go up linearly depending on how many records you query
    • Application slows down
  • Nightmare of all backend developers!

20 of 43

Solution #1 - SQL JOINs

SELECT c1.*, c2.* FROM Chirps c1

LEFT JOIN Chirps c2 ON c1.ChirpId = c2.id;

id

ChirpId

1

2

2

NULL

3

2

c1.id

c1.ChirpId

c2.id

c2.ChirpId

1

2

2

NULL

3

2

2

NULL

2

NULL

21 of 43

ORM

Or… ORMs make it even easier.

await Chirp.findAll({

limit: 5,

include: { association: "Chirp", attributes: ["content"] },

});

22 of 43

Solution #2 - Batching Queries

messages = await Message.findAll();

users = await Users.findAll({

where: {id: messages.map((m) => m.UserId) }

}); // SELECT * FROM Users WHERE id IN (1, 2, 3)

// combine results here with code

...

23 of 43

N+1 problems are not limited to SQL calls

It applies to API calls as well!

24 of 43

Quick demo in Chirper

  • Code with N+1 problem
  • Code with SQL JOIN

25 of 43

Pagination in APIs

26 of 43

What is pagination?

Dividing a collection into multiple pages, serving it one page at a time.

GET /patients?page=1&limit=10

Show me the first page with 10 patients

Return a list

[{...}, {...}]

Return an object with more information:

{total: 123, items: [{...}, {...}], next_page_uri: ‘/patients?page=2’}

27 of 43

No pagination

Store everything in the frontend, show user a subset.

Good: Given that everything is already in memory, it is fast.

Bad: It consumes large amounts of memory. Unsure if end-user’s device can handle it

When should we not use pagination?

28 of 43

In-memory backend pagination

Retrieve collection in backend memory, but serve paginated version to frontend.

Good: For collection storage which does not have a pagination API

Bad: Huge memory usage in the backend

29 of 43

Offset-limit pagination

page=0, limit=10

First page = first 10 records

Second page = skip first 10 records, read record 11-20

xth page = skip (10*x) records,

read record (10*x)+1 to (10*(x+1))

30 of 43

SQL offset-limit pagination

Retrieve and serve paginated collection from database query

Good: Filtering/Ordering usually comes for free with a database call

SELECT * FROM Messages WHERE UserId = 5 ORDER BY created_at DESC OFFSET 100000 LIMIT 10;

Bad: Poor for frequent data updates / OFFSET is not efficient

31 of 43

Problematic Scenario

  1. You load page 1, which has 3 records [3, 2, 1]
  2. 3 new records came in. [6, 5, 4, 3, 2, 1]
  3. You try loading page 2 with 3 records [3, 2, 1]
  4. Page 2 is really just page 1.

[3, 2, 1, 0, -1, -2]

32 of 43

Cursor based pagination

GET /patients?limit=3&prev=2024-05-30T18:20:00Z

  • Return a pointer to the next record not on the current page�{items: [{...}, {...}], prev: ‘10’, next: ‘5’}�
  • Limitation: it requires a sequential column for consistent ordering�SELECT * FROM Chirps WHERE id >= 5 LIMIT 3;�
  • next_item_id could be a timestamp too!�SELECT * FROM Chirps WHERE createdAt < now()

8

7

6

10

5

5

4

3

12

11

10

next

prev

33 of 43

Demo: Infinite scrolling with cursor-based pagination

Back to Chirper!

34 of 43

What pagination to use in Assignment 2?

  • For image pagination
  • For comment pagination

Hint: both works, but which might be easier?

35 of 43

Uploading and serving files

36 of 43

Can’t you only put strings in the request body?

  • I lied.
  • Content-Type: multipart/form-data to the rescue!

37 of 43

Sending the file to the backend

  • Bruno
  • HTML Form action (with page refresh - old)

<form action="/url" method="POST" enctype="multipart/form-data">

  • Fetch

38 of 43

HTML File upload element

  • Browser Javascript: no direct access to filesystem
  • <input type=”file” .../>

39 of 43

What is received in the backend

  • Metadata
    • Filename
    • Content Type (file type)
    • Size
    • Path
  • File content
    • Usually binary, sometimes string

40 of 43

MIME Type (Multipurpose Internet Mail Extensions)

“Browsers use the MIME type, not the file extension, to determine how to process a URL”

  • text/css
  • application/json
  • text/html
  • image/png
  • image/jpg

This information is exposed through HTTP Response header Content-Type

41 of 43

Demo

Uploading images with Fetch API in Chirper

42 of 43

Important notes

  • Do NOT base64 encode file content and send it as JSON�
  • Store user-uploaded file content separately from static content. As with all user uploaded information, it shall not be trusted.�
  • When serving the file, return with correct Content-Type header.

43 of 43

Security-first mindset

  • The application should validate that the user is indeed uploading an image before storage.
    • Restrictions in the browser file selection
    • Restricting based on file upload type (basic)
    • Production: run file through security scan first