1 of 22

Intro to Fullstack Dev

Session 3

2 of 22

Today’s agenda

  • APIs
  • Distributed systems and system design
  • Databases

3 of 22

APIs

Oracles

4 of 22

Black boxes

  • API - Application Programming Interface
  • Defines inputs and outputs, but you don’t know or need to care how it works
  • Several kinds of APIs
    • Operating systems
    • Libraries and frameworks
    • Web APIs

5 of 22

Web APIs

  • Interactions mediated through network clients (session 1)
  • Requests sent to a URL also called an endpoint
  • Web APIs are often built as distributed systems

6 of 22

Distributed systems

You can’t get what you want

7 of 22

Distributed systems

  • “A system whose components are located on different computers, which communicate and coordinate their actions by passing messages” - Wikipedia
  • A single computer might not have the power to handle a problem by itself
  • Clusters of computers work together to achieve a common goal

8 of 22

Distributed systems are stupid

  • Computer hardware fails at a statistically significant rate
    • Need to handle random, unexpected failures across your entire hardware
    • Networks can be bottlenecked
    • Can’t assume homogeneous systems
  • Inconsistent information
    • You don’t know the physical structure of the system
    • Network latencies inject uncertainty
    • Each individual machine has a limited view of the system
    • Computers don’t share timers or clocks
  • Formalized as the CAP Theorem
    • When you have a distributed system, you cannot have both availability of your system and consistency at all times

9 of 22

Modern systems

  • Favor availability over strong consistency
    • Refreshing the page to see an update is preferable to everyone going offline
  • Weak vs eventual consistency
    • Eventual - your changes will always be shown, might need to wait to propagate
    • Weak - your changes can fail
    • Weak is used in streaming, VoIP, many online games
  • Designed for scalability - the ability to handle rapid unexpected increases in traffic (Reddit hug of death)
    • Also need to scale down after traffic has passed, you don’t want to keep paying for servers you don’t need anymore

10 of 22

Microservices

  • Split up a single application into separate domains that handle one (or a few) things
  • Microservices communicate with messages - a distributed system of distributed systems
  • Scalability benefits
    • Bottlenecks can be scaled independently
    • Allows businesses to grow independently
  • Criticisms
    • Complexity
    • Performance

11 of 22

12 of 22

Service components

  • Load balancers distribute requests to make sure no one server in the cluster it fronts is overloaded
  • Servers handle actual processing
  • Databases store shared data that can be accessed by (and only by) the cluster

13 of 22

Databases

So important, they get their own title

14 of 22

Overview

  • A set of related data and its organization
  • Access is provided by a DB management system, software that can query and update the underlying storage
  • DBMS can be categorized by how data is related to each other and the engineering tradeoffs in their design (CAP theorem)

15 of 22

Relational databases

  • First formally described in 1970
  • “Standard” database
  • Data is organized into tables of columns and rows
    • Each row is an entry, each column is an attribute
  • Tables can be merged (joined) to create composite tables

16 of 22

RDBMS tradeoffs

  • CPU is more plentiful than storage
    • Prefer splitting data across tables, remove redundancies (normalized data)
    • Heavily depend on complicated query optimizers to speed up data access
  • Favor strong consistency
    • Can be tweaked, but true in general (ACID transactions)
    • Compromises availability in the case of database clusters
  • Almost all use a dialect of SQL

17 of 22

NoSQL databases

  • Technically existed before relational models, but only started being popular in the 2000s based on the needs of massive scale web companies
  • RDBMS tradeoffs are unacceptable at this scale
  • Many different types of NoSQL databases based on their data model, some of which actually do use SQL
    • Different data models are specialized for different workloads
    • Graph databases for social networks, etc

18 of 22

NoSQL tradeoffs

  • Storage is more plentiful than CPU
    • Moore’s law is over, CPUs are not getting significantly faster
    • Storage is still getting cheaper though
    • No complicated queries
  • Favor availability over consistency
    • Also has positive implications on access speeds
    • Most have eventual consistency models
  • Simpler data models than relational
    • Key-value, documents, etc
  • Many have their own specific query language

19 of 22

Do you need a NoSQL database?

  • Do you require sub millisecond latency across sustained loads of thousands of requests a minute?
    • Most people don’t operate at that scale, are perfectly acceptable with RDBMS tradeoffs
  • Do you have database admins?
    • RDBMS are all run on servers and require constant uptime ($$$$)
    • There’s a reason that database admin is an entirely separate job family
    • RDBMS are hugely complex and require specialized knowledge to operate at full potential
    • Backup and restore, migrations, query/index tuning, security, replication, etc
    • In general, NoSQL databases are simpler to operate, don’t have extensive tuning parameters

20 of 22

Our API

PepegaCredit

21 of 22

I swear I’m not a shill

  • Amazon API Gateway
    • Serverless load balancer
  • AWS Lambda
    • Serverless compute
  • Amazon DynamoDB
    • Serverless NoSQL document store database
    • Fully managed
    • I’m not a DBA, I’m not qualified to do RDBMS stuff
  • This is standard implementation inside Amazon
    • If it’s good enough for Amazon, it’s good enough for you

22 of 22

Next time

  • What is a web browser?
  • Client side security
  • React