1 of 9

DS 4300 - Fall 2024�

Amazon S3

Mark Fontenot, PhD

Northeastern University

Based in part on material from Gareth Eagar’s Data Engineering with AWS, Packt Publishing

2 of 9

Have you found some datasets?

2

3 of 9

S3

  • S3 → Simple Storage Service
  • Object storage allowing for massive scalability
  • Customers can use S3 to store and protect all sizes and types of data for:
    • websites
    • mobile apps
    • backup/restore/archive
    • data lakes
    • big data analytics

3

4 of 9

S3 Storage Classes

  • Production:
    • S3 Standard - Default class
    • S3 Express One Zone - High performance
  • Infrequent Access (IA):
    • S3 Standard-IA
    • S3 One Zone-IA
  • Archive:
    • S3 Glacier Instant Retrieval - Millisecond-level retrieval
    • S3 Glacier Flexible Retrieval - Minute-level retrieval
    • S3 Glacier Deep Archive - No real-time access

4

5 of 9

Bucket

  • A bucket is a container for objects.
  • You can store any number of items in a bucket
  • Initial quota of 10,000 general purpose buckets
  • Buckets have a name and live in a region

5

6 of 9

Objects

  • Objects are the fundamental entities stored in S3.
  • Consists of:
    • data
    • metadata
      • Set of name-value pairs that describe the object
      • Some is default, but you can specify custom metadata
  • Uniquely identified with an object key
    • Object key is unique to an object within a bucket
    • Every object has exactly one key
  • Every S3 object can be uniquely addressed with a URI

6

7 of 9

More on Objects

  • S3 can version objects
    • keep multiple variants of an object in the same bucket
    • preserve, retrieve, restore any of the prior versions
    • When enabled, every object gets a version ID.
  • In a bucket, objects can be organized in folders
  • Very granular access controls.
    • Need to be able to grant access to only those users who can access a particular object

7

8 of 9

Small Demo

8

9 of 9

??

9