1 of 74

CSC-372

Machine Learning with Big Data

Introduction

2 of 74

About Me

  • Name:

Syed Fahad Sultan

سید فہد سلطان

  • Pronunciation (IPA):

ˈsæjjɪd fah(aː)d solˈtˤɑːn

  • Just call me “Dr. Sultan” (Pronounced: Sool-tahn )

  • Rising Junior at Furman

index=0

First name

index=1

Middle name

index=2

Last name

Syntax:

Syed

Fahad

Sultan

Given name

Middle name

Family name

Semantics:

Fahad

Sultan

Syed

3 of 74

How to reach me?

  • Office: Riley Hall 200-E

  • Email: fahad.sultan@furman.edu

  • No fixed office hours

  • You can schedule an appointment with me using using this link this semester.

  • I am using an appointment scheduling system Calendly to make it easier for you to find a time that works for you.

Open door policy, when not in class or meeting

Drop by office, for any other time

OR Email to schedule time

4 of 74

About the Course

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

Sunday

08:30 - 10:00

CSC-372 (Riley 204)

CSC-372 (Riley 204)

10:00 - 11:30

CSC-223 (Riley 204)

CSC-223 (Riley 204)

11:30 - 01:00

01:00 - 2:30

02:30 - 04:30

CSC-223 Lab

(Riley 203)

CSC-372 Lab

(Riley 203)

  • Class, Lab, Office hours: Guaranteed availability
  • Drop by Riley Hall 200-H or User calendly link to schedule a meeting or Email
  • Lecture/Labs for other courses
  • Email only
  • Meeting times: 8:30 - 10:00 AM Tuesdays and Thursdays (Riley 204)

  • Lab times: 02:30 - 04:30 PM Thursdays (Riley 203)

5 of 74

About this course

  • Course website:

    • https://fahadsultan.com/csc372 (NOT UP YET)

    • All of the content will be posted on the course website and Moodle

      • Some might also be uploaded on Moodle, as needed/requested

  • All Programming and Written Assignment submissions on Moodle

6 of 74

Grade Breakdown

Written Assignments

15

Programming Assignments

15

Class Participation

5

Professionalism

5

Exam 1

20

Exam 2

20

Final Exam

20

Finals

Midterm

Time & Effort

Finals

Exams

Time & Effort

7 of 74

Minimum Requirements

In order to pass this class, you must

1. Earn ≥ 60% of the total points

2. Attend ≥ 80% of the lectures and labs.

3. Submit ≥ 80% of all assignments

4. Take ALL tests and final!

In other words, you cannot blow off an entire aspect of the course and pass this class!

Note that this basic requirement is necessary but not sufficient to pass the class.

8 of 74

Assignments

  • Opportunities to:

    • Learn

    • Prepare for Career/Grad School

    • Prepare for Exams

    • Start early and reach out if stuck

      • If you reach out, I will just tell you the answers.

    • Find the joy in the course material!

  • Weekly deadlines

Written Assignments

15

Programming Assignments

15

Professionalism

5

Class Participation

5

Exam 1

20

Exam 2

20

Final Exam

20

9 of 74

Written Assignments

  • These would build on your understanding of concepts covered in class

  • Expect similar questions on the exams

  • Both handwritten or typed submissions are acceptable

  • Submit on Moodle

Written Assignments

20

Programming Assignments

20

Professionalism

5

Class Participation

5

Exam 1

15

Exam 2

15

Final Exam

20

10 of 74

Programming Assignments

  • We’ll start these during the lab sessions.

  • Due date: Generally, in a week’s time.

  • Implementation in Python

  • Google Colab

  • Collaboration of ideas strongly encouraged
    • Don’t share code
    • Don’t submit someone else’s code

  • Using AI for Assignments is not permitted and would be considered plagiarism

11 of 74

12 of 74

Professionalism

Assignments -- Written

15

Assignments -- Programming

15

Class Participation

5

Professionalism

5

Exam 1

20

Exam 2

20

Exam 3 (Final)

20

  • Attendance

  • Coming to class on time

  • Being respectful to others

  • No Cell phones in class
  • There is no such thing as a stupid question.

Under no circumstances is it acceptable to laugh at or ridicule someone else’s question.

Such behavior undermines a respectful and inclusive environment where all participants feel comfortable asking questions and engaging in meaningful discussions.

13 of 74

Class Participation

Assignments -- Written

15

Assignments -- Programming

15

Professionalism

5

Class Participation

5

Exam 1

20

Exam 2

20

Exam 3 (Final)

20

  • Come to class, labs and office hours

  • Ask questions during class

  • Answer questions and participate in discussions

  • Email your questions/comments

  • I may periodically give out class participation points during class for answering a question

  • Given the glut of information accessible online and otherwise in this day and age, meaningful interactions with your peers and teachers is essentially what you are paying college tuition for.

14 of 74

Exams

Exam 1

20

Exam 2

20

Exam 3 (Final)

20

Evaluation/Grade (50%):

  • You will be evaluated on your ability to apply knowledge to new problems -- “Show don’t tell”

    • Not just on your ability to retain and recall information

  • The exams are primarily going to determine your grade

  • All exams are going to be cumulative

    • With focus on the topics covered since last exam

  • You will be assigned an interim grade on Workday after every Exam

  • Diligent work on the homework and assignments will be rewarded here.

15 of 74

Academic Integrity

  • Do not plagiarize from friends/internet/AI. It is not worth it. You’ll suffer in the exams.

    • Come see me and just ask for the answer. For assignments, I will gladly share it with you

    • Ask for an extension instead.

  • 50 foot Policy:

    • Your code/answers should remain 50 feet away from your friend’s code/answers

16 of 74

Accomodations (SOAR)

  • Furman University recognizes a student with a disability as anyone whose impairment substantially limits one or more major life activity.

  • Students may receive a variety of services including classroom accommodations such as extended time on tests, test proctoring, note-taking assistance and access to assistive technology.

  • However, receipt of reasonable accommodations cannot guarantee success–all students are responsible for meeting academic standards.

  • Students with a diagnosed disability may be entitled to accommodations under the Americans with Disabilities Act (ADA).

17 of 74

Nondiscrimination Policy & Sexual Misconduct

  • Furman University and its faculty are committed to supporting our students and seeking an environment that is free of bias, discrimination, and harassment. Furman does not unlawfully discriminate on the basis of race, color, national origin, sex, sexual orientation, gender identity, pregnancy, disability, age, religion, veteran status, or any other characteristic or status protected by applicable local, state, or federal law in admission, treatment, or access to, or employment in, its programs and activities.

  • If you have encountered any form of discrimination or harassment, including sexual misconduct (e.g. sexual assault, sexual harassment or gender-based harassment, sexual exploitation or intimidation, stalking, intimate partner violence), we encourage you to report this to the institution. If you wish to report such an incident of misconduct, you may contact Furman’s Title IX Coordinator, Melissa Nichols (Trone Center, Suite 215; Melissa.nichols@furman.edu; 864.294.2221).

  • Additional information about Furman’s Sexual Misconduct Policy, how to report sexual misconduct and your rights can be found at the Furman Title IX Webpage. You do not have to go through the experience alone.

18 of 74

About the Course

  • This course builds on CSC-272 and focuses primarily on Neural Networks and Deep Learning

  • The shortest path from Linear Regression to Transformers

19 of 74

This course

Deep neural networks

How to train them

How to measure their performance

How to make that performance better

20 of 74

This course

Networks specialized to images

Image classification

Image segmentation

Pose estimation

21 of 74

This course

Networks specialized to text

Text generation

Automatic translation

ChatGPT

22 of 74

This course

Generative learning (unsupervised)

Generating random cats!

23 of 74

* Tentative Plan, subject to change

  • This is my first teaching this course

  • Any and all feedback is welcome!

  • “Feedback (Anonymous)” on Moodle

    • Anonymously share any feedback

    • Share any changes you want me to make in the course

    • At any point in the semester.

  • You can submit multiple times over the span of the semester.

  • Think of it as a Complaints Box for the course

24 of 74

TextBooks

  • Both of them are free online
  • Uploaded on Moodle

25 of 74

26 of 74

27 of 74

28 of 74

29 of 74

30 of 74

31 of 74

Supervised learning

  • In Supervised Learning, Models define a mapping from Input to Output
  • Learn this mapping from paired input/output data examples

32 of 74

Supervised learning

  • In Supervised Learning, Models define a mapping from Input to Output
  • Learn this mapping from paired input/output data examples
  • A Model is simply a mathematical equation
    • It takes in a vector of numbers and returns as output a vector of numbers

33 of 74

Classification vs. Regression

  • Supervised learning itself is of two types:
    • Regression
      • When Output(s) = Continuous number(s)
        • Univariate regression when there is one real-valued output
        • Multivariate regression when there is >1 real-valued outputs
    • Classification
      • When Output(s) = Category assignment(s)
        • Binary classification when output is one of two categories
        • Multiclass classification when output assigns to >2 categories

Artificial Intelligence

Machine Learning

Supervised Learning

Regression

Classification

Deep Learning

34 of 74

Regression

35 of 74

Regression

  • Univariate regression problem (one output, real value)
  • Fully connected network

36 of 74

Graph regression

  • Multivariate regression problem (>1 output, real value)
  • Graph neural network

37 of 74

Classification

38 of 74

Text classification

  • Binary classification problem (two discrete classes)
  • Transformer network

39 of 74

Music genre classification

  • Multiclass classification problem (discrete classes, >2 possible values)
  • Recurrent neural network (RNN)

40 of 74

Image classification

  • Multiclass classification problem (discrete classes, >2 possible classes)
  • Convolutional network

41 of 74

What is a supervised learning model?

  • An equation (cyan curve) relating input (age) to output (height)
  • Search through family of possible equations to find one that fits training data well
    • Training data is input/output pairs (brown dots)

42 of 74

Terms

  • Regression = continuous numbers as output
  • Classification = discrete classes as output
  • Two class and multiclass classification treated differently
  • Univariate = one output
  • Multivariate = more than one output

43 of 74

Deep Neural Networks

  • Deep neural networks are just a very flexible family of equations
    • These equations represent an extremely broad family of relationships between input and output

  • Fitting deep neural networks = “Deep Learning”

  • They can process inputs that are very large, of variable length, and contain various kinds of internal structures.

  • They can output single real numbers (regression), multiple numbers (multivariate regression), or probabilities over two or more classes (binary and multiclass classification, respectively).

    • Their outputs may also be very large, of variable length, and contain internal structure.

44 of 74

Structured outputs: Image segmentation

  • Multivariate binary classification problem (many outputs, two discrete classes)
  • Convolutional encoder-decoder network

45 of 74

Structured outputs: Depth estimation

  • Multivariate regression problem (many outputs, continuous)
  • Convolutional encoder-decoder network

46 of 74

Structured outputs: Pose estimation

  • Multivariate regression problem (many outputs, continuous)
  • Convolutional encoder-decoder network

47 of 74

Structured Outputs: Translation

48 of 74

Structured Outputs: Image captioning

49 of 74

Structured Outputs: Text to Image

50 of 74

What do these examples have in common?

  • Very complex relationship between input and output
  • Sometimes may be many possible valid answers
  • But outputs (and sometimes inputs) obey rules

Language obeys grammatical rules

Natural images also have “rules”

51 of 74

Complex Outputs: Idea

  • Learn the “grammar” of the data from unlabeled examples
    • Grammar is often underlying relevant Probability Distributions
  • Can use a gargantuan amount of data to do this (as unlabeled)
  • Make the supervised learning task earlier by having a lot of knowledge of possible outputs

52 of 74

53 of 74

Unsupervised Learning

  • Learning about a dataset without labels
    • Clustering
    • Finding outliers
    • Generating new examples
    • Filling in missing data

54 of 74

DeepCluster: Deep Clustering for Unsupervised Learning of Visual Features (Caron et al., 2018)

55 of 74

DeepCluster: Deep Clustering for Unsupervised Learning of Visual Features (Caron et al., 2018)

56 of 74

Unsupervised Learning

  • Learning about a dataset without labels
    • e.g., clustering
  • Generative models can create examples
    • e.g., generative adversarial networks

57 of 74

Unsupervised Learning

  • Learning about a dataset without labels
    • e.g., clustering
  • Generative models can create examples
    • e.g., generative adversarial networks
  • PGMs learn distribution over data
    • e.g., variational autoencoders,
    • e.g., normalizing flows,
    • e.g., diffusion models

58 of 74

Unsupervised Generative Models

  • Generative unsupervised models learn to synthesize new data examples that are statistically indistinguishable from the training data.

  • Some generative models explicitly describe the probability distribution over the input data
    • New examples are generated by sampling from this distribution.

  • Others merely learn a mechanism to generate new examples without explicitly describing their distribution.

59 of 74

Generative models

60 of 74

Conditional Synthesis

  • They can also synthesize data under the constraint that some outputs are predetermined (termed conditional generation).
  • Examples include image inpainting and text completion

61 of 74

Latent Variables in Generative Models

  • Data can be mapped to lower dimensional than the raw number of observed variables
    • The number of valid and meaningful English sentences is much smaller than the number of strings created by drawing words at random.
    • Similarly, real-world images are a tiny subset of the images that can be created by drawing random red, green, and blue (RGB) values for every pixel. This is because images are generated by physical processes (see figure below).
  • We can describe each data example using a smaller number of underlying latent variables.
  • Here, the role of deep learning is to describe the mapping between these latent variables and the data.

62 of 74

Latent variables

63 of 74

Interpolation

64 of 74

65 of 74

Reinforcement learning

  • An agent
  • A set of states
  • A set of actions
  • A set of rewards

  • Goal: take actions to change the state so that you receive rewards

  • You don’t receive any data – you have to explore the environment yourself to gather data as you go

66 of 74

Example: chess

  • States are valid states of the chess board
  • Actions at a given time are valid possible moves
  • Positive rewards for taking pieces, negative rewards for losing them

67 of 74

Example: chess

  • States are valid states of the chess board
  • Actions at a given time are valid possible moves
  • Positive rewards for taking pieces, negative rewards for losing them

68 of 74

Why is this difficult?

  • Stochastic
    • Make the same move twice, the opponent might not do the same thing
    • Rewards also stochastic (opponent does or doesn’t take your piece)
  • Temporal credit assignment problem
    • Did we get the reward because of this move? Or because we made good tactical decisions somewhere in the past?
  • Exploration-exploitation trade-off
    • If we found a good opening, should we use this?
    • Or should we try other things, hoping for something better?

69 of 74

Landmarks in Deep Learning

  • 1958 Perceptron (Simple `neural’ model)
  • 1986 Backpropagation (Practical Deep Neural networks)
  • 1989 Convolutional networks (Supervised learning)
  • 2012 AlexNet Image classification (Supervised learning)
  • 2014 Generative adversarial networks (Unsupervised learning)
  • 2014 Deep Q-Learning -- Atari games (Reinforcement learning)
  • 2016 AlphaGo (Reinforcement learning)
  • 2017 Machine translation (Supervised learning)
  • 2019 Language models ((Un)supervised learning)
  • 2022 Dall-E2 Image synthesis from text prompts ((Un)supervised learning)
  • 2022 ChatGPT ((Un)supervised learning)
  • 2023 GPT4 Multimodal model ((Un)supervised learning)

70 of 74

71 of 74

2018 Turing award winners (Godfathers of AI)

72 of 74

2024 Nobel Prize Winners

  • Sir Demis Hassabis and Dr. John Jumper were co-awarded the 2024 Nobel Prize in Chemistry for developing AlphaFold, a groundbreaking AI system that predicts the 3D structure of proteins from their amino acid sequences.
  • John Hopfield and Geoffrey Hinton won the Nobel Prize in Physics for foundational discoveries and inventions that enable machine learning with artificial neural networks

73 of 74

Purposeful Pathways

All the information you need is on the Moodle Page

  • You must satisfy the purposeful pathways requirement to pass the course
    • Otherwise, you will get an Incomplete on Workday until you meet the requirements

  • Three Computer Science community meetings.

Please save one of these dates and times:

  • Wednesday, January 15 during the common hour – Riley 204
  • Friday, January 24 during the common hour – Riley 106
  • Monday, January 27 at 4:30 PM – Riley 106

Each CS and IT major is expected to attend one of these meetings. We’ve planned different days and times to accommodate as many schedules as possible. Attendance will count as a Computer Science Purposeful Pathways opportunity, but I hope the chance to shape our community together is reason enough to join.

74 of 74