1 of 25

CS476

Automata Theory and Formal Languages

http://cs.bilkent.edu.tr/~calkan/teaching/cs476/

2 of 25

Basics

  • Lecture times
    • Tuesdays 13:30 - 15:20 and Fridays 9:30 - 10:20 EE-04
  • TAs & Graders:
    • TBA
  • Textbook: Introduction to the Theory of Computation, 3rd Edition, Cengage Learning, Michael Sipser
  • Grade distribution
    • 10 Quizzes - every Tuesday (starting Sep 24) unless announced otherwise (10%)
    • 3 Classworks (30%)
    • Midterm (25%)
    • Final (35%)
  • FZ Policy: [(Classworks' Average x 0.3) + (Midterm x 0.25)] should be at least 18 (out of 55).
  • Read the FAQ at the course web page.

3 of 25

Management issues

  • I will assume you remember MATH 132 topics.
    • Polish up: http://cs.bilkent.edu.tr/~calkan/teaching/cs476/0-intro.pdf
  • All course notes are available in the course web page, and will be posted to Moodle
  • Midterm and Final exams are closed book, closed notes
  • Classworks are open lecture notes (clean copies only)
  • Makeup Policy: Medical report holders will be entitled for the midterm make up, and ONLY ONE classwork makeup in the last week of the semester. Both midterm and classwork makeups will be comprehensive.
  • Grading Policy: Note that I do not discuss with students about grades. Therefore, I will not answer any questions about the passing grades and/or students' requests for passing the course. Any emails sent to this effect will be omitted.

4 of 25

Planned Lecture Cancellations

The following lectures are canceled.

  • October 4
  • November 15

5 of 25

Outline

  • Automata Theory
    • Regular Languages
      • DFA/NFA/Regular Expressions
      • Pumping lemma for RL
    • Context-Free Languages
      • Context-Free Grammars
      • Pushdown Automata
      • Pumping lemma for CFL
  • Computability Theory
    • Turing Machines
    • Undecidable Languages
  • Complexity Theory
    • P vs NP
    • NP-Complete Problems

CW 1

CW 2

Midterm

Final

CW 3

6 of 25

Why is this a must course?

Computer Science is a form of applied mathematics

  • Fundamental math requirements for CS: Set Theory, Discrete Mathematics, Linear Algebra, Number Theory, Graph Theory
  • Computers “compute”: testing and verification of axioms
  • Computers have limitations
    • Unsolvable problems
    • Solvable but hard problems
    • Easy problems

7 of 25

Formal Languages

  • Formal Languages consist of words that are made up from an alphabet, and well-formed according to a specific set of rules
  • Natural languages (i.e., Turkish, English, Russian, etc.) are NOT formal languages -- but formal languages are used as a way to study their regularities
  • In this course:

Language = Set of words

8 of 25

This course ...

  • What are the fundamental capabilities and limitations of computers?
  • Automata Theory
    • definitions and properties of mathematical models of computation, which play a role in several applied areas of computer science (finite automaton used in text processing, compilers, etc., and context-free grammars used in programming languages and artificial intelligence).
  • Computability Theory
    • classification of problems is by those that are solvable and those that are not.
  • Complexity Theory
    • to classify problems that are easy ones and hard ones.

9 of 25

“Real” World: Regular Languages

  • Regular expressions are used in many systems.
    • e.g., UNIX a.*b.
    • e.g., Document Type Definitions (DTDs): XML, HTML, SGML describe XML tags with a regular expression format like person (name, addr, child*).
  • Finite automata can model protocols, electronic circuits.
    • CS223: Finite State Machines
    • CS421: Network protocols (TCP/IP, HTTPS, etc.)
  • Extending Finite Automata with a “probabilistic layer” -> Hidden Markov Models
    • Machine learning
    • Bioinformatics

10 of 25

“Real” World: Context-Free Languages

  • All programming languages are context-free languages
  • Context-free grammars are used to describe the syntax of essentially every programming language.
    • Not to forget their important role in describing natural languages.
  • Extending CFGs with a “probabilistic layer” -> Stochastic Context-Free Grammars
    • Natural Language Processing
    • Bioinformatics (RNA Folding)
    • Homology Search
    • ...

11 of 25

“Real” World: Computability and Complexity

  • When developing solutions to real problems, we often confront the limitations of what software can do.
    • Undecidable things – no program whatever can do it.
      • Computability Theory - TMs, undecidability
    • Intractable things – there are programs, but no fast programs.
      • Complexity Theory: NP-hard problems

12 of 25

“The conversation”

13 of 25

Other important stuff

  • All lecture notes are available, and same content as in the lectures -- that’s the point.
  • Still, attendance is important - sometimes I mention common mistakes
  • Notation, notation, notation!
  • Quizzes will be graded harshly
    • Note that each is just worth 1%
    • No contribution to FZ
    • The aim is to keep you warmed up.

14 of 25

Q & A

I don’t

15 of 25

Q?

16 of 25

Intro, definitions, and MATH 132 Refresher

17 of 25

Sets

  • A set is a group of objects, called its elements or members, represented as a unit. For example S = {7, 21, 57} and S = {n|n = m2 for some m ∈ N } , where N = {0, 1, 2, 3, ...} is an infinite set of natural numbers. The set with no members is called the empty set, written as ∅.
  • A is a subset of B, written A ⊆ B, if every member of A also is a member of B.
  • A is a proper subset of B, written A ⊊ B, if A is a subset of B, and not equal to B.
  • Union (A ∪ B), intersection (A ∩ B), and complement (A¯) are useful operators defined on sets.
  • The power set of A (shown as 2A) is the set of all subsets of A

18 of 25

Sequences and Tuples

  • A sequence of objects is an ordered list of these objects such as {7, 21, 57} The order doesn’t matter in a set, but in a sequence it does.
  • A sequence with k elements is a k-tuple.
  • If A and B are two sets, the Cartesian product or cross product of A and B, written A × B, is the set of all ordered pairs wherein the first element is a member of A, and the second element is a member of B.
    • A={a, b, c}; B={1,2}
    • A × B = {(a,1), (a,2), (b,1), (b,2) , (c,1), (c,2)}

19 of 25

Functions and Relations

  • A function f : D → R defines a mapping from inputs in its domain D to outputs in its range R. If f is a function whose output value is b when the input value is a, we write f(a) = b.
  • A function that use all the elements of the range is said to be onto the range.
  • A function with k arguments is called a k-ary function.
  • A predicate or property is a function whose range is {TRUE, FALSE} . A property whose domain is a set of k-tuples A × . . . × A is called a relation or a k-ary relation on A.

20 of 25

Graphs

An undirected graph, or simply a graph, is a set of discrete objects called nodes or vertices, where some of these objects are joined by links called edges. Formally, a graph G = (V, E) is defined by its non-empty vertex set V, and its edge set E.

G = (V, E)

V = {1, 2, 3, 4}

E = {{1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4},

{3, 4}}

21 of 25

Graphs

  • The number of edges at a particular node is the degree of that node.
  • We say that graph G is a subgraph of graph H if the nodes of G are a subset of the nodes of H, and the edges of G are the edges of H on the corresponding nodes
  • A path in a graph is a sequence of nodes connected by edges. A simple path is a path that doesn’t repeat any nodes. A graph is connected if every two nodes have a path between them. A path is a cycle if it starts and ends in the same node. A simple cycle is one that contains at least three nodes and repeats only the first and last nodes.
  • A graph is a tree if it is connected and has no simple cycles.

22 of 25

Strings and Languages

  • An alphabet is a non-empty finite set, where the members are the symbols of the alphabet
    • Σ = {a, b, c, d, …., z}
  • A string over an alphabet is a finite sequence of symbols from that alphabet, usually written next to one another and not separated by commas
    • automata, coffee, chocolate, …
  • If w is a string over Σ, the length of w, written |w|, is the number of symbol it contains
    • w=coffee -> |w|=6
  • The reverse of a string w, written wR, is the string obtained by writing w in the opposite order. String z is a substring of w if z appears consecutively within w.
    • wR = eeffoc; z = off

23 of 25

Strings and Languages

  • Lexicographical order of strings is the same as the familiar dictionary order. We will occasionally use a modified lexicographical order, called shortlex order or simply string order, that is identical to lexicographical order, except that shorter strings precede longer strings (e.g. , 0, 1, 00, 01, 10, 11, 000, ...)
  • String x is a prefix of string y if a string z exists where xz = y, and x is a proper prefix of y if in addition x ≠ y.
    • y = coffee; x = cof (proper prefix); x = coffee (non-proper prefix)
  • A language is a set of strings. A language is prefix-free if no member is a proper prefix of another member

24 of 25

Symbols

  • ∀: For all
  • ∃: There exists
  • ∄: There does not exist
  • ⇔: If and only if (iff)
  • ¬: Not

25 of 25

Proofs

The only way to determine the truth or falsity of a mathematical statement is with a mathematical proof, which relies only on the facts (and those facts derived from the initial set of facts). One cannot base a mathematical proof on informal arguments, intuition, handwaving, and alike.

  • Proof by Construction: starting from given facts (and rules of mathematics) to derive newer facts until we reach the mathematical statement we are given to prove.
  • Proof by Contradiction: add the negation of the mathematical statement (our assumption) we are given to prove to the given facts, to eventually reach an obviously false consequence, also called a contradiction (for every possible case).
  • Proof by Induction: show that all elements of an infinite set have a specified property.