1 of 17

MOSS: Moo-sure Of Software Similarity

Lynbrook CS Club

12/6/21

2 of 17

Announcements

  • Member Presentations!
    • BECOME the presenter.
  • First USACO contest coming soon!
    • Contest window: December 17th to 20th
    • Very Exciting
    • Probably 4 hours long?

3 of 17

USACO: Do’s and Don'ts

  • DO:
    • Write your own code, from scratch (no templates)
    • Use a single account
    • Have fun, enjoy the problems
  • DON’T:
    • Discuss the questions until permitted
    • Use multiple accounts
    • Search up non-syntax specific things
    • PLAGIARISE OTHERS’ CODE

4 of 17

Why You Shouldn’t Plagiarize

  • Common Sense
  • You will get caught
  • You will get banned
  • USACO will contact Lynbrook
    • Possible academic disciplinary action

Brian Dean

USACO Cheaters

5 of 17

How Will They Catch You?

  • I’m glad you asked
  • Code plagiarism detectors!
    • Most famous for code is MOSS (Measure Of Similarity Software)
  • How do these plagiarism detectors work?

6 of 17

Plagiarizing Designing our Algorithm

  • Copy-detection algorithm: given a set of documents, identifies which pairs may have been copied from one another
  • What properties do we need?
    • Whitespace/Variable Naming Insensitivity: ignore meaningless whitespace, unaffected by renaming variables
    • Noise Suppression: ignore small matches, and only flag significant matches
    • Position Independence: scrambling order of code should not affect matches (ie: reordering functions)
  • We’ll generate “fingerprints” for each document

7 of 17

The Main Algorithm

  •  

8 of 17

Preprocessing

  • Remove whitespace?
    • Just Do It™
  • Fix variable names?
    • Just them all to the same thing (e.g., v or i)
    • Pros: easy
    • Cons: possible false positives
  • There are other, harder to implement ways to fix variable names
    • E.g., replace each with the number of tokens since the last occurrence of the same variable
    • Wonky, hard to implement, not necessarily better

9 of 17

#hashing

  •  

10 of 17

 

  •  

11 of 17

Rolling Hash

  •  

12 of 17

Choosing Hashes (winnowing)

  •  

13 of 17

Choosing constants…

  •  

14 of 17

Comparing Fingerprints

  • Many different metrics to pick
    • Each has different tradeoffs
  • Fairly simple, pretty good one for comparing similarity between two files:

  • Alternatively, rank by raw number of matches (more sensitive, but harder to cheat)

15 of 17

How To Beat Moss

A few tricks:

  • Write your own code
  • Don’t submit anything
  • Some convoluted method that has like a 30% chance of working, and requires writing code so bad it’d be immediately spotted by any human eye, and involves more effort than just doing the thing yourself

Basically, you don’t.

16 of 17

Sources (so it’s not plagiarism)

17 of 17

Thank you for coming!