1 of 12

Jeopardy! Trivia Site

Group 7 – Somang Han, Sally Hu, Dat Nguyen, David Weissman

1

Have some coffee and play a Jeopardy! game with us!

2 of 12

About Us

Motivation

  • Created by Merv Griffin, the show features a quiz competition in which contestants are presented with general knowledge clues in the form of answers and must phrase their responses in the form of questions.
  • We are building a Jeopardy! web application but with some extra fun!
  • The users can challenge themselves with more than 217,000 real questions from the show and see how they stand against Jeopardy! Contestants.

Welcome to Group 7 Presentation!

We are Somang, Sally, Dat and David.

2

3 of 12

Preview of Our Web Application

3

4 of 12

Preview of Our Web Application

4

5 of 12

Datasets

In our application, we will mainly leverage two datasets.

Kaggle Dataset

  • Originally came from Kaggle that contains 217,000+ questions with answers, rounds, value per question, category, daily double, and air date.
    • The file size is around 35 MB with 217,000 rows and 7 attributes. 
    • The mean value of the “value” column is 766.2 with a standard deviation of 677.1
    • There are a total of 27,975 unique categories.
    • There are a total of 216,123 unique questions.
    • There are a total of 88,266 unique answers.

J! Archive Dataset

  • The data comes from the J! Archive website, a fan-created archive for Jeopardy!.
    • Contestant play data: contains 20,450 rows and 12,748 unique contestants.
    • Contestant data: contains 12,748 rows and 12,748 unique contestants.�

5

6 of 12

Entity-Relationship (ER) Diagram

6

7 of 12

Relational and Schema

7

Relation Name and Schema

Foreign Keys and References

(if applicable)

# of Instances

contestants(cid, name, occupation, state)

N/A

12,748

jeopardy_qa(qid, category, question, answer)

N/A

216,780

jeopardy_round(rid, round, value)

N/A

118

jeopardy_show(showNum, season, airDate)

N/A

6,820

jeopardy_episode(eid, showNum, rid, qid)

showNum: jeopardy_show(showNum)

rid: jeopardy_round(rid)

qid: jeopardy_qa(qid)

216,910

contestants_play(cid, showNum, isWinner)

cid: contestants(cid)

showNum: jeopardy_show(showNum)

20,450

After normalizing, all of our relations are in 3NF.�

8 of 12

Application/Technologies

We utilized the following technologies.

8

Client

Server

Request

Network

Application Software (Client Portion)

Application Software (Server Portion)

Response

Communication Protocols

9 of 12

Complex Queries

See below some examples of our queries.

9

WITH top_answers AS (

SELECT *

FROM ( SELECT ANSWER

, COUNT(*) TOTAL_QUESTIONS

FROM JEOPARDY_QA qa

INNER JOIN JEOPARDY_EPISODE ep ON qa.qid = ep.qid

GROUP BY ANSWER

ORDER BY TOTAL_QUESTIONS DESC)

WHERE ROWNUM <= 20

),

top_categories as (

SELECT *

FROM (

SELECT DISTINCT qa1.category

,COUNT(qa1.answer) AS numAnswers

FROM jeopardy_qa qa1

INNER JOIN top_answers qa2 on qa1.answer = qa2.answer

GROUP BY qa1.category

ORDER BY numAnswers DESC

)

WHERE ROWNUM <= 10

)

SELECT tc.category ,qa.question ,ta.answer

FROM top_categories tc

INNER JOIN jeopardy_qa qa ON tc.category = qa.category

INNER JOIN jeopardy_episode ep ON qa.qid = ep.qid

INNER JOIN top_answers ta ON qa.answer = ta.answer

ORDER BY tc.category, ta.answer`;

WITH winners AS (

SELECT *

FROM contestants_view

WHERE isWinner=1

),

top_occupations AS (

SELECT *

FROM ( SELECT occupation ,COUNT(DISTINCT cid) as numOcc

FROM winners

GROUP BY occupation

ORDER BY numOcc DESC)

WHERE ROWNUM <= 20

),

top_players AS (

SELECT c.occupation,name ,COUNT(DISTINCT shownum) AS numWon

FROM winners c

INNER JOIN top_occupations o ON c.occupation = o.occupation

HAVING COUNT(DISTINCT shownum) > 1

GROUP BY c.occupation,name

ORDER BY c.occupation,numWon DESC

),

final_top AS (

select occupation ,name, numWon,

(ROW_NUMBER() OVER (PARTITION BY OCCUPATION ORDER BY NUMWON DESC)) AS grp

FROM top_players

)

SELECT occupation, name, numwon

FROM final_top

WHERE grp <= 3

ORDER BY occupation, numwon desc;

10 of 12

Performance

10

We employed several optimization techniques to speed up our performance.

Query

Timings (in seconds)

Improvements

Before

After

Delta

Creation & Indexing of View

Index on Jeopardy Q&A table

Restructure Query

Questions From Top Categories Of Top Answers

1.125

0.682

0.443

X

Top Winners From Top Occupations

0.227

0.173

0.054

X

X

Get Questions

0.194

0.176

0.018

X

Top Winners With Most Consecutive Wins

0.116

0.099

0.017

X

X

Top Questions by Category

0.133

0.123

0.01

X

11 of 12

Technical Challenges

11

This is not an easy feat!

Play Function

While implementing the play page, we spent some time debugging (ex. overriding answers) to better user experience.

Fuzzy Matching

Matching a user’s answer to the correct answer was challenging since there may be discrepancies between actual answers and the user’s submission.

Front-End Design

Designing the website in a nice format was difficult. We began by browsing Color Hunt and used Jeopardy! show pictures.

Missing Data on the

J! Archive website

Missing data on the web made html syntaxes different and led to getting information that we did not want.

12 of 12

Thank You!

12