In this project, you will create a movie database in MongoDB. You will first provide database functions that import, aggregate and relate the data, in order to begin to make the structure of it useful. In subsequent deliverables you will continue to refine the database structure will using technologies such as Node.js and Express.js to present the movie data, as a website, to users.

Deliverables:

D1: Relating Movies and Credits

In this deliverable, you will create a movie database in MongoDB, and provide functions on it. You need to use Javascript or Python 2.7 for this deliverable (do not use Node).

Among the goals for this project are for you to learn about how to organize collections of data in MongoDB, with different relational and index structures, to meet different computational / user needs. As part of this, you need to understand Mongo Object IDs and differentiate them from other ids that are part of external data.

Setup MongoDB as a service on a cloud platform.
Project Hosting Options
Download and unzip this archive of movie data onto your cloud service machine:
https://ecologylab.net/courses/studio/data/dataJSON.zip
This will decompress into 2 data files:

movies.json
credits.json

Create a new database called movies_mongo. Use it while developing the functionality below.
Read from the data file movies.json. Form a movies collection.
Provide a function getRecordByMovieId(movie_id) that returns the full movie record, as json, for that id.
Provide a function getRecordByIMDBId(imdb_id) that returns the full movie record, as json, for that id.
Provide a function getMovieStats(), which returns the following aggregate reporting about the entire movies collection, as a string:

Movies: <num_movies>
Total Running Time: <hours:minutes>
Unique genres: <num_genres>

Read from the data file credits.json. There are different ways that you might choose to form a credits collection:

Read the cast and crew objects into their own MongoDB credits collection. This, "the lesser way", resembles what you would do with a relational database.
Iterate through the array of credits objects. For each, directly merge the cast and credits into the matching movie record (like the orders for a customer, which we discussed in lecture).
This, "the better way" is REQUIRED FOR HONORS STUDENTS.
It will provide 5 points of extra credit for others.
For 10 extra points, to support person-centered browsing, create a collection of persons, each corresponding to cast and crew entries. Do not allow multiple entries for the same id / person. Include a movies collection, of MongoDB OBJECT_IDs, inside of each person entry.
Provide a function getPersonById(person_id) that returns a composite object that includes the person data and their associated movies.

Provide a function getCastByMovieId(movie_id) that takes a source file movie id (from the json) and returns the cast for that id, as json.
For 10 extra points, provide a function getCreditsStats(), which returns the following aggregate reporting about the entire movies collection, as a string:

Credits Entries: <total_num_credits_entries>
Cast Members: <num_unique_cast_entries>
Crew Members: <num_unique_crew_entries>

Note: there may be some people who function both as cast and as crew. You can ascertain this using the credit_id field, which should be unique across the collections.

Provide a function getAggregateRecordByMovieId(movie_id) that takes a source file movie id (from the json) and returns an aggregate object, which contains the full movie record, cast, and crew for that id, as json.

Use MongoDB's createIndex() function to support the performance of each of the above functions.
In addition to returning output as JSON, each function must print, to the console, the executionStats for each query that it runs. These are available via db.collection_name.find(blah).explain("executionStats").
Each query must run in a reasonable amount of time. Of course, this will be determined primarily by the indexes that you create to support it. Here is some info on how to evaluate and optimize the performance of queries. Google for more!

teamwork → development log

Be strategic to divide the work among team members. You may also want to sometimes work together on the same thing. Be thoughtful. Ensure that all teammates are up-to-date and on the same page.

Each team must maintain a development log (Google Doc, GitHub wiki, or similar) updated by the team members. Give access to the TA and PTs. This log will be graded. There is no designated format. We will check your daily progress.

Update your log daily.
Make sure to explicitly include a link to this log in github as part of every deliverable.
Your log should include notes from meetings. These, in turn, should include statements of who is committing to do what and WHEN. Make sure that your meetings regularly feature explicit conversations about this and that these commitments, in turn, are regularly posted in your development log.
Include a link to your development log in your README for each deliverable.

submission

All functionality must be implemented in JavaScript and/or Python 2.7. See resources.
Make sure to separately provide the code that you use to

populate the database,
optimize the database,
provide functions on the database.
Note: we will not be running the first 2 code components, so make sure that they include plenty of explanation about what you did, in case it's not obvious from the code.

Provide the TA and PTs access to your:

Private GitHub repository
MongoDB Instance

Create a user with READ ONLY role: https://docs.mongodb.com/manual/tutorial/create-users/
Connect remotely to the instance using the user credentials.

AWS tutorial (You may have to add bindIpAll: true in /etc/mongod.conf and choose ‘All traffic’ instead of ‘Custom TCP’ in AWS Security Group)
Azure tutorial (similar issue with the config file)

Follow the below instructions and submit the URL to your GitHub repository, the DB connection string for the teaching team to get READ ONLY remote access to your MongoDB instance, and the URL to team development log.

The DB connection string has the form:

mongo --username user --password pass --host hostname --port port_number

(e.g. mongo --username alice --password abc123 --host mongodb0.tutorials.com --port 28015)

For more details, see:

https://docs.mongodb.com/tutorials/connect-to-mongodb-shell/

Before you submit, verify the DB connection string in the MongoDB shell on a system other than your cloud service console.

**Do not use localhost (127.0.0.1) as the hostname for verification.**
**Python users**: Make sure to test your script by using the above connection string parameters as command-line arguments.

Initialize MongoClient() with these arguments. See: https://docs.mongodb.com/tutorials/connect-to-mongodb-python/#connect-using-host-and-port-parameters
However, do not put these arguments in GitHub. We will take these from the connection string submitted through Google form.

In your GitHub README file, include any other specific instructions for connecting to your MongoDB instance and running your code.
Each team must submit only once!
Use this form to submit the URLs and connection string.

Each individual: use this form to submit your self and peer evaluation.

D2: Movies Web

[In GitHub, create a separate branch for this deliverable. See submission instructions for details.]

In this deliverable, you will create a movies website. Think of it as resembling IMDb. As a conceptual starting point, consider what you like and dislike about IMDb and any other movies websites that you use. You can also consider the website tmdb.org, which is where our data came from. Keep these story ideas in mind, as you design the user experience and look and feel of your Movies Website.

Remember: carefully read the data you were given. Understand it! Perhaps make one member of your team the data expert. Get some website to pretty print JSON values and stare at them long enough so you understand the story they tell.

Note: here is how to form URLs with the image file data that you find in the JSON:

To get low res "thumbnail" images, appropriate for use in embedded presentations of data, such as about cast and crew members in a movie or movies a person has participated in, using data such as profile_path and poster_path (there will be others!), from your database. Say that you have such data from a particular database record. This applies to any field that ends in .jpg. Let's call it img_path. (Note, the below has an extra slash, for readability here, which you will not want.)

https://image.tmdb.org/t/p/w138_and_h175_face/img_path

To get higher res images, appropriate for featured data presentations:

https://image.tmdb.org/t/p/w600_and_h900_bestv2/img_path
https://image.tmdb.org/t/p/w780/img_path
Forum post about image sizes https://www.themoviedb.org/talk/53c11d4ec3a3684cf4006400

There may be other convenient ways to form image URLs. Look around. Play around.

Begin by performing thinnest architectural spike components with a Node.js stack.

Setup Node.js (web server), Express.js (routing from URL paths to (your) JavaScript code and/or templates in the web server), Mongoose (connect to MongoDB in your web server JavaScript code) the and a template engine (server-side structure for web pages, including passing data that you retrieved from MongoDB), such as Pug or Handlebars, on your cloud box.
Node.js Stack Resources
Create a Node webapp. Start the Node server.

Use wget or curl on your box to confirm that this webapp's default index page can be accessed from within your box. The default port, which is probably 3000, should be good.
Access this, your webapp's default index page in your web browser. You may need to open the port to do this. If you are using a high numbered port, then, you may not.

You will use a URL, http://your_box_hostname:port, to do this. Below, we will call this your_web_app_url. Make sure to save this and share it among your teammates.

Make a small edit in the project's default index file(s), just to see that you can. In my current environment, these files are index.js and index.pug. (Note: I am not advocating for Pug. I may change template engines at any time.)

Create a web service that provides data, as JSON, from your MongoDB instance through your web app. We highly recommend that you use Mongoose to do this. If so, install Mongoose. Use Express and (we expect) Mongoose to create 3 public web service API endpoints

/dbservice/movie?movie_id=int
Return all the useful data about a movie, as JSON. The dataset includes many fields for each movie. Make sure to return all of them!

Additionally, make sure, within each movie object, to return one array of cast member objects and another array, of crew member objects. These aggregate objects should include data about each person and their role, sufficient for presentation in an associated web page.

name
id
role (character or department and its value)
img_path

/dbservice/person?person_id=int

Return useful data about a person, as JSON. You may notice, our dataset is relatively impoverished for each person, compared to the movie data.

If you have not already, use the credits data to create a collection of unique persons in order to support this function. The base part of each person record should include at least:

name
id
img_path

Also, aggregated in each person object should be one array of objects representing the movies in which they function as cast, and another array of the movies in which they function as crew. One of these arrays may be empty. Internally, each array entry should include:

MongoDB Object IDs to represent the associated movie object.
role (character or department and its value)

When you return data, in the service, convert these arrays of movie objects, with Object IDS, into JSON entries, each of which includes the data you would want to present inside a person web page or in search results for a person. These include (you could want more than this!) at least:

role (character or department and its value)
movie title
movie id
movie poster_path (for displaying, in a web page, a thumbnail image for the movie)

/dbservice/search?q=query_string&num=max_num_results
Return, as JSON, an array of matching movie objects and an array of matching people objects. The query string should be either a movie title or person name. To generate the results, perform 2 find operations, with the same query_string, perhaps padded with wildcard characters, on: (1) your movies collection and (2) your persons collection. For each find operation, build and return an array. In most cases, one of these arrays will be empty. Sometimes, both may be empty. The num parameter let's the caller limit the number of maximum number of results for each query.

In the JSON that you return, label the arrays movies and persons.

For each entry in each of these arrays, include:

id of the movie or person
title of the movie or name of the person. (You could choose to use the same label for both of these fields, even though they are different in the input data.)
img_path
role (character or department)

Create a website that uses HTML5, including CSS and perhaps JavaScript, to present movies data to users.

Thoughtfully use ideas from the Tufte readings on Visual Information Design—e.g., layering, separation, micro/macro readings, small multiples—as you design your website to effectively present visual information, making it legible and attractive. Make sure to avoid overuse of saturation and 1+1=3 problems. Also, keep in mind Norman's principles of affordance, feedback and user models.

There is too much information! Be thoughtful about which information to give what emphasis, through your design.

Your website should include these pages:

A home page, as /.
This is a good place to include a description of your site, effectively present content, which includes links to popular movies and actors, and a text entry field for a search function. (To streamline the presentation, you may choose to have an about page, linked from here, which describes your website, and skip the description on your home page.)

For 10 extra points, effectively present the top 10 grossing films in the collection.

Search results page, as /present/results?query

Implement this as 2 separate find operations, one on the movies collection, another on the persons collection, using the same code as in your /dbservice/search?q=query_string. You may limit the number of results you show to a maximum of 10, to avoid dealing with partitioning the result set in case of large numbers of results.
Pages for each movie, as /present/movie?movie_id

Here you will effectively present at least all the above data, which is relevant to users, about each movie, including nice entries (with links) to the page for each cast and crew member that was part of it. Also include site navigation, such as to the home page.

For 5 extra points, incorporate a link to the IMDb page for each movie.
For 10 extra points, incorporate a link to Wikipedia page for each movie.

Pages for each person (cast and crew), as /present/person?person_id

Here you will effectively present at least the data from your service, which is relevant to users, about each person, including nice entries (with links) to the page for each movie they performed in and each movie that they were crew for. Also include navigation, such as to the home page.

For 5 extra points, incorporate a link to the IMDb page for each person
For 10 extra points, incorporate a link to Wikipedia page for each person.
For 15 extra points, incorporate biography information for each person, which you would obtain from Wikipedia, IMDb, or some other web source.

d2 submission

Provide the TA and PTs access to your:

Private GitHub repository.
Use .gitignore. Only commit the files that you have changed. DO NOT commit all the files in Node.js!
You must create a separate branch for each deliverable. Use the following convention for naming your branch:

“P” + projectNumber + “D” + deliverableNumber

So, for this deliverable, title your branch P1D2.

In your README file, make sure to describe the repository’s organization of code components written for populating the database and manipulating the data. This refers to the code for insertion, merging, and modification of documents, and creation of any indexes, apart from any other operations, so as to effectively support the web service endpoints.
Use this form to submit the URLs to your:

GitHub repository
Website, i.e., your_web_app_url
Team development log

Each individual: use this form to submit your self and peer evaluation.

D3: Website Experience

Improve the user experience of your movies website. Improve both the interface and, as appropriate, backend functionality.

Of course, again, incorporate visual information design principles addressed by Tufte and interaction design principles addressed by Norman.

As part of this, finish / improve components of your previous deliverable. Additionally, develop these new functionalities:

Website interface / design.

Improved search experience.

Search interface on every page. Concisely include search on every page. Take just the right amount of the user's attention for this feature.
One set of search results. The previous D2 allowed you to separately present matching movies and people. Would any real website do this? Unify the interface to present a single set of search results. Include a mechanism for presenting more than one set of 10 results.

Movies / persons sorting.

When presenting cast and crew credits in a Browse Movie page, enable sorting by last name + first name, department, or character.
When presenting movies in a Browse Person page, enable sorting by year, genre, and popularity. Make one of these the default.

Focused presentation: Cast / credits in Browse Movie, Movies in Browse Person.
Initially show only the top 2-3 entries for each. Provide a simple affordance that enables expanding the list -- inside the same page.
10 points extra credit.
Improved affordances and user experience. Refine your interface / user experience.

Backend - integrated with front end. All improvements must support the user experience.
Be clear on the user stories that are involved.

Improved search experience / function. In D2, you provided a simple, first cut search. Now, make it nice for users. Implement this improved search in your API and use it in your website.

Unify the search operation. Search over both the movies and persons collections at the same time. Return a single array of results, in order.
Add a start parameter. This says what result number to start with, when you return results. Note: Google does this for presenting multiple pages of search results.
/dbservice/search?q=query_string&num=max_num_results&start=first_result
Make sure to only search relevant fields, that is, for a movie: its title and the name of a collection if it is in one; for a person, their name. When searching a movie, DO NOT search it's credits. When searching a person, DO NOT search their movies.
Make sure to only search for words, not just strings. For example, "hanks" should not match "Thanks".
In your website, support autocomplete, so as the user types a character, you present a short list of ranked possibilities. Performance of autocomplete should be sufficient that the user doesn't feel like they are waiting.
Find apparent misspellings in search queries. Present corrected results to users in a manner similar to what Google does.

Favorites by user. Enable the user to say what movies they like. When they seek to do this the first time, require them to register with the website. Thus, create user accounts. Require them to be logged into the website, in order for them to do this.

Incorporate into the website a page that displays a user's favorite movies. Require them to create an account / be logged into the website, in order for them to do this.
For 20 extra points, enable accounts to be created based on the user's Google account, using the OAuth2 protocol.

agile development

GitHub allows you to create agile boards and backlogs, assign issues, and track progress. For this deliverable, engage in an agile development cycle, by following the steps listed here.

Make sure to:

Create a backlog
Assign tasks to members
Move tasks through the board as you finish tasks

Your team’s processes will be graded.

d3 submission

Provide the TA and PTs access to your:

Private GitHub repository.
Use .gitignore. Only commit the files that you have changed. DO NOT commit all the files in Node.js!
Use the following convention for naming your branch:

“P” + projectNumber + “D” + deliverableNumber

So, for this deliverable, title your branch P1D3.

Use this form to submit the URLs to your:

GitHub repository
Website, i.e., your_web_app_url
Team development log

Each individual: use this form to submit your self and peer evaluation.