1 of 18

Lesson 4: Searching the web

Year 8 – Developing for the web

2 of 18

Key vocabulary

Starter activity

You probably use them every day, but what parts of a search engine can �you identify?

Visited hyperlink

Search bar

Search term

Unvisited hyperlink

Child pages

3 of 18

Key vocabulary

Starter activity

You probably use them every day, but what parts of a search engine can �you identify?

Visited hyperlink

Search bar

Search term

Unvisited hyperlink

Child pages

4 of 18

Lesson 4: Searching the web

Objectives

In this lesson, you will:

Describe what a search engine is
Explain how search engines ‘crawl’ through the World Wide Web and how they select and rank results
Analyse how search engines select and rank results when searches �are made

4

5 of 18

How do search engines work?

Activity 1

Search engines use keywords to categorise the web pages that they find.

When a user wants to find a useful web page, they enter these keywords and the search engine provides hyperlinks so that the user can access them.

5

6 of 18

Gathering information

Activity 1

Search engines use programs known as crawlers or spiders to find content on the World Wide Web.

These crawlers visit links from one web page to another, recording common keywords that they find.

By travelling along these links, the crawlers can eventually find newly created content.

6

7 of 18

Crawling

Activity 1

7

8 of 18

Crawling

Activity 1

8

Step 1: Source code of page explored for metatags that explain what the page is about

9 of 18

Crawling

Activity 1

9

Step 2: Important keywords are recorded (headings and words near the top of the document are flagged as more important for search results)

10 of 18

Crawling

Activity 1

10

Step 3: Hyperlinks are added to a queue, ready to be visited by the crawler once the search of the page is complete.

11 of 18

Indexing

Activity 1

When crawlers finish their journey, they are stored in a data structure called an index.

The index records the following about each web page:

Frequently used keywords
Type of content found, (images, text, etc.)
Date of last update

Other useful information is recorded that may be used later when users run their searches.

11

12 of 18

Crawl and index

Activity 1

Open the ‘Crawl and index’ worksheet.
Read through the HTML source code for two different web pages.
Think about what a crawler might pick up as it reads the page.
Complete the index table to predict what a crawler might summarise about each page.

12

13 of 18

Needle in a haystack

Activity 2

There are potentially millions of web pages that could be stored in a search engine index that correspond to a single keyword.

Searches query the index database to find pages with those keywords in them.

If you are looking to buy a ladder, why might the web page on the right appear at the top of the search results?

13

Start Activity two by considering how the collated index of all the websites found for a keyword is going to give an unusable amount of results. With the massive amounts of information available on the World Wide Web, clearly searches have to rank the pages in some way in terms of their relevance. It will not be useful just to return all the results that contain the most keywords.

Ask learners why the example web page image on the slide would appear near the top of a list of search results if that is the case. Reading through the text it is clear that the keyword ‘ladder’ is going to be flagged by the crawler as it appears in the heading and multiple times throughout the text. However, this is an information page and not something where you would buy ladders from. The user could add extra keywords to narrow the results but this is no guarantee of getting the most relevant results. Search engines also need to apply clever algorithms to help narrow the results further.

Image source: Bauwiki.png - Author: Flaaim (12 August 2019) [Creative Commons Attribution-Share Alike 4.0 International license] https://commons.wikimedia.org/wiki/File:Ladder_Inspection_Checklist.jpg

14 of 18

Spam

Activity 2

Web designers can use this knowledge to their advantage.

By filling a web page with multiple keywords, they can trick crawlers into thinking a page is more useful than it actually is.

14

15 of 18

Ranking algorithms

Activity 2

Search engine designers create complex algorithms that attempt to rank the importance of web pages, beyond the frequency that keywords appear.

How might ranking algorithms consider the following factors when judging the relevance of a web page?

When the page was last updated
Web pages that link to the crawled page
Other web pages that the crawled page links to
How long visitors to the page tend to stay

15

Remind learners that computer scientists design algorithms that can be coded to enable computers to carry out useful tasks. Search engines use algorithms to rank web pages of similar content so that better and more appropriate web pages are presented to searchers first.

Ask learners to consider the four bullet points and explain why they might be useful to a ranking algorithm.

When the page was last updated: If pages are not updated, the information in them becomes less relevant over time. ‘Fresher’ web pages are more likely to be more appropriate to the searcher.
web pages that link to the crawled page: web pages that link to a page mean that other web pages consider the content to be useful. When ranking pages, those with lots of links to the page indicate the page is well thought of.
Other web pages that the crawled page links to: Similar to the previous point, this gives an indication of the type of content the page relates to as the links are likely to provide context for the page content.
How long visitors to the page tend to stay: The usability of a page is important. If a page is difficult to use then readers will simply go elsewhere. Clearly these web pages are not going to be useful for other searches and so a ranking algorithm is going to consider the page less appropriate.

16 of 18

Build a high-quality web page

Activity 2

Considering all that you have learnt, you now need to create a web page that would rank highly at the top of a list of search results.

Your page needs to summarise the how search engines work, including:

How crawlers work
How web pages are indexed
How web pages could be ranked and why this is necessary

Use the ‘What makes a quality web page’ handout to give you some ideas about high-quality designs.

Save your web page as ‘search_engines.html’

16

17 of 18

Plenary

Swap seats with the person next to you.

Use the ‘Plenary review criteria’ handout to see if you think their web page has:

Clear headings using the heading tags
Important keywords near the top of each section of the page
Suitable meta-tags
Suitable images
No unnecessary information
A complementary colour scheme that isn’t too strong
Key information obvious on the page (e.g. uses bold, italics, etc.)

17

18 of 18

Next lesson

Summary

18

In this lesson, you...

Explored how search engines find and rank the content of web pages in order to provide more appropriate web pages for searchers

Next lesson, you will...

Consider how users can tailor searches to narrow the results and begin linking web pages you create using hyperlinks