Web Architecture

What happens when you type a URL into a browser? This section answers that question, providing an overview of the process and protocols involved.

Static Web Pages
Static web pages contain text, images, and links. They look the same each time they are loaded into a browser-- the information on the page doesn't change.


Static web pages are relatively easy to create. You can use DreamWeaver or some other what-you-see-is-what-you-get (WYSIWYG) editor, and that software will create the Hypertext Markup Language (HTML) code for you. Or you can write HTML yourself. In either case, the HTML must be uploaded onto a web server. Generally this means using a file-transfer program like WFTP to move the file from your local folder to one on a web server.

Web pages are 'invoked' when a user types a uniform resource locator (URL) into the browser's address bar, clicks on a link, or fills out a form and clicks a submit button. The URL request is sent over the Internet to the server identified in the URL. Internet servers are actually identified with numbers-- Internet Protocol (IP) addresses-- instead of the domain names found in URLs. So a URL is first processed by a Domain Name Server (DNS), which converts the domain part of the URL-- the part before the first /-- into an IP address. After communicating with the DNS, the browser sends the request to the IP address the DNS server returned.


If you have access to a browser, try the following:

1. Enter the IP number 138.202.192.14 in the address bar. What web page appears?


2. Enter 138.202.192.14/alumni? What page appears?


As you can see, the domain name usfca.edu is mapped to the IP address 138.202.192.14. Normally, a user will type usfca.edu into the address bar. The browser, behind the scenes, sends 'usfca.edu' to the DNS, which returns 138.202.192.14 to the browser. The browser then sends the rest of the URL, the request, to USF's server. In the second example, the request is /online/library.


We call the request sent to the server an HTTP request. HTTP stands for Hypertext Transfer Protocol. This protocol provides the standard by which the nodes on the Internet communicate.

Now let's consider what happens when the HTTP request finally reaches its destination server. For static web pages, the server doesn't have to do much work-- the request just refers to a file which lives on the server. The server just grabs that file, along with any images, CSS style files, and javascript referenced, and sends those back across the network to the client browser. The browser then 'renders' these files and the user sees the requested page.

Things get more interesting when the request is for a dynamic page, which we'll explore next.

Dynamic Web Pages

A dynamic page is one that displays different information each time it is invoked. For instance, a news page that shows you up-to-date articles is a dynamic page. A page that responds to user input, such as when you fill out a form, is also considered a dynamic web page.

"Serving" dynamic pages is more complicated than serving static pages. The  server doesn't just directly return HTML in response to a request. Instead, server "programming code" is called which processes the request. This code is not HTML but code written in a high-level programming language such as Java or Python. The server code often will access information in a database, using the standard query language (SQL), and perform computations on that data. It then packages this computed data into  dynamically generated HTML which is sent to the client.
But what does this fancy term, dynamically generated HTML, mean? You can think of it like a form letter you might create in a word processing program, one in which the name and address can be changed each time you send out a specific letter. With such programs, you write the letter with named stubs representing the name and address of the recipient.

For web pages, the process is facilitated with HTML templates. HTML Templates contain HTML code along with special template variables for the dynamic part-- the stubs-- of the page. Basically, once the server code has computed the dynamic data needed, it then opens an HTML template file and fills in the template variables with real data. The resulting html code, with specific data, is then sent to the client.

For an example, let's consider what occurs when a user performs a search on Google. First, an HTTP request is sent to the Google servers. The Google server code accesses a large database of keyword-page mappings. The pages all have associated PageRanks-- a measure of their popularity. Google uses this popularity and a keyword matching algorithm to determine the top results. These are then packaged into an HTML template that looks like the Google result page but with stubs for the results. What gets sent to you, the user, is the HTML resulting from subsituting the results for your query into this generic results page.

There are a number of HTML templating systems. Django is a popular one. With Django, the template files have regular HTML along with special template variables denoted with double curly brackets. Here's an example:

    <h1> this is just a regular html header </h1>
    <h2> Your score on this test is {{score}} </h2>

In this sample, the template variable is {{score}}. The server code's job is to replace {{score}} with an actual number.

Template files can get more complicated. For instance, there is a way to show lists of data such as is necessary for the Google results example. The non-HTML code in a template file is called scripting code.

Deploying a Website

So you want to set up a web site? How do you do it? After you create your web pages, where do you put them? How do you set-up a domain name.

There are a couple of options. One is to create your site in the 'cloud'. The cloud refers to the huge server farms to which companies like Google and Amazon provide access. Google Sites, for instance, allows you to create your site in a wiki manner and store it on Google's servers, free of charge. You need know nothing about HTTP, servers, or any of the discussion above.

The other option is to set up your own server or use a server set up by your organization. A server can run on commodity hardware-- a $1500 PC could do the job. The difficult part is setting up the server software, such as Apache's Tomcat. Even for simple sites, it requires some technical expertise.

Organizations you work for or belong to may also provide 'server space'. At the University of San Francisco, for instance, you can upload web pages to a web server using a tool called USF Files.

If you set up your own server, you just upload files to a special 'web' directory defined by the server software. That special web directory is the root of your website. So if you're site is http://mysite.com, and your web directory is /web, then the URL of http://mysite.com/maps/calif.html will cause your server to look in the subdirectory /web/maps for a file 'calif.html'.

Whether you use the cloud or your own server, you'll probably want to buy a domain name. Domain names are controlled by ICANN-- the Internet Corporation for Assigned Names and Numbers. There are a number of companies that provide an interface to ICANN and allow you to obtain and register a domain. One is Yahoo-- check out
http://smallbusiness.yahoo.com/domains/.

You can register a domain name for about $10 a year. The hard part is finding a domain name that nobody has claimed. Its a bit like a real estate land grab, and buying and selling domain names can be a very lucrative business. Check out this CNN article: http://money.cnn.com/magazines/business2/business2_archive/2005/12/01/8364591/ describing the art of domaining and the Godfather of it, Yun Ye.