To understand how the script updates the page, you need to know a few things about how a browser creates what you see on the display.
What you actually see on the screen is generated by a layout engine -
WebKit includes a very popular layout engine, used in Safari; Firefox uses its own layout engine, Gecko. Conceptually, the layout engine takes the DOM and makes calls to the underling GUI to render the display - in practice, the layout, parsing, and downloading are performed concurrently to improve the user experience; refer to Section 6.3.1 in this
document.
Scripts incrementally update the DOM on page load/exit, periodically, or on events such as mouse motion, mouse clicks, keystrokes, etc. (Here is a
linkto a comprehensive list of events). The layout engine will perform incremental layout - this leads to a much better experience for the user, since it's much faster than updating the entire browser window.
The DOM updates could be used to implement
image rollovers,
sortable tables, and
expandable content, which are all performed locally. Javascript can call out to remote services and update the DOM based on the results; this is how search suggestions, stock quote updates are implemented. (The script may also be used to log browser actions, in which case no updates of the DOM are performed - this is how Google Analytics, Statcounter, and other such services are implemented.)
The scripts call the services with arguments passed in a specific format. The remote services could be written in any language: PHP, Python, and Java are common. I prefer Java, since it has the advantages of a well-designed programming language as opposed to a hacked scripting language, and will focus on services written in Java.
The Javascript computation model has some subtleties: for example, for security reasons, the scripts cannot write to local files, access local data structures, and can only communicate with the domain they originated from. (File upload is an HTML feature and Javascript cannot change the contents of the fileupload element.) There are also restrictions on windows that scripts can open, in terms of size, captions, notification for closure. the The Javascript engine runs within a single thread in the browser, and cannot make concurrent calls to Javascript methods, since much of what Javascript is used for is UI work, and UI code has to be single threaded, as discussed in Goetz's beautiful text,
Java Concurrency in Practice.
The actual data that the web server uses to populate the dynamic page is usually pulled out of a SQL database, although in principle it could come from anywhere: a flat file, a data structure in RAM. etc. The database offers features such as indexing, locking, backup, etc. Similarly the data coming from the client that needs to be saved by the server is saved in the database.
The software running in the web server is a specific example of what is referred to more generally as middleware. The terms arises from the fact that the code sits between client requests and the backend databases.
Middleware separates the raw data from its presentation, and in this way the middleware can customize the same data for different browsers, languages, resolutions, bandwidths, etc. Middleware implements business logic - checking to see if inputs are well-formed, access permissions, etc. It can also implement caching and rate control. It can be viewed as an example of the
MVC pattern.
Although the service could be written in Java from scratch, using the
ServerSocket class to listen to a port, in practice it's better to use a servlet engine, such as Tomcat. The servlet engine handles many generic, mundane activities related to responding to the request. You write a
servlet class that extends
HttpServlet and overrides the method
doGet which handles requests. A configuration file maps the URL path corresponding to the servlet to the corresponding method that is invoked (this configuration file is called
web.xml and resides in a specific place relative to the root of the site.)
The servlet engine manages things like parsing the HTTP header - so if you need the value of the argument
qu, you call
request.getParameter("qu"). (The variable request is passed in to the doGet method.) It also handles session management, e.g., as described here.
Closely related to a servlet is the notion of a JSP, which is essentially HTML with inline Java code.
For example, the Google Servlet Engine at Google takes the URL http://www.google.com/complete/search?hl=en&js=true&qu=adnan and calls out to the servlet complete/search - it passes in the arguments as a hash {(hl,en), (js,true), (qu,adnan)}. Try experimenting with this URL: hl refers to the language (en is english, fr is french), js=true indicates Javascript is enabled in the browser, and qu=adnan is the keyword to suggest on.
The combination of client-side Javascript and server-side servlets interaction is sometime called AJAX; it can be pictorially illustrated as below:
Google App Engine
Conceptually, it is straightforward to create a website: all you need is the basic html/css/Javascript files, a server to serve them up and run the servlets (Tomcat is commonly used), and a database server (MySQL is the popular choice).
Creating a website that works in the real world has a number of challenges:
- Performance: as load on the site increases, responsiveness goes down, leading to a poor user experience
- Security: keeping the software up-to-date with respect to security patches is a challeng - the servlet engine, SQL server, JRE, and underlying OS are all targeted by hackers
- Denial of Service: cutting off repeated login requests & high-computation queries from a single client or a set of clients
- Latency: location of the servers has a major impact on the latency, and ideally you should have servers close to the client serve up content
- Access Control: managing users and passwords
Fortunately, there is a solution to all these problems: use a hosted web services company. There are many commercial companies: Rackspace and Akamai are examples (Rackspace for dynamic content, i.e., when you need to run a JVM), and Akamai for static content such as images, html, scripts, css files, etc. However, a better place to start is Google's App Engine. This has a high
quota of free requests (roughly equivalent to 25% of a machine), and is r
easonably priced once you go beyond the free quota (roughly 2$ per day for a Core2 class server running your apps 24x7). You can set limits on daily spend.
Furthermore App Engine will handles bursts of requests - it will give you roughly 10 machines at peak. Issues such as using servers close to the clients, replicating the databases for higher read performance, user login, keeping the machines patched, managing DOS attacks, etc., are all taken care of for you. You have the option of redirecting your own domain name to your App Engine deployment.
There is a very well written
tutorial on App Engine. Note that you can do all your development locally - a local JVM runs the Google version of Tomcat. and you point your browser to a server running on your local machine. You can do all the development from the command line, and don't need to spend a lot of time coming up to Eclipse and its project formats.
Javascript development in practice
Javascript suffers from the usual problems with interpreted languages -
- the lack of a type system makes it hard to catch errors at compile time,
- object-oriented programming is not natural,
- optimization is difficult,
- building nontrivial data structures is very hard
- porting is an issue (not all browsers implement the ECSMA standard exactly), and
- internationalization is nontrivial.
There are a couple of solutions to this: Javascript libraries and GWT.
Javascript libraries
There's no need to reinvent the wheel - YUI, Google AJAX libraries, Dojo, and Script.aculo.us are examples of libraries of Javascript functions for commonly performed actions. Some of the functions these libraries provide include:
- Popup calendar to make text entry easier
- AJAX packaging of XML data
- Building sortable tables
- Inplace editor
GWT - The Google Web Toolkit
GWT takes a very different approach - GWT is a Java to Javascript compiler, together with a set of Java libraries which can be compiled to Javascript. The use of Java does away with problems 1-4. The use of a compiler makes porting the compiler's problem. The libraries included with GWT make it much easier to build Javascript functions for things like getting data ready for RPCs, spriting images (sending a collection of images as a single image to reduce the overhead of seperate HTTP requests per image), etc.
The best way to learn GWT is to do this URL tutorial. One of the challenges is getting used to the directory structure - there are specific requirements on the locations of files for a site served up by a servlet engine (see next section). The GWT books I looked at were hopeless - out of date and many examples didn't even compile. Some of the example posted on the GWT site are very good.
Key Tools
- Firebug firefox plugin that allows you to inspect and manipulate DOM elements
- HttpFox firefox plugin that allows you to follow HTTP traffic to/from browser - works before SSL encryption
- Seleniumfirefox addon used to write and check tests at the browser level
Of course, need to know how to use other tools - version control, build, test, debug, edit, bug tracking, etc.
There are several Java library classes that are especially useful when building websites. I've found the following to be useful: URL, Mail, Html Parser, Timer, Junit, and XML. Protocol Buffers can also be helpful for sharing data across servers and languages.
A visit to any large bookstore will demonstrate that there are more books on Java than you could ever hope to read. I've seen many of them, very few really stand out. The following two are masterpieces that every programmer should have: Java Effectively, 2nd Edition and Java Concurrency in Practice. For a Java reference, I like Java Precisely - it's in the Kernighan and Ritchie style, short and to the point (best for people already experienced in another language). For general questions on Java, I like Core Java, Volumes 1 & 2 from Sun.
Strawman Implementation of a Website
Files
My website consists of the following files (these have to be laid out in a specific directory structure, see the
- Content that is statically served up to the client: HTML/CSS and Javascript files
- Dynamically generated content: JSP files
- Servlets that respond to client requests (these are Java class files)
- Java library code (beyond that available in the standard Java libs) that the servlets call out to
- A web.xml file that specifies the mappings from servlet paths to servlet files, & the entry page
- The SQLiteJDBC Java library that allows Java to talk to SQLite
Servers
The website needs a servlet engine, Java compiler and interpreter, a database engine, and of course an OS and hardware to run it on. Here's my steup:
- Intel Q6400 with 4 GB, running 64-bit Debian 5.0
- Sun's Java Run Time Environment, version 1.6, which you can get here (this is the Java bytecode interpreter and Java run time libraries)
- Sun's Java compiler, which you can get here
- The Tomcat servlet engine version 6.0 (this also serves up static content)
- SQLite database engine, version 3.5.9-6 - you can get the latest version here. SQLite is a library - it does not run in daemon mode. It uses the file system for locking, and has surprisingly good performance.
You can download my sample website as a tar file linked
here (low MBytes in size). This consists of a set of files as outlined above. There is a commented Makefile that explains the steps in deploying the code to the Tomcat server - basically compile the servlets into class files, and create a single war file which gets copied into Tomcat's webapp directory. Point your browser to localhost:8080/servletexample - the specific examples in the site are shown
here.
You can browse the files that make up the website
here.
When experimenting from home, I use an ssh tunnel so I can point my browser to localhost:8080 and have that tunnel to the (firewalled) website above. You can tunnel as follows: ssh -N -L 8080:localhost:8080 username@your.domain.com