Internet Application Design and Development

Webserver Project
Table of Contents

Introduction

    1. Requirement

    2. Installing and Running

    3. Modifying Server Settings

2.0 Supported HTTP Request Methods

2.1 Supported HTTP Request Methods

2.2 CGI

2.3 Access Control

2.4 Authentication

2.5 Mapping URLs to the file system by Alias

2.6 Logfile

2.7 Support for Client-side Caching

2.8 Architecture

2.9 Process Flow

3.1 Known Issues & Future Works

Method Diagrams


Introduction
Our project was to understand how web server works and its implementation. Within authentication process, I added the feature to use the encrypted password file created by Apache’s tool, and additional methods (DELETE, OPTIONS, TRACE) were also implemented.


1.1 Requirements

-java jdk 1.5

-Windows / Linux / Unix
-FireFox 2.0.0.2

-CGI perl and python


1.2 Installing and Running

  1. Connect thecity.sfsu.edu via ssh and login

  2. Download a tar file

  3. Decompress the content of the Webserver.tar file to your desired location

  4. Place htdocs directory anywhere in the city server under user account, and make sure the path is set in the httpd.conf file as the document root directory.

    1. To change the path to the document root, modify the value of “DocumentRoot” om the httpd.conf configuration file.

  5. Change a path into the location of Webserver

  6. In shell, type in ./Run_httpServer.sh

  7. Your server is now running on port 8975

  8. To see a test page, Open browser window and type in http://thecity.sfsu.edu:8975
    You should see content of the test page in your browser

1.3 Modifying Server Settings

The server loads several configuration files at start up.

  1. Name

    Description

    Listen

    The server’s port to run

    ServerRoot

    The server's root

    DocumentRoot

    The server’s document root

    LogFile

    The location of log file.

    DefaultType

    The default mime type

    DirectoryIndex

    The default file to show

    SendBufferSize

    The server's size of buffer

    ScriptAlias

    The script should be defined here.

    Alias

    The alias of directory

    Directory directives

    The authentication comes here.

    KeepAlive

    Persistent Connection

    httpd.conf is the most important file to define the server behavior. The supported setting are following:







  1. To change these settings, first make a back up of the httpd.conf file

  2. Open the file by vi, and make appropriate changes

  3. mime.types file contains information about known mime types of the server

2.1 Supported HTTP Request Methods

The server supports the following http request methods which are required in this project.

GET

HEAD

POST

PUT
Additionally, I’ve implemented following methods

DELETE

OPTIONS

TRACE

2.2 CGI

In order to make CGI programs work properly we have configured our Server to permit CGI execution. We have done this by adding ScriptAlias directive to httpd.conf file which tells the Server that specific directory is set aside for CGI execution.

2.3 Access Control

The server provides basic access control if the server is running on UNIX based system. When a client makes a request for a file or a directory we check the permissions of the directory or the file. If read permissions are not set, the server will return a response with “403 Forbidden”. If the file is not found, the server returns “404 Not Found” See authentication for protect contents with username and password.


2.4 Authentication

At runtime, the server loads all necessary information about the authentication. When a request is made to the server, the server determines whether the requested path requires an authentication by looking at the internally stored authentication data. When the requested path requires authentication, the server sends to the client with the “WWW-Authenticate” to request authentication. After the client response with the authentication, the username and password is checked against the password file, which is specified in the Directory’s AuthUserFile. If the username and password combination is not correct we return “401 Unauthorized”.

The username and password file should be created by Apaches’ tool, and the password is encrypted by MD5, which slightly modified by Apache. Since the MD5 is “one way encryption,” there is no easy way to de-encrypt back to plain text. So, our server needs to encrypt the password as Apache server does.

2.5 Mapping URLs to the file system by Alias

When a client makes a request, to decide what file to serve to the client, our server looks at the URL path and appends it to the DocumentRoot path which is specified in httpd.conf file. The server looks at Alias, which maps file system folder structure to the path specified in URL. For example if inside httpd.conf the user specified the following

Alias /da1/ “/home/students/username/htdocs/ddir1/” and the request is made with the following URL: http://thecity.sfsu.edut:8975/da1/index.html
The server will look for index.html file in /home/students/username/htdocs/ddir1 directory.
2.6 Logfile

Every time a request is made, the server writes a log entry to a document whose location is specified in httpd.conf file. The log file contains basic information about the request to the server and the response made by the server along with the date and the IP of the requestor.

2.7 Support for Client-side Caching

To enable client-size caching, with every OK (200) response the server sends back Last-Modified date for the file. The server looks for If-Modified-Since header in the request and checks the modification date of the file against the received date. If the file is not modified, the server returns 304 status, otherwise 200 status is returned with contents.

2.8 Architecture

I’ve designed the architecture by my own, which has three packages: server, method and utility packages.

The classes in the method package are called by Class.forName(), So if we have additional methods to implement, we just need to place the new method class, and the other classes does not need any modification at all, which increases efficiency in software maintenance. (See Method Diagram in the end.)
Server Package (./server)
AccessLog Log Output is done by this class.

Authentication Determined and process authentication

CGIHandler Process CGI

CGITRuntimeThread Handle the thread of CGI Handler

Const All constants are defined here

Direcotry Corresponding to the DIRECTORY tag.

DirectoryContainer Contains multiple Direcroty class

Environment Prepare and create environment variables.

HttpdConf Loads httpd.conf into hashtable at runtime

HttpRequestHandler SocketHandler calls HttpRequestHandler

HttpRequestThreadGroup Thread group of HttpRequestHandler

HttpServer Calls Socket Handler
KeepAlive Used for persistent connection

MimeTypes Loads mime.types into hashtable at runtime

SocketHandler Calls multiple HttpRequestHander

Method Package (./method)

Base Base class of all methods (GET, POST…)

DELETE Compliant with RFC2616’s delete method

GET Compliant with RFC2616’s get mehod

HEAD Compliant with RFC2616’s head method

OPTIONS Compliant with RFC2616’s options method

POST Compliant with RFC2616’s post method

PUT Compliant with RFC2616’s put method

TRACE Compliant with RFC2616’s trace method

ResponseBase Base class of ResponseXXX

Response400 Compliant with RFC2616’s “400 Bad Request”

Response403 Compliant with RFC2616’s “403 Forbidden”

Response404 Compliant with RFC2616’s “404 Not Found”

Utility Package (./utility)
Base64 (*1) Used to decode username and passowrd

ByteArray Used to store data as byte array

DateUtil (*2) Used to compare date for If-Modified-Since

DateParseException (*3) Used by DateUtil

Debug Used for debugging purpose

MD5 (*4) Used to encrypt username and password

MD5Crypt (*5) Used to encrypt username and password

The codes that I borrowed
*1: Base64 Encoder and Decoder at http://iharder.net/base64
*2&3: Date Conversion from Apache’s Jakarta project

htttp://apache.edgescape.com/jakarta/commons/httpclient/binary/commons-httpclient-3.0.1.tar.gz

*4&5: MD5 encryption related from The University of Texas at Austin

ftp://ftp.arlut.utexas.edu/pub/ganymede/1.0.12/ganymede-1.0.12.tar.gz
2.9 Process Flow

  1. When the server starts, the server (HttpServer) instantiates SocketHandler as separate thread.

  2. SocketHandler loads all configuration files: httpd.conf, mime.types, and authentication file if necessary.

  3. SocketHandler tries to open a socket to communicate with the client.

  4. If SocketHandler open a socket, SocketHandler instantiate HttpRequestHandler as multiple thread.

  5. If HttpRequestHandler finds legal syntax of headers, read all headers into buffers. If the client sends body contents, the serve reads all contents into an instance of ByteArray.

  6. When the HttpRequestHandler detects that the input stream is empty, the HttpRequestHandler instantiates the appropriate method class as multiple thread.

  7. HttpRequestHandler instantiates appropriate method by reading the first line of buffer, and actual instantiation is done by Class.forName()

  8. All method classes has a base class: Base, and the Base class

  9. The Base class handles authentication if the requested path is set in the DIRECTORY directive in the httpd.conf.

  10. The Base class handles simple client side caching support if the header: if-Modified-Since exists.

  11. The Base class handles CGI by instantiating CGIHandler if the requests path is script aliased.

  12. Otherwise, the Base class responses to the client by sending back the contents of the requested path.


3.1 Known Issues & Future Works

  1. Persistent Connection by setting KeepAlive on in the httpd.conf, does not fully work well except image files. I think it is due to architectural problem of reusing sockets.

  2. Security issues are not fully implemented.

  3. Authentication is not full implemented except “Basic”