Internet Application Design and Development
Requirement
Installing and Running
Modifying Server Settings
2.0 Supported HTTP Request Methods
2.1 Supported HTTP Request Methods
2.2 CGI
2.3 Access Control
2.4 Authentication
2.5 Mapping URLs to the file system by Alias
2.6 Logfile
2.7 Support for Client-side Caching
2.8 Architecture
2.9 Process Flow
3.1 Known Issues & Future Works
Introduction
Our
project was to understand how web server works and its
implementation. Within authentication process, I added the feature to
use the encrypted password file created by Apache’s tool, and
additional methods (DELETE, OPTIONS, TRACE) were also implemented.
1.1 Requirements
-java jdk 1.5
-Windows
/ Linux / Unix
-FireFox 2.0.0.2
-CGI perl and python
1.2 Installing and Running
Connect thecity.sfsu.edu via ssh and login
Download a tar file
Decompress the content of the Webserver.tar file to your desired location
Place htdocs directory anywhere in the city server under user account, and make sure the path is set in the httpd.conf file as the document root directory.
To change the path to the document root, modify the value of “DocumentRoot” om the httpd.conf configuration file.
Change a path into the location of Webserver
In shell, type in ./Run_httpServer.sh
Your server is now running on port 8975
To see a test page, Open browser window and type in
http://thecity.sfsu.edu:8975
You
should see content of the test page in your browser
1.3 Modifying Server Settings
The server loads several configuration files at start up.
Name Description Listen The server’s port to run ServerRoot The
server's root DocumentRoot The
server’s document root LogFile The
location of log file. DefaultType The
default mime type DirectoryIndex The
default file to show SendBufferSize The
server's size of buffer ScriptAlias The
script should be defined here. Alias The
alias of directory Directory directives The
authentication comes here. KeepAlive Persistent
Connection
httpd.conf is the most important file to define the server
behavior. The supported setting are following:
To change these settings, first make a back up of the httpd.conf file
Open the file by vi, and make appropriate changes
mime.types file contains information about known mime types of the
server
2.1 Supported HTTP Request Methods
The server supports the following http request methods which are required in this project.
GET
HEAD
POST
PUT
Additionally,
I’ve implemented following methods
DELETE
OPTIONS
TRACE
2.2 CGI
In order to make CGI programs work properly we have configured our Server to permit CGI execution. We have done this by adding ScriptAlias directive to httpd.conf file which tells the Server that specific directory is set aside for CGI execution.
The server provides basic access control if the server is running on UNIX based system. When a client makes a request for a file or a directory we check the permissions of the directory or the file. If read permissions are not set, the server will return a response with “403 Forbidden”. If the file is not found, the server returns “404 Not Found” See authentication for protect contents with username and password.
2.4 Authentication
At runtime, the server loads all necessary information about the authentication. When a request is made to the server, the server determines whether the requested path requires an authentication by looking at the internally stored authentication data. When the requested path requires authentication, the server sends to the client with the “WWW-Authenticate” to request authentication. After the client response with the authentication, the username and password is checked against the password file, which is specified in the Directory’s AuthUserFile. If the username and password combination is not correct we return “401 Unauthorized”.
The username and password file should be created by Apaches’
tool, and the password is encrypted by MD5, which slightly modified
by Apache. Since the MD5 is “one way encryption,” there
is no easy way to de-encrypt back to plain text. So, our server needs
to encrypt the password as Apache server does.
2.5 Mapping URLs to the file system by Alias
When a client makes a request, to decide what file to serve to the client, our server looks at the URL path and appends it to the DocumentRoot path which is specified in httpd.conf file. The server looks at Alias, which maps file system folder structure to the path specified in URL. For example if inside httpd.conf the user specified the following
Alias
/da1/ “/home/students/username/htdocs/ddir1/” and the
request is made with the following URL:
http://thecity.sfsu.edut:8975/da1/index.html
The server will look for index.html file in
/home/students/username/htdocs/ddir1 directory.
2.6
Logfile
Every time a request is made, the server writes a log entry to a document whose location is specified in httpd.conf file. The log file contains basic information about the request to the server and the response made by the server along with the date and the IP of the requestor.
2.7 Support for Client-side Caching
To enable client-size caching, with every OK (200) response the
server sends back Last-Modified date for the file. The server looks
for If-Modified-Since header in the request and checks the
modification date of the file against the received date. If the file
is not modified, the server returns 304 status, otherwise 200 status
is returned with contents.
2.8 Architecture
I’ve designed the architecture by my own, which has three packages: server, method and utility packages.
The classes in the method package are called by Class.forName(), So
if we have additional methods to implement, we just need to place the
new method class, and the other classes does not need any
modification at all, which increases efficiency in software
maintenance. (See Method Diagram in the end.)
Server
Package (./server)
AccessLog Log Output is done by
this class.
Authentication Determined and process authentication
CGIHandler Process CGI
CGITRuntimeThread Handle the thread of CGI Handler
Const All constants are defined here
Direcotry Corresponding to the DIRECTORY tag.
DirectoryContainer Contains multiple Direcroty class
Environment Prepare and create environment variables.
HttpdConf Loads httpd.conf into hashtable at runtime
HttpRequestHandler SocketHandler calls HttpRequestHandler
HttpRequestThreadGroup Thread group of HttpRequestHandler
HttpServer
Calls Socket Handler
KeepAlive Used for persistent
connection
MimeTypes Loads mime.types into hashtable at runtime
SocketHandler Calls multiple HttpRequestHander
Method Package (./method)
Base Base class of all methods (GET, POST…)
DELETE Compliant with RFC2616’s delete method
GET Compliant with RFC2616’s get mehod
HEAD Compliant with RFC2616’s head method
OPTIONS Compliant with RFC2616’s options method
POST Compliant with RFC2616’s post method
PUT Compliant with RFC2616’s put method
TRACE Compliant with RFC2616’s trace method
ResponseBase Base class of ResponseXXX
Response400 Compliant with RFC2616’s “400 Bad Request”
Response403 Compliant with RFC2616’s “403 Forbidden”
Response404 Compliant with RFC2616’s “404 Not Found”
Utility
Package (./utility)
Base64 (*1) Used to decode
username and passowrd
ByteArray Used to store data as byte array
DateUtil (*2) Used to compare date for If-Modified-Since
DateParseException (*3) Used by DateUtil
Debug Used for debugging purpose
MD5 (*4) Used to encrypt username and password
MD5Crypt (*5) Used to encrypt username and password
The codes that I borrowed
*1: Base64
Encoder and Decoder at http://iharder.net/base64
*2&3: Date
Conversion from Apache’s Jakarta project
htttp://apache.edgescape.com/jakarta/commons/httpclient/binary/commons-httpclient-3.0.1.tar.gz
*4&5: MD5 encryption related from The University of Texas at Austin
ftp://ftp.arlut.utexas.edu/pub/ganymede/1.0.12/ganymede-1.0.12.tar.gz
2.9 Process Flow
When the server starts, the server (HttpServer) instantiates SocketHandler as separate thread.
SocketHandler loads all configuration files: httpd.conf, mime.types, and authentication file if necessary.
SocketHandler tries to open a socket to communicate with the client.
If SocketHandler open a socket, SocketHandler instantiate HttpRequestHandler as multiple thread.
If HttpRequestHandler finds legal syntax of headers, read all headers into buffers. If the client sends body contents, the serve reads all contents into an instance of ByteArray.
When the HttpRequestHandler detects that the input stream is empty, the HttpRequestHandler instantiates the appropriate method class as multiple thread.
HttpRequestHandler instantiates appropriate method by reading the first line of buffer, and actual instantiation is done by Class.forName()
All method classes has a base class: Base, and the Base class
The Base class handles authentication if the requested path is set in the DIRECTORY directive in the httpd.conf.
The Base class handles simple client side caching support if the header: if-Modified-Since exists.
The Base class handles CGI by instantiating CGIHandler if the requests path is script aliased.
Otherwise, the Base class responses to the client by sending back the contents of the requested path.
3.1 Known Issues & Future Works
Persistent Connection by setting KeepAlive on in the httpd.conf, does not fully work well except image files. I think it is due to architectural problem of reusing sockets.
Security issues are not fully implemented.
Authentication is not full implemented except “Basic”