1

Google Summer of Code 2007 Project Proposal:

Drupal automated staging toolkit

Allister Beharry

I. Introduction

My project proposal for SoC 2007 is for a Drupal automated staging toolkit. This is a bit of a mouthful for a project name so I'll begin by stating what this multi-syllable contraption is supposed to do:

  1. Provide an easy, fast and automated way for a tester to create a complete Drupal site code-tree running specified versions of core and contrib- module code, with sample content and users populated in the database.

  2. Provide an east, fast and automated way for a tester to generate a LAMP stack configuration for hosting a Drupal site defined by:

    1. Choice of web server (Apache/Lighttpd/....);

    2. Web server configuration for Drupal (mod_php/FastCGI PHP/reverse proxy caching...);

    3. Choice of database (MySQL/PostgreSQL/...);

    4. Configuration of database (storage engine/client connection parameters...);

    5. PHP/application server configuration (enabled extensions/settings/database interfaces/optimizations - zlib output compression/op-code optimizers/memcached....);

  3. Provide a completely automatic way to stage the generated Drupal site and LAMP stack configuration to a physical server location.

  4. Provide an completely automatic way to generate a virtual machine image containing:

    1. A minimal Linux environment;

    2. The LAMP stack configured and provisioned as in (2);

    3. The Drupal site configured as in (1).


II. Motivation and benefits to Drupal

  1. A key requirement of automated testing of Drupal is the ability to automate creating a Drupal site with a specific configuration. In the case of unit-testing module patches and code, and also for regression testing, the Drupal site must be built with specific versions of core and contrib- module code. There needs to be an automated way for a human tester or an automated testing tool to create a complete Drupal site by simply specifying what code versions he/she/it wants running, using some sort of interface or configuration file. The Drual testing framework SoC proposed project has this as a desired objective.

  2. For performance and scalability testing, a tester needs to vary web, database, and application server configuration parameters. A tester also needs to be able to quickly specify which Drupal modules will be enabled during the testing, well as quickly define a number of test users, and the size and nature of the sample content dataset.

  3. Once a Drupal site specification is created, it has to be staged to a physical server location. Testers need a way to automate the repetitive tasks of staging a Drupal site. Earl Miles (merlinofchaos) has already begun work towards this: http://drupal.org/node/52709

  4. A virtual machine image consisting of a minimal Linux environment, a LAMP stack, and Drupal, is the perfect way for rapidly bootstrapping a Drupal site for standard code testing and also for use in an environment for scalability and high performance testing. Automatic generation of a virtual image given a defined Drupal and LAMP configuration would be the next stage of evolution of Drupal testing environments and provide a number of significant benefits in other areas as well.


The automated staging toolkit is designed to be an intrinsic part of an automated unit- and regression testing environment, by providing testers with a simple, fast way to automatically generate a complete Drupal site running specific code versions, and using specific LAMP server configurations. It is also intended for use as part of a performance and scalability testing environment by providing the ability to rapidly build and then benchmark the effects of different application, web and database server configurations on Drupal site performance and scalability. The ability to quickly generate and stage a complete Drupal site will drastically improve the frequency, scope, depth, and sheer developer appreciation of and enthusiasm for the testing activity. This will significantly improving the quality of the Drupal code base and the maturity of the entire development lifecycle.


Time and resource permitting, an additional stage in the toolkit will also use the generated Drupal site and servers' configuration as input to build a self-contained virtual machine image in Xen, VMWare, or potentially the Amazon EC AMI format. This virtual image can also be be used in testing environments, including advanced performance testing scenarios such as evaluating clustering, distributed database topologies, and alternative storage and computing models like Amazon S3 and Elastic Cloud. The ability to rapidly generate self-contained Drupal virtual machine images will also be extremely valuable to Drupal consultants and solution providers for marketing, prototyping and demonstrating Drupal solutions to potential clients, and large organizations and ASPs like CivicSpace looking to take advantage of the massive benefits of virtualization technology.


III. Implementation details and milestones

This project obviously is rather ambitious and while it would be outstanding if everything I envision could be neatly packaged in version 1.0.0 tarball on midnight August 20th 2007, I've heard that software development projects can sometimes, you know, run late or have proposed functionality cut and scaled back.


I've envisioned the implementation plan for this toolkit as a set of independent stages. The core idea is that completion of each stage produces a project output that is self-contained, provides significant functionality, and can be used on its own. I'm a complete believer in iterative development and early and continuous integration; however the spiral will have to occur within each stage, at least to the point where the output of that stage has the features frozen, design and specification nailed down, and beta quality code released. Therefore if development of later stages cannot be completed with the time and resource constraints of SoC, Drupal will still have tools from the completed stages that are ready be used by the community.


The major implementation stages are:

  1. Develop an XML(or other structured) file schema capable of describing a Drupal site code-tree, including modules installed/enabled/disabled, and the versions of each module to be present, including well-known macros like HEAD etc..

  2. Develop a parser capable of taking a file in the format of (1) and generating a PHP-CLI script which can download, and build the specified site code-tree.

  3. Develop a code library with the functions to be called during execution of the script generated in (2).

  4. Develop a file schema capable of describing the LAMP stack web, database, and application/PHP server and configuration to be used to host the Drupal site.

  5. Develop a parser capable of taking a file in the format of (4) and generating a PHP script which can provision and configure the LAMP stack required for hosting the Drupal site.

  6. Develop a code library with the functions to be called during execution of the script generated in (5).

  7. Develop a file schema, parser,and code library for generating and executing a PHP script to populate a Drupal site with a specified dataset of sample users and content.

  8. Develop a file schema, parser, and code library for generating and executing a PHP script which will deploy the configured Drupal site and LAMP stack configuration to a specified physical server location.

  9. Develop a file schema, parsers, and code library for generating and executing a PHP script which will generate a virtual machine image containing a minimal Linux environment plus the Drupal site and LAMP stack configured and provisioned in the previous stages. The image format can be targeted to different virtualization environments – Xen, VMWare, and so on.


This project can leverage code and know-how from multiple Drupal and PHP projects, and pre-made virtualization appliances like those from Virtual Appliances: http://virtualappliances.net/products/lamp/. Although it appears to have a broad and ambitious scope at first blush, what it really is just a high-level super-glue project: taking existing code, assets, and techniques, and wiring them together for the purpose of automation.


IV. About me

I'm currently reading for the B.A. with a double major in Mathematics and Linguistics at the University of the West Indies (UWI) at St. Augustine, Trinidad. I've worked as a software, web, and database developer in the local IT industry for 5 years, using mostly Microsoft-centric development technology. I did the MCSD certification in 2001 and the MCDBA in 2005. My experience with open-source CMS technology started with Plone, which by virtue of being built on top of the Zope web application framework, still retains pride of place in my heart as the most technically advanced CMS I've ever seen. In UWI I joined up with the Student Activity Center – a student organization which tries to improve the day-to-day life of UWI students – on the IT/dev/web team. The first project I worked on was the new SAC Online site; although I initially wanted Plone, the phenomenal collaboration and community features of Drupal won me over. We initially thought we would just download Drupal, install, pick a theme and be off and running. As is with open source however, we soon found ourselves inside the guts of module code, fixing database scripts, hacking theme templates, searching despondently on forums for causes and solutions to the white screen of death, and rejoicing when we managed to get our stuff working. In the whole process of getting our site off the ground three things happened: I became very impressed with Drupal, its node- and category-centric philosophy, its vast functionality, and popularity. Secondly, I got a lot of insight into how Drupal works and PHP and MySQL applications in general. Thirdly I began to hate the whole ritual of untarring the Drupal code, creating the database and user, setting permissions...which we did nearly every day. I get mad every time I think about how many Drupal web folders and databases named test-xxx we have littered all over our development server. There has to be a better way.


I've been programming since I was 15, and I'm pretty comfortable with software, web and database development. For a project on a platform or technology or language I've never used, I can get up to top speed very quickly. As I got older and got responsibility for leading projects, I've disciplined myself with regard to software development methodologies like test-driven and iterative development, continuous integration; the importance of requirements and feature freezing; pretty much all the software development life-cycle best practices in vogue right now. I think my major strength is the ability to 'zoom out' over an idea, see the conceptual design and implications, and rapidly connect-the-dots as to how different technologies can be integrated into a possible solution. I tend to suck a great deal of information from the Internet pipe on programming topics over a wide array of platforms and technologies, on a daily basis (more than is good for me probably.) My major weakness is my tendency to obsess and nitpick over minor details, and not be willing to compromise on or sacrifice secondary matters in favour of the more important task of getting the entire thing completed and out the door on time.


V. Why I want to be participate in Summer of Code

There are three main things I hope to accomplish as a SoC participant. Firstly, I need to develop the professional focus and discipline required to execute and complete a project where the clients and managers are thousands of miles away and can only yell at you through email. Secondly, I would like the experience of working in a major open source project. Although I might have the technical skills, a certain professional shyness has always held me back from participating in projects with coders who were light-years ahead of me. Thirdly, I really want to raise the awareness of OSS in the county I live – Trinidad and Tobago. Although we have the strongest economy in the Caribbean region, we are still a developing country – this has led to an absurd situation where the public and private sector are pouring tens of millions of dollars into developing IT – but OSS is being ignored because our leaders, managers, and local IT solution providers are locked into business models and ideas from the 1980's. It is incredibly ironic that for a resource constrained country like mine, 99% of our IT industry is Microsoft-centric. At UWI practically all of the server infrastructure, plus all the computers in student labs, staff offices, and libraries, run Windows. If I can at least make people here more aware of what OSS means to developing countries like ours, how technically far ahead the software is over most proprietary stuff, how we do have people with the skills and knowledge to develop OSS solutions, then I would feel I have accomplished something in the spirit of the whole SoC programme.