Evidence Driven cultures in small teams

The Reluctant Tester

or...

Creating a Testing Culture
Evidence Driven Cultures
Everything I learned about Software Development I learned in my Junior High science class.
Measure, Think, Act
The Reluctant Tester
Promoting Proof Driven cultures in small teams

Draft: 2013-08

1. Introduction

2. Creating a Testing Culture

2.1. The Scientific Method

2.1.1. Laboratory to Production

2.2. Assumptions

2.2.1. Software Systems

2.2.2. Members are Motivated

2.3. What is a “Test”

2.3.1. Test Driven Development and Intensive Fast Failure

2.4. Risks of Testing

2.4.1. False Security

2.4.2. Gaming the System

2.4.3. Personal Failure

2.5. Responsibility for Testing

2.6. Key Phrases in Testing

2.7. Cross-cutting Concerns

2.7.1. Organizational Structure Level 0: Putting the Cart Before the Horse

2.7.2. Organizational Structure Level I: Having a Plan

2.7.3. Organizational Maturity Level III: Unity

2.7.4. QA Integration with other departments

3. Implementing Effective Testing

3.1. Documenting Tests

3.2. Information Required

3.2.1. Sources of Information

3.2.1.1. Business Analysts

3.2.1.2. Developers

3.2.1.3. Users

3.3. Automated Testers

3.3.1. Pros/Cons

3.3.1.1. Humans are Creative

3.3.1.2. Some activities are difficult to automate

3.3.1.3. Regression testing is Expensive

3.3.2. Examples of Good Automation Candidates

3.3.3. Examples of Poor Automation Candidates

4. Creating a Testing Team

4.1. Test Driven Development

4.2. Central Test Database

4.3. Central automated Test server

4.4. Designate an automation specialist

4.4.1. Key Points

4.4.2. Risks

4.5. Make system owners responsible for testing

4.5.1. System Health Test

4.5.2. Content Policing

4.5.3. Unit Testing

5. Integrating the Existing Processes

5.1. Test Runs

5.2. Automated Tools

5.2.0.1. CompanyTests Project

5.2.1. Sample Test Run

5.2.1.1. Sample 1: Run smoke tests

5.2.1.2. Sample 2: Run siteSearch

5.2.2. Project Layout

5.2.3. Writing Tests

5.3. Utilities

6. Bibliography

1. Introduction

This paper is a collection of my experiences developing large web based projects with small teams.

2. Creating a Testing Culture

2.1. The Scientific Method

https://www.youtube.com/watch?v=b240PGCMwV0

http://www.mheducation.ca/school/products/9780070726024/mhr+discovering+science+7/

ScienceSkill 1 Organizing and Communicating Scientific Data

http://en.wikipedia.org/wiki/Scientific_method#Elements_of_the_scientific_method

Observe
Hypothesize

use cases
user expectations
nullable

Test

defined process that will demonstrate the expectation
reproducible
the statement has been made, others should be able to reproduce your results: otherwise you are Just Making Shit Up®

Conclusion

pass/fail
did the process fail the test?
did the test fail the process?
Is the expectation realistic?

Peer Review

error reports
once software is released it has many eyes on it

When in

2.1.1. Laboratory to Production

Moving a developed process from the laboratory to the manufacturing plant requires that the high order thinking of the researcher, be broken down to simple steps that can be delegated with minimal training to employees.

2.2. Assumptions

To a certain extent, when investigating processes, we have to assume a certain “happy” example. Rather than dealing with every single problem that exists in the world, we work to solve just one set of problems.

Throughout this document certain assumptions are made regarding the processes and nature of the teams involved. While these assumptions are not necessarily true in any given environment, these assumptions are used to define the scope of this document.

2.2.1. Software Systems

While much of what is discussed is relevant to Quality Control in general, this discussion is based on experiences in Software and Digital systems, and therefore focuses on solutions for those fields.

2.2.2. Members are Motivated

For the most part this is not an issue, people take pride in their work, and are interested in making their company, their product, or their service better.

Unfortunately, this is not 100% true. Some people just don’t care. If the people involved don’t care about the system they are building, there is nothing that can be done to improve the system.

Solving the disinterested employee problem is out of the scope of this discussion. For the purposes of this discussion, I am assuming that the people involved care.

2.3. What is a “Test”

???Section of rambling thoughts???

Tests are a measure of the system’s conformation with user expectations.

This means that testing is a form of specification writing. The generation of a test is not an act unto itself, it is a part of the defining the scope of the system.

Notice that I say “the generation of a test”. Tests aren’t an action that is performed, they are a definition of a set of steps.

In order for the test to be meaningful it has to be consistently reproducible. In many ways it is a laboratory definition of the set of steps to produce a desired result. When we don’t achieve that result we have discovered something interesting. It is the reproducibility of the test that makes it meaningful.

In order for a test to be reproducible, the exact steps to reproduce the results need to be documented. By writing the steps down, you are confident that no steps will be missed. If a step is missed, then the test is invalid because you have not undertaken all the actions necessary to reproduce the results.

This is of huge benefit to the “the New Guy”. Coming in cold to any system is difficult, under these conditions, The New Guy has access to a significant amount of documentation, regardless of what his role is:

Users: acting as a tester will give a very detailed look at all of the usages they are expected to encounter. In this case, it becomes a sort of user manual.

Documentation Author: the tests represent detailed instructions for how to use the system. The new document writer can use the test list as a quick boost to writing the instructions in a distributable manner.

Developer: developers want to look for samples of code when learning a new system. Done correctly, tests act as an in-house library of code samples.

Closely related to New Guy, is Future Self. Tests are a way of communicating with ourselves when designing new features, or redesigning existing ones. By writing the expectations out in a detailed fashion, all parties have a written contract as to what their expectations are. Future Self also has a way to follow up with the rationale that prompted doing something the way it was done in the first place. While the expectations may change, we have clearly defined what they are at this point.

In the end a Test is simply a form of garnering Feedback. The results of tells the developers that Testing is about garnering feedback. The purpose of feedback is so we can understand where we need to take action.

Therefore testing is a guide for action, not the action itself.

2.3.1. Test Driven Development and Intensive Fast Failure

Dr. Jack Matson - Intensive Fast Failure - wrote “Innovate or Die”

2.4. Risks of Testing

As with anything, there are risks associated with testing.

Generally, the objective of Testing is to decrease the probability of bugs being released into our code that is delivered to the customer. More generally, we are trying to make things better.

Unfortunately, as with any process or tool, there are certain risks associated with Testing. These risks can actually increase the probability of problems appearing in the system. Generally, this is due to a misuse and mishandling of what the testing processes are attempting to achieve and what they are capable of.

Once understood, it is possible to mitigate the risks associated with Testing.

2.4.1. False Security

In 1981, studies discovered something interesting: removing a doctor’s mask during surgery actually reduced incidents of post-surgical infection. Several causes for these results have been postulated: that the addition of the protective equipment had lead to the doctors taking larger risks, or that improper use of the protective equipment (due to complacency) had led to an increase in problems.^[1]

Over the years similar situations have been identified, where the addition of safety equipment has led to an increase in incidents. To pull a couple examples from Wikipedia:

Skydiving
Anti-lock Braking Systems

This effect, known as Risk Compensation, is controversial, however there are some standard elements that can be derived:

Increasing the use of safety equipment, increases the risk tolerance. Effectively, with safety equipment you can get closer to “the edge”. This is the positive effect we are looking for.
The perceived capabilities of the equipment can exceed the actual capabilities. In other words, people may come to rely on the safety equipment and become complacent in their behaviour. This is the negative effect we need to watch for.

The balance between the two original examples is what we are striving for. In the case of Seatbelt usage, people tend to behave a little more dangerously, but receive a lot more protection.

In software development, this translates to higher productivity (greater risk taking), with reduced bug reporting (greater protection). The problem becomes how to control the negative side: by putting the safety mechanism in place some people tend to rely on the tool rather than their own good sense. To paraphrase Sanjay Malhotra (an instructor friend), “The safety on a firearm is installed between your ears”.

Testing is a quality control tool which acts as a safety mechanism to prevent Users from receiving a poor customer experience. By placing testers between the development team and the end user, developers can experience a sense of dissociation from the users. While this dissociation often increases productivity, it can also decrease the sense of responsibility. Instead of conducting thorough investigations of the components they produce, they begin to feel that the “safety net” (the testers) will catch the ones that get through, and therefore not worry about them as much. Rather than conducting their own investigation, they will rely on the investigation of the tester to catch their problems for them.

This causes two problems:

The developer is the person that is most familiar with the system and is, therefore, the most likely to be aware of the weaknesses and harmful edge cases; but is not looking at for them.
What was put in place as a redundant secondary system, has become the primary system with no redundancy.

To mitigate this issue, it is important to maintain a sense of responsibility and clear lines of accountability to individuals that are responsible for the quality of the system or its individual components.

2.4.2. Gaming the System

Goodhart’s Law states that “When a measure becomes a target, it ceases to be a good measure”. One of the most evident applications of Goodhart’s law is in the measurement of Gross Domestic Product (GDP). GDP is a measure of the productive wealth of a nation, but performs that measure by looking at the side effects of production. When governments set the objective “increase the nation’s GDP” it will begin to manipulate the side effects (directly within its influence), not the national wealth (not within its influence). The side effects increase, but the wealth will remain the same.

The same effect can be observed in the grocery store. Chickens are sold by weight. Customers demand more food stuff per dollar. Unfortunately, the measure of “foodstuff” is the weight of the food. The objective is therefore transformed from “more foodstuff” to “more weight”. Now that the measure has become the objective, lower cost fillers can be used to increase the mass of the food. In the case of chicken, this is often achieved by injecting water into the meat. Since water is not a foodstuff, the measure of weight has become meaningless.^[2]

This same problem often becomes evident in testing. As testing is simply a measure of the health of a system, people will tend to focus on meeting “measurable goals”. Pass/Fail measures of quality create the ability to use Volume of Pass as a quantitative measure. As people begin to observe the counts of successes, they will tend to view this as both a measure of their own success, and as an objective to be met.

There are fundamental flaws with treating qualitative data in a quantitative manner. Firstly, in an effort to meet objectives, test developers may make their tests easier to pass. In his paper The Joel Test, Joel Spolsky identifies an extreme case of a Microsoft employee writing poor code simply to achieve a goal of “done” (quantity), while not meeting the spirit of “done” (quality).

The story goes that one programmer, who had to write the code to calculate the height of a line of text, simply wrote "return 12;" and waited for the bug report to come in about how his function is not always correct. The schedule was merely a checklist.... In the post-mortem, this was referred to as "infinite defects methodology".

Attention must be paid to not allowing the “score” to matter, or even better, not allow their to be a concept of “score” in the first place.

Team members must always be actively encouraged to create harder tests; pushing the system (and themselves) to become progressively better over time. Rather than a scoring system, testing should be viewed like athletes view their measures. When a weight lifter reaches his goal of 200 kilograms, he redefines the goal by adding 20 kilograms; when a runner can run 10km, she increases the objective to 15km; when a developer gets a long loading report down to 5 seconds, he changes the objective to 3 seconds.

Managers can help foster this environment by never explicitly using the tests as a measure of performance. Rather they should incentivise efforts to increase the thoroughness of testing.

2.4.3. Personal Failure

Recently, I was asked by a developer to come over to his desk to “look at something”. I dropped what I was doing, walked over to his desk, and stood by while he asked “is this working?”.

I was annoyed.

I had reviewed the work once already, and found a minor flaw in the solution. The success/failure criteria were clearly laid out in the ticketing system: either the problem had been fixed, or it hadn’t. There was no need to pull me away from other testing, that had been in my queue for days, to run a test of the system on his computer, all he had to do was compare the results on his screen to what I had already said in writing. Worse, once he submitted the change, I would still be required to perform official, documented, and detailed, testing; all resulting in a longer amount of time being required to achieve the same results.

The worst part, this wasn’t the first time this developer had done this, or the first time other developers had done it. When asked, developers have always had the same answer: “I didn’t want it to fail again”. I find that statement odd since whether I’m sitting at my desk or theirs the test has still failed. However, from the developer's perspective, so long as the failure isn’t documented, it is perceived as less of a failure.

I don’t understand the phenomenon, but whenever you tell a person that there is a problem with their work, they assume you mean you have a problem with them; that they are a failure. This is closely intertwined with Gaming the System, and is probably the root cause. We are taught from a very early age that the scores and grades we receive on tests are what is important about us, not the knowledge we have gained, not the process, just the final score. Therefore, we engage in trying to change our score. This leads to a problem wherein people view a poor score as a reflection of themselves. Scores and measures are synonyms, therefore if you identify a failure, they take it as a personal attack on them.

This must not be allowed to happen. Rather, Success and Failure of tests is about creating dialogue.

A successful run of a test does not mean that the system is behaving well, instead it means that the person who defined the test, and the person that built the system agree as to what the behaviour should be.

It is possible that both are wrong.

Given that both could be wrong, in a success case; it stands that either could be wrong in a disagreement case. Therefore there are three states that can exist:

		Developer
		Pass	Fail
Tester	Pass	Agree-Pass	Disagree
Tester	Fail	Disagree	Agree-Fail

Agree-Fail: This should never occur. If the developer feels it has not met their standards, it should not be leaving their desk.
Agree-Pass: Both measurers (the test and the development) agree that the expectations have been met
Disagree: We don’t know who is correct, but there needs to be further discussion to determine what the customer’s expectations are. This may be cleared up by a quick discussion (oops, missed something), or could result in the entire project having to be redesigned due to a flawed assumption. In either case, the underlying truth has been discovered.

The key is we have found a problem and are dealing with it.

Developers must be aware of this ingrained response we have as humans, and fight it. Failing a test only means that there is a difference of expectations, a misunderstanding. Testing exists to find those misunderstandings.

2.5. Responsibility for Testing

Generally, testing is viewed as a the Quality Assurance person’s job, but that doesn’t quite capture the essence of the relationship. While QA’s primary tool is the test, the tests are the means to the end, rather than an end themselves.

If we look at any development process, developers always run their own code. They always check to see that it is working. Having a second set of eyes on it is important, but simply as part of being asked to solve a problem, they check to see that the problem is solved. This has been formalized, but it is important to recognize that the very definition of undertaking a task dictates a mechanism to determine that you have completed the task: a measure, or a Test.

This fundamental interpretation of taking action, indicates that “Testing” is part of developer’s responsibility. If we have assumed that Testing is the Quality Assurance Department’s responsibility and we have found that it is also the Development Department’s responsibility, we need to ask who else is responsible that we haven’t considered.

The answer is simple: everyone is a Tester, everyone is responsible for Testing.

Everyone has their area of expertise, and their area of expertise will create bias. This combination of bias and expertise means that some individuals are better at finding certain problems than others. We need to harness this range of expertise to maximize quality, while at the same time compensating for one another’s biases. This “Peer Review” process is what keeps us from falling into biased patterns.

Every stage of the development process is responsible to ensure the system is working as expected, and it is everyone’s responsibility to check one another’s work. Well documented tests allow us to verify and validate one another’s results, watching for mistakes that hide in our blind spots.

2.6. Key Phrases in Testing

Regression Testing

Test Driven Development

Test Suite

Test Case

Test Run

2.7. Cross-cutting Concerns

Testing is everybody’s business. Every aspect of production has an interest in seeing that their role is performing as expected. People will conduct tests to ensure that their part of the system is performing as expected.

This is why formalized testing is not just the role of a QA professional, nor should a QA professional be confined to the walls of a “Testing” department.

2.7.1. Organizational Structure Level 0: Putting the Cart Before the Horse

This is the obvious, and usually first choice for management to divide the organizational structure into.

Generally organizations will have the development department submit their finished product to the testing department who will then queue the work for testing. This unfortunately divides the two roles, and creates a gap between them, reducing overall communication and increasing the time lag between communication events when they occur.

The steps taken become very disjointed:

The developer assesses the expectations
The developer implements those expectations
The developer considers it “done” and puts it in Quality Assurance’s inbox for testing
Quality Assurance assesses the expectations
Quality assurance implements validation checks, some of which fail
QA places the work back in Development’s inbox, with the validation checks attached.
Goto Step 1

2.7.2. Organizational Structure Level I: Having a Plan

One obvious solution to the redundant effort noted is to place the test definition prior to the production. By defining the validity checks prior to the development process, developers do not risk developing against faulty expectations. The expectations of success are clearly defined prior to them writing the first line of code.

The drawback to this model is that as development occurs, new ideas get discovered, new ways of looking at things become apparent. Effectively, this is the waterfall method of development where developers are told what they will produce, and have no mechanism for offering feedback, except to put the item right back into the tester’s queue.

2.7.3. Organizational Maturity Level III: Unity

Rather a more vertical and integrated approach should be taken. In the end, we are all on the same team.

With this configuration, the workflow is much tighter and the feedback generated by both QA and the Developers allows them to react to one another’s suggestions immediately, reducing the need for massive adjustments later on.

Not only can the feedback cycle be reduced, experts are better able to learn from one another. When automating a test, who could be a better resource to ask for advice than a developer who sits across from you?

In fact, this is not limited to just the developers and the Testers. As the tests are implemented, feedback from the results may challenge the Business Analyst’s expectation of the anticipated workflow. Why not bring them into the mix? How about Technical Writers? Surely code documentation could benefit from a professional editor and clearly written test summaries can influence the approach taken by a junior tester.

Every member of the team is responsible for every aspect of quality.

2.7.4. QA Integration with other departments

At the very least, it is important for every member to have a basic understanding of the teams work in order to understand how to integrate and work together.

	Tester	Dev	Analyst	Writer	more...
Testing		Writes Automated tests
Dev			Debug Code/Interpret code capabilities
Analysis	Determine Dangerous System Usages	Implementation Decisions		Explain Need to Users
Writing	Defines Tests	Creates Documentation	Writes Need Assesments
more...

System Quality

is the hardware keeping up
is the system available?
things like alarms, monitors
system scans for known issues

Functional Behaviour

does it do what we expect it to do?
regression testing is important - when I change one thing, there is a risk of “bumping” other parts of the system

Business Expectations

are the specs meaningful?
we can test the specs by rewriting them

Standards Conformance

are all of the website’s links valid?
are content authors using using effective keywords
spell-checking

3. Implementing Effective Testing

3.1. Documenting Tests

http://www.psmag.com/science/invisible-gorilla-security-threat-69620/

Eventually we all get tired, bored, lazy. Something gets to us

test case ID
test case description
test step or order of execution number
related requirement(s)
depth
test category
author
check boxes for whether the test can be or has been automated
pass/fail
remarks

3.2. Information Required

3.2.1. Sources of Information

3.2.1.1. Business Analysts

Users generally do not have time to focus on communicating with developers for extended periods of time. They are (hopefully) too busy using the system. To allow users the ability to get on with the work of using the system, user advocates are used that can focus requests and engage with developers on behalf of the users.

The role of the Business Analyst is to gather as much of the required information from the user, in as short a time as possible. Analysts, offer the ability to discuss a vague description by a user, and based on experience, translate a vague request into a tangible description. From an Analyst’s perspective, you know you have done your job right, when the user is having trouble describing their idea, but excitedly exclaims: “Yes, that’s exactly what I meant!”

The Use Cases and User stories generated by the Business Analyst are descriptions of expected behaviour; and descriptions of expected behaviour are testable statements, the key to ongoing quality assurance.

3.2.1.2. Developers

Nobody knows the system better than the people that built it. Developers know the edge cases and the expected behaviour that they put into the system. Who else can do a better job of testing the marginal areas?

The key is that developers are the ones that make the expectations a reality. Therefore they should document what they did. This will allow other stakeholders to review their interpretation for accuracy, completeness, and most importantly sensibleness.

One of the key complaints often heard from developers is that they didn’t receive a complete specification. Generally, the user requesting a new feature, has made a request for a new feature, but has not given the developer sufficient information to complete the task. The lack of information comes from a simple core problem. Anytime, anyone makes a feature request, they have an image in their head of what they want; english is an imprecise language; therefore their description is likely insufficient for a computer to understand. Developers see this problem when they realize that a statement can be interpreted in more than one way.

While the problem is not solvable, it is reducible.

By having developers define their understanding of the expectations, we have a mechanism for communication: the developer redefines the specification to a technical level; if they are unable, there is insufficient information; if their definition doesn’t meet the expectations of the requester, there was insufficient information. This lack of information is not a failure, it is feedback that the specifications were not precise enough.Once the feedback is generated, action can be taken.

As a result, when teams consisting solely of programmers attack a problem, they prefer to express their solution in code, rather than in documents. They would much rather dive in and write code than produce a spec first. … My pet theory is that this problem can be fixed by teaching programmers to be less reluctant writers by sending them off to take an intensive course in writing. Another solution is to hire smart program managers who produce the written spec. In either case, you should enforce the simple rule "no code without spec". (Joel Spolsky)

While I agree with Joel regarding the importance of Specs, I also believe that writing tests is an excellent form of specification definition that the Developers need to be involved in. It is boring and tedious work, but it keeps Developers focussed on the problem. It forces developers to think about what they are attempting to achieve. Therefore it is not optional for Developers to write specifications, it is part of the planning process. All we can hope to do is reduce the pain experienced, a Test Driven approach reduces this pain.^[3]

Whatever approach you take, it is important to recognize that Developers are part of the Testing Team. As the first people to see new features, they are the first line of expectation conformance validation.

3.2.1.3. Users

Error reports from users are an invaluable tool in the process of test definition. Every error report that comes in from users is a test that has been performed on the system and failed.

While we aim to shield our users from errors, we should not lose sight of the value of their input. In his paper “The Cathedral and the Bazaar”, Eric S. Raymond identifies that it is not possible for any organization, with any amount of time or resources, to achieve the level of detailed testing that a brief release in front of actual users can achieve. This culminated in the concept of “Release Early, Release Often”.

While our objective is to achieve zero defects, we can also note that an error report from a user is a test definition we have already paid for (generally with embarrassment that they saw something bad). If an embarrassing test (user finding a bug) has happened, the best thing we can do is learn from our mistake. That is where an effective error report comes in.

A good error report from a user (as guided by support) takes a form very similar to a test with the same focus (https://developer.mozilla.org/en-US/docs/Mozilla/QA/Bug_writing_guidelines):

Objective
Steps to Reproduce
Expected Outcome
Actual Outcome

This format of information collection is nearly identical to the format of information required for testing with only “Steps to Reproduce”

The Test Engineer should take advantage of this, and any process should include a copy of every uniquely identified bug, being sent to the Testers for inclusion in their tests. The users and production support just saved the Test Engineer time, in identifying and documenting the nature of the bug.

To take this a step further, the point to testers is to produce bug reports. Really, another way to define tests is as predefined error reports.

Users, and production support, are part of your testing team. They are the two sides of the same coin, downright conjoined twins.

3.3. Automated Testers

Labour represents the highest cost to any system. When paying for a person’s time, you need to compensate them for massive amounts of resources that are effectively unused (food, clothing, and shelter).

Accuracy is another issue with manual testing procedures. Humans get tired, lazy, biased, complacent, and any number of other things that will cause them to make mistakes. This is especially true when involved in highly redundant tasks: people get bored and start taking shortcuts.

There are two solutions to solving this problem. Firstly, we can clearly outline the tasks to be undertaken by the individual. By defining the process in painstaking detail, we are able to encourage people to take the necessary steps to assure quality. By explicitly listing the steps a tester must take, you reduce the ability for the individual to create excuses for themselves, or to take shortcuts. The level of detail required for this type of documentation leads us to the ability to automate testing.

Automation harnesses all the advantages that computing has offered. The benefits of computers as given in my High School “Computing” class were simple: they never get tired, they never get bored, and they never vary their process. Essentially, computers are far more effective at being meticulous when working through checklists and taking readings.

Naturally, this is contrasted against a human’s ability to be creative, the one weakness of automated testing.

Automated testing systems cannot perform tests outside of their predefined parameters. It takes a human’s creativity to identify new tests. Further, computers cannot determine human desires. At the end of the day, software is built for humans, therefore a certain amount of testing must be done to gain the “human impression”.

3.3.1. Pros/Cons

3.3.1.1. Humans are Creative

This should be used to create tests. High-order employees are hired for their ability to identify the tests that need to be created and defined. The detailed definition of tests is, like Business Analysis, a very thought intensive process. Until a test has been accurately defined, it cannot be trusted to the infinitely stupid box that sits on top of our desks. The creativity of humans is a valuable asset that is required of the test creator.

3.3.1.2. Some activities are difficult to automate

Some tasks are just plain difficult to automate. At some point, the amount of effort required to allow a computer to accurately determine whether something is successful or not far exceeds the time it would take for a human to just sit down and check it themselves.

???Include chart showing cost curves???

3.3.1.3. Regression testing is Expensive

The volume of activity involved in a complete regression test should be phenomenal (if you are doing it thoroughly). If the organization has done an effective job of defining the boundaries and operations of their system, there should be a lot of tests to perform. This means that it is basically impossible for a complete regression test to be performed without a large number of testers.

Automation overcomes this problem in a number of ways. Firstly, computers can perform tests faster than humans; also, computers can work 24 hours a day. This means that rather than a complete regression test requiring months to complete, the effort can be reduced to hours, or even minutes. Also, as the deadline approaches, the computer does not feel the pressure to start taking shortcuts to meet forced deadlines. This ensures that the readings taken from the system are accurate and represent reality.

3.3.2. Examples of Good Automation Candidates

Latency Scan

Heartbeat Monitor/Deadman’s Switch

Simple CRUD Operations

3.3.3. Examples of Poor Automation Candidates

Poorly understood or defined requirements

- the objective should be to have humans test it in an effort to determine what they are doing

- having a human work through this a few times will likely help define the requirements

User Acceptance Testing

- computers don’t have “impressions” or “feelings”

- won’t pick up on a “font just feels too small” based on context

- computers can’t test “Not Look Stupid” (my favourite test)

4. Creating a Testing Team

Based on my experience working with small teams, and observing the existing processes, and existing culture, I hope to offer suggestions for proceeding that will have a large impact with minimal change. The processes I suggest have been drawn from experience at multiple organizations, and formalized processes in literature (Test Driven Development), but have been tailored as much as possible to the cultural environment, and normal processes found among small teams.

Primarily these suggestions revolve around formalizing processes that usually already exist, and are very close to testing best practices.

4.1. Test Driven Development

The act of writing a unit test is more an act of design than of verification. It is also more an act of documentation than of verification. The act of writing a unit test closes a remarkable number of feedback loops, the least of which is the one pertaining to verification of function (Robert C. Martin)

In the context of small teams and in order to facilitate change and adoption, I would recommend that TDD-like practices be implemented rather than attempting to implement a formal TDD process system; allowing for the a natural adoption of greater TDD practices at a later date. Vast volumes of text have been written on Test Driven Development and all of it should be read, but adoption and understanding takes time and part of the point is to make change in small, consumable, increments rather than large, overwhelming, ones. In this sense, it is better to take some small steps and alow them to grow over time.

In general, Developers should write tests as they develop code. This gives them a target to shoot for in their development, as well as integrating the regression test development into their process.

Rather than developing all of the tests right from the get-go, the first step would be to create a list of test stubs representing the list of features that are to be developed against. This list then becomes the development objectives for the development request.

Sample Test Stub

/**

* <h1>New Feature requested by User</h1>

* <p>

* Complete description based on user requirements

* (see the “Process” section of this document)

* </p>

@Test

public void testNewFeature(){

Assert.fail("Not yet implemented");

}

This creates a list of tests that need to be fleshed out, and all fail. Now the developer has a clear list of unmet objectives that need to be fulfilled before closing the ticket. As the developer completes the tasks, they should fill in the tests to a point that they give a success signal. The task is not considered complete until all tests pass.

This mechanism of writing a list of objectives creates a rapid feedback mechanism. Since the developer is forced to think out the entire task prior to even investigating the problem, they are better able to focus their efforts at solving the problem.

True Test Driven Development (TDD) has strict forms that are to be adhered to, but these strict forms are a function of creating a theoretic framework. The theoretic framework is supposed to model a dynamic and agile environment. Here I am going to describe what I see as some best practices based on a foundation of TDD. I have yet to meet a team willing to undertake the discipline required for true TDD right out of the box, therefore this is TDDlite. For a discussion of true TDD practices, I recommend Introduction to Test Driven Development by Scott Wambler.

4.2. Central Test Database

It is highly advisable that organizations maintain a central database of tests. These tests would form a regression library, documenting the knowledge acquired by the organization over time. Outside of the standard documentation benefits (training, risk assessment, etc.), this has several key benefits:

More thorough testing over time
Preservation of knowledge
Training Tool

Over time, as the expectations of users will be forgotten. This can occur because people come and go, or simply because people forget over time. There is a very low probability that anyone is going to remember that there was a key requirement of there being a minimum font size of 26px on a component 1 year after the initial development was complete. By putting that requirement in a standard checklist that people have to validate against, everytime they modify the component, it will act as a reminder when someone has to make a change in the future. This puts an end to people not understanding the full ramifications of changed specifications.

On the flip side, when a developer sees an odd piece of code, they may be hesitant to undo it, thinking there must have been a good reason to have done that. This level of documentation puts an end to the fear of making changes because “It must have done that for a good reason”. You know the reason, it is in written in the test, and you can confidently proceed to change the specification if required.

By putting the documentation in a single location, shared among all stakeholders, you significantly reduce the probability that any changes to the system will not be replicated through to other stakeholders. The maintenance of a list of bug reports, test reports, and specification reports means that changes to one set of documentation is not replicated through to the other datasets. A Central Test Database reduces this by becoming the authoritative data repository: everyone is required to keep the central repository up to date. This means that changes get replicated out to everybody immediately.

Some of this work has already been done.

Currently, the central repository of data is the “CompTests” project, consisting of automated tests and written documentation. The written documentation serves as the shared knowledge base of tests for everyone on the project. See “Writing an Automated Test” for more information on creating documentation.

This is suboptimal as it requires knowledge of HTML and looks very “code-like” when attempting to update the tests. A better solution would be to maintain a web application specifically designed for maintaining test libraries. These offer the benefits of enforcing complete documentation, and consistent formatting, and offering the capability of distributing testing efforts throughout the organization. Efforts were made to configure MozTrap on QABot (dhines.bpio.example.ca), unfortunately this was left incomplete due to time constraints. They system is in a functional state and does have tests in it, unfortunately, there is no distributed knowledge of how to manage the test database, or the communication with external testers.

4.3. Central automated Test server

Primarily this is to keep the running of tests from interfering with ongoing work people that people are doing
If their computer is busy running tests, it interferes with their work
Some tests are general in nature (not the unit tests)

these tests can be delegated to central Test Server

Regression scans are also a good candidate
RISK: developers see this as “someone else’s responsibility”

have already seen this
desire for “I write code, someone else does the testing work”
mentality arises from testing being viewed as drudgery
a separate system would reinforce this mentality
public shaming is the general practice to manage this
developers that check in code that cause a problem are publicly held accountable, this encourages developers to check their own work, before checking in. (this should be mild and short, not ongoing shaming)

4.4. Designate an automation specialist

An Automation Specialist would be a person that sets the standards by which the tests will be maintained.

Any time a process like this needs to be maintained in an ongoing fashion, there is a large amount of individual discretion that goes into the day to day operations of the system. While this is healthy and allows the system to evolve as needed, there needs to be one individual that is considered an expert, and can act as arbitrator when there is disagreement as to the exact details of implementation.

This individual should be the person that questions should be directed to. While it is important that everyone learn to write tests, having a single individual considered the expert forces them to accumulate knowledge as people as them questions. They become a single repository for all knowledge, while some people have some of the picture, having many of the questions filtered through one person makes that person the big picture of the problems. This further allows them to make better judgement calls in the case of disagreement.

This role is similar to and has overlapping responsibilities with the Code Auditor role, in that Automated Tests are a form of code, and should conform with the overall quality practices of the organization. However, it is also distinct from the Code Auditor role as it is more focussed on how to implement code as applicable to a testing framework. If possible, these responsibilities should be shifted to the Quality Auditor role, as maintaining the automated tests are more concerned with the descriptiveness of the tests with an eye to conformance.

This role has a pre-requisite of having a working knowledge of programming, this allows the individual to engage in becoming an expert in this specialized style of programming.

4.4.1. Key Points

The person that fills this role should have development experience, and should be familiar with Devlopment guildelines.
In particular, Jon Hartling (as the “Code Cop” or “Code Auditor” role) could fill this position. Alternately, the Quality Auditor could also fill this role as this is strongly related to the QA position. Likely this role should go to QA to shift workload away from CA, though this will not work if the QA has low development experience, or not available.
the specialist is only to act as the expert that can answer questions, not write other people’s tests
This role’s word is law in their domain: on process disagreement, code quality, style guidelines. This domain overlaps with the Code Cop’s domain.

4.4.2. Risks

The primary risk associated with this is that it will define a Testing role again. Testing is the responsibility of all individuals on the team, and creating a “specialist” creates a role that is equal with “the person that does that stuff”. In order to maintain the role of “specialist” it is important to reinforce that the role is the person that maintains the quality of the tests. The individual in the role must refuse to write tests for other people (they should continue to write their own), rather offering advice and education to people with questions. The first time they offer to do it on behalf of another, they have sacrificed their role to being “the person that writes the tests”.

4.5. Make system owners responsible for testing

creates a sense of responsibility
remove sense of “someone else’s problem”

This actually refers to two specific actions: Author Testing, and Web Team Testing.

4.5.1. System Health Test

already begun

4.5.2. Content Policing

C&M has certain rules they should adhere to
possible to regularly scan the system for rules
examples:

full links vs relative links
spelling errors
links are valid

4.5.3. Unit Testing

performed by developers
use TDD principles
already begun

5. Integrating the Existing Processes

Writing tests is part of the overall development cycle, and therefore begins with a good feature request or bug report:

User Issue Report

Title:

5.1. Test Runs

Performing a Test Run is as simple as running the pertinent automated tests from the test project, and following up with any non-passes that may occur.

“Following up” would be a matter of watching for anything that is not a “Pass”. This could be either a fail, or an Manual Test. The automated tests offer a way to eliminate sections of testing rapidly (no further work is required on automated passes). Basically, the tests should offer suggestions for further action

That’s it. You’re done.

5.2. Automated Tools

One of the first things I do when coming to any company is to implement an automated test framework for the consistent development of Unit Tests.

5.2.1. CompanyTests Project

The project should be self-contained to allow for maximum portability between all developers, testers, and anyone else we can convince to write tests. That means it should be a simple matter of checking the project out of SVN and opening it with Eclipse.

5.2.2. Sample Test Run

The simplest way to verify that your environment is configured correctly is to run a couple of tests. There are two types of tests stored in the project: system scans, and unit tests. Since the purpose differs, the way they will typically be used is different. It is a good idea to run through each of the two types of runs.

5.2.2.1. Sample 1: Run smoke tests

Tests can be grouped through the use of XML configuration files that select specific tests to run, and pass parameters into the system.

The Smoke Tests^[4] push information into various parts of the system and check for problems. They are not a detailed check but scan large portions of the system.

Open: CompanyTests > suites > smoke.xml
Check configuration parameters

SampleSize: 1.1
BaseURL: http://www-staging.cms.example.ca

In Package Explorer (left hand menu), right click on “smoke.xml”
Select “Debug As” > TestNG Suite

In the console you should see “[TestNG] Running:”. Next to the console tab, there is a “Results of running suite”, this should have a list of tests that have been run, are running, and have failed.

5.2.2.2. Sample 2: Run siteSearch

When working on a particular component, the developer should keep a copy of the tests open, as they make changes to the system, they can run the tests associated with the changes they are making to ensure the test and the code actually agree with one another.

Navigate: CompanyTests > src > ca.company.web.test.components.company > sitesearch.java > testSearchWroking
Right Click on “testSearchWorking”
Select “Debug As” > TestNG Suite

5.2.3. Project Layout

There are four main sections to the test packages:

+	Components	ca.company.web.tests.components.*
-	Templates	ca.company.web.tests.templates.*
+	System	ca.company.web.tests.crawlers.*
+	Utilities	ca.company.web.tests.*

Each of these represents a “testable” area, with the exception of the Utilities which are helper classes used to manage the tests in general.

Within each of these areas, the project is laid out to mirror the development project. In particular, for each component path in Dev, there is a corresponding component path in CompanyTests. This allows a developer to easily switch between the two code bases.

5.2.4. Writing Tests

Once a test has been documented, there is certainly an opportunity to automate this test. Automation is faster and more reliable, so we should attempt to automate every test.

Before writing the actual test steps, it is necessary to get a browser instance. Locally, we have a helper routine for getting browser instances (BrowserPool). BrowserPool and the drivers it returns are sensitive to setup and teardown requirements and will take care of much of the setup and releasing of the browser instances.

It is also best to set up any constants we are going to use right away. This is to match the pattern found in the documentation.

search - the text we are going to search for
suggest - the text we expect to get back
pageUrl - the URL of the page we are going to use for testing.^[5]

@Test(testName="rm1242")

public void testSearchSpellCheck() {

final String search = "webteam";

final String suggest = "web team";

final String pageUrl = baseUrl + this.pageUrl + "?q=" + search + "&stype=main";

try(BrowserPooledDriver pooleddriver = browsers.borrowObject()){

WebDriver driver = pooleddriver.getDriver();

// actual test steps go here

}

At this point, we are ready to actually automate the test.

For this, the documentation offers excellent pseudocode. The step-by-step procedure has been tested by humans running it, and is very detailed. Really, it is code for than runs on a “MeatBag Processor Unit”.

try(BrowserPooledDriver pooleddriver = browsers.borrowObject()){

WebDriver driver = pooleddriver.getDriver();

// STEP 1: GOTO: http://example.ca/search.html?q=webteam

// RESULT: search suggests "Did you mean: web team"</li>

// RESULT: "web team" should be a link

}

At this point, I would strongly suggest running the code. This would be a really convenient time to find out that you have made a mistake in the administrative code. Much more convenient than after you have put a bunch of test code in place.

To run the code:

Find the method in the “Package Explorer” or “Outline” view of Eclipse.
Right-click on the test
Select “Debug As” -> TestNG

You should see it kick in and run the test. Hopefully you get a green icon!

Now that we have the baseline in place, we can move on to actually automating the test steps. Our description can easily be translated into automated steps (all one of them), and the checks for results we are expecting.^[6]

try(BrowserPooledDriver pooleddriver = browsers.borrowObject()){

WebDriver driver = pooleddriver.getDriver();

// STEP 1: GOTO: http://example.ca/search.html?q=webteam

driver.get(pageUrl);

// RESULT: search suggests "Did you mean: web team"</li>

elem = driver.findElement(By.xpath("//div[@class='spelling']/a"));

Assert.assertNotNull(elem, "Presence of new search link");

// RESULT: "web team" should be a link

elem = driver.findElement(By.xpath("//div[@class='spelling']/a"));

Assert.assertEquals(elem,suggest,"Spelling suggestion made");

}

We use the Selenium Webdriver to operate a web browser instance. In our example above, we can see the driver does a “findElement” call, this looks up the webpage element (any HTML tag). For a parameter, you need to call one of the methods of the “By” class, which is how we address the element we are attempting to look up. In this case, we are attempting to find an “anchor” tag, by its xpath.

It is worth noting that Selenium is not the only way to gather data about our environment. In strict UnitTesting, we would be directly calling the methods on each class we write in our library. In the case we are checking header information, Selenium can’t help, we need to turn to other HTTP communication mechanisms. Perhaps we want to test a terminal application, we could use a command streamer to act as our interface.

Once we have obtained the element we can run some Assertions on it. Assertions are the falsifiable statements about the data we have gathered. TestNG gives us an Assert class that contains several assert methods^[7] that allow us to make different kinds of statements about the data. Every assertion should have a textual description associated with it. In the first case, we are going to assert that we found the element at all (did not get null); the second case checks to see that it contains the text we are anticipating.

Now would be a good time to run the test again.

Assuming you got a green-light from the test, it is worth taking a moment to look back on what we now have.

Automated Test - obviously we have created an automated test. Faster and more reliable.
Manual Test - using JavaDoc, we are able to maintain steps for the test to be performed and reviewed by non-technical individuals. If automation were not available, or not possible, we still have a test plan.
Regression Library - we have a complete regression library. This library can be used to review tests over time. When a major release is performed, we will have a list of known issues that need to be checked against. Stakeholders have a listing of everything that is being checked and can highlight areas that are being overlooked. This also gives the foundation for effort estimation as we can identify how long tests take to run and compare that to the number of tests required for a given run.
Complete Specifications - The tests in a regression library offer a complete description of the specifications of the system. If the specifications need to change, there is a source of information for understanding what the original functionality was, and what ramifications may be present.

5.3. Utilities

BrowserPool

BrowserList

PageSampler

6. Bibliography

Cathedral and the Bazaar

http://www.catb.org/esr/writings/homesteading/cathedral-bazaar/

Good Bug Reports

http://www.chiark.greenend.org.uk/~sgtatham/bugs.html

Mozilla Bug Report Guidelines

https://developer.mozilla.org/en-US/docs/Mozilla/QA/Bug_writing_guidelines

Goodhart’s Law (measurement/objectives)

http://en.wikipedia.org/wiki/Goodhart's_law

Guessing the Teachers Password & Goodhart’s Law

http://lesswrong.com/lw/1ws/the_importance_of_goodharts_law/

http://lesswrong.com/lw/iq/guessing_the_teachers_password/

Selenium Web Driver

http://docs.seleniumhq.org/docs/

TestNG Test Framework

http://testng.org/doc/index.html

MozTrap

https://wiki.documentfoundation.org/MozTrap

Test Driven Development

http://www.agiledata.org/essays/tdd.html

Page of

[1] The specific study was performed by Orr in 1981, and has since been demonstrated to have been flawed. Rather, later studies have shown there to be no risk or benefit to wearing, or not wearing, a mask in the operating theatre. To this day, the debate continues.

[2] Other descriptions of this phenomenon can be found on LessWrong.com in the articles “The Importance of Goodhart’s Law” and “Guessing the Teacher’s Password”; as well as Technology Review’s The Dictatorship of Data; School-Teacher Cheating

[3] This appears to be the general consensus among TDD proponents such as Wambler

[4] The concept of Smoke Tests comes from plumbing. Plumbers used to use smoke bombs to find leaks in pipes. As the pipe filled with smoke they could walk the line and check for leaks. There is a reference to Plumber's Bombs in the Sherlock Holmes series where Watson uses one fake a fire, scaring everyone out of a house, allowing them to sneak in and search the place.

[5] For this example, pageURL is being defined in the function, however it may be more appropriate to set for the entire class using a @BeforeClass routine.

[6] Each test falls into two parts: gather data, and make an assertion about it. Generally these are done on one line, but I have a preference for doing it on two lines. This allows me to step through the code (when there is a problem) and inspect the gathered data before I make any assertions.

[7] While TestNG is the best of the Java family of test frameworks, I have to compliment Nunit for having far better reporting capabilities built into it. Not only do they have a wide variety of Assertions, they also have more states that can be returned from the system.