Testing Testers
Testing Testers
Things to Consider when Measuring Performance
You don't wait until the day before a software release to test the product. Testing software is a complex process, involving systematic investigation and sustained observation. In this week's column, James Bach argues that evaluating testers is similarly complex. And it shouldn't be put off until the night before the tester's performance review.
I was at the Sixth Software Test Managers Roundtable meeting recently, discussing ways to measure software testers. This is a difficult problem. Count bug reports? Nearly meaningless. Even if it meant something, it would have terrible side effects, once testers suspect they are being measured in that way. It's easy to increase the number of bugs you report without increasing the quality of your work. Count test cases? Voila, you'll get more test cases, but you won't necessarily get more or better testing. What incentive would there be to do testing that isn't easily reduced to test cases, if only test cases are rewarded? Why create complex test cases, when it's easier to create a large number of simple ones?
Partway through the meeting, it dawned on me that measuring testers is like measuring software. We can test for problems or experience what the product can do, but no one knows how to quantify the quality of software, in all its many dimensions, in a meaningful way. Even so, that doesn't stop us from making a meaningful assessment of software quality. Maybe we can apply the same ideas to assessing the quality of testers.
Here are some ideas about that:
To test something, I have to know something about what it can do.
I used to think of testers in terms of a set of requirements that all testers should meet. But then I discovered I was missing out on other things testers might have to offer, while blaming them for not meeting my Apollonian ideal of a software testing professional. These days, when I watch testers and coach them, I look for any special talents or ambitions they may have, and I think about how the project could benefit from them. In other words, I don't have highly specific requirements. I use general requirements that are rooted in the mission the team must fulfill, and take into account the talents already present on the team. If I already have an automation expert, I may not need another one. If I already have a great bug writer who can review and edit the work of the others, I might not need everyone to be great at writing up bugs.
"Expected results" are not always easy to define.
Let's say two testers test the same thing, and they both find the same two bugs. One tester does that work in half the time of the other tester. Who is the better tester? Without more information, I couldn't say. Maybe the tester who took longer was doing more careful testing. Or maybe the tester who finished sooner was more productive. Even if I sit there and observe each one, it may not be easy to tell which is the better tester. I'm not sure what my expectation should be. What I do instead is to make my observations and do my best to make sense of them, weaving them into a coherent picture of how each tester performs. "Making sense of observations" is a much richer concept (and, I think, more useful) than "comparing to expected results."
When I find a problem, I suspend judgment and investigate before making a report.
When I see a product fail, especially if it's a dramatic failure, I've learned to pause and consider my data. Is it reliable? Might there be problems in the test platform or setup that could cause something that looks like a product failure, even though it isn't? When the product is a tester, this pause to consider is even more important, because the "product" is its own programmer. I may see behavior that looks like poor performance, when in fact the tester is doing what he thought I wanted him to do.
Sometimes, when one problem is fixed, more are created.
Whenever testers try to improve one aspect of their work, other aspects may temporarily suffer. For instance, doing more and better bug investigation for some problems may increase the chance that other problems will be missed entirely. This performance fluctuation is a normal part of self-improvement, but it can take a test manager by surprise. Just remember that testing, like any software program, is an interconnected set of activities. Any part of it may affect any other part. Overall improvement is an unfolding process that doesn't always unfold in a straight line.
Something may work well in one environment, and crash in another.
A tester may perform well with one technology, or with one group of people, yet flounder in others. This can lead those of us who spend a long time in one company to have an inflated view of our general expertise. Watch out for this. An antidote may be to attend a testing conference once in a while, or participate in a tester discussion group, either live or online.
Problems and capabilities are not necessarily obvious and visible.
As with testing a software product, I won't know much about it just by dabbling with the user interface or viewing a canned demonstration. I know that to test a product I must test it systematically, and the same goes when I'm evaluating a tester. This means sustained observation in a variety of circumstances. I learned long ago that I can't judge a tester from a job interview alone. All I can do is make an educated guess. Where I really learn about a tester is when I'm testing the same thing he's testing, working right next to him.
Testers are not mere software products, but I find that the parallel between complex humans and complex software helps me let go of the desire for simple measures that will tell me how good a tester is. When I manage testers, I collect information every day. I collect it from a variety of sources: bug reports, documentation, first-hand observation, or second-hand reports, to name some. About once a week, I take mental stock of what I think I know about each tester I'm working with, triage the "bugs" I think I see, and find something that's good or recently improved about each tester's work. It's a continuous process, just like real testing-not something that works as well when pushed to the last minute before writing a performance review.



Comments
#1 Submitted by Jon Bach on Fri, 07/04/2003 - 10:12pm.
As I interview testers, I find myself using the "Good Enough" software criteria to get a sense of how testers would fit in with our team. For example, I watch them test an application for 20 minutes, and I evaluate whether they have sufficient benefits, no critical weaknesses, whether the benefits meaningfully outweigh any weaknesses I saw, and whether further scrutiny of them is worth time away from my own billing work.The GE principles still apply even after they're hired. Since the company I work for does outsource testing, the criteria also helps me evaluate context -- whether a tester is right for this project, at this time, and for this client.
#2 Submitted by Chris Dwan on Thu, 04/10/2003 - 12:30am.
Excellent article. I don't have anything terribly intellectual to say about it but just had to post something to counter the last comment. People are more than knowledge. There are tons of people out there who seem to know everything but can apply nothing. There has to be integrity, determination, vision and a million other things beyond knowledge. You made excellent points that are useful for opening eyes to take a look at the reality of things. You take in the human element quite nicely too, seeing that each tester is different and can't be boiled down to numbers and measures.
#3 Submitted by Rene’ Jones on Tue, 06/15/2004 - 9:19pm.
How much should we test based on a modification that was built. If the mod took 100 hours what percentage of time should I test. 20%, 50%, 120% etc. If someone could shed some light on this I would appreciate it. I can be reached at info@logisticsociety.com
#4 Submitted by Owen Gerard on Fri, 06/25/2004 - 2:43pm.
James, it may be worthwhile taking your analysis a step further and analysing the different traits that appear between the various types of test divisions. By this I mean that if an employer is looking to recruit a UAT tester, they will be looking for a slightly different tester to one required for System testing.
#5 Submitted by Tanya Paschke on Wed, 06/23/2004 - 1:25pm.
I agree 100% that it is difficult to determine the value of a tester. In my experience, it is difficult to rate one tester within a team of them, but much easier to rate the team as a whole. As you stated in the article, a test team needs to have a good blend of different skills. Where one person excels, another does not have to. The most effective test teams I have worked with have been able to find the "right" blend.Within environments with a small quality team (1 - 3 people), I have found that often times, the most "effective" tester(s) are those who have earned the respect and trust of their developers and fellow team members. This is NOT because they necessarily find the most bugs or have the most thorough test cases; however, they usually have the most influence to get changes made. I have been in several situations where one team member has demanded something - they received no or reluctant responses while another has requested the same something only to receive positive responses and immediate action. Since testing requires so many different skills - technical, business, and soft - it is extremely difficult to evaluate their performance. Which is more important? That all depends on the environment in which one is working.
#6 Submitted by Khula Azmi on Mon, 06/21/2004 - 8:25am.
Unfortunately, the productivity of a tester is sometimes considered directly propotinaled to the number of bugs he reports. Way of thinking can't be changed in a day. Anyway thanks alot for this great article :).
#7 Submitted by Costas Conner on Fri, 06/21/2002 - 7:27pm.
In not one sentence of the article, or the responses, was the word REQUIREMENT(S) uttered. How can we test the quality of a software product, if we don't know the requirements in which it's built or based on? Well my friends, same goes for a tester. There are certain job requirements that must be laid out. From there on the actual results would be measured against the expected (requirements). Anything beyond or below would be a defect, or an enhancement, and be dealt with accordingly.And who test the manager who's testing the tester? What skills does the manager have, and who hired/reviews their skills? Really the process could go on and on.
#8 Submitted by Erik Petersen on Fri, 06/21/2002 - 12:05am.
No one has mentioned test management tools yet. They have very powerful information filtering, and you can look at how a tester has structured their tests, how they've run them, the type of bugs they found etc. This is a good indicative measure of strengths and weaknesses, if you have not worked on the project with them.I'm also amazed at some testers I have met. Brilliant testers but terrible people skills, but there are many programmers like that, so they can do good work if we keep them in a backroom, and try to boost their diplomatic skills ;)
#9 Submitted by Shazeeda Hack on Thu, 06/20/2002 - 6:10pm.
A good tester is a person who was born with the talent. I don't think anyone can really test a tester!
#10 Submitted by Joe Anziano on Thu, 06/20/2002 - 12:10am.
James - a good article and an important subject. Two tools we are using to evaluate testers/test effort, I learned from your course. The first is "pair testing": when a new tester arrives on a project, he/she is immediately paired with an experienced tester - very effective when the match is right. The second is Session-Based Testing: having implemented SBT for tracking our Exploratory testing, each test session is reviewed by the team lead. This gives an opportunity to review and guide testing of each tester on the project as well as allowing "cross-pollination" across the project. At the same time, quality of tester effectiveness and focus can be evaluated and guided in almost real-time. Any deficiencies or side-tracks are caught very early on and there are no surprises down the line. Again, a VERY EFFECTIVE technique. Test on!
#11 Submitted by amy peck on Wed, 06/19/2002 - 8:41pm.
"These days, when I watch testers and coach them, I look for any special talents or ambitions they may have, and I think about how the project could benefit from them." Hmmm ... and sometimes James, things work out very well for all of us.
#12 Submitted by Stan Greene on Wed, 06/19/2002 - 5:50pm.
I believe the best way to measure someone's performance is to evaluate them using multiple dimensions. I came up with 5 areas to measure my people. They are Test Methodology Skills, Software Development Process Knowledge, Product Knowledge, Tools/Technology Skills, and Interpersonal Skills. I evaluate their skill/knowledge level in each of these areas and I put together a development plan based on my assessment. We meet on a regular basis to discuss their plan and their progress. In addition they have specific project goals related to test planning, execution and reporting. In this way I get a clearer understanding about a person's strengths and weaknesses and then I can tailor the plan to help them improve in the areas were they are the weakest.
#13 Submitted by Gena O'Flaherty on Wed, 06/19/2002 - 5:35pm.
I think we should go one step further and start evaluating testers during the interview. When I conduct interviews, I make them prove their testing skills by giving them 1 hour to find as many bugs as possible. Additionally, I give a 1 hour SQL test to see where their level of SQL lies (for database integrity testing). This has proven to be a very effective weeding-out tool. Too many times, testers will say all the right buzz words in the interview and then attempt to hide their inadequacies on the job. Since I have implemented this interview tool, I have formed a fantastic team of 6 who all have strong testing and analytical skills. As a bonus, each person is an expert in an area where the other 5 are not (one is a automation guru, one a database guru, one an IT guru, etc.). To help motivate and keep this fabulous group producing great quality work, we have a weekly Learning Series session where each person teaches the group about something they are an expect at (SQL on one week, automation tricks on another week). By providing them with the tools and knowledge and SUPPORT they need to do a great job, it becomes very easy to evaluate them.
#14 Submitted by Alan Jorgensen on Wed, 06/19/2002 - 2:14pm.
"To test something, I have to know something about what it can do." Once an exploratory tester, always an exploratory tester. You point out the importance of noting strengths, but in testing, we look for weaknesses, so if you are going to liken software testing to testing testers (which I should suppose are the same at some abstract level), shouldn't we be looking for weaknesses in our testers? In the same way that you identify "automation" capability, wouldn't your look at your whole product (excuse me, "team") and say "We seem to be weak in automation, or recording, or planning, or ..." In testing, we bore in on these areas, so, too, in management?
#15 Submitted by Annemarie Martin on Wed, 06/19/2002 - 1:33pm.
I agree with everyone that this is a great article to get you thinking about ways to not only 'test your testers', but to analyze your OWN testing. I especially liked this point: "Whenever testers try to improve one aspect of their work, other aspects may temporarily suffer. " It is important, I think, to realize that as with any other aspect of Application Development, a heavy focus in one area can put another area at risk. So if you're managing a team, and ask particular testers to improve upon certain areas or skillsets, you have to be prepared to have someone pick up the slack in other areas if it's needed. I sometimes like to have different people focusing on improving in different areas, so that everything is being covered as well as possible. In response to one poster, I don't think the article intended to provide specific means methods or metrics to evaluate testers, but rather some general things to keep in mind that are outside of what people usually consider - too often, we focus on how many defects people find, or their testing artifacts, and I think James was attempting to point out other things that should be considered when you're evaluating anyone - including yourself.
#16 Submitted by Mike Klein on Wed, 06/19/2002 - 1:23pm.
One thing in this that needs to be emphasized is that any sort of annual review should not contain any surprises to the tester (OK maybe a "happy" surprise would be fine). In the same way 'saving' a bug to the last minute in some misguided way to emphasize my importance, it is equally bad to use the review process as the only time to bring up protential areas of improvement and correction. Also, (to continue the software testing analogy) it's much better to catch things earlier than later.
#17 Submitted by Paul Hepplestall on Wed, 06/19/2002 - 12:37pm.
This is a worthwile article to bring to attention a basic fundermental question on how to measure a testers performance. However while there are some useful insights the question still remains unanswered, and it would appear that the author is suggesting that there may be no systematic approach available that will provide an adequate method to determine the effectiveness of testers.
#18 Submitted by maurice siteur on Wed, 06/19/2002 - 8:23am.
Analogy with testing itself is fun to read.The conclusion is obvious.Talking to people, looking the way they behave, looking at their work, joining reviews; all will help.
#19 Submitted by Darryl Hurmi on Tue, 06/18/2002 - 4:26pm.
I think that this article does highlight the basic problem in objective evaluation of tester performance. The observation method never worked well for me because I had 20 testers and spent half my time looking for more. Also, as a test manager what you are looking for in a tester may not fit with what is needed to improve the quality of the software, I know this from experience. What I judge my testers on was what kind of an impact they made on the products they were testing. I did this by talking with all of the development teams to see which testers were seen as contributers and which were not. this was tempered based on the experience of the developers, the less experienced the team the easier the praise came. I also looked at the work they did to see that it meet our standards. Good testers though will make an impact on their development group and a poor tester will not. If you look at why you can see the weeknesses of the poorer tester and help them improve.
#20 Submitted by Paul Petty on Tue, 06/18/2002 - 3:41pm.
That is a great article and get's one to start thinking about individuals. The comments about having to retest other testers bug reports is good. You really can't judge another person until you have "...walked in their shoes...". Looking at the other testers bug reports and following thier steps, etc, to reproduce the same problem in the product will definately help in knowing a bit about that person. Testing and testers are a subjective thing and getting to know the testers, their individual talents, strengths, and weaknesses as far as testing goes will help you as a manager to make a better and more informed decision about them. Be kind to them and don't judge them to harshley for things they may not do like you would like them to. You have to, as a manager, help them to improve. If you don't, you might not discover that if that tester had some training in that area, they might be actually the best or better than you know. Give them every opportunity to improve and reach for the goals that will help them to "...be the best they can be...". A manager's job is to help other testers to grow and improve in what that person would like to do. If the tester decides they don't want to improve or grow, then you can tell they really aren't interested and then you can make a better judgement based on attitude and personal experience with that person. The mechanics of testing are not as important as the individual themselves and their attitude, goals, and desire.
#21 Submitted by John Daughety on Tue, 06/18/2002 - 1:50pm.
It is nice to have someone show so clearly how hard it is to evaluate a tester's contribution - thanks James! In one job as manager of a test team, not only did I have to struggle with the typical issues, I had to put 5% of my team in the "superior" category, 5% of my team in the "inferior" category, etc. Luckily I was in Marketing before testing, so I knew the skills of "spinning" a story. It sure didn't feel good, though.On a positive note, though, I have discovered one way to judge a tester's skills that seems very effective. Retest 10 or more of his/her bugs. I had to do this recently when our company laid off 5 of its 6 testers, and I had over 300 bugs to retest. I learned so much about the other testers during this exercise: how well they document exactly what they did to cause a bug, whether they attach the correct data, and how well they describe what should have happened. Most importantly, looking at the whole bug write-up I could gain insights into how hard they dig when finding a bug. Some people would see a bug as only a symptom of some other, more serious issues, and keep digging. Others would simply write up the bug quickly and move on. It is a bit subjective, but this technique does work with concrete data (their bug reports) and provides great opportunities for feedback and training.
#22 Submitted by pradeep0 on Tue, 06/18/2002 - 12:39pm.
Thanks James. Good points. We are looking at Defect Removal Efficiency as a performance measure of the testers. It is a ratio of defects discovered before the product was delivered to the client to the total reported defects. Wedo not count UT defects in this.
#23 Submitted by Bing Fang on Tue, 06/18/2002 - 12:23pm.
Very interesting points. Yes, measure tester's work is as difficult as measure the software qulity. But we know software quality has "good" and "bad" in gerneral sense, so does tester. In some cases we just have some kind of feelings that one tester is better than the other, but it is not easy to give quatitifiable descriptions, just as software quality.
#24 Submitted by Nitin Tawde on Tue, 06/18/2002 - 5:51am.
Great Article !!!But I suppose that there are many Test Managers who are biased and who do not reward good points to the Testers who are sincere because they do not like them.
#25 Submitted by Tek Wallah on Tue, 06/18/2002 - 12:03am.
James Bach has hit the nail on the head again -- but where does it lead us? There's no substitute (yet) for intelligent, experienced and astute observation. But how many of us can do that? Even if we had superb people-empathy skills, we can't sit and look over every employee's shoulder all day long. It's the same old story over and again: the staff know who's good and who's not, but those of us who make the decisions about them don't. And we guess wrong far too often.
#26 Submitted by Dale Emery on Mon, 06/17/2002 - 6:51pm.
Hi James. Nice article, as usual. I'm wondering about the context for all of this: What reasons do managers have for wanting to measure individual testers? If I don't know that, I don't know what to measure, and I don't know how to evaluate whether any given measurement might help.
#27 Submitted by Joseph Strazzere on Mon, 06/17/2002 - 5:49pm.
As usual, James makes some great points.The search for the "Silver Bullet" of simple measures is misguided. It may take more work for you as the manager, but treat your testers as individuals, and you'll all be better off!
#28 Submitted by kalyan rao on Mon, 06/17/2002 - 1:49pm.
While reviewing performance probably we should think about Skill-Set of the individual ( OS's, databases, Business Knowledge,Etc..) and the contribution to the projects.If the organaiztion is matured enough, it is also not a bad idea to take the opinion from our parteners i.e Development teams. In the end testing is afterall providing services to the development teams to keep the project on track with respect to Time,Quality and content.
#29 Submitted by RAvi Kumar on Mon, 06/17/2002 - 1:19pm.
Its really good but the information is not enough to test a tester.
#30 Submitted by Harsha BN on Wed, 06/26/2002 - 9:58am.
Hi James, You are telling about testers performance in different technologies,but my opinion is testers should expertise in particular technology, managers should not assign work to testers who is not expertise in particular domain expertrise. Managers shouldthimk about your article.
#31 Submitted by Antonio Cardoso on Mon, 06/24/2002 - 6:51pm.
It is a very important subject. I believe that thousands of managers are looking for a way to evaluate their testing contributors. I am, too. I know that your intention was not to show ways for evaluating testers but I was disappointed with the article in this way. But, the article was excellent in revealing to us that we have to think how to solve this situation. Thanks for the article, I hope you present us with more!
#32 Submitted by on Mon, 06/24/2002 - 5:22pm.
Hi James. Thank you for another sparkling article. The sentence that struck me the most was this: "Making sense of observations" is a much richer concept (and, I think, more useful) than "comparing to expected results." It really slapped me upside the head. This is the way out of a conundrum that I have been in for years: how to describe what allows me to do effective testing in very short timeframes. (By this I mean enough time to outline a set of actions that the product is purportedly able to accomplish, but not enough time to create a set of tests so specific that the inputs and outputs are defined in advance.) I test by attempting as many actions as I can, in as many states as I can, and then Making Sense of Observations. Doing so seems to make the best use of my intellectual resources. Other testers' intellectual resources may best be leveraged using other paradigms, especially that of Rigorously Predefined Requirements, Tests, and Expected Results. As a test manager, and the creator of test methodology on projects, I must be awake to my bias. I may choose my team to serve this bias, hiring people who think like me. Or I may have to overcome my bias and find ways to lead testing that suit the mission of testing, and also suit differing styles of thinking. I will keep this in mind as I move forward.
#33 Submitted by roger dalal on Thu, 12/12/2002 - 9:24pm.
Hello James:I think the best way to judge a tester is to test him in his knowledge of testing methodologies,general information technology and product specific matters.That would be a practical way to do these things. Your claim about wanting to get to know the testers ambitions etc are vague and generally in my management experience in QA, fraught with errors.
#34 Submitted by Alexandra Gerassi... on Wed, 07/31/2002 - 1:03am.
Isn't the absence of 'Stop-ship' defects in the released software would be a good measurement of the tester's performance?