Keeping Secrets
Keeping Secrets
How Data Privacy Affects Testing
Test data has long been a challenge for testing; privacy legislation, identify theft, and the continued trend towards outsourcing has made it even worse. Just establishing and maintaining a comprehensive test environment can take half or more of all testing time and effort. In this week's column, Linda Hayes adds in the new and expanding privacy laws that inevitably limit your testing options. Yet from the quagmire of laws and company standards, better testing can emerge.
In the old days, production could provide a refresh from time to time for your test bed. Although this was not easy, it was a starting point. You still had to make sure you either had the storage to get a full copy or could extract a coherent subset, and of course you could not reuse data because it was a moving target. And even though production was arguably a good sample, there was no guarantee that all your potential test conditions existed. Finding the right conditions to satisfy a test was the proverbial needle in a haystack.
This was already hard, so what do you have to worry about now?
Your Health
For starters, when you visit the doctor you have yet more paperwork to complete, authorizing (or not) the disclosure of your medical information, thanks to the Health Insurance Portability and Accountability Act of 1996. The rules set by this act highly restrict who can access patient data, as well as when and why. Since your medical provider has to have permission to disclose your data to other health professionals or family members, it can't be provided to perfect strangers who are testing the latest release of software for a health insurance company or hospital management firm.
As a tester, if your application touches this information in any way, you can't use unconditioned production data for testing. You can scrub names or social security numbers, which is not a necessarily new technique (although it's not always done). But now you have to worry about less obvious elements that can give the patient away, for example, a policy or claim number can tie back to an actual person through another system.
Your Money
Grimm-Leach-Bliley may not be as well known as Sarbanes-Oxley, but Title V of that act directly addresses the privacy of your financial information. Aside from certain exceptions like the Fair Credit Reporting Act, your financial institution has to publish its privacy policy and allow you to opt-out of certain disclosures. They also can't provide your information to testers in the U.S. or elsewhere. The European Union has even more stringent financial privacy regulations.
That means any software that touches money or where it lives also has to be kept private. Again, not just your social security number, but other data that could be traced back to you even indirectly. Bank or credit card numbers are obvious, but what if an order number could be tied back to an invoice that could identify you?
Your Identity
All of this comes back in some way or another to your identity. Identify theft is a huge problem. Consumers and credit card companies are getting smarter and Web-exposed systems are becoming more secure, but all of the software that powers these relationships has to be tested, which means testers may have access to data that they otherwise would never be authorized to see.
I recall one project where we tested upgrades to a human resource system. We had to battle with the system administrator to give us supervisory password access to the test server so we could theoretically hire and fire employees, change their salaries, etc. The network support group resisted because the information was highly confidential, but we argued that the testing region was different from production. As a practical matter, we had to exercise every available function. As a compromise, we received regular refreshes from production, which we had to scrub.
Your Location
Granted, legislation is catching up and law enforcement is waking up, so protection is becoming a priority...at least in some places. But what if testing is done in another country where these mechanisms aren't as available or enforceable? If your company sends its testing to another firm in another country, then you're entering a whole new level of disclosure. While your bank technically hasn't disclosed data to another institution if the data finds its way into an internal test region, sending it across the world to an unrelated entity is completely different.
Hidden complications can also come in the form of organizational constraints meant to provide additional protection, as usually found within domestic firms. "Chinese walls" and other internal controls limit the exchange of information between departments or companies. But what if you are using a gigantic outsourcing firm that services hundreds of systems for multiple enterprises? It is conceivable that data not meant to be commingled could be.
Silver Lining
As an eternal optimist, I do see a silver lining in all of this. It is forcing companies to examine the entire issue around test data. A customer recently described a major project designed to create a test data region from the ground up. Not only did this solve privacy concerns, it also defined the data needed and reused every time, improving coverage and saving tremendous time.
If your company hasn't come to grips with privacy issues yet, I encourage you to do so and fast. It isn't easy to fix, but once the genie is out of the bottle, it may be impossible.


Comments
#1 Submitted by Kathy Van Stone on Wed, 05/04/2005 - 3:52pm.
I am admittedly a developer more than a tester, but for seed data we tend to use obvious fake data (for example 000-11-2222 for a social security number). We do have a method of scrubbing production data for when there is a error that needs debugging. I handed your article to the person in charge of that to see if he may have missed something. Thanks for the article.
#2 Submitted by Jeremy Sloan on Wed, 05/04/2005 - 10:10am.
As a software QA department for a financial company, data privacy is an issue that we have to deal with all the time. Test automation can really help, in some cases. When you have the ability to load a created test bed entirely from a set of Excel worksheets or delimited text files, privacy concerns disappear and you also have far more control over your test conditions. Building a comprehensive suite of test data from scratch is a lot of work, though.Even with automated testing tools available, it's not always practical (or even possible) to avoid using production data for testing - especially for production support. In order to help comply with Sarbox, we have a 'firecall' process in place. If a developer or tester assigned to production support needs access to production data, that person requests a temporary user ID that allows this access. After a set period -- typically 24 hours -- the ID expires. While somewhat inconvenient, this process limits production access to those who really need it.
#3 Submitted by Frits Bos on Mon, 12/27/2004 - 11:57pm.
Hi, Linda, I like the way you put the question squarely to test managers, why on earth would you continue to rely on using production file extracts when that is a fundamentally flawed approach, let alone an approach that is likely to turn you into a law-breaker?I remember the IT history a bit farther back than you acknowledge. Having been around for a few years, as you know, I can attest to the fact that not only was finding the right test conditions difficult, typically the data did not exist. That was actually a blessing in disguise, because we did not have an opportunity to compromise customer data. So the "old days" you are talking about clearly were not that long ago. The history is that customer data used to be regarded as company owned information. Especially banks used to think that they owned client data and, as a rule, they demanded a lot of personal information that nobody questioned. Now we have PIPEDA in Canada, which is very strict, but I don't think it is all that well enforced, since most companies that test software routinely violate the Act by using production data extracts.It is actually surprising that this data extract business started in the first place, because if you think about it, there is almost no likelihood that the special new functionality conditions you want to test for will be reflected in older production data. Could it be that, as a result, testing is less than thorough due to trying to find a shortcut for creating test data? You mentioned that people may use scrubbed data, but in fact the selection of that data prior to scrubbing already is a violation of the Act, which clearly states that the data are not to be used for any purposes other than what the individual has consented to. The use of a customer profile for testing is usually not one of the uses customers consent to, so the extraction of that data prior to scrubbing is illegal.Another interesting twist is testing applications as part of offshore development. Has anyone paid attention to the test data that is used? How many companies are inadvertently exporting private information? I am not suggesting that offshore developers would abuse the information, but the fact remains that people simply do not pay attention to the law when they develop and test, they simply focus on the job at hand. Sooner or later someone will pay the price of being "caught" in the act. The sad thing is that all these violations do nothing to improve the testing of software.As you pointed out, there are many benefits from building test data from the ground up. We have shared philosophical discussions on what it takes to make people see the reality that you cannot test conditions that are not reflected in the data files. It is ironic that in the really old days that I talk about we did not use test automation because all our testing was batch file based, which (if you stop and think) is very similar to firing off a test automation session. Somewhere between then and now the data extract approach took hold, probably because of the perceived costs of creating test data. What I like about your product, Certify, is how easy it is to use that tool with a table-driven architecture to build any type of data file you want. I can generate all the conditions, reflect them in data tables, and establish scripts to build data files as well as scripts to test a GUI based application that uses those data files. Not only does this avoid any legal issues, it actually ensures that we test the conditions that really matter in the business application. I don't think this is such a difficult concept to grasp, so the question is: why would anyone still insist on using data extracts that are unlikely to provide an adequate test bed? What does it take to convince people that there are many benefits from building a testing-specific test bed from the ground up? When do people realize that test automation tools can do more than to automate test transactions?
#4 Submitted by Michael McDonald on Tue, 12/28/2004 - 6:24pm.
I don't work for a bank or a healthcare company. Where do I go to find out what rules on data privacy apply to us?