TLDR: If AI can easily generate code to increase test code coverage, has it become a meaningless metric?
I used to like code coverage (the percentage of the code executed while testing) as a metric.
I was interested in whether it was very high or very low.
Either of these was a flag for further investigation.
Very low would indicate a lack of testing.
Very high would be suspicious or encouraging (if the code was written following TDD).
Neither was a deal breaker, as neither was an indication of the quality or value of the tests.
Now tests are easy. Anyone can ask an AI tool to create tests for a codebase.
This means very low code coverage indicates a lack of use of AI as a coding tool, which probably also suggests a lack of other productivity tools and time-saving techniques.
Now, very high code coverage can mean nothing. There may very well be lots of tests or tests that cover a lot of the code, but these are very likely to only be unit tests and are also very likely to be low-value tests.
There are two approaches to tests. Asking:
- Are there inputs or options that cause the code to break in unexpected or unintended ways?
- Does the code do what it's supposed to? (What the person/user/business wants?)
Type 1 tests are easy, and the type AI can produce as they can be written based on looking at the code. These are tests like: "What if this function is passed an empty string?"
Type 2 tests verify that the code behaves as intended. These are the kind that can't be written without knowledge that exists outside the codebase. These are tests like: "Are all the business rules met?"
Type 1 tests are about the reliability of the code. Type 2 tests are about whether you have the right code.
Type 1 tests are useful and necessary. Type 2 tests require understanding the business, the app, and the people who will be using it.
Type 1 tests are generic. Type 2 tests will vary for each piece of software.
Type 1 tests are boring. Type 2 tests are where a lot of the challenge of software development lives. That's the fun bit.
Them: "We've got loads of tests."
Me: "But are they useful?"
Them: "Umm..."
I've recently started experimenting by keeping AI-generated tests separate from the ones I write myself. I'm hoping this will help me identify where value is created by AI and where it's from me.
0 comments:
Post a Comment
I get a lot of comment spam :( - moderation may take a while.