Wednesday, April 19, 2023

2 types of testing - but only 1 that AI can help with

A robot surrounded by text on computer screens

There's currently a lot of discussion about using ChatGPT (or other, similar, AI tools) to create unit tests for code. 

This has the opportunity to be valuable, but it's important to consider that there are two broad categories of tests when it comes to software. Each answers one of the following questions:

  1. Does the code do the right/correct thing?
  2. Does the code not do the wrong thing?

At first glance, you might mistake these for being the same. But there's a subtle difference.


Does the code do the right/correct thing? 

Does it do all that it's supposed to?

Does it correctly implement all of the requirements?

By only looking at the code, it's impossible to know for sure. Well-written code should provide many indications about what it should do, and wider knowledge of the application and the domain to which it relates can also provide insight but without knowing the full requirements, you can't know for sure.

If all you have is the source code, you must assume it's correct. Until you find something wrong or that needs changing.


Does the code not do the wrong thing?

Sometimes the requirements for an app might include a statement that "the code should not crash unexpectedly" or "the code should be secure." Often this is just implied. Who wants an app that crashes or is insecure? 

Fortunately, these are the kind of things you can test for without any wider knowledge of what the code should do. This is how they're different from the first kind of tests.

AI (& other) tooling can be a really good way of creating tests that try and break code. If it finds something, and you fix the issue, you get better (more reliable) code. But be wary of relying on such tests alone.

Yes, having more tests is generally better. But beware of relying on these tests for identifying regressions and the introduction of bugs in the future.

One of the great benefits of having tests is that they not only help you verify the code is correct now but also that it isn't accidentally changed in unexpected ways in the future.

Having tests that verify the code does the correct thing (produces the correct output given specific inputs) is good. But, of greater benefit is that the tests can verify that the code still produces the correct output if you need to modify the internal code. Tests that only make sure that the app doesn't crash, given particular inputs, aren't as valuable in the long term.

It might be great to say you have lots of tests and to even have high test coverage, but if they're not helpful as the code changes, are they really all that useful?



Yes, "it depends"

It's not as black-and-white as I've laid out above. There are times when AI might be able to generate useful tests of whether the code is doing the right thing based on the names and patterns it identifies in the code. Or comments in (and around) the code may provide sufficient context to generate useful tests.

Your mileage may vary when it comes to having a tool (AI-based or otherwise) generate tests for you, but I've always found thinking about these 2 types of tests helpful.



0 comments:

Post a Comment

I get a lot of comment spam :( - moderation may take a while.