A system with the same core functionality is available, developed by a third-party
• Back-to-back testing can be used to test the core functionality using this third-party system as a test oracle.
• Existing test cases or even randomly-generated test inputs and automated testing can be used.
A/B Testing can be used to see if the new version achieves better sales by comparing the results from the two versions using statistical analysis and splitting site visitors so that some visit the old version and some visit the new version
Was the latest version of a spam filtering system attacked during its training?
• A/B Testing can be used to see if the new version provides results that are statistically significantly different from those of the current system for the same set of emails.
• If there is a difference it may be due to a data poisoning attack using the training data.
We only have a few trusted test cases, and we have inexperienced testers who are familiar with the application
• Metamorphic testing may be appropriate as we can generate many follow-up tests from a few trusted source test cases.
• Inexperienced testers who are familiar with the application should be able to generate metamorphic relations to do this, even with only a small amount of training.
There is a worry that the train control system may not handle conflicting inputs from several sensors satisfactorily
Pairwise testing can be used to ensure all pairs of values from different sensors can be tested in a reasonable timescale when all combinations would be infeasible.
Our confidence in the training data quality is low, but we do have testers experienced in the boat insurance business
Experience-based testing, using techniques such as EDA (exploratory data analysis) may be able to identify whether we have reason to be worried about the data quality.
We have several teams producing ML models, but they don’t all perform the same verification tasks to ensure quality
Using a checklist, such as the Google “ML Test Checklist” would help to ensure all ML models had gone through the same testing steps.
We are new to ML, and would like to know that our testing is aligned with the testing of experienced model developers
Using a checklist, such as the Google “ML Test Checklist” would help inform a new entrant to ML on what is a ‘good’ set of testing activities to perform when building an ML model.
We have a tester who has seen several AI projects that have had problems with bias and ethics – and we are worried about making the same mistakes ourselves
Error guessing using the experience of this tester may be useful in avoiding unfair bias and ethical problems in our systems.
We have a test automation framework, but checking the test results from our AI bioinformatics systems is very expensive with conventional tests
Incorporating metamorphic testing into the existing test automation framework should be possible and this will allow many follow-up tests to be generated.
We are testing a self-learning system and we want to be sure that the system’s updates are sensible
A/B testing could be implemented automatically by the self-learning system. This would involve any changes made by the system causing automated A/B testing to be performed. This A/B testing would need to check that core system behaviour was not made worse by the change by comparing the new and current versions.
We are worried that updates to the system introduced defects in the unchanged core functionality
• Back-to-Back testing can be used (with the updated and previous version) to identify defects introduced to the core functionality (assuming it is supposed to be unchanged).
• Note that A/B testing is not appropriate here as we are not comparing measurable performance statistically but are identifying defects.
We want to check that the replacement AI-based system provides the same basic functions provided by the previous conventional system
Back-to-Back testing can be used (with the replacement AI-based system and the previous conventional system) using regression tests focused on the basic functions that are supposed to be the same.
We are testing an automated plant-feeding system that considers multiple factors, such as weather features, water levels, plant type, growth stage, etc.
Pairwise testing may be appropriate as all combinations of values for each of the factors is not possible due to a combinatorial explosion problem.
We believe that the public dataset we used for training may have been attacked by someone adding random data examples
• This sounds like a potential data poisoning attack.
• Exploratory data analysis (EDA) may be an appropriate response to identify if there are now noticeable problems with the dataset, such as outliers.
We have received a warning that the third-party supplier of our training data may not be following the agreed data security practices
A review of the processes used by the third-party supplier may be required to determine that the probability of a data poisoning attack on the training data is as low as required.
Our software is used to control self-driving trucks, but we are aware that our overseas rivals do not want us to succeed
Adversarial testing may be appropriate in situations where attacks against mission-critical systems may occur.
We need something that guides our testers when they are asked to take a quick look at the data used in the ML workflow
The use of exploratory testing is suitable when time is short and we can define a data tour that focuses the exploratory testing sessions on areas specifically related to the data.
We want to check that the new innovative ML model that has been developed is broadly working as we would expect
• Back-to-Back testing can be used (with the innovative model and a simple ML model that is easy to understand being compared).
• This will give us a good idea of whether the new model is working along the right lines (results would not be expected to be exactly the same, but similar).
• Note that A/B testing is not appropriate here as we are not comparing measurable performance statistically but are identifying differences in individual test results.
Our classifier provides similar functionality to a classifier which has reported problems with inputs close to the classification boundary
• Inputs close to the boundary may correspond to small perturbations that are adversarial examples.
• Due to transferability of adversarial examples, this suggests that we should perform adversarial testing of our classifier.