TL;DR
This post will provide test strategy for bad language detector.
Here is the context. You are testing application that has discussion feature. Every discussion has public visibility and only registered users could participate in discussions. As discussions are public and if your application is successful, this will attract spammers of all types. Including adult site spammers.
Developers will implement shiny bad language detector component (they will even mention AI in scrum meeting ) and insist to go live without any testing.
Soon, you will get user issue reports that words that are not bad language are marked as bad. On next scrum, you are assigned to test this component.
As you are not Facebook and you do not have a human army for bad language detection, you need to come up with clever test strategy.
Bad language test strategy
Ask developers how they implement API in that component. There is high probability that they used regular expressions [Wikipedia]
For example, regular expression:
`\w*anus\w*`
will mark
manuscript
as bad language.
Here is how you can see which words will get false positive. For above regular expression, you can use this:
https://www.thefreedictionary.com/words-containing-anus
I suggest following. Write rspec test that will load to bad language component with following extensive list of english words:
https://github.com/dwyl/english-words
so you can detect list of false positives.
This is example how test automation assists test strategy created by tester.
Can you name some of the risks for this test strategy?
Well, a lot depends on what you think of as “bad language”. For instance, your example would disable any application regularly used by doctors or surgeons, especially those who specialise in the lower end of the gastro-intestinal tract. Ditto biologists.
The same proviso applies to place names; there were quite a few examples in the UK from the early days of the Net where such detectors blocked whole municipal websites. (I hesitate to give examples! 🙂 )
Hi Robert!
Yes, you are correct, context is very important here. For my client, given example is bad word. But your comment should be taken into account if this blog post would become teaching material.
Thanks!