The Business Solutions Series is a compilation of solutions to various business challenges that I have encountered throughout my professional journey.
Context
We were working with computer factories all over the world, and these factories were being handed rolls of certificates that they needed to include with the computers. These rolls were being produced using technology similar to the one used to make money. Each roll had a roll ID and each certificate inside a roll had a certificate ID. These were being used to track their life and usage across the supply chain.
The purpose of these certificates was for the end user to reactivate the software in case they needed to perform a factory reset on their device. Computers came from the factory with its software already activated, so most of the time these certificate codes were not used. However, a common exception was random hardware glitches that would require a factory reset.
Problem
The certificate codes were very valuable in the black market as they allowed people who didn’t purchase the software legally to activate pirated versions of it. This gap rapidly developed into a cat and mouse dynamic with some of the factory workers stealing codes with increasingly sophisticated methods, and the software company trying to stop the practice.
Initially the workers were writing the codes in pieces of paper as they were taking each certificate out of the roll and putting it into the hardware box. The software company would just quickly block activation certificates once they started seeing more than a few usages of a single certificate. But then the workers started copying more certificate numbers so none of them would need to be reused more than once or twice in the black market.
Later, the company was also able to identify rolls - with the roll IDs - that seemed to be fully compromised and flag the whole rolls. This way the certificates of those rolls couldn’t be abused anymore. This led to even more sophisticated code copying at the factories, where the workers stealing them would not steal all certificates from the same roll and would instead alternate between copying them or not as they were processing them.
Objective & Assumptions
The objective of the project was to stop the piracy practice of stealing certificates.
We knew that legit factory resets do happen once in a while, but they were mostly due to “random” glitches in the hardware. We also assumed that humans stealing the codes were probably not using random number generators when choosing which certificates to steal (even if their code copy patterns had evolved to be more complex).
Solution
My manager introduced me to a statistical test called “runs test”. It is a simple test where we marked the certificates with a binary flag: if they have been used for reactivation or not. Then, we ordered the flag values into sequences of positive and negative runs - in the same order that the certificates came on each roll, where a run ends every time the consecutive certificate has a different flag value.
For example: A roll with 26 certificates and an activation sequence 00001110011110001001110000 has 9 runs.
The test consists of estimating the expected number of runs based on how many positive and negative values were seen on the roll under the hypothesis that each element in the sequence is independently drawn from the same distribution. If the actual number of runs is a few standard deviations away from the expected value, we would then flag those rolls as suspicious.
The formulas are so simple that it was very easy to implement into the SQL data warehouse and set up - right in the warehouse - a monitoring process to flag any suspicious roll moving forward.
If you are interested in learning more about this test this wikipedia article can be a good starting point.
Impact
The new flagging mechanism worked like a charm. The company was able to find - very quickly - thousands of rolls that they didn’t know had been compromised. They were able to also monitor new rolls and quickly react moving forward. This had a very positive impact on revenue as piracy practices were massively reduced after this went to production.
Not data science, but wouldn’t it be better to have the retailer control the certificates and give them to the consumer at the time of purchase? In many cases the retailer is also the software company (ie Apple selling Macs or Microsoft Surfaces), and even when it’s not, presumably it’s easier to catch this activity at the point of sale than at the factory?