Anonymisation – The bar has been set high

Anonymisation – The bar has been set high
February 8, 2018 Blogger

The General Data Protection Regulation (GDPR) will impose a set of very strict rules to which you will have to adhere to if you wish to process personal data. Anonymous data, however, falls outside of the scope of the GDPR, meaning that, in theory, you can do whatever you please with a set of anonymized data. The problem however, is that the less data your data set contains, the less useful it is for statistical analysis and marketing purposes.

In practice it’s not all that simple. Making a data set fully anonymous is not an easy task. If you take a look at the guidelines HIPAA drafted regarding making data sets anonymous, just removing 18 types of identifiers makes a data set anonymous. This is however not enough under the scope of the GDPR. Important here is that just removing certain identifiers does not render the data anonymous.

Anonymising can be done through various manners. Using ranges of data is one example of this. Instead of saving dates of birth, or age, an age range could be used, eg: 18-25, 25-35, and so on. Swapping data is also a possibility. If you were to swap the values of person X with person Y, the statistical analysis would stay the same, but the linked person would not exist.  Just flat out removing certain identifiers is also an option.

Under the GDPR, in any kind of data set, if you are able to single out an individual, then the data set cannot be considered anonymous. Think about the following example, you own a big set of personal data and you remove all categories except shoe size, height, and postal code. You could think this is fully anonymous, but it is entirely possible that there is only 1 individual with shoe size 50 living, whilst being 2m10, and living in the area with 1930 as a postal code. Since it would be fairly easy to identify this one person, the whole of your dataset is not just “anonymous”.

With this in the back of our mind, it is not hard to imagine that a lot of “anonymous” data sets that are being used right now are in fact not anonymous at all, but merely pseudonymised. Pseudonymisation and anonymisation are often mentioned in the same sentence, but they are not the same. Pseudonimised data for one, falls under the scope of the GDPR. Secondly, pseudonymised data will have an algorithm, a linked database, or some other means in place, which means that whomever has access to those identifiers, can easily identify the entire data set.

Furthermore, a dataset that is fully anonymous today should be re-evaluated at certain times. As technology keeps getting faster and better, and as a lot of data is available on the internet, data sets that are now considered anonymous because the time & effort involved in de-anonymising this data set are disproportional, could be considered not anonymous one year in the future as new technology or different data sets are made available that make identifying this data feasible.

To summarise, anonymous data is one way to avoid the GDPR, but it is certainly not straight forward. The bar has been set very high and data you might think is anonymous, probably is not.