Article

3 Tips Regarding the Anonymisation of Personal Data

This blog has been adapted from: S. Stalla-Bourdillon, Trois conseils sur l’anonymisation des données personnelles, in J. Rochfeld et al (ed), Le nouveau droit français de la protection des données à caractère personnel: spécificités du droit interne au regard du RGPD et de la directive “données personnelles”, Collection décryptage, Dalloz, 2019, forthcoming in French.

Anonymous data, as defined in Recital 26 of the General Data Protection Regulation (GDPR), is excluded from the scope of the regulation. It is therefore crucial to know in which cases it is possible to use the qualification of “anonymous” data, in order to determine the legal regime applicable to the data to be processed. Hence the importance of the French Data Protection Agency (CNIL)’s role, which, in accordance with article 8.2 of Act No. 7817 of 6 January 1978, has the power to certify, approve, or even issue norms or general methodologies in this area and thus determine whether the GDPR applies to specific data processing activities.

Favour a risk-based approach.

Despite the publication in 2014 of guidelines on anonymisation techniques by the Article 29 Data Protection Working Party or WP29 (now the European Data Protection Board), the subject remains controversial.  However, Recital 26 of the GDPR seems to suggest that a risk-based approach is desirable. Requiring a re-identification risk of zero is impractical – and indeed impossible – if we consider that in many other areas such a requirement is not implemented. Recital 26 of the GDPR refers to all the means reasonably likely to be used, such as singling out, either by the controller or by another person for the purpose of determining whether an individual is identifiable. Although WP29’s comments remain confusing, a “reasonably impossible identification” seems to be the standard in order to exclude the application of the GDPR.

Design a dynamic anonymisation process and adopt a release-and-control model.

In practice, one thing is certain: simply examining the data to be processed (for example, by focusing on the nature of the attributes listed for each entry in a database) to conclude that the data should be considered anonymous or ‘anonymized’ is problematic. Moreover, the term ‘anonymous data’ is misleading in that it implies that it is sufficient to adopt a static approach to answer the question of the qualification of the data, and that, once the data has been considered anonymous, it must be considered anonymous forever, regardless of the environment in which it is used, which is certainly likely to evolve.

It is therefore more appropriate to talk about anonymisation as processes or procedures, and to consider not only the characteristics of the data to be processed but also those of its environment. This allows for a combination of anonymisation techniques applied directly on the data and control measures (technical and/or organisational) selected in view of the specificity of the context in question.

Consequently, the ‘release-and-forget’ approach – i.e., the de-identification of a dataset purely through techniques applied on the data set and its subsequent release to third parties without additional controls – should raise suspicion. Instead, it is preferable to opt for a ‘release-and-control’ model, such that the controller who initially transformed the data is subject to an obligation to monitor technological developments and the recipient of the data is subject to an obligation not to undermine the anonymisation procedure implemented, unless legitimate exceptions are met (for example, for research purposes). In practice, this means that static copies of data should not simply be “handed off” between organizations due to the risks this type of hand off occurs. As such, the UK Data Protection Act 2018 is worth mentioning: its Section 171 makes the re-identification (knowingly or recklessly) of data previously subjected to a de-identification process an offence, along with the subsequent processing of de-identified personal data without the consent of the controller responsible for de-identifying the personal data.

Think about purposes first.

Any controller interested in pseudonymisation and/or anonymisation solutions should first ask herself the following question: What is the purpose of the processing operation? By clarifying the processing purpose, the controller should be able to identify the types of information she intends to derive from the processing or analysis of the data, and thus the types of questions or queries she intends to ‘pose’ or make to the database at her disposal.

In general, if the controller intends to obtain aggregates (averages, maxima, minima, counts, sums, etc.) corresponding to collective behaviours, she should seriously consider the method of ‘differential privacy.’ It has been shown that among the anonymisation techniques examined by WP29, only differential privacy is capable of significantly reducing the trilogy of re-identification risks (individualization, correlation, inference) – provided, and this is important, that only the results to the queries are taken into account and queries are monitored and restricted when needed. Surprisingly however, WP29 seems to assume that as long as the data being queried (which must be distinguished from the results to the queries) is not destroyed, the results of the queries cannot be considered as anonymised data. The technique of differential privacy is thus wrongly placed on the same level as pseudonymisation even though the guarantees are much higher. This view has rightly been criticised, including informally by some national data protection authorities, because of the possibility of separating the initial database from the queries, suggesting that it has not been unanimously accepted. Ultimately, implementing differential privacy gives data subjects the guarantee that the inclusion of their data within the database queried does not  (statistically) significantly increase the risk of re-identification.

If the controller intends to derive information relating to singled-out individuals, only pseudonymisation (and not anonymisation) should be possible, although it is true that in principle for a singled-out individual to be identifiable, there must be additional information somewhere that can then be combined with the data resulting from the singling out. This consideration might explain why the 2014 WP29’s opinion on anonymisation techniques is not fully aligned with the 2013 WP29’s opinion on purpose limitation.

When opting for a high degree of personalisation, the controller should be aware of the heightened risk posed to the fundamental rights and freedoms of the data subjects. The number of techniques at her disposal is also more limited. An example is the technique to mask certain values (through, for example, encryption with secret key, hashing, tokenization) that is classified by WP29 as a pseudonymisation technique. Similarly, generalisation techniques play  a key role, although it is unlikely that the most restrictive, such as k-anonymity or l-diversity, will achieve the desired goal. The controller should therefore be asked to mask as many direct and quasi-direct identifiers as possible, such as names, customer IDs, social security or national identification numbers and sensitive data, and to properly secure access to additional information using state-of-the-art access control techniques.

In conclusion, when applying so-called anonymisation techniques upon personal data, it is essential to bear in mind that their respective strength greatly varies from one to the other and that a query-based approach (rather than an approach based on the data to be queried) is both stronger and easier to tailor to utility requirements. In addition, in many instances a controlled data environment is a prerequisite to reducing re-identification risks. What is more, given the key role played by the criterion of the processing purpose to determine the techniques to be applied on the data, it is essential to be able to preserve purposes over time.

***

The Immuta GDPR Compliance Playbook for 2019 includes new best practices required for legal and compliant use of EU data, with a focus on Data Protection by Design. Learn purpose-based restrictions, how to map GDPR data protection principles to the Immuta platform’s global policies, and guidance on implementing specific controls within the Immuta platform, such as masking and differential privacy. To download, visit: https://www.immuta.com/download/immuta-compliance-playbook-gdpr.

***

[1] Defined as “information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.”

[2] Art. 29 Data Protection Working Party, Opinion 05/2014 on anonymisation techniques, WP 216, 2014 (Opinion 05/2014).

[3] See for example Khaled El Emam, Cecilia Álvarez, “A critical appraisal of the Article 29 Working Party Opinion 05/2014 on data anonymization techniques” (2015) 5 (1) International Data Privacy Law 73; See also before 2014 the position of the Information Commissioner’s Office, Anonymisation: managing data protection risks code of practice, 2012, https://ico.org.uk/media/1061/anonymisation-code.pdf.

[4]  Opinion 05/2014, p. 9, n. 1.

[5] See Mark Elliott et al, The UK Anonymisation decision-making framework, 2016, https://ukanon.net/ukan-resources/ukan-decision-making-framework/.

[6] Runshan Han et al, Bridging Policy, Regulation and Practice? A techno-legal Analysis of Three Types of Data in the GDPR, in 2017  In R. Leenes, R. van Brakel, S. Gutwirth, & P. De Hert (Eds.), Data Protection and Privacy: the Age of Intelligent Machines, Haywards Heath: Hart.

[7] 2018 c. 12.

[8] Han, n. 5.

[9] Art. 29 Data Protection Working Party, Opinion 03/2013 on purpose limitation, WP 203, 2013.