WireColumn: Why ‘Pseudonymisation’ Can Be a Good Deal, for Both Users and Businesses

stefan

Stephan Noller is CEO & Founder of nugg.ad AG

In a previous article, Nick Stringer from IAB UK explained how the EU wants to regulate the use of data on the Internet and why this might go too far when it comes to data-driven business models. My company is one of those data-driven businesses and we are helping many publishers and websites across Europe to deliver more relevant ads to their consumers in a privacy-friendly way – because we are working with a strong data-minimisation technology built in. The concept is called ‘Pseudonymisation’ and has been introduced to the policy-discussion at a relatively late stage – although it is carrying some of the core concepts of the proposed regulation as its principle.

Pseudonymisation describes a process in which recorded data is limited in such a way that it is no longer possible to link it to the individual from whom it originated. However, the granularity of those individuals is retained – which is not the case with the more familiar process of anonymisation – so that if one person is recorded as having an interest in sports, and two people in culture, once the data has been pseudonymised it still registers as three people and three data records. Anonymisation, on the other hand, would convert that information to something like a 33% interest in sport.

Pseudonymisation stands for privacy by design: It works by completely removing the characteristics of a data record that might be possible to single out a person – such as IP Address or Cookie-Identifier – and replacing them with a machine-generated identifier. We should be clear that most online businesses do not collect data that can identify an individual directly.

Of course, the reliability of this process is significantly dependent on how securely these characteristics are really being deleted, or how difficult it would be to recreate the original reference to an individual. For instance, this would make it possible for a third-party to remove the critical data and delete it in compliance with the contract, so that the party processing the data would be unable to store it other than in its pseudonymised state, from a technical point of view. In other contexts though, it can be adequate to pseudonymise the data in a company, so that it would, at least theoretically, be possible to restore it. This can be a legal requirement, for example to pursue infringements of the law. Admittedly, the protective function of pseudonymisation would render the operation weaker when making it easier to recreate a reference to individuals. In general, pseudonymisation plays an important role in data protection from the very beginning, when data is generated and stored, and prevents any critical data stores being produced. In addition, the user does not need to be ‘forgotten’ (the ‘right to be forgotten’ is one of the core concepts of the new data protection regulation presented by EU-Commissioner Reding in January) in pseudonymous data sets, because forgetting already takes place in the moment the data is stored.

However, a further dimension must be taken into consideration in order to make protection secure – and that is the granularity of the data stored. The fact is that, especially in the era of big data and the growth boom in data volumes, the granularity itself can sometimes be so specific that you can infer references to individuals, even without direct identification. That’s relatively easy to understand. For instance, if you combine indirect identifiers such as employer, make of car and preferred holiday destination. Nowadays you can use modern algorithms and the volume of data available to identify individuals from their data trail, even in supposedly harmless data scenarios, for instance by recording their search queries over longer periods (where most people would reveal their identity by ego-searching straight away).

So this is where effective pseudonymisation has to come in – to guarantee that the volume and granularity of the data stored does not provide the right conditions so that references to individuals can be recreated ‘via the back door’. One way of ensuring this, is to have the storage dimensions checked by a third-party. Technical options would also be a possibility. For instance, to ensure that it is never possible to exceed a batch size of one, even if the available data is combined in infinite ways. Companies that use pseudonymisation will check their data on a regular basis to avoid the risk of suddenly collecting critical stuff that could be used to single someone out and therefore act with data-protection in mind while collecting.

Why go to all that effort?

If the pseudonymisation process is successful and reliable, this results in benefits both for users’ data privacy and for data-hungry companies. Users benefit from pseudonymisation in several different ways. One way is that they can use online services requiring registration or identification for example, without divulging their full identity, which might be commenting on a website or blog. There is good reason for the Schleswig-Holstein Data Protection Commissioner’s recent warning to Facebook because the network does not provide its users with the legally-secure option of pseudonymised network use (according to the German law). This can be a significant protective feature if, for instance, people who need to protect themselves from stalking want to use networks of this type and cannot afford to disclose their identity.

The other benefit to the user is an indirect result of the process detailed above for guaranteeing pseudonymisation, because the company collecting the data does have to implement numerous safeguards in order to restrict the volume and accuracy of the collected data effectively. Data privacy experts also call this ‘data minimisation’. So, as a company you are obliged to consider exactly what data you really need, and for how long, and then you optimise data collection in such a way that only the data you need is stored, right from the start. In the case of big data companies on the Internet who store several terabytes of data on a daily basis because their services are used globally, this can quickly assume a considerable role. Pseudonymisation is the industry’s active way of self-limiting with regard to data. If a huge volume of data has been effectively pseudonymised in this way, it can even be harmless if the database is lost or hacked into – without reference to individuals the data is usually worthless to outsiders.

So, why should you go to all that effort when, after all, it’s easy just to store everything at relatively low-cost these days? It is indeed the case that the financial benefits of data minimisation are increasingly being eroded by the plunging costs of IT. There are, however, still two good reasons to opt for pseudonymisation: Firstly, it represents an active data protection strategy, so that even the best security measures for critical data cannot outperform its protective effect. Therefore, companies will tend to opt for this strategy in order to demonstrate to their users that they handle their data seriously and respectfully. This incentive alone is not likely to be enough to convince as many companies as possible to go down the pseudonymisation route – the pull of data-flows and their monetisation is too great.

However, there is another very effective lever besides this to reward the efforts of pseudonymisation, which has been provided for in the German Telemedia Act (TMG) for many years. This law can prevent companies from building critical data sets and should unquestionably be included in the current debate (and also the law) about the European General Data Protection Regulation. If data is to be verifiably rid of direct reference to individuals by means of reliable pseudonymisation, then collection of this data should be made easier as an incentive. More specifically, an objection policy for users should be adequate instead of a consent requirement, as is the case with the German Telemedia Act. With such a strong incentive it would be possible to lead the booming big data industry down the pseudonymisation route, benefiting all sides hugely.

Tags

Comments


  • Petteri Vainikka, Enreach

    Aren’t we making things unnecessarily complex for everyone here?

    Consumer privacy regarding all forms of data used for targeting, or analyzing the effects of advertising, (or anything else for that matter without full disclaimer privacy policy and explicit opt-in) does not benefit from rhetoric obfuscation, but should in stead be made very simple and clear — because it actually is just that simple.

    There exist two kinds of data:

    (1) Anonymous, non personally identifiable data, either by source or rendered permanently anonymous by process, where all person-related data that could allow backtracking has been purged. (This is the simple definition of anonymous, or anonymized, data.)

    (2) Personally identifiable data, or pseudonymized data. Note: “the pseudonym allows tracking back of data to its origins, which distinguishes pseudonymization from anonymization, where all person-related data that could allow backtracking has been purged.” (Wikipedia)

    For use in advertising, there is hardly need for anyone to deploy data of the second nature, nor should any 3rd party hold in possession such data unless they have explicitly been given permission to keep a copy of such data, and use it in accordance with their clearly stated usage & privacy policy — by consumers whose data records they are. There should also be – clearly marked on every ad or other vehicle deploying such data for any use case – an easy and simple opt-out mechanism, similar to what is already de facto in permission based email marketing programs (sadly not for spam).

    Anonymous data on the other hand is just that: anonymous. Yes, I believe consumers equally deserve to know (i) where and (ii) by whom their browsing is being tracked, also for such anonymous profiling (which can then be further abstracted away from the anonymous unique profiles by various forms of advanced modeling if needed.) However, as this type of data is 100 % anonymous by both source and design, it’s use monitoring is obviously more liberal.

    Even in the vortex of big data, applying full anonymity in best done at the design phase – so that no one is tempted to backtrack pseudonymized data to person-related data – and also to just keep things simple.

    Last, there is absolutely nothing preventing 100 % anonymous data from being equally granular. To comment on the example in article: “if one person is recorded as having an interest in sports, and two people in culture, once the data has been pseudonymised it still registers as three people and three data records. Anonymisation, on the other hand, would convert that information to something like a 33% interest in sport.” Why would this be the case?

    If one person is recorded as having an interest in sports, and two people in culture, the 100 % anonymous user level data profiles should show three separate anonymous profiles:

    (1) anonymous user A: interest in sports
    (2) anonymous user B: interest in culture
    (3) anonymous user B: interest in culture

    I.e., three separate anonymous profiles, with one recorded interest signal each.

  • http://twitter.com/TheCookieCrunch Cookie Collective

    If pseudonymisation still enables a web user to be targeted with an interest based ad on a site – then according to the Article 29 working party it is still personal data, and under their recommendation would require opt-in consent, It is not just about the data itself, but the effect it has on the individual.

    http://www.cookielaw.org/blog/2013/4/11/eu-position-on-behavioural-profiling-clarified.aspx

  • http://twitter.com/TheCookieCrunch Cookie Collective

    If pseudonymisation still enables a web user to be targeted with an interest based ad on a site – then according to the Article 29 working party it is still personal data, and under their recommendation would require opt-in consent, It is not just about the data itself, but the effect it has on the individual.

  • Pingback: This man thinks big data and privacy can co-exist, and here’s his plan | BaciNews