Seven new draft recommendations on AI submitted for public consultation by the CNIL (1/2)

CNIL publication on its website

The CNIL has been engaged in in-depth work on artificial intelligence for several years. Back in 2017, the authority published a report on the ethical challenges of algorithms and artificial intelligence. Its work continued in 2022, notably with the publication of the first set of recommendations on the main principles of AI, which also included a guide to support professionals in their compliance. In May 2023, the CNIL launched a major project on the design of AI systems and the creation of databases for machine learning. A first series of recommendations on this topic was published in April 2024.

The CNIL is now supplementing this work by submitting seven new recommendations to a public consultation process until September 1^er 2024. This first article presents the first three topics of the recommendations: the legitimate interest legal basis, the dissemination of open-source AI models and web scraping.

The legitimate interest legal basis in developing an AI system

In this recommendation, the authority details the three conditions that must be met in order to rely on this often-used legal basis for the development of an AI system: the “legitimacy” of the interest pursued, the necessity of the processing and the absence of disproportionate harm to the rights and interests of the data subjects.

With regard to the first criterion, the CNIL considers that certain interests could a priori be considered legitimate in the context of the development of an AI system, for example: conducting research, facilitating public access to certain information, offering a conversational agent service or assisting users, or detecting fraudulent content or behavior. On the other hand, the development of an AI system to send targeted advertising to minors based on their profile could not be considered legitimate, being reminded that this practice is prohibited by article 28.2 of the DSA with regard to online platforms.

As far as necessity is concerned, the CNIL points out that this requires verification that there are no less privacy-intrusive means available. The condition of necessity must therefore be assessed in the light of the principle of data minimization, under which data controllers must ensure that they do not collect more data than is strictly necessary to achieve the purposes of the processing.

The CNIL then provides detailed instructions for balancing the interests of the data controller and those of the data subjects. First, the data controller will need to list the benefits brought about by the AI system, such as improving healthcare, facilitating the exercise of fundamental rights, and so on. Then, these benefits will have to be weighed against the potential impact of the processing on the people concerned. In this context, the CNIL distinguishes three types of risk for the latter: those linked to the data collection methods used to develop the AI system (e.g.: illicit collection, collection of a large volume of data), those linked to the training methods used for AI systems (e.g.: issues in guaranteeing the exercise of fundamental rights), and finally the risks that may materialize when the system is used (e.g. the risks of memorizing and then regurgitating personal data, or even generating such data when using an AI-supported conversational agent).

Lastly, the French authority insists on the need to also take into account the “reasonable expectations” of data subjects, and to provide for compensatory or additional measures as soon as necessary to limit the possible impact of processing on data subjects. On this last point, the CNIL provides a detailed list of measures that can be contemplated, depending on the risks involved.

Publishing open-source AI models

In this recommendation, the CNIL highlights the benefits of opening up AI models, while also mentioning the risks this can present. Among the many benefits of opening up models, the commission notes that this can enable the data controller to benefit from contributions from the community and facilitate the adoption of the model by certain players. It also points out that this can increase transparency and make certain checks possible by third parties (capabilities and limitations, presence of bias, vulnerability, etc.). The CNIL thus considers that opening up the model can, in particular, lead to a strengthening of the data controller’s legitimate interest, when this legal basis is mobilized.

The authority notes, however, that openness may, under certain conditions, enable third parties to facilitate the reuse of AI systems for malicious purposes, or the exploitation of security flaws. As a result, it recommends that additional safeguards be put in place to limit these risks, such as the introduction of licenses to limit certain re-uses of models, or the implementation of data security measures.

CNIL advice on web scraping

The CNIL has already had the opportunity to develop its position on web scraping – i.e. the collection of personal data freely accessible on the Internet – in previous publications. In this third fact recommendation, it lists the minimum measures to be implemented in order to use this technique on the basis of legitimate interest.

Firstly, the commission points out that it is mandatory in all cases to put in place certain measures to ensure compliance with the principle of minimization: defining precise criteria and applying filters to exclude unnecessary data from collection, and ensuring that any data that may have been collected by mistake is deleted very quickly.

In most cases, additional safeguards will have to be implemented. For example, the CNIL recommends excluding by default data collection from certain sites containing particularly intrusive data (e.g. health forums), or from those that clearly oppose web scraping. The authority also encourages the widest possible dissemination of information on data collection and the rights of individuals, and to allow them to object to processing at their own discretion. The CNIL also mentions a project it plans to launch for a “register of organizations processing data collected through web scraping for the development of AI systems.” The purpose of this register would be to facilitate the provision of information to individuals and the exercise of their rights with regard to organizations processing data collected, via the web scraping technique, as part of the development of an AI system. The CNIL points out, however, that registration would be optional, and would in no way prejudge the lawfulness of the processing carried out by data controllers. We will present the other four recommendations recently published by the CNIL in a second article. These recommendations, also submitted for public consultation by the CNIL, deal with informing data subjects, exercising rights, annotating data and security.