After having created a department dedicated to the issues raised by artificial intelligence in January, the CNIL is now unveiling its action plan, which includes research work, support for professionals, but also enforcement actions. The publication of this plan follows the recent and very important development of generative AI systems such as ChatGPT (for language) or MidJourney (for images).
The CNIL’s action plan is based on four themes:
- “Understanding AI systems and their impacts“: The CNIL will address some of the main data protection issues raised by artificial intelligence. This should mainly materialize through internal research and analysis by the CNIL and its digital innovation laboratory (LINC).
The authority describes the questions it will study as “new”. In detail, these are relatively classical questions that were already being raised in some contexts, but are now being mobilized by artificial intelligence, together and at a new level. This includes the following issues:
- Upstream of the use of AI systems: the fairness and transparency of the collection of data used to train the systems and their processing, and in particular of the data publicly available on the Internet.
Indeed, AI systems based on machine learning, especially complex systems with a general purpose, need in principle to learn from huge amounts of data in order to achieve the best possible results (1). For example, it seems that the natural language processing model “GPT” developed by OpenAI has been trained on massive amounts of publicly available data on the web – including data from the Common Crawl organization (2) .
This mass use of data – including personal data – poses obvious difficulties, particularly in terms of prior information of data subjects and determining the appropriate legal basis.
- Downstream of the use of AI systems: AI systems also receive personal data from their users and, in some cases, use it both to perform the task requested of them and to perfect their training. This is notably the case of natural language processing models exploited in the form of a chatbot such as ChatGPT (OpenAI/Microsoft) or Bard (Google).
The protection of the personal data shared by the users of these systems must therefore be ensured by their designers. As an example, the Italian data authority recently required OpenAI to integrate the possibility for ChatGPT users to object to the reuse of their data for training the language model (3).
- The CNIL will also address more global issues related to the protection against bias and discrimination that can be generated by AI systems, as well as to their security.
Protection against biases and discriminations is a recurrent issue for AI systems using real world data for their training, as they may thus pick up all or part of the biases and discriminations found there. This is one of the reasons why the training phase of these models is very important. One of the current solutions to reduce the risks is that of “reinforcement learning from human feedback” (RLHF) in which users/testers spend many hours refining the machine learning, including by detecting and correcting the biases and discriminations that may appear.
- “Enabling and framing the development of AI“: In the context of this second theme, the CNIL affirms its willingness to support the actors of the field through various thematic publications.
First, the CNIL recalls that it has already published several guidance on AI in 2022 (4), as well as a position specifically dedicated to the use of so-called “augmented” video cameras (5).
The CNIL has also announced that it is working on new topics that should be the subject of upcoming communications. First of all, a guide on “rules applicable to the sharing and reuse of data” should soon be submitted for public consultation. This guide should focus in particular on the reuse of data freely available on the Internet, and thus address one of the most sensitive issues for machine-learning-based AI systems (see above). The CNIL should also publish several new guidelines on specific themes: scientific research, application of the principle of purpose to general purpose AI, rules and good practices for the selection of training data, management of individuals’ rights, etc.
- “Federating and supporting innovative players“: The CNIL’s desire to support will also take the form of the authority’s monitoring of real projects based on AI.
The support of the projects will be carried out in several frameworks. First of all, the CNIL has announced the launch of a call for projects for the 2023 edition of the “sandbox” program. This initiative of the French authority has existed since 2021 and is renewed every year. The selected projects benefit from a specific support by the CNIL teams for a determined period. In 2021 and 2022, some of the selected projects were already based on AI. According to the CNIL, the 2023 edition will notably concern the use of AI in the public sector.
More generally, in February 2023, the CNIL launched a new scheme called “reinforced support”, aimed at digital companies with strong economic development or innovation potential. This device is therefore naturally intended to include projects around AI. Finally, specific support will be provided by the CNIL to providers of “augmented” video cameras for the 2024 Olympic and Paralympic Games.
- “Auditing and controlling AI systems“: The CNIL announced that part of its control actions in 2023 will focus on topics related to artificial intelligence.
First, the authority indicates that it will ensure the “respect” of its position on the use of “enhanced” video cameras published in 2022. It is worth noting the wording used by the authority, since it is a position published on its website and is not, in principle, legally binding.
The controls will also focus on the fraud-prevention systems (e.g., social insurance fraud), in view of the challenges linked to the use of artificial intelligence algorithms for this type of processing. Indeed, fraud-prevention processing often involves the collection of a large volume of data from heterogeneous sources (purchases made, activity on social networks, etc.) and can therefore be particularly intrusive for the data subjects.
Finally, the CNIL will continue to investigate complaints about AI-based tools. Among these, the authority confirms that it has received complaints against the company OpenAI, which operates ChatGPT, and has opened an investigation against it. This investigation is conducted in parallel with the task force dedicated to ChatGPT which was created within the EDPB to ensure a coordinated approach. This confirms that the recent withdraw of the prohibition order issued by the Italian data protection authority against OpenAI does not mean that its ChatGPT tool is considered fully compliant with the GDPR.
Sylvain NAILLAT
(1) Some researchers warn, however, of the dangers, from a scientific and ethical point of view, of building ever larger models: https://dl.acm.org/doi/10.1145/3442188.3445922
(2) https://medium.com/@dlaytonj2/chatgpt-show-me-the-data-sources-11e9433d57e8
(3) https://www.garanteprivacy.it/home/docweb/-/docweb-display/docweb/9881490
(4) https://www.cnil.fr/fr/intelligence-artificielle-ia
(5) https://www.cnil.fr/fr/deploiement-de-cameras-augmentees-dans-les-espaces-publics-la-cnil-publie-sa-position