Goodbye job applications, hello dream career
Seize control of your career and design the future you deserve with LW career

The issues in GenAI and machine learning contracts

Generative artificial intelligence and machine learning (ML) systems present contracting challenges that are not always seen in traditional IT contracts, writes Belyndy Rowe.

user iconBelyndy Rowe 08 May 2024 Big Law
expand image

As with all successful tech contracts, the parties entering the contract (and their lawyers) need to understand the technology to ensure they are best protecting their position and accepting risks they can manage.

To do this, we must ensure we understand the way the artificial intelligence (AI) tool is offered to market (usually cloud-based), the inputs and outputs for the AI, the training data, and any intended use of the tool by the parties. We also need to keep on top of the multiple new models and tools entering the market.

This article considers some issues inherent in generative AI and ML contracts and strategies for addressing them effectively.


Data ownership

AI and ML models usually “learn from” large data sets. This often includes prompts and training data provided by customers.

For customers, it is essential to know if the provider will reuse its prompts and data for their other clients. Customers should confirm in the contract any intellectual property (IP) rights they hold in the data, and make sure they understand the scope of the provider’s intended use of the data. The sensitivity of data sets and the presence of any customer or employee personal information that could be accessed by the provider should be considered and addressed in the contract.

Providers of AI tech generally must obtain a licence to use the customer’s prompts and training data. A specific licence should also be sought for any of the provider’s intended use of the customer data that goes beyond the provider’s supply of the services and tech to the customer. Providers should avoid any assignment of the IP in their existing data sets and tech.

Remember, there are various types of data that have value and must be considered: training data (used to train an algorithm or machine learning model), inputs/prompts (data fed into the AI), and outputs (data produced using the AI in response to inputs).

Data use

Some contracts, however, focus too much on data ownership – not considering that data itself cannot always be owned (although the compilation of it can be) and IP ownership alone may not restrict the provider or customer’s data use.

A broader set of data rights must be considered. Data exploitation rights, confidentiality obligations, and periods of use will be significant factors in negotiations. A thorough understanding of the parties’ commercial requirements enhances the likelihood of achieving a mutually beneficial and balanced agreement.

A common example is the customer wants to protect its data, but the provider wants to offer the AI system – which has been trained on the customer data – to its other customers. This could be addressed by the provider, undertaking that the training data provided by the customer will not be disclosed to new customers. The providers may also give the customer comfort via confidentiality undertakings, data protection measures, protocols for data breaches and damages if the provider breaches the contract.


Customers will likely want to restrict which and how many of the provider’s personnel can access the customer’s prompts, training data and outputs. Treating outputs as confidential until their sensitivity is known is advisable. On the other hand, providers should aim to limit confidentiality obligations so they can make wide use of the outputs of their tech and, if required, the customer’s training data and prompts.


Generative AI and ML outputs carry the risk of errors, potentially causing harm to the customers if relied on. Providers can include broad disclaimers to mitigate this liability. Customers should seek assurances regarding the accuracy of the outputs (to the extent required) and other errors that could limit the usefulness of the tech for their purposes – for example, discrimination issues embedded in the model. These assurances can be set out with the other requirements in the service level agreements (SLAs) covering AI functionality.

Third-party risks

There is a risk generative AI outputs could expose customers to third-party actions related to IP infringement. Customers should seek indemnification clauses. At a minimum, IP indemnities concerning software and outputs should be included – although resistance from providers should be expected. Providers will need to include language acknowledging the provider’s limited control over AI-generated outputs.

Understanding the technology and the data involved is critical to negotiating generative AI and machine learning contracts. Parties and their lawyers must understand the data involved and set clear boundaries on its use.

Belyndy Rowe is a senior associate at Bird & Bird.