Extracting Defined Terms from the AI Act with Juremy

We are working on a feature that automatically recognizes and extracts defined terms in 24 languages from EU legislative documents. Let us share with you where we are now, and as a bonus give you a 24-language downloadable term sheet of the very recently published AI Act.

What are defined terms?

EU drafting guidelines

Guideline 14 of the EU Joint Practical Guide for drafting of legislation1 states that

where the terms used in the act are not unambiguous, they should be defined together in a single article at the beginning of the act,

and further explains that

all terms should be given their meaning in everyday or specialized language. For the sake of legal clarity it may, however, be necessary for the act itself to define words it uses.

For example, the Artificial Intelligence Act2 defines that

‘reasonably foreseeable misuse’ means the use of an AI system in a way that is not in accordance with its intended purpose, but which may result from reasonably foreseeable human behaviour or interaction with other systems, including other AI systems.

EU legislation regulates not only legal, but plenty of technical matter too, so you would find a wide range of professional terms from agriculture, construction, chemistry, pharma and other industries as well.

One of our favorite examples is the definition of ‘Two lamps’ from Commission Delegated Regulation (EU) 2015/208, supplementing agricultural vehicle functional safety requirements – warning, geometry class material ahead:

two lamps: a single light-emitting surface in the shape of a band or strip if such band or strip is placed symmetrically in relation to the median longitudinal plane of the vehicle, extends on both sides to within at least 0,4 m of the extreme outer edge of the vehicle, and is not less than 0,8 m in length; the illumination of such surface shall be provided by not less than two light sources placed as close as possible to its ends; the light-emitting surface may be constituted by a number of juxtaposed elements on condition that the projections of the several individual light-emitting surfaces on a transverse plane occupy not less than 60 % of the area of the smallest rectangle circumscribing the projections of the said individual light-emitting surfaces.

Juremy and defined terms

One of our inspirations was the TermCoord TermBases published until recently, which listed topic-specific termbases collected from related legislative acts. Unfortunately the Termcoord website is in maintenance mode for a while, but the listing resources can still be browsed via an archive.org snapshot, the latest of which is from February 2024.3

While these termbases are highly sought-after, compiling them in 24 languages takes tremendous manual effort and can’t be as comprehensive as with the help of automation. Also, extracting the definitions from a document once is not necessarily enough - modifications such as amendments and corrigenda can update existing definitions or add new ones, so following up with changes periodically would also be a necessity.

Challenges in recognizing defined terms

In a legislative act, definitions are often - but not exclusively - found at the beginning of the act, such as Article 2 or 3, or in an annex. Most of the time, it is pretty straightforward: A single term is defined, with the term indicated, and the rest of the definition following. However, there are some complicated cases, for example:

  • inconsistent or missing indication of what exactly the defined term is,
  • multiple terms defined together,
  • a term with synonyms or abbreviation
  • or terms whose definition spans multiple points.

Our approach to overcoming these challenges is to build on Juremy’s existing context-aware document processing pipeline, and to utilize multiple heuristics in recognizing defined terms. For each output entry, we can also signal if the output is likely accurate as-is, or if it needs further manual review and processing.

Examples of challenging definitions from the AI Act

Let’s see how Juremy copes with some of the above mentioned challenges, taking definitions from the AI Act as example.

Multiple synonyms for a term

In the English definition for “deployer” in Article 3, paragraph 4, we see a straightforward structure (even if the definition itself may surprise people with software background4):

‘deployer’ means a natural or legal person, public authority, agency or other body using an AI system under its authority except where the AI system is used in the course of a personal non-professional activity,

but the Czech version

„zavádějícím subjektem“ nebo „subjektem, který zavádí“ fyzická nebo právnická osoba, veřejný orgán, agentura nebo jiný subjekt, které v rámci své pravomoci využívá systém AI, s výjimkou případů, kdy je systém AI využíván při osobní neprofesionální činnosti

defines two alternative terms, approximately “deploying subject” and “subject that deploys”. Juremy recognizes this, and in the output sheet separates the terms with a double-slash.

Definition spanning multiple points

The definition of “widespread infringement” in paragraph 61 spans multiple points and subpoints:

Omitting some parts of the definition might result in incorrect interpretation. Luckily, Juremy recognizes all the related parts, and includes all subpoints in the definition.

If you are interested in upcoming Juremy developments and updates, subscribe to our newsletter below and also receive the extracted terms and their definitions of the AI Act.

The sheet contains all 68 defined terms of the AI Act, one per row, with the terms and their definitions in separate columns, in all 24 official languages of the EU.

We are also curious to learn how you would utilize the capability of extracting defined terms from EU legislation – get in touch with us!

🙐

  1. Joint Practical Guide of the European Parliament, the Council and the Commission for persons involved in the drafting of European Union legislation  ↩︎

  2. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 (Artificial Intelligence Act)  ↩︎

  3. List of TermCoord TermBases, as archived by archive.org, February 2024↩︎

  4. In software engineering, deploying usually refers to the activity of installing and making a service available in some environment, while in the AI Act, it seems to rather refer the fact of using an AI system. Well, never assume, but read the definitions! ↩︎