When searching for the accurate translation of a term or expression, it is crucial to look for the equivalent term in the right context. In case of terms with various meanings (polysemous words or homonyms), it can be misleading but at least annoying to get concordance search results from the wrong topic. For example, researching the legal term “resolution”, and getting results from chemistry-related regulations can easily seem like a waste of time.
As we are not only developers but also active users of Juremy.com, we understand the importance of choosing the right context first hand. So we looked for a way to solve this issue by focusing our searches to specific domains.
In this article we’ll give a concise overview of the EuroVoc thesaurus, its existing practical uses in EU legal and professional translation, but also a challenge we faced while using it to customize searches in Juremy, and how we solved that.
Article contents
What is EuroVoc classification and why was it created?
EuroVoc is the multilingual thesaurus maintained by the Publications Office of the European Union, covering the activities of the EU1. EuroVoc organizes concepts into 21 domains and 127 sub-domains in all 24 official languages of the European Union and in three languages of countries which are candidates for EU accession: Albanian, Macedonian and Serbian. The covered fields relate to EU and parliamentary activities, and encompass both European Union and national points of view.
The original aim of the thesaurus was to provide the information management and dissemination services with a coherent indexing tool for the effective management of their documentary resources2. Today, it enables users to carry out documentary searches using a controlled vocabulary and with the benefit of semantic networks between concepts. Let’s have a closer look at EuroVoc’s semantic network structure:
The 21 domains are different fields of knowledge which are used to categorize concepts. For example, EUROPEAN UNION, LAW, TRADE or TRANSPORT are domains. The domains are further subdivided into 127 sub-domains (also called microthesauri). For example, see the listing of all 7 sub-domains of the the domain TRADE below:
20 TRADE 2006 trade policy 2011 tariff policy 2016 trade 2021 international trade 2026 consumption 2031 marketing 2036 distributive trades
Each such microthesaurus further expands to a tree of concepts (also called terms). See a part of the concept tree under microthesaurus “2026 Consumption” listed below:
2026 consumption consumer (TT) consumer behavior (NT1) consumer motivation (NT2) … consumer protection (NT1) consumer law (NT2) … … consumption … goods and services …
As illustrated above, immediately below a microthesaurus are top terms (TT), and further narrower terms (NT). Narrower terms are indexed by their distance from some reference term. In the example above, NT1 is one step apart from the top term, while NT2 is two steps apart3.
In the 2021 December release of EuroVoc4, out of the 7301 concepts, 545 are top terms and the bulk of the remaining terms are just one or two steps apart from the top term.
Who uses EuroVoc and for which purposes? Why is it useful for linguists?
Indexing large document collections: usage by EU and national organizations
EuroVoc is one of the means for indexing the EUR-Lex documentary collection. It is used by:
-
the European Parliament
-
the Publications Office of the European Union
-
the national and regional parliaments in Europe
-
some national government departments and European organizations, and Documentation Centres.
The assigned descriptors (either domains, microthesauri or concepts) can then either be used for guiding search and retrieval of documents in the collection, also to quickly convey a summary of the document contents for the users.
EUR-Lex advanced document search filter
EuroVoc descriptors are useful for identifying the thematic content of EU legislative documents. It is possible to search EUR-Lex documents by “Theme” in the advanced search page, which will allow users to pick one or more EuroVoc descriptors which are used to describe the content of a document. This is useful rather for document search than terminology lookup.
However, in EUR-Lex advanced search, it is not possible to filter on a complete domain or micro-thesaurus with a single click. If the user wishes to only retrieve documents belonging to a given domain, she would need to tick all top terms in that category (see the image below). Consequently, this use case is more designed for very specific document research in a narrower field.
Domain filtering of IATE terms
IATE (Inter-Active Terminology for Europe) is the EU’s interinstitutional terminology database, administered by the EU Translation Centre. Currently IATE contains 934 thousand entries and around 8 million terms in the 24 official EU languages, and also Latin. The EuroVoc thesaurus serves as the basis for the IATE classification of terms.
The “Expanded search” surface on the IATE website supports domain filtering across all EuroVoc domains and descriptors.
Another use case of applying domain filters to IATE terms is the Term Recognition plugin in the Trados Studio CAT tool. When activated in the Termbase Settings, the IATE Terminology Provider plugin will automatically extract all IATE terms in the active segment of the document and display them in the target language as well. With an additional setting, translators can filter IATE terms by choosing one or more domains in the domain filter page. However, in case of legal language or more general expressions, it is hard to decide which domain(s) should be chosen to find the right terminology. In order not to exclude possibly accurate translations, one feels an urge to tick all boxes. As EuroVoc is used to categorize both EurLex documents and IATE terms, which are coincidentally the corpora on which Juremy offers blazing-fast bilingual concordance search, it is a natural choice for focusing search in Juremy as well.
But as seen above, the problem is that filtering reduces the number of potentially useful query hits, by not surfacing terms which don’t belong to the chosen domain, sub-domain or concept. Now let’s see how Juremy resolves this issue!
How does Juremy surface more relevant results by using the EuroVoc classification?
Focusing the topic is very important in EU institutional and legal translation. The risk of choosing a target language equivalent from the wrong context can be high, particularly for shorter query expressions.
For example, if we would like to find the German equivalent of the term “duty”, first we have to determine which field of knowledge our search term belongs to. If the document topic, or surroundings of the expression suggests a financial context, we should search for equivalents within the EuroVoc domain “Finance”. In this case, the accurate translation would be a variant of “Steuer” or “Zoll” – see the following illustration from Juremy’s new interface:
On the other hand, if the context suggests that the term “duty” is rather related to an industry sector specification, we should be looking for the correct translation in the “Industry” domain. It is always the linguist’s understanding of the translation project which will help choose the right terminology. Specifically in case of this example, “duty” has at least a dozen different meanings depending on the sector in which it is to be interpreted, as illustrated below:
However, there are two main issues which arise when we try to filter topics, or look for the equivalent term only in a given domain or sub-domain.
-
First, it is often ambiguous for a translator to decide which EuroVoc topic to choose when trying to find the most accurate translation of a term. The classification is quite complex, furthermore a given IATE term or EUR-Lex document often belongs to multiple domains.
-
Second, it is top priority for linguists to find as many relevant hits for a term query as possible. This allows them to choose the most accurate term from a larger pool of alternatives.
Why preferences instead of filtering?
What if we added EuroVoc topics to Juremy as filters? As a user, if we didn’t get any matches for our query, we could never be sure if there were any relevant matches that were filtered away. That thought would prompt us to retry the search with the filters turned off. Toggling the filters on and off is at least inefficient, but also a risk that we wouldn’t use topic filtering at all.
To resolve this contradiction, we implemented a user-friendly and efficient way to focus searches: instead of setting strict filters, we can set topic preferences! When preferences are set, Juremy will return preferred results in the first place, but would still fall back to other topics if there aren’t any preferred ones. This way we can’t miss any relevant results.
Currently we support setting topic preferences in terms of EuroVoc domains. Preferring a domain will automatically prefer all IATE entries or EUR-Lex documents tagged with those domains or any of its constituting concepts. Clicking a domain label on any result entry (after enabling the Topic view) will let us set our preference for that domain.
Also, we found it is often easier to tell what domain we likely won’t need. So in addition to positive preferences, we can also set negative ones - domains from which (at least given better options) we would rather not get results. And once we have set up some preferences, they can be fine-tuned further with the topic preferences editor:
A bit about the visuals
As topics are treated more prominently on Juremy’s user interface, we wanted the topic labels to be quickly distinguishable, informative but also not obtrusive. To help that, we got a custom icon set designed representing the 21 EuroVoc domains, and assigned well-differentiable colors for the domains as well.
We also found these objectives to be somewhat contradictory. So we give the decision to our users about their preferred markup style, and provide two customization options:
-
Choice of detail exposed on the label: just the domain icon, or also the domain name, or even showing the specific concept within the domain.
-
Ability to turn off colors. However mesmerizing nice colors are, we wanted an option to keep the topic labels in the background as much as possible.
Here is how you can set your preferences in the new menu:
Select the Topic drop-down menu
Enable the ‘Topic’ checkbox on the right to see topic labels. You can also choose whether Color Coding should be applied to the topic labels, or you keep the black and white format.
Customize your display
You can choose to show: only the domain icon; the icon with domain name; or fully expanded concepts. Then click on any topic label in the results to set your preference for that domain.
Arrange your preferred topics easily
Click the small wrench icon under the search field to open the preferences editor. Add new preferences, or rearrange the existing ones by dragging them between the +/- marked areas.
In contrast to filtering, you don’t loose any results with setting preferences. Your preferred results will just appear on top of the hitlist, but you are still able to see hits from the non-preferred topics as well.
-
Apart from the hierarchical term relations, EuroVoc also contains various links that allow horizontal movement between related terms (RT) even across domains. ↩︎