Building a skills taxonomy for the UK


Building a skills taxonomy for the UK

By Elizabeth Gallagher, India Kerle, Cath Sleeman and George Richardson

There is no official and fully open skills taxonomy in the UK

There is a really important need for such a taxonomy that would enable consistent conceptualisation of workforce skills, together with consistent terminology and language around skills used by educators, careers advisers, policy makers and employers. The lack of a consistent language has multiple consequences such as creating confusion over the skills required for particular roles or the training needs of employees. At the same time, the effects of COVID-19 and Brexit have triggered rapid changes in skill demands as well as new skill shortages. This shifting landscape has only increased the need for an open and up-to-date skills taxonomy for the UK which could help to provide better quality and up to date information, in turn to better inform policy.

This paper creates a data-driven approach to creating a taxonomy, which could in turn, inform the creation of an official UK skills taxonomy. Nesta has developed a new methodology for creating a ‘data-driven’ skills taxonomy, which we use to identify core (or transversal) skills, shed new light on regional skill differences and detect COVID-19-related changes in skill requirements. ‘Data-driven’ refers to a taxonomy that has been created based on the skills mentioned in UK job adverts. The dataset of job adverts was provided by TextKernel. This work builds on the first data-driven skills taxonomy created by Djumalieva and Sleeman (2018), which you can read about here, by extending their approach in three directions.

  • First, their method relied on a proprietary list of skills that were extracted from adverts by their data provider. In this instance, proprietary meant that the method of extracting skills from adverts was opaque, therefore they could not extract skills themselves and instead relied on a predefined list. In this ESCoE Technical Report published today we develop a new, up to date (fully open) approach for extracting skills. This approach does not rely on a pre-set skill list and, as a result, we can detect previously unseen skills.
  • Second, due to data restrictions, Djumalieva and Sleeman (2018) were unable to release the names of skills publicly that resided in each part of their taxonomy. We do not face these restrictions as we use an alternative dataset, and can therefore publish the entire taxonomy and codebase.
  • Third, as their taxonomy was developed almost four years ago, we can offer a more up-to-date view of UK skill demands. Therefore, the new data-driven approach means that we can better understand: 1) the current state of the UK job market, including regional skills demand, due to updated data; and 2) skill-mismatches as new skills emerge in job adverts due to our fully open method.

A new ‘data-driven’ approach to building a skills taxonomy

Our method uses state-of-the-art natural language processing techniques to extract skills from job adverts, to build a skills taxonomy, and to understand certain skill demand challenges. The high-level methodology is described below:

Figure 1: A high level visual methodology of the new skills taxonomy
  1. Training a supervised machine learning classifier to predict whether or not a sentence within a job advert mentions skills required for the job.
  2. Creating sentence embeddings for the skill sentences using a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model (Devlinet al, 2018).
  3. Extracting skills by reducing and clustering the sentence embeddings using UMAP dimensionality reduction and DBSCAN clustering algorithms.
Figure 2: An overview of the pipeline for steps 2 and 3
Figure 3: An overview of the steps to building the skills taxonomy using clustering

We apply this method to a sample of five million UK job adverts that were posted between 2015 and 2021. Around 7,000 skills are extracted and assembled into a taxonomy that contains 11 skill groups at the top level and 250 at the bottom level. Upon inspection, the automatic grouping is broadly logical and we find that the job titles which most frequently request skills from particular clusters are highly consistent with the names of these clusters (for example, cleaning skills are very frequently asked in job adverts for Cleaners).

The taxonomy is applied to automatically identify a set of transversal skills and two further applications are explored: identifying regional skill differences, and examining changes in skill demands following an exogenous shock, namely the COVID-19 pandemic.

A core set of skills are relevant for many types of roles

We can automatically identify transversal skills relevant for many types of roles using the taxonomy. This is a key departure from many ‘must-have’ skill lists, as most are typically assembled qualitatively. Using a ‘data-driven’ co-occurrence network approach, the most transversal skills identified relate to generic technical skills, interpersonal skills, basic skills and personal qualities. Specifically, the most transferable level C skill groups relate to being a strong communicator, being a team player and developing and managing staff. These automatically identified transversal skills could eventually be supplemented with expert judgement, allowing for an even more robust, mixed-methods approach.

There are regional and situational differences in skill demands

In addition to revealing transversal skills, the taxonomy also reveals regional differences in skill demands and changes in skill demands, following exogenous shocks such as the COVID-19 pandemic.

For the former, the largest difference in regional skills demands were seen in “Manufacturing, engineering and physical skills”, which made up 2.6% of the skills mentioned in London, and 7% in the West Midlands. The demand for “Digital and technology” skills also varied widely, making up 18.3% of all skills demanded in London, but only 14.7% in Wales. The smallest regional differences were seen in demand for skills related to “Health and care”. Notable outliers from the rest of the UK include Northern Ireland, which has a particularly high demand for “Food, cleaning and safety” skills, and the North East, which has a high demand for “Childcare and education” skills.

For the latter application, on COVID-19, there was an increase in demand for health care skills, and a decrease in demand for service industry skills. Both of these findings are consistent with the multiple lockdowns experienced during the COVID-19 pandemic. These results demonstrate how a ‘data-driven’ skills taxonomy can dynamically capture changes in the labour market that stem from exogenous shocks such as the COVID-19 pandemic.

A starting point for an official UK skills taxonomy

The primary use case for a ‘data-driven’ skills taxonomy is to serve as a starting point for an official UK skills taxonomy. We would not recommend adopting the taxonomy without any expert refinement, as not all skills are mentioned in online job adverts, and the skills that are mentioned may not accurately reflect those required for the job. Similarly, not all vacancies are advertised online. However, a ‘data-driven’ taxonomy provides a strong base upon which domain experts could build. Specifically, they could add branches to the taxonomy and check the automatic labelling of skill clusters.

The timing of this paper is apt, as the Government has recently established a Skills and Productivity Board, who appear to be looking to develop such a taxonomy. The skills taxonomy could also complement the new Skills Accelerator Programme, created by the Department for Education (DfE). This programme aims to build partnerships between employer groups, colleges and other providers to address local skills gaps via training.

A new way to detect emerging skills and new skill combinations

Another application of this ‘data-driven’ taxonomy would be using it to identify emerging skill and skill combinations in the UK labour market. The updated skills taxonomy does not rely on a proprietary list of extracted skills and instead is able to predict whether or not a sentence within a job advert contains skills required for the job. This low-judgement approach to identifying skills could accelerate the information pathway from employers, who may need particular skill or skills combinations, to educators, who want to ensure their students acquire practical skills to secure employment.

Next steps

Identifying emerging skills and skill combinations is one of many potential avenues for further analysis. This information could be used to inform new types of training and qualifications. There are also several use cases for the skill sentence machine learning classifier, for example in Nesta’s Open Jobs Observatory. If you have any additional ideas or feedback please feel free to reach out to the Data Analytics team (

Read the full ESCoE Technical Report here.

Elizabeth Gallagher is a Data Scientist, Data Analytics Practice, at Nesta
India Kerle is a Junior Data Scientist, Data Analytics Practice, at Nesta
Cath Sleeman is Head of Data Discovery, Data Analytics Practice, at Nesta
George Richardson is Head of Data Science, Data Analytics Practice, at Nesta

ESCoE blogs are published to further debate. Any views expressed are solely those of the author(s) and so cannot be taken to represent those of the ESCoE, its partner institutions or the Office for National Statistics.

About the authors

George Richardson

Research Projects

Related publications