The first publicly available data-driven skills taxonomy for the UK


The first publicly available data-driven skills taxonomy for the UK

By Jyldyz Djumalieva and Cath Sleeman

In new research published by ESCoE we present the first publicly available data-driven skills taxonomy for the UK. A skills taxonomy provides a consistent way of measuring skill shortages, which are costly and can hamper growth. A taxonomy can also help workers and students learn more about the skills that they need, and the value of those skills. The taxonomy can be explored here, along with several interactive data visualisations.

The cost of skill mismatches

Skill shortages are a major issue in the UK and arise because there are not enough people with particular skills to meet demand. The Open University estimated that skill shortages cost the UK £2bn a year in higher salaries, recruitment costs and temporary staffing bills. They can also significantly hamper growth. According to OECD research, the UK could boost its productivity by 5% if it reduced the level of skill mismatch to OECD best practice levels. And skill mismatches may be set to worsen, owing to both short-term factors such as Brexit, and longer-term trends such as automation.

Building a skills taxonomy for the UK

The first step to measuring shortages is to build a skills taxonomy, which shows the skill groups needed by workers in the UK today. This taxonomy can be used as a framework by which to measure the demand for, and supply of, each skill group. The UK already has well-established taxonomies for defining groups of occupations and industries.

To build the skills taxonomy we began with a list of just over 10,500 unique skills that had been mentioned within the descriptions of 41 million UK job adverts, collected between 2012 and 2017 and provided by Burning Glass Technologies. Machine learning was used to hierarchically cluster the skills. The more frequently two skills appeared in the same advert, the more likely it is that they ended up in the same branch of the taxonomy.

The final taxonomy has a tree-like structure with three layers. The first layer contains 6 clusters of broad skills; these split into 35 clusters, and these in turn split to give 143 clusters of specific skills in the third layer. Each of the approximately 10,500 skills lives within one of these third-layer groups.

Much more than a list of skills

The skills taxonomy provides estimates of the demand for each skill cluster (based on the number of skill mentions within adverts), the change in demand over recent years and the value of each skill cluster (based on advertised salaries). The estimates of demand get us halfway to measuring skill shortages. Most importantly, a user can search the taxonomy by job title, and discover the skills they need for a wide range of jobs.

The value of skills

The taxonomy provides the first set of publicly available estimates for the value of skills in the UK, based on advertised salaries. To date, workers and students have had to decide between these skills without access to this type of information. These values are estimates rather than precise figures as only 61% of adverts mention a salary.

The five skill clusters in the third layer with the highest annual median salaries are:

  1. Data engineering
  2. Securities trading
  3. IT security operations
  4. IT security standards
  5. Mainframe programming

The five clusters with the lowest salaries are:

  1. Premises security
  2. Medical administration
  3. Dental assistance
  4. Office administration
  5. Logistics administration

The benefits of a data-driven taxonomy

Using job adverts to create the taxonomy ensures that it is based on the same ‘skills language’ used by UK employers, rather than the language of academics or policy makers. The other benefit of a data-driven approach relates to maintenance. Several existing skill taxonomies (such as O*Net and ESCO) rely, at least in part, on expert consultation which means that updating the taxonomies can be a long and costly process. In contrast, a data-driven taxonomy is easier to update: the same methodology can be applied to a new set of job adverts.

Limitations to consider

No taxonomy will ever be truly comprehensive, whether it is derived from experts or created from job adverts, and moreover there is no single ‘right way’ to group skills. The most important limitation is that not all work is advertised online. As a result, the demand for skills used predominantly by freelancers and casual workers may be underestimated in the taxonomy.

Putting the skills taxonomy to work

Over the next year we will be showing a range of use cases for the skills taxonomy. This will include estimating skill shortages at a regional level and automatically detecting new and redundant sets of skills. The taxonomy itself will also continue to evolve, as we look to add a fourth layer.

Anyone is welcome to use the taxonomy. If you are interested, please get in touch by emailing

ESCoE blogs are published to further debate.  Any views expressed are solely those of the author(s) and so cannot be taken to represent those of the ESCoE, its partner institutions or the Office for National Statistics.

About the authors

Jyldyz Djumalieva

Research Projects

Related publications