Identifying drivers of job quality in online job adverts

cube-no-animation-2

Identifying drivers of job quality in online job adverts

By Rosie Oxbury

Previously, ESCoE’s work in partnership with Nesta developed methods for automatically extracting various key information from job adverts, such as skills and locations. An important area of research into worker wellbeing and productivity covers job quality, i.e. the aspects of a job that affect wellbeing. This next piece of work sets out to investigate if any information about job quality can be gleaned from online job adverts? And if so, can this be extracted automatically?

The aspects of a job that contribute to worker wellbeing have been formalised under the umbrella term ‘job quality’ (also referred to as ‘good work’). Most research examining different drivers of job quality rightly addresses this from the employee’s point of view – often via surveys (such as the CIPD Good Work Index), and recently via employee reviews. We were keen to find out if indicators of job quality could be identified in job adverts, and if so, whether these could be extracted automatically.

We have created an open-source codebase that can be used to automatically identify aspects of job quality as they appear in online job adverts. We applied our methods to a random sample of 100,000 job adverts from Nesta’s Open Jobs Observatory (OJO). Our approach is designed to enable researchers to answer questions such as:

  • What dimensions of job quality are on offer in online job adverts?
  • How do different sectors and/or regions compare in terms of adverts offering different dimensions of job quality?

This blog describes how we have arrived at a working definition of ‘job quality’ for our purposes, and how we built a Python package to automatically identify different dimensions of job quality in free text. We also show two examples of our codebase in action: an overview of offered job quality in a random sample of job adverts and a case study investigating offered job quality in job adverts in the early years sector.

This work provides a timely and low-cost signal of trends in different dimensions of job quality. This is useful in identifying the regions and sectors with differences in worker wellbeing.

How we define “job quality”

We took CIPD’s seven dimensions of job quality as our starting point: pay and benefits; contract (elsewhere called terms of employment); work-life balance; job design and the nature of work; relationships at work; employee voice; health and wellbeing. We also added an additional category, ‘barriers to access’, to our taxonomy, so that dimensions of job quality that directly impact marginalised groups might be gathered together. We made one further addition, “atmosphere, culture and environment”, which fits under “Social support and cohesion” and which we took from a related ESCoE project.

We conducted an initial analysis to assess (a) which, if any, dimensions of job quality might manifest in job adverts, and (b) what language would be used to express these.

  • The higher level dimension of job quality, eg “Job design and nature of work”.
  • Sub-categories within that were taken from the CIPD Good Work Index 2023 and Measuring Good Work, eg “career progression”, “learning and development”, “sense of purpose”.
  • The most common phrases that we saw in job adverts that related to these sub-categories. For example, for the sub-category “learning and development”, we included the phrases “CPD” (Continuous Professional Development), “learning and development”, “training”.
  • We designed our taxonomy to capture anything that could relate to job quality, either positively or negatively. So, for example, it includes the dimension HOURS which is intended to identify any information about working hours, and is agnostic to over- or under-employment.

Automatically extracting dimensions of job quality from job adverts

The pipeline we developed for extracting dimensions of job quality has two basic steps:

This part of the pipeline was necessary because of the difficulty of distinguishing some common job benefits from job requirements. For example, the theme of “training” comes up in both “You will be responsible for delivering training” and “You will receive opportunities for further training”, but one is a job requirement and one is a job quality measure. We built a classifier to distinguish between these kinds of sentences (a) so as to avoid false positives (like the “delivering training” example above) and (b) so that we only map the relevant parts of a job advert to the taxonomy - which overall improves the efficiency of the pipeline. You can read more about this part of the process here.
Once sentences that relate to job quality have been identified, these need to be mapped to the taxonomy in order to identify which dimensions of job quality are present in a particular advert. To achieve this, the sentences are split up into smaller spans of text. Next, these chunks are compared to the key phrases from the taxonomy using natural language processing techniques. Only the phrases with the highest scoring matches are returned.

A worked example of how the complete pipeline works, including both the classification and the mapping steps, is represented in the graphic below. You can read more about this part of the process here.

Which dimensions can we extract?

The best performance is achieved on:

  • Hours
  • Compensation
  • Perks
  • Leave
  • L&D
  • Career progression

We encountered a relatively high number of false positives for flexible hours. When we investigated these errors, we find that phrases such as “more shifts will come” and “shifts with early finish” are mistakenly being matched to “choice of shifts”. Additionally, phrases such as “flexibility to be based [in the office or ‘hybrid’ working from home]” and “working culture and flexible working [(hybrid  working available)]” are matched to “flexible working”. We can see that these should be in the flexible hours by reading ahead, but the algorithm only got a short part of the text.

When a sentence is labelled as being about location or shifts then it is usually correct, but the algorithm misses a lot of these (a lot of false negatives). This seems to be the case for locations because the locations of jobs include lots of place names, which aren’t included in our keyword list – such as “working in Northgate” and “within the Blackpool and surrounding areas”. Often the phrases to do with shifts are classified as flexible hours since “choice of shifts” and “flexible shift patterns” are a flexible hours job quality measure.

Potential avenues to improve performance will be explored in the section ‘limitations and next steps’ below. You can read more about how we evaluated the pipeline here.

Offered job quality in the UK

We applied this methodology to a random sample of job adverts from Nesta’s OJO database in order to investigate offered job quality across sectors in the UK. We find that:

And the amount it is mentioned is negatively correlated with salary across occupations (occupations with higher average pay have a lower proportion of adverts mentioning L&D).
IT, Human Resources, and Marketing. For some sectors, the rates of flexible location in the job advert are consistent regionally, but for some this differs.
Logistics & Transport, Health & Social Care and Hospitality & Catering. It’s worth noting that Hospitality & Catering and Health & Social Care are industries that use a relatively high percentage of zero hours contracts compared to other industries (EMP17: people in employment on zero hours contracts). As a result, the advertised flexibility may not correspond to experienced job quality.

 You can find our full analysis here.

A case study: job quality in the early years sector

To aid Nesta’s A Fairer Start Mission, we applied our algorithm to the specific field of Early Years Careers. With current government plans to increase state-funded childcare places, and the sector already suffering staff shortages, investigating job quality in this sector and determining how it could be improved is an important question for policymakers. We focused our analysis on the related factors of pay, learning & development, and flexible hours.

Early Years Practitioners (EYP) are suggested to be leaving the sector for work in retail (NDNA 2018/2019 Workforce Survey England).

Our analysis so far offers quantitative evidence that:

  • Jobs in retail and hospitality are advertised at a similar level of pay to EYP, while primary/secondary school teachers, supply teachers and special needs teachers enjoy better pay on average. This is consistent across different regions of England.
  • A similar proportion of job adverts in Retail/Hospitality compared to EYP offered flexible hours (~15%).
  • 40% of adverts for Early Years Practitioners offered CPD opportunities.

You can read our analysis in more detail here.

Why does this matter?

This work provides a way to quantify offered job quality in job adverts. We believe that this provides a complementary source of data compared to other research that focuses on measurable outcomes of job quality in terms of worker wellbeing.

Applied to a wider sample of job adverts, this could give researchers and policymakers a way to monitor aspects of the quality of jobs offered and could be used to track offered job quality in target sectors, on a specific dimension, or across the market as whole. For example, it could be used to track offered job quality in private vs. public sector jobs, compare the greenness of the job against the quality of the offer, or monitor trends in hybrid and remote work being offered.

We encourage researchers to apply these methods to their own data, or even retrain and evaluate the pipeline against their own data, and provide guidance on how to do so in the documentation. We have designed our approach to be sufficiently flexible to enable wider use. While the taxonomy that we created is suitable for giving an all-round picture of job quality, the approach could be easily modified to map job adverts to a different taxonomy if desired. This could be particularly convenient if, for a given research project, only one or two dimensions of job quality were in scope, and it was unnecessary to map text to all the different dimensions included in our taxonomy.

What’s next?

Our models generally perform well, however they were trained and validated on a relatively small sample of data. By increasing this with a broader sample, we could improve the performance of our models and ensure that they can perform well across as many different occupations as possible.

The pipeline in its current format could also be enhanced in various ways. Those that we would expect to be most useful would be:

In its current form, the pipeline can identify phrases that are likely to include information related to terms of employment and pay, but the ability to parse this information is not yet included. For example, the pipeline will be able to identify that a sentence like “Regular working hours 9-5 Monday-Friday” contains information about hours, but further processing would be needed to extrapolate how many hours a week this amounts to. We have prototyped some functions to extract such information (see the code behind the analysis of job quality in the early years sector).
For example, by including more phrases we could fix some of the issues distinguishing between flexible hours and shift job quality measures.
For example, it is currently unable to distinguish “Weekend working will be required” from “No weekend working is required” - both will be matched to the “Employment terms” dimension of job quality.

How to apply our methodology

You can find our open source codebase here

You can find our taxonomy of job quality-related keywords here

ESCoE blogs are published to further debate. Any views expressed are solely those of the author(s) and so cannot be taken to represent those of the ESCoE, its partner institutions or the Office for National Statistics.

Research Projects