Measuring job quality using ojd_daps models: Technical blog
10 min read time
Identifying drivers of job quality in online job adverts
By Rosie Oxbury
Previously, ESCoE’s work in partnership with Nesta developed methods for automatically extracting various key information from job adverts, such as skills and locations. An important area of research into worker wellbeing and productivity covers job quality, i.e. the aspects of a job that affect wellbeing. This next piece of work sets out to investigate if any information about job quality can be gleaned from online job adverts? And if so, can this be extracted automatically?
The aspects of a job that contribute to worker wellbeing have been formalised under the umbrella term ‘job quality’ (also referred to as ‘good work’). Most research examining different drivers of job quality rightly addresses this from the employee’s point of view – often via surveys (such as the CIPD Good Work Index), and recently via employee reviews. We were keen to find out if indicators of job quality could be identified in job adverts, and if so, whether these could be extracted automatically.
We have created an open-source codebase that can be used to automatically identify aspects of job quality as they appear in online job adverts. We applied our methods to a random sample of 100,000 job adverts from Nesta’s Open Jobs Observatory (OJO). Our approach is designed to enable researchers to answer questions such as:
This blog describes how we have arrived at a working definition of ‘job quality’ for our purposes, and how we built a Python package to automatically identify different dimensions of job quality in free text. We also show two examples of our codebase in action: an overview of offered job quality in a random sample of job adverts and a case study investigating offered job quality in job adverts in the early years sector.
This work provides a timely and low-cost signal of trends in different dimensions of job quality. This is useful in identifying the regions and sectors with differences in worker wellbeing.
We took CIPD’s seven dimensions of job quality as our starting point: pay and benefits; contract (elsewhere called terms of employment); work-life balance; job design and the nature of work; relationships at work; employee voice; health and wellbeing. We also added an additional category, ‘barriers to access’, to our taxonomy, so that dimensions of job quality that directly impact marginalised groups might be gathered together. We made one further addition, “atmosphere, culture and environment”, which fits under “Social support and cohesion” and which we took from a related ESCoE project.
We conducted an initial analysis to assess (a) which, if any, dimensions of job quality might manifest in job adverts, and (b) what language would be used to express these.
The pipeline we developed for extracting dimensions of job quality has two basic steps:
A worked example of how the complete pipeline works, including both the classification and the mapping steps, is represented in the graphic below. You can read more about this part of the process here.
The best performance is achieved on:
We encountered a relatively high number of false positives for flexible hours. When we investigated these errors, we find that phrases such as “more shifts will come” and “shifts with early finish” are mistakenly being matched to “choice of shifts”. Additionally, phrases such as “flexibility to be based [in the office or ‘hybrid’ working from home]” and “working culture and flexible working [(hybrid working available)]” are matched to “flexible working”. We can see that these should be in the flexible hours by reading ahead, but the algorithm only got a short part of the text.
When a sentence is labelled as being about location or shifts then it is usually correct, but the algorithm misses a lot of these (a lot of false negatives). This seems to be the case for locations because the locations of jobs include lots of place names, which aren’t included in our keyword list – such as “working in Northgate” and “within the Blackpool and surrounding areas”. Often the phrases to do with shifts are classified as flexible hours since “choice of shifts” and “flexible shift patterns” are a flexible hours job quality measure.
Potential avenues to improve performance will be explored in the section ‘limitations and next steps’ below. You can read more about how we evaluated the pipeline here.
Offered job quality in the UK
We applied this methodology to a random sample of job adverts from Nesta’s OJO database in order to investigate offered job quality across sectors in the UK. We find that:
You can find our full analysis here.
To aid Nesta’s A Fairer Start Mission, we applied our algorithm to the specific field of Early Years Careers. With current government plans to increase state-funded childcare places, and the sector already suffering staff shortages, investigating job quality in this sector and determining how it could be improved is an important question for policymakers. We focused our analysis on the related factors of pay, learning & development, and flexible hours.
Early Years Practitioners (EYP) are suggested to be leaving the sector for work in retail (NDNA 2018/2019 Workforce Survey England).
Our analysis so far offers quantitative evidence that:
You can read our analysis in more detail here.
This work provides a way to quantify offered job quality in job adverts. We believe that this provides a complementary source of data compared to other research that focuses on measurable outcomes of job quality in terms of worker wellbeing.
Applied to a wider sample of job adverts, this could give researchers and policymakers a way to monitor aspects of the quality of jobs offered and could be used to track offered job quality in target sectors, on a specific dimension, or across the market as whole. For example, it could be used to track offered job quality in private vs. public sector jobs, compare the greenness of the job against the quality of the offer, or monitor trends in hybrid and remote work being offered.
We encourage researchers to apply these methods to their own data, or even retrain and evaluate the pipeline against their own data, and provide guidance on how to do so in the documentation. We have designed our approach to be sufficiently flexible to enable wider use. While the taxonomy that we created is suitable for giving an all-round picture of job quality, the approach could be easily modified to map job adverts to a different taxonomy if desired. This could be particularly convenient if, for a given research project, only one or two dimensions of job quality were in scope, and it was unnecessary to map text to all the different dimensions included in our taxonomy.
Our models generally perform well, however they were trained and validated on a relatively small sample of data. By increasing this with a broader sample, we could improve the performance of our models and ensure that they can perform well across as many different occupations as possible.
The pipeline in its current format could also be enhanced in various ways. Those that we would expect to be most useful would be:
You can find our open source codebase here
You can find our taxonomy of job quality-related keywords here
ESCoE blogs are published to further debate. Any views expressed are solely those of the author(s) and so cannot be taken to represent those of the ESCoE, its partner institutions or the Office for National Statistics.