Classifying STEM and Creative Occupations Using Online Job Ads


Classifying STEM and Creative Occupations Using Online Job Ads

By Hasan Bakhshi and Antonio Lima

Over the past year, through the Economic Statistics Centre of Excellence (ESCoE), Nesta has developed a twin classification for the UK workforce of first, occupations and second, skills (forthcoming), using detailed information contained in millions of job ads. By linking for the first time in a timely way UK occupations and skills to what employers say they need, we think these taxonomies can lead to more targeted policies on workforce development.

Policymakers have prioritised STEM and creativity but struggle to define them

In our latest ESCoE Discussion Paper, we show how the job ads data can be used to classify occupations in two areas of great interest to policymakers: STEM occupations and creative occupations. The UK government’s Industrial Strategy has identified STEM and creativity as priority areas for skills policy, but identifying which occupations are STEM and creative is notoriously difficult. Approaches based on manually assessing occupation codes in the Standard Occupational Classification (SOC) are problematic in this regard, as the SOC is not well suited for understanding skills. This is because the SOC groups together within codes occupations with similar skills levels as opposed to skills specialisation, meaning that occupations with similar skill sets to those judged ‘STEM’ or ‘creative’ by sponsoring government departments may be hidden in other occupation codes and codes judged ‘STEM’ or ‘creative’ may include ‘non-STEM’ or ‘non-creative’ occupations.

We use machine learning methods and skills specified in job ads to classify occupations

In our paper, we use a supervised machine learning method to classify jobs as either ‘STEM’ or ‘non-STEM’ and ‘creative’ or ‘non-creative’, based on the list of skills and other requirements specified by employers in job ads, each of which we have also assigned SOC codes. A preliminary step is to characterise every occupation in the SOC in terms of its skills make-up: this allows us to characterise each occupation as a mathematical object that can be compared with occupations. We train the classifier by assigning the labels ‘creative’ or ‘non-creative’ to all the job ads in our data base according to the list of SOC codes deemed to be creative in the Department of Digital, Culture, Media and Sport’s official statistics, and the labels ‘STEM’ or ‘non-STEM’, according to a definition which is also used in government. That is, a job ad that belongs to a ‘creative’ SOC code is automatically labelled as ‘creative’ for the purposes of model training, regardless of its skills content, and a job ad is labelled as ‘STEM’ if it belongs to a ‘STEM’ occupation, independently of its skills content. These labelled job ads then serve as ‘ground truth’ to train the two classifiers, which independently decide whether or not the job is creative/STEM. In other words, the classifiers base the labels on the skills content of the job, not on its SOC code.

Our paper adds to the growing body of evidence that online job ads are a timely and rich source of information on occupational requirements – in this case STEM-ness and creativity – for policymakers.

ESCoE blogs are published to further debate.  Any views expressed are solely those of the author(s) and so cannot be taken to represent those of the ESCoE, its partner institutions or the Office for National Statistics.

About the authors

Antonio Lima

Research Projects

Related publications