By Richard Dorsett and Jessica Hug
The use of administrative data to produce labour market statistics is attractive for several reasons. Such data require no collection costs beyond those incurred for their administrative purpose. They are typically available at scale, and often at high frequency, and they are not subject to selective non-response nor recall error among respondents. In our new ESCoE Discussion Paper published today, we explore the potential to use UK tax records to construct novel statistics on individuals’ payrolled employment and earnings transitions. These findings are of substantive interest in their own right. However, the primary purpose of the paper is to illustrate the potential of administrative data as a basis for producing labour market statistics and, in so doing, to argue the case for their greater use.
The Office for National Statistics (ONS) regularly publishes a joint report with HM Revenue and Customs (HMRC) providing labour market statistics derived from earnings data collected under the Pay As You Earn (PAYE) Real Time Information (RTI) system. These experimental statistics complement the more established survey-based statistics. Our paper focuses on the potential to use PAYE RTI data to produce novel statistics on labour market transitions. This is an area where administrative data have definite strengths, allowing near-costless tracking of individuals over an extended period of time.
Some new statistics
Since April 2014, all employers have been required to send information about tax and other deductions under the PAYE system to HMRC every time an employee is paid. The analysis in this paper is based on these data and covers the population of employee jobs for the tax years 2014/15 to 2017/18. The raw data required careful cleaning prior to analysis, and the business rules used to achieve this are described in detail in the paper. We focus in this blog on exemplar statistics to give a taste of what is possible with the resulting data.
Take, for example, the figure below. This shows the proportion of employees in April 2014 or April 2015 who are still in employed 1, 3, 12 or 24 months later. While beyond the scope of this paper, there is no reason in future why longer-term retention proportions cannot be considered. The figure shows variation in these proportions at the level of the travel-to-work area (TTWA), something that is only possible because of the large size of the data. Important to note is that, since self-employment is not observed, the reported transitions are more accurately described as movements out of payrolled employment.
Our paper includes many more examples, some of which further drill down into the data to achieve more nuanced insights. We show how this retention varies with age, gender, earnings level and industry. We can even combine dimensions to show, for example, how the TTWA retention patterns differ by gender.
The PAYE RTI data allow identification of jobs as well as employees. Not only can this support the development of new statistics on labour market dynamism as reflected in job-to-job transitions, it can be informative of employers themselves. As an illustration of this, the figure below which shows job duration by employer size is made possible by seeing in the data how many employees are associated with each employer. This is U-shaped – middle-sized firms (with 51 to 100 employees) have the shortest job durations, with 57% of jobs ongoing 24 months later compared to 63 and 64% among the largest and smallest categories, respectively. The paper goes further, considering also job-to-job transitions including the tendency for individuals to move between holding different numbers of jobs.
In addition to changes in employment or jobs, the paper presents statistics showing earnings mobility, as captured by changes over time in individuals’ average payrolled weekly earnings. The figure below focuses on the top ten percentiles of April 2015 earners. One year later, the impression is of stability, with the most common outcome to be on the same percentile as before.
Conclusion
The HMRC PAYE RTI data offer the opportunity to deepen our understanding of the labour market through the production of new statistics. Our paper provides many examples but is far from exhaustive. Over time, the longitudinal dimension will naturally extend and will provide a unique resource for the observation and analysis of long-term trends. It is worth noting that this will tend to reduce the limitation of the data being restricted to the employed population only. Over time, the subgroup of the population that is not observed in the data will reduce to those who have never been in payrolled employment.
Linkage to other datasets is particularly exciting. In principle, since jobs are associated with employers, the data provide the basis for a linked employer-employee dataset. Further linkage with, for example, the Inter-Departmental Business Register could develop the data in this direction. Similarly, person-level data could in principle be linked with other individual-level administrative data to enrich worker information. Matching in small area information is also possible. For instance, scraped online vacancy data at the TTWA level would allow the extent of local labour market mismatch to be directly assessed.
There remains a significant role for survey data in the production of labour market statistics. Survey data offer advantages over administrative data in several regards and these are sufficient to ensure the continued relevance of survey-based statistics. Rather than being a substitute for survey data as the basis for producing labour market statistics, administrative data provide a complement.
Read the full ESCoE Discussion paper here.
Richard Dorsett is Professor of Economic Evaluation at the University of Westminster
Jessica Hug is a Research Associate at the Economic Statistics Centre of Excellence
ESCoE blogs are published to further debate. Any views expressed are solely those of the author(s) and so cannot be taken to represent those of the ESCoE, its partner institutions or the Office for National Statistics.