Matching UK Business Microdata - A Study Using ONS and CBI Business Surveys (ESCoE TR-14)


Business data linkage is a powerful tool to unlock new insights, that are often not possible
using data from one source alone. However, it can be challenging and often requires a
number of decisions to be made on how the linking should be conducted. Such decisions
can affect the match rates and conclusions drawn from the linked data. To provide some
useful information to researchers on the common pitfalls when doing data linkage, and
some potential solutions, we provide an account of a business data linkage exercise. We
link three sources: a survey of businesses conducted by the Confederation of British
Industry (CBI), the FAME dataset of business financial data from Bureau van Dijk, and the
Inter-Departmental Business Register (IDBR). This requires the use of business names
and addresses as linking ‘keys’ which are subject to error and imprecision, resulting in less
than complete matches. We detail a novel solution to choose among ‘multiple matches’
when a propensity-score matching approach is unable to select a definitive match, which
we implemented when linking the CBI data with the IDBR. We report match results, which
are around 50% when linking the CBI survey with FAME, and around 90% when linking the
CBI survey with IDBR. We also report variation by geography, size and time-period. We
then use the IDBR-linked CBI data to match on data from various ONS business surveys,
which typically have match rates of less than 50%, and in some cases far lower. We
conclude with some recommendations for researchers when conducting data linkage.