Consumer Price Indices (CPI) have typically been compiled using manually collected data. Web-scraped and point of sale scanner data are becoming more accessible and provide an excellent source of price data from which inflation can be calculated. These sources are known as “alternative” data since the information was collected for purposes outside the remit of statistical agencies and the production of official statistics. Before incorporating these data, the nature and quality must be assessed to avoid biases in measures of inflation. One crucial issue is the detection of outliers. Outliers are data points that differ significantly from other observations and can cause price indices to over- or underestimate true inflation. This project develops methods for detecting and addressing outliers in these new data sources.
Point of sale (POS) scanner data records information for each good that can be uniquely identified by a barcode including the quantity, value and price of a particular product sold at retailer tills. Web-scraping software collects a product description and price directly from retailer websites. As these new data sources become readily available, national statistical institutions have increasingly used these data to supplement their manually and centrally collected prices for the construction of their consumer price statistics.
While outlier detection methodologies are well established for current data, the introduction of these alternative data sources represent a new challenge: it can be difficult to isolate the impact of changes in pricing, features or packaging on sales quantity in the case of scanner data; while the complicated and changeable nature of web page structures can affect the reliability of web scraped data. Furthermore, issues like seasonality, new item codes and missing item codes are more difficult to solve with these alternative data sources than in the case of standard price data.
In this project we take stock of outlier detection methodologies developed and implemented by national statistical institutions that have incorporated scanner or web-scraped data in their production of consumer price indices and develop methods for incorporating these data into the UK CPI.
Project Papers and Presentations
Mao, X. (2021) ‘Applying Machining Learning for Alternative Data Outlier Detection and Deletion’ ESCoE Conference on Economic Measurement 2021 Poster Exhibition, 11-13 May 2021. Poster Presentation.