By Juan Mateos-Garcia
Artificial Intelligence algorithms that can extract insights from big and unstructured datasets have been described as an “invention in the methods of invention” that could transform productivity across the economy. But how can researchers and policymakers whose job it is to understand and support invention (and innovation!) make the most of data-driven methods like these?
On the 7th October 2022, ESCoE came together with Mirko Draca from the CAGE Research Centre, the Digital Catapult and Nesta in an event looking at that question. This blog summarises key themes from the presentations and discussions on that day (you can watch the videos here).
Demand pull from policy
It is hard to deny that there is strong demand for policy-relevant evidence about structural change in the economy: policymakers want to identify and support the development and diffusion of high impact, emerging technologies, speed-up the transition to a green economy and increase economic resilience to shocks like the COVID-19 pandemic or the Ukraine war, all while increasing inclusion in terms of who (people, communities and places) participate in production and innovation and benefit from it (an example of this is the levelling up agenda in the UK). This is linked to the emergence of new models for economic (and particularly research and innovation) policy based on “innovation missions” and transformative / sustainability agendas.
All of these policy questions require a detailed and timely understanding of the structure of the economy: what is its industrial and occupational composition, what is its geography, who is involved in different economic activities, what is the structure of production and innovation networks, how do these things change over time and how is this change linked to various factors, and in particular technological innovation?
Unfortunately, it is not easy to give good answers to these questions if one only has access to traditional indicators such as business and employment surveys organised around slow-changing industrial and occupational taxonomies, and aggregate measures of knowledge production like numbers of publications and patents trying to capture the volume of science and technology being produced.
There are three reasons for this:
First, economics is in many ways the story of a race between economic change and our ability to measure it: if change is slow, then we need to update our measurement system – such as industrial and occupational taxonomies – infrequently. This also means that the gap between economic reality and our understanding of the economy grows slowly and most policy decisions are based on a correct perception of the economic situation. If, to the contrary, the economy is changing rapidly, with new industries and skills appearing and others dwindling, then this creates the need to update our measurement system rapidly. Otherwise, we will be making policy decisions on the basis of an outdated view of the economy. Rapid development and diffusion of new technologies like AI and sudden shocks like COVID-19 accelerate change, create disruption and increase demand for faster measurements of a shifting economic structure. As we will argue, new data, data analytics, machine learning and AI can help address this demand.
Second, economic change has a direction as well as a rate. This means that technologies can follow different trajectories: We could have developed electric cars in the early 20th century but instead we focused on combustion engines. Today we are building AI systems based on data and compute intensive deep learning algorithms, but other approaches would be possible, and this might have implications for whether AI technologies augment or displace workers. Policymakers might want to know if different versions of a technology create different impacts on society so that they can fund and support the adoption of those that are more beneficial. But aggregate statistics about scientific and technological production do not tell us what kinds of scientific and technological knowledge is being produced – getting that requires deeper analyses of the content of science and technology.
Third, the economy is becoming more complex: production and innovation involve deep networks of trade and collaboration cutting across organisations, industries and geographies. Mercedes Delgado, one of our keynote speakers at the event, showed research about the growing importance of “supply chain” services in the US economy that supports this idea. While this growing connectivity enables a deeper division of labour and allied efficiency gains as well as increases the volume and variety of ideas that can be recombined innovatively (we will come back to this later), it also increases the risk of systems failures if a “central node” in a production or innovation network fails. COVID-19, the Ukraine war and geopolitical tensions between US and China, for example around chip production, illustrate this situation. This creates increased policy demand for evidence about critical interdependencies in the innovation and their potential impact that are generally not available (with the exception of input-output tables) from official sources.
Supply push in data sources and techniques
Recent years have seen an explosion in data sources and methods that we can use to measure and understand the economy. The bulk of the presentations at the “Modelling an Evolving Economy” event focused on cutting-edge applications of these techniques.
Starting with the data, we have seen an explosion in the volume and types of data that are available and could provide a faster and more accurate understanding of economic change. They include web sources such as for example company descriptions that Nesta have analysed, as part of an ESCoE project, to assess the limitations of official industrial taxonomies and produce alternative, bottom-up ones capturing new industries.
Another important example are online job ads that contain information about demand for different occupations and skills – including emerging ones. Daniel Rock (Wharton) presented new work using millions of online job ads like these to create a map of the job space and analyse its evolution. This is an area where Nesta has also been very active, developing new skills taxonomies and data production systems to collect online job data and make it publicly available.
In addition to data from the “wild web”, there has also been a proliferation of deep, well-curated datasets that can be useful to understand particular domains. Anna Valero (LSE) presented work with colleagues at UCL and Warwick University that uses one of these sources, Beauhurst, to analyse the space of technology entrepreneurship in the UK and what innovation strategies tend to be more successful.
Extracting insights from these data often requires machine learning and AI techniques that transform qualitative, text data, into numerical representations that can be used to make predictions, cluster observations and create indicators. There were many excellent examples of this kind of analysis at the Modelling an Evolving Economy event:
Daniel Rock and colleagues turn their job data into vectors with deep learning algorithms that make it possible to quantify similarities and differences between jobs, and to cluster them. Their analysis shows that the space of jobs has continued expanding in recent years, suggesting that the economy is still creating new types of jobs despite the advent of powerful automation technologies (This also illustrates our earlier point that the economy is becoming more complex, and that we need data sources and methods that can measure, model and track this complexity).
We have adopted similar methods to cluster companies based on similarity in their text descriptions, and through those clusters, characterise sectors in a bottom-up way. In a new ESCoE Discussion Paper, we use the resulting bottom-up taxonomy, containing hundreds of sectors, to create high resolution profiles about the industrial strengths and weaknesses of local economies in the UK that could inform local economic, industrial and innovation policies.
But measuring the economy is not enough – we also want to model it. Policymakers want causal knowledge about the drivers of economic processes (e.g. what policy will increase productivity, or spur the development of a beneficial technology). This requires models that connect key variables while removing the effect of confounders.
New data can provide valuable inputs into these models, as shown by new research about the link between technological exploration and firm-level outcomes that Mirko Draca presented at the event. In this analysis, Draca and his collaborators use topic modelling, another machine learning technique, to quantify the thematic composition of US patents based on their abstracts. They then calculate changes in this composition to create company-level measures of technological exploration: if we see a sudden shift in the types of ideas that a company is patenting, then this suggests that it is exploring a new area. They show that this measure of exploration is linked to higher sales, suggesting that this variable is economically significant.
Matt Clancy (Institute for Progress) presented, on his part, a review of the state-of-the-art in our understanding of innovation as a process of creative recombination of existing ideas. In general, this involves the use of network science, and increasingly, text analysis of publication and patent abstracts to identify novel combinations of ideas and build indices of novelty and disruptiveness that are then incorporated into econometric models of their drivers and impacts. Clancy discussed key insights from this research and its implications for policy.
Some economic processes are too complicated to be modelled using the linear econometric frameworks generally used by economists. In his keynote talk, J Doyne Farmer (INET Oxford / Santa Fe Institute) talked about the potential of agent-based models that are becoming more attractive and powerful as more computational power and detailed micro-data become available. Agent based models create simplified models of agents (a company, a consumer, a worker) and their interactions that can be used to simulate the evolution of complex economic processes and networks and how they are impacted by policy. Farmer provided examples of how these methods are being used to understand and regulate financial markets and the impact of COVID-19 on the economy.
Hyejin Youn (Northwestern/Santa Fe Institute) presented new work that takes a similar approach to model the evolution of science and technology, showing, through a network model that is now being calibrated with patent and publication data, how innovative agents create, exploit and combine ideas to expand the space of knowledge.
Researchers and entrepreneurs are using these new data and insights to build tools to inform policy. The evidence generated by new data and methods is often very rich, detailed and contextual: policymakers in different agencies and locations want to answer different questions, which requires different indicators. A policymaker in a national agency might want to see the national or international picture while someone working in a local body wants to zoom into their own region. Digital, interactive tools can help explore all this information to address a variety of use cases. An example of this are the cluster maps developed by Mercedes Delgado and colleagues at the US Cluster Mapping project, which provide information about the US industrial and geographical innovation.
Economic complexity, an influential programme of work that analyses data about the composition of trade, industrial and inventive activity in different locations to produce indices capturing their sophistication and uniqueness, has generated many interactive tools helping policymakers make sense of their position in product, industrial and knowledge spaces, and inform policies to diversify their economies to make them more productive, competitive and innovative. Viktor Stojkoski(ANITI, University of Toulouse) talked about how this information and tools are being used to answer key economic policy questions such as “what industries to focus on”, “when to implement a policy”, “where to target policies” and “who to work with” in order to create and access the knowledge required to increase economic complexity and realise its economic benefits.
The need for policy intangibles
Decades of research on the economics of technology have taught us that the successful adoption of new technologies – and in particular of transformative general-purpose technologies – require investments in complementary assets, many of which are intangible. This also applies to economic and research and innovation policy-making bodies and agencies that might want to adopt new data sources and methods to improve their understanding of the economy and the impact of their policies.
We touched on these issues through a presentation by Caroline Paunov (OECD) and a closing panel debate with Cosmina Dorobantu (Alan Turing Institute), Grant Fitzner (ONS) and Stian Westlake (Royal Statistical Society) chaired by Rebecca Riley (ESCoE).
Caroline Paunov highlighted that new data and techniques are not without their limitations: they are often unrepresentative, it is hard to guarantee their temporal consistency, and they can be difficult to interpret. This increases the need for comprehensive documentation of data sources and how they are analysed so that policymakers can assess their strengths and weaknesses. Grant Fitzner pointed at the need to build human capital (analytical skills) and capabilities in government so that policymakers can address their own analytical needs and become critical customers of research coming from the outside – the ONS Data Science Campus and ONS’ collaboration with external researchers in this area through ESCoE are two examples of this.
Given their limitations, new data and methods will never be able to replace official sources which, although perhaps less detailed and timely, offer a representative, robust, consistent and interpretable view of the economy required to inform many policy decisions and to benchmark the results of innovative methods. One important finding from the literature in recombinant innovation highlighted by several speakers at the conference is that new ideas are more likely to be impactful when they integrate novelty and tradition – this could also apply to new data and analytical methods which are more likely to be adopted when they are combined with tried, tested and trusted data sources and approaches.
Cosima Dorobantu pointed at the potential of data science and AI to inform policies to increase societal resilience while highlighting the need for investments in organisational capital to spur interdepartmental collaboration and create spaces to explore and experiment with new techniques. Stian Westlake also picked up on the need for institutional innovation and experimentation in government, including through sustained investments in ambitious and transparent initiatives to create new data sources and methods and bring the insights into the heart of economic policymaking.
In summary, the discussions at the Modelling an Evolving Economy event made three things clear to us:
First, policymakers are thirsty for relevant and timely insights about the state and evolution of the economy which can inform more impactful and inclusive policies, and help respond to shocks faster and more effectively. Official data sources and indicators, useful as they are, only address these evidence needs partially.
Second, we are seeing an explosion of data and analytical methods based on data science, machine learning, AI and complexity science that can help address these policy needs. There is much scope to recombine these techniques in creative ways: novel data sources can be analysed using traditional methods (e.g. econometrics) and traditional sources (e.g publications and patents) can be analysed using novel methods (machine learning and complexity science). The outputs from all this work can be disseminated innovatively through interactive visualisations and dashboards. Researchers might want to combine novelty and tradition to help their work gain traction with policy audiences.
Third, there is not a silver bullet that will transform insights from new data and analytics into policy impacts: achieving this will require a host of complementary and sustained interventions to build skills, organisational capital and networks, and to run experiments with new methods and share the results openly so that everyone can learn from them.
We hope that “Modelling an Evolving Economy” provided compelling and useful evidence about the opportunities and challenges in this space, and helped strengthen the links between researchers and policymakers which are required in order to bring data science, machine learning and AI into the mainstream of economic policy.
Read Juan Mateos-Garcia and George Richardson’s Discussion Paper here.
ESCoE blogs are published to further debate. Any views expressed are solely those of the author(s) and so cannot be taken to represent those of the ESCoE, its partner institutions or the Office for National Statistics.