Welcome to the LSE events podcast by the London School of Economics and Political Science. Get ready to hear from some of the most influential international figures in the social sciences. Good afternoon everyone and welcome to LSE for today's event which is part of the LSE Festival Visions for the Future.
My name is Cosmina Dorobantu and I'm a senior advisor at the LSE Data Science Institute. I'm very pleased to be here to welcome Sarah Giannoletti, Laura Gilbert and Helen Margits to both our online audience and our audience in the room with us today.
Sarah is Associate Professor in the Department of Statistics at LSE and the Programme Director for the Health Data Science MSc. Laura is a Senior Director of AI at the Tony Blair Institute for Global Change and an Expert Advisor in AI to the UK government. And Helen is Professor of Society and the Internet at the University of Oxford and a visiting professor here at the LSE Data Science Institute.
I took some brief housekeeping notes for everyone here before we get started. For the social media users in the audience, the hashtag for today's event is #LSEFestival. Please, please put your phones on silent so as not to disturb the event. We will be exploring three key areas today: the opportunities that government data presents,
the very real challenges of working with it, and what the future holds for its use in policymaking and academic research. Given the focus of today's discussion, I'm particularly pleased to have these three panelists because they represent the full spectrum of experience with government data, from the systems that collect it to the teams that use it within government and to the researchers who unlock insights for the public benefit.
Helen brings decades of experience in digital government and understands better than most the legacy systems that underpin the data collection efforts across Whitehall. Laura has the rare distinction of having both and led a data science team at the very heart of government in 10 Downing Street during the COVID-19 pandemic.
And Sarah represents the academic community that relies on public sector data to generate new insights for the benefit of us all. So I'm looking forward to this panel discussion and there will be an opportunity for all of you at the end to, and our online audience, to ask questions.
But look, before we dive into the very real challenges of working with government data, I want to start exploring the sort of remarkable opportunity that it holds. And when you usually ask people who holds massive amounts of data, they tend to answer the big tech companies, you know, places like Google and Meta and Amazon.
But government also has massive amounts of data and in many ways it's more comprehensive than whatever the tech companies hold because it touches virtually every aspect of citizens' lives. So with that, Helen, I was wondering if you can start us off by telling us what data the government holds and collects. Thank you and it's lovely to be here.
It's especially nice to talk about this at the LSE because I wrote my PhD at the LSE a very long time ago, precisely about the kind of things that we're going to be talking about today. Government collects data really as a byproduct of its interaction with citizens. If you pay a tax or you receive a benefit or you apply for a licence or a passport or a permission, you have to give data to government.
And in that sense, government's in a kind of privileged position because for you to get those basic or to do those basic kind of functions of being a citizen, government is in a privileged position to be able to take data from you. In that sense, government's in like a sort of watchtower or watermill at the intersection of society's information flows.
That means that government collects data about really virtually everybody in the population that it governs. Now that still might not seem like big data in the absolute sense. It's not big compared with the millions of people that Meta or Google hold data about.
but it is big in the big data sense, in the sort of whole population sense, because it's everybody who pays taxes in that country. It's everybody who receives debt benefits or has received benefits. It's everybody who holds a passport. And that means that it's very exciting data, because if you wanted to improve the way that you deliver people with... you give people passports or collect people's taxes or deliver benefits,
then that data could help you to improve the way that you do that. Or it could help you to make policy about who should have passports because it allows you to know who's got passports and how many passports are allocated and so on, to take a very simplistic example. So it could help you to make better policy making and I'm sure we'll talk about that
I'm not saying that the sort of Hellician picture I've painted is how it actually happens,
But those are the possibilities for government data to be able to really improve everything that government does to make it better through the analysis of data. No, thank you for that. And Laura, you're one of the people who managed to use some of that data, arguably during one of the most critical periods in recent history during the pandemic. Can you tell us about some of the projects that you've run and some of the impact that they've had?
Yeah, so I joined Downing Street in September 2020, so about six months into the pandemic, and I was the first, and as it stands, last Director of Data Science. I left there in January. The data science team is still going strong, but sort of not with the same level of seniority, sadly. Government certainly does have a lot of data. Unfortunately, government isn't one entity.
sort of there's a really good legal position by which it ought to be but it's very much not so that data is available in different pockets across the board and you know I feel very strongly that there should be a much better strategy and more importantly practice of connecting that up because at the moment we do have all these information about people their lives their practices what they need and what goes wrong
And it's a little bit like if you were a doctor and you had loads and loads of scans, you could have a look and see who's got cancer, but you just don't bother. You know, it's not a great way to sort of live. And certainly that was what we experienced. And it is sort of across the board. I used to run a medical tech company. I could get, without too much trouble, people's health information in a way that you cannot get it in Downing Street.
There's maybe good reasons for that, but it's not a connected system. And even quite basic data about government performance, a lot of the four years, four and a half years I was there, we spent going into departments and, in some cases slightly surreptitiously, wiring APIs in so that data would flow automatically. Because before that, we'd have to ask them for the data. And if they wanted to send it to us, they'd send an Excel spreadsheet. And if they didn't, the email would get lost automatically.
So, you know, we went from a position where we had data that's coming in sporadically on Excel spreadsheets. In one particularly famous case, there was a regular data feed which was a screenshot of a table. You know, you sort of have to transcribe it. It was absolutely wild. And particularly when you wanted to figure out, say, if programs were being delivered well and somebody else didn't perhaps want you to know that, it could be a real battle.
So things have improved a lot, but I mean the pandemic was a particularly interesting one. I remember a very frustrating experience I had. So I wasn't there very long. It was in November the second lockdown hit.
And we were concerned, and I was particularly concerned, because we'd seen statistics from the first lockdown about GPs. And the GPs had closed, and there'd been a lot of illness and death as a result of people not being able to get appointments with their GPs. So for the second lockdown, there was an order that the GPs had to stay open. But when you went on Twitter, you saw a lot of people complaining that their GPs off on the golf course, they can't get an appointment, they couldn't get in.
And so I thought, well, what we'll do is we'll figure out if this is true or not. So I went to the NHS and said, I'd like a snapshot of, you know, for the last week,
all the GP appointments, I want number of appointments per GP and I want the same information from this time last year, just a snapshot. And then we'll figure out which ones roughly to level, which ones are really different. And if they're a lot lower, maybe there's a good reason. But someone can phone them and ask what they're up to. So we can have an actual quite manual intervention here. And I thought it was a great idea. So right, can I have this date from there? Oh, yeah, absolutely.
Yeah, well, that's no problem. We'll get you the data. And so I waited a week and I emailed them and I waited two weeks and I emailed them again and I called them and then I got somebody else to call them and one of the special advisors called them. We sort of went around this cycle for a bit and then they provided a table and the table went, right, uh,
In the southeast there were this many appointments and they said, "Look, it's the same as last year." And I said, "Well, we don't know that because it might be that some GPs are working really hard to catch up. Sure, your average is the same. Maybe others still aren't working. I want the actual GP level data." So around that, a number of other occasions, it transpired they didn't collect the data. They hadn't told us.
So they said, "Well, we just don't get GP-level appointment data." And they said, "Anyway, it won't be useful because a lot of GPs are doing phone appointments. Sometimes they just book one appointment slot, then phone loads of people won't help you at all." Well, I looked at them and I went, "What are you aggregating?"
like how have you given me southeast data if you don't know any of the individual pieces of information that are going into it so um it's it's a very interesting system we you're quite right we do have all this data and we have with that the potential of course to do harm and the potential to provide much much better services and much better analysis on what would work
and a lot of it is wasted by not being a good settlement. It's become a lot more automated, but I would say we do spend a lot of time on the data science side, using proxy information or using publicly available data to fill in gaps.
So I sound very pessimistic. I think government is getting a lot better at this. And the more people become used to the data feeds, and the more particularly that being open with your data is not used as a big stick to hit people, but it's used collectively to help solve problems, which is very much what we were trying to do. And I think they continue to try and do. I think this is improving.
But yes, tales from the dark side. Yeah, yeah, yeah. No, no, no. Thank you very much for those tales. I don't think a lot of us here actually know them. Well, how do you give me aggregated statistics when you don't have the others? I mean, I can imagine. I think I know who holds GP appointment data and it's not government, so I can see why they wouldn't give it to you. Is there an example of a project where you actually managed to get the data and you were able to do something with it?
A lot, to be fair, a lot. So what the team did is if the Prime Minister was going to make a decision, we would get all the data we could make available. We'd usually do a predicted model, say, you know, if you chose this, then we think this, and if you chose that, something else. If you did it in combination, we could expect that. And generally, if it was particularly a new policy, departments would be very helpful because they want to do something right. So there were a lot of decisions around everything from sort of the...
aerated concrete through to various transport decisions, education policy. Almost actually almost every major decision has a data science dashboard behind at the moment so it's very very common it's just it was quite difficult to get off the ground. A lot of that was about people as well so when I first arrived they were actually looking at the HS2 and so we went and we got a
we've got data, we've got models, we can really help with this. It was beautiful, it's a beautiful model. And it's one of the first things we did, and we kind of bounced up to the policy makers and said, we've got some lovely data for you, would you like it? And they went, no, go away. LAUGHTER
So actually, what I learned very early on is it's not even the data that's the problem. It's actually not in an unkind way, but it's people that are the problem. Because what you're doing is you're working with people who are very busy, very stressed. They're expert in their fields, and therefore they don't think they need data because they think they already know the answer.
And a lot of what we were doing was around the behavioural science of trying to put people into a mindset where they would be receptive to information, perhaps that conflicted with what they thought was going to happen. And we were very successful at that, actually. So, yeah. And then after that, we had, of course, the incubator for AI, and that has taken things like
We've got a lot of data on prescriptions, actually largely not from government sources, and have been able to use that to build an AI pharmacist that tries to stop people dying. We think in the UK, academics tell us that about 22,000 people a year die from bad prescriptions, and it costs about a billion pounds. So there were solutions in the AI space that were very impactful.
I think we don't often think of the scale of the government's work and operations. Sometimes I like to look at the number of budget lines the government has. There are more than 50,000. When you think about that, behind every single budget line there is a policy decision. And what was the policy decision based on?
Sarah, I want to come to you as well, because you're an academic that works with public sector data. And I was wondering if you can tell us why it is so important for the government to collect this data and what makes this data valuable for research? Well, I'm a social statistician, so I am interested in understanding how government works, how policies impact people, whether they're doing the right thing. And so the fact that administrative data is becoming available now-- and this is actually quite recent. There's an initiative called ADR UK,
and they're an initiative of the Economic and Social Research Council and they're bringing all of this admin data to researchers, making it available to us and also in some cases linking data sets with something that you were mentioning and I'm working on actually one of those data sets. So for a statistician this data is really exciting, I mean you know there's all this information, there's millions and millions of people and anybody who's lived and worked in the UK is probably in one of those data sets at least once if not multiple times and
it spans all of the United Kingdom countries and it's almost every sector, you know, social care, education, children, health. It's really, really amazing. And so I've been working on some of these data sets and particularly with collaborators in Leeds and Edinburgh and Brunelian universities using the Ministry of Justice, Crown Court trial data sets, and we've been able to uncover some really interesting things. For example, we've been able to see that
There is a difference in sentencing rates for people who are from less affluent neighborhoods, for example. And judges are more lenient to people from more affluent backgrounds. And also on ethnic disparities, we've observed that
Drugs offences is where disparities seem to concentrate and there don't seem to be any disparities across other kinds of offences. And these of course are interesting points which we would like to bring to the Ministry of Justice to see how are you going to deal with these kinds of issues. And we're only just scratching the surface of these data with more results coming out.
And I'm actually working with the Department for Education Ministry of Justice linked data set. So this is like, I think it must be a gargantuan task of linking on the individual level people from the DFE, so school information, attainment information, linking it with MOJ, so people who have been involved in the criminal justice system. So we can follow young offenders all the way through schooling and into their education.
well, not criminal activity, but into their interactions with the criminal justice system. And this can enable us to understand maybe what leads people to these decisions. And they were particularly interested in understanding ethnic disparities in terms of things like being in care and attainment.
And this is just my tiny little pocket of research, right? There are millions, I don't know, millions probably not, but hundreds of people now applying and getting data from the ADR UK, and they span lots of different things. And yesterday I just had a quick look on their website, and I just had a look at what kinds of things are coming out of it. And a few that caught my eye were in Wales, for example, there's a data set where they have found that there is...
disparities in cancer screening rates whereby people of minority ethnicities, younger people and people from deprived areas are not being screened as much. And so this seems like an obvious place where maybe there could be a campaign, you know, where you could improve screening rates and improve prognoses. And then another data set which is a, it's got a bit of a name, it's the annual survey of hours and earnings. So it's basically how much people earn for their work.
And that's shown that there is still a 30% gender pay gap when you look at total weekly earnings over all an employee's jobs. And this information, I mean, I think everybody who works on this kind of data, on this administrative data, what we want is for government to pay attention, to take this evidence and change things for the benefit of people. Yeah.
You mentioned linking data and I think, you know, again, something that's quite particular to this country is that, you know, people don't have their own identifier and this is because the population has been quite against the use of IDs, which makes it incredibly difficult actually when you're sitting in government or as a researcher to link up those data sets. It's quite a complex task.
Find that every time I travel abroad people are quite surprised, you know, because most countries, you know You get born with a number that gets assigned to you and that follows you through life. But I
Although that does happen in the UK. Your NHS number is assigned before you're born. So if you're born here, you have two unique identifiers. You have your national insurance and you have your NHS ID. So I think it is a very cerebral problem if we made the choice to unify those two things. And there's not a great deal...
There wouldn't be a lot of data confusion between them. The matching case would be quite niche. It would be very doable. It's more a political thing where quite often the political class would like to do that and the public strongly against it, as you've said.
I mean, I think it doesn't even, maybe it's necessary, I think there are potentially statistical methods that can overcome direct linking of individuals. It's just a matter of the willingness of different groups. And I think there is willingness. It seems to me that people are very helpful on the ADR UK side. They're quite helpful, even if it takes time.
We talked a little bit about the challenges of working with this data and Helen, your research covers the sometimes troubled relationship between government and technology. And I was wondering if you can give us a brief history of how government technologies choices evolved and more importantly, how the decisions that were made decades ago continue to shape and sometimes constrain what's possible today.
Sure, well it is a troubled relationship, but kind of it all started so well. Because in the 1950s and 60s after the Second World War when large-scale computer systems started to enter government in the US and the UK, they really, government was kind of leader and innovator, even the post office,
which now has a certain amount of fame when it comes to large-scale computer systems, was actually regarded as a sort of leading edge of large-scale computing at the time. It's hard to imagine that now, but I think it's important to remember because it makes you see that it can be done. There's nothing endemic about government that means it absolutely can't develop systems.
What happened after that, progressively, from the 1970s onwards, was that governments, particularly again the US and the UK, and particularly the UK, progressively outsourced or contracted out their systems to particular sorts of companies called systems integrators, companies that promised to sort of take it all off your hands.
And it was a very unfortunate sort of coinciding of events when government had been struggling with these – started to struggle with systems and – as the – as the technology developed. And at time, the dominant kind of – any of you public administration scholars in the audience – the time of new public management, when competition was the word and privatization of government functions,
And some people who worked in government, not obviously the ones who had caused government to be a leader in any sense, but other policy makers and bureaucrats kind of seized on the opportunity to not get too involved in technology of any kind because they didn't really know anything about it and they didn't want to get their hands dirty.
So these contracts to systems integrators became bigger and bigger until they encompassed whole departments would kind of in a whole scale way outsource their systems. And this created, I mean as Laura pointed out, I painted this sort of helician picture of government data but governments in all sorts of bits.
But now they were in more bits because they were in these very large-scale relationships. And some of those contracts, the contract for the tax computer systems, for example, was the largest at the time, the largest contract in the world. Government started not having expertise anymore to kind of manage these contracts.
I mean, for example, when I was doing my PhD here at LSE, I went to talk to somebody in the Treasury Expenditure Division, as it was at the time, that was overseeing that huge tax contract. And I was asking the head of that expenditure division, "So, you know, have you got a lot of expertise?" I said, "What do you think of the new contract for tax, the tax computer system?"
"Well, it's very big, isn't it?" And I said, "Yeah, it's really big, but what do you think about the provider at the time?" And he said, "Well, it's very new, isn't it? It's very modern." I said, "Yes, but have you got any sort of expertise here in the expenditure division to sort of oversee this huge 2.6 billion?" Nothing, of course, but it was a lot of money then. And he said,
"Not really. I do have someone in my team who knows something about computers, but that's a coincidence." He went to a lot of trouble to tell me that he didn't really know anything about it. That was a huge problem and that's the kind of story in part, not of course the whole story, but of how the relationship between the post office and their accounting systems went so horribly wrong.
because they weren't really in control of their own data. And in fact, in the court case over the Ryerson scandal, brought by postmasters and postmistresses, there was the challenge that the post office, there was no gold standard data in the accounting system. There was no kind of ground truth.
And the only organisation that had any control over saying what was the kind of ground truth of the accounting system was Fujitsu, the company that had caused a lot of the issues in the first place. And that situation, you could say, well, it's completely changed now.
But you couldn't really say that. You couldn't really say that at all because first of all a lot of those systems that were built by these companies, so-called legacy systems, still remain. Some have even, I shudder to think of it, but even some of the ones I wrote my PhD about are still there. These systems do not yield usable data.
Why don't they? Because there's no tradition in government, really, of using transactional data to kind of feed back into services or feed back into policy or achieve the sort of insights that Laura was talking about. There just isn't a tradition. Because what government's famous for, of course, is bureaucracy, barbarian bureaucracy, for the social scientists among you.
And in bureaucracy, the data is held in filing systems, basically, and what Weber called the files. And a filing system, if anybody remembers when we used to use them, it's good for finding one piece of data about somebody. It's quite good, not very good.
But it doesn't yield any data. You can't analyse a filing cabinet. It just doesn't yield data for analysis. And when those first computer systems were built, they allowed sort of mass processing and mass updating. But in terms of data analysis, they really just replicated that situation. They didn't yield data for analysis.
so i suppose i and i'm not trying to make excuses for the people or the people who couldn't provide you with any data or wouldn't provide you with any data but it is the case that those early computer systems which are some of the systems that are being used now did not yield data
And had they yielded data, government would have lacked the expertise to kind of extract that data. Do you mind if I comment on that one? You're entirely right. But I think people need to understand that to some extent that is on purpose. This hasn't happened by accident. There's really, really...
strong negative incentives at play here that cause these sorts of things. And you see them in the way the entire institutions run and they're the reason that procurement is done badly. And to some extent, the reason that we don't have these technical experts, we should do is have somebody who knows how they would build it themselves, who can then hold to account the company that's been tasked to build it and that understands why you need the data and why you need to stage rollout, et cetera. And what we have instead is we have a system that is still largely populated by journalists
And there's a couple of good examples from others there. I wanted HR data. And so I went to the team that very manages that. And we said, well, we can't get that. And I said, what are you talking about? Well, there's an HR system. And it's been outsourced. And the contract terms don't involve us getting the data back off the service provider. So for our own staff, we couldn't get that data-- not in a usable format. You can see it on a screen. And I sort of said, well, what?
How did that happen? And they sort of explained that, you know, under previous decision making, this system was procured. But really the thinking was, well, if we didn't own the data, we couldn't lose it. And therefore we weren't responsible if something went wrong with it, which is not the way that works. But if you have no idea about everything from GDPR to, you know, data management, you might think it was. I can't be accountable for this going wrong because I don't hold it.
Another example very early on when I came in, I was asked to approve, there was a system that had been built and it was going to a very large consultancy company, it was a very small procurement for them, it was just a million pounds a year. And they wanted to take up to three million pounds because there's going to be a second group of people using this data, so a second tenancy. And they said, well, we want three million to run this second tenancy. So increase the number of people using it from 16 to about 30.
So I went, what do you mean you need another £2 million? He said, well, you know, we need another £2 million. So I said, well, I think I might have a look at it.
And I sort of managed to bully somebody into giving me admin access. And if anyone has done any kind of cloud development, it was an Amazon web service, you know, and it had what we call an S3 bucket, so a place you put the data. And then it had what you call the Lambda pipeline or a few Lambda pipelines, which is just a bit of Python where you can go in and get out the data and then get some out. And that was it.
And then it had a user access and management system. And in AWS, generally what you do is you set up these security groups. You see these people are end users. These people have admin permission. And when new people join, you add them, you take them away. And this system had hundreds of people on it, all with individual just little bits of access, no groupings, no management. And there were people there that had left 18 months ago. And, you know, it wasn't a massive security issue, but it was a badly, badly managed system.
And I said, well, why are we paying one million pounds a year? Yeah, we've got these consultants in and went, all right, explain why you need this extra two million. And they gave us a very long presentation. They said a number of things, including explaining it would take 16 developer days to set up the second tenancy. And I said, have you heard of version control?
And they said, oh, well, of course we have, you know, but, oh, well, you know, there's a few things that need added. There's not really any way to alert people if the system goes down. I mean, it's AWS, you tick a box. I mean, come on.
And then at the end, the last slide was, well, you are our most valuable customers. We don't stand on ceremony. We'll give you the most expensive person available to answer all of your calls. You know, we go for the most senior person. We don't want the most senior person. We want the person that does the work. So binned that contract and went back to the people that had procured it. And when you looked at what was actually going on, they were responsible for, and I won't be too identifying, but an IT infrastructure project.
And they weren't qualified to do that. And they weren't allowed to hire people. They didn't have headcount. They didn't have budget. But what they could do is they could get budget for £3 million to go to this big consultancy company to do this kind of thing that I could have built myself in an afternoon and continue to manage for about five minutes a week.
So, you know, it's not actually that it's happened by mistake. It's the way the system incentives are set up. That person, to do their job, they cannot hire the right people. They're not getting paid much themselves. They can't go out and do it themselves. There's no way to do that. And they can't even hire anyone who can tell them exactly what they should build.
But the one thing they can do is they can get a lot of money, they can give it to a large consultancy, and then they can go, right, I'm just going to sit back now and it's not my problem, exactly as you were saying. And until that changes and the system is rationalised to incentivise people to save money and to deliver a better product, unless there's a positive rather than only a negative incentive, we're never going to change it.
I want to, I'm keeping an eye on the time and I want to give our lovely audience a chance to ask us questions as well. Look, the LSE Festival is Visions for the Future, so I want us to spend a little bit of time talking about the future as well. And the question that I have for all of you actually is, you know, what are your hopes and dreams for the future when it comes to government data and
what is your most optimistic vision of the governance collection and use, and, you know, where could we get to in, say, five years' time? Do you want to go first? Well, yeah, okay, thanks. So, I mean, my dream is for the whole process of policy evaluation to be somehow regulated in some protocol, in the same way that, you know, drug companies can't just throw any old drug onto the market. They have to have this process where you have to evaluate, does it work, does it not work,
the same thing has to happen for policy. It can't just be something, oh, this sounds good, let's do it, or this is what voters want, let's throw this policy out there. So my vision is you take information from...
you know, data, administrative data, just like it's being generated all the time now, you collate it in some way, even in some automated AI way if you want to, and then things that seem to work become candidates for some kind of policy intervention, and then, you know, some kind of prediction happens with simulations, models, you know, something fancy, statistic, but even that fancy doesn't have to be.
And then it gets trialled. It doesn't just get thrown out to the whole population. There's some kind of thought about, okay, so where could we try this out before everybody is affected by it? You know, we, I, on another project we've been working on understanding the impact of universal credit on the mental health of unemployed people and it's just not great. It hasn't done, it hasn't done people favours and they're not better off than on the legacy benefits despite all the spiel.
And so I think it's really important to have, you know, a way of testing, a way of understanding how – whether these things are gonna work before, before you, before you do that and not be led necessarily by what is politically expedient. And then I would like the data get – the data accessing process to also be a bit faster but – Yeah. If we can add one more thing. If we can add one more thing, yes. And no missing data ever. That'd be great too.
I really agree with that. You're sort of talking about the evaluation of policy and government's actually taken quite big strides there. I was briefly, well for about a year and a half, the director of the Evaluation Task Force who are amazing
And since they were set up, I think when they were in 2019, about 8% of major spending government had any evaluation around it. Now it's up to sort of 32% is well evaluated, and we're going up sort of 50% with at least an idea. And it's rising and rising. So that's hopefully realistic for the future. I think what I really want to see that would empower everything you've talked about is just transparency of data.
Hi, I'm interrupting this event to tell you about another awesome LSE podcast that we think you'd enjoy. LSE IQ asks social scientists and other experts to answer one intelligent question. Like, why do people believe in conspiracy theories? Or, can we afford the super rich? Come check us out. Just search for LSE IQ wherever you get your podcasts.
Now, back to the event. Because it's not the case actually that the way the system works is that anyone is obliged to put in place the policy
that the data suggests. And as the data science team, our job was not to say, the data says this and therefore you must do that. It is the job of elected politicians or elected officials to make that call. And we had to accept that sometimes we would go, here's the data showing that this is perhaps not what we would recommend. And they would feel that the political situation was such that they would do that anyway. And that is their job.
and will continue, I think, to be their job. However, if the data is transparent, if almost everything's published, then you don't need to be so much shoving in these small data science teams that are really struggling for a grip in government. Anyone can look at it. And you get very smart people outside government who build models, and maybe a few different models, there's different approaches. You can have a dialogue. You can sort of hold people to account, really. So when...
Maybe politicians have to say, "Yeah, I know the data doesn't say that, but I feel very strongly that this is the right way to go." And we surface all of that. So I think my vision or dream would be huge amounts of data transparency so that we can have an honest conversation and also because trust in government is lower than trust in Facebook marketers.
People are buying stuff on Facebook from people they trust more than government, and a lot of that is because there is no transparency around the decisions, and I think if the data was transparent, we'd see something else, which, of course, is what you work on. Yeah, well, I mean, it's very difficult to get data. I mean, it's very difficult because of this whole privacy thing.
Not just personal data, but as you say, the administrative data. Let's see how everything's going and let's be honest about it, and then we can have a better chance of changing it. Yeah, I'm sure you could aggregate data in ways that you can still get insights without having to have individual data published. Well, it's not just that, but what is being spent where? I'm not even really talking about personal data. I'm just interested in personal data, sorry. I think the administrative data, what are you spending where? What are you looking at what the outcomes look like? I don't know.
Well, yeah, as I said, I did my PhD here at LSE and quite a lot of the things that I said in my PhD are still true. And I would like them to be more wrong now. So I would like to, in five years' time, I'd like to sort of sit here like this. And when I say something like that, Laura doesn't say you're right. She says, no, no, it's not like that anymore.
And one of the things I did write in my thesis, as I said, there's currently transactional data from governments' interaction with citizens does not feed back into service improvement or policymaking. And when I wrote that, I thought it was extraordinary because I'd come from the private sector before I did this PhD and I thought that can't be right.
I kind of checked rather often it was right and I want that to be wrong. I want government to be in a constant state of kind of improvement and response to data of some kind and the insight it can provide. So that's what I want in five years time.
No, that's a great note to end my questions on. We'll open it up to the audience. And if you're online, you can type short questions into the Q&A box, and we'll try to answer as many as possible. Please include your name and affiliation. But yeah, for those of you here, please just raise your hand. And we'll take the--
Three, four at the time. And then we'll sort of answer them. That requires a lot of memory. I hope somebody's taking notes. I'll take notes and I'll remind you. Hello, good afternoon. Thank you so much for this wonderful talk. I used to do my master's at LSE a few years ago. And now I work in finance. So you can imagine data is something that I hear every day. I do have a question. Does it concern you that
private sectors or corporates are really good at collecting data, while as the government, apparently based on this discussion, is sort of lacking in this area. And is there anything you can potentially learn from corporates to perhaps collect better data, more data, or run more thorough analysis, as long as obviously the collection analysis is legal and ethical?
Thank you very much. My question is around if there are any initiatives with respect to kind of proactive service readiness or service delivery. So as an example, we're in the midst of a heat wave. So are there initiatives to think that
taking weather data and understanding that, that if we go above, say, 30 degrees, that will have an influence on how people behave, and demographically that will have impacts in various different areas. So there'll be an exodus of people perhaps to coastal areas, and that will put pressure on trains and roads. We'll have more people with heat stroke who are elderly and perhaps young, so they'll present themselves at A&E. Are there initiatives to try and kind of tie that and be proactive and start to...
be more, instead of just reactive and when citizens and people present themselves at these particular sort of government service points, that we can be ready for those things and it perhaps is incredibly complex, I guess, in terms of marrying up the data. And if I may ask a second question, which was to Dr. Sarah's point about particular groups where perhaps there are gaps in data collection, you mentioned the screening example. Is that a result of
lower trust or lower awareness or just a blind spot for policy makers and data collectors because perhaps they aren't from those communities and so they make generalizations from the data that they do have from the populations that do supply that data historically. - I think there was a question somewhere around here. Yeah, yeah, yeah, there we go.
Well, my question was, you know how you said there's this problem with all of this data which is being handed off to, which is effectively fragmented, it's being given off to these private firms in the hope that it never needs to be touched by government again.
Has the government actually recognized this as a serious problem, and is it taking any concrete steps to do the things we've been talking about here – centralizing the data, making it more easily accessible, and finding sufficient expertise to actually work with it properly? Thank you so much. I promise to take this one, and then we'll answer them, okay? But I have notes.
Thank you for the talk. So I graduated from LSE last year, and I'm now working as a summer intern for the Nokia Bell Lab for AI safety research. And I know Garvin launched the AI Safety Institute, and I know Alan Turing Institute also do some work with AI safety. I just want to know, like, between different organizations, whether you work together for some projects, or you, like, how do you support with the government policy, how do you work together? Thanks.
Super. Thank you. Thank you so, so much for that. All right, so we have a question about concern that private companies are very good at collecting data and what can we learn from that?
Do you want me to take that? Yeah, sure. I also think there's a degree to which is, I mean, there are very few private industries that have the breadth and scope of government. They generally have a business model, they collect information about that, and as you say, if they're a successful company, they have the money to be able to handle it.
In terms of what we can learn from them, yeah, I mean, I think the thing that we should be learning from them, and I have said this within government on a number of occasions and from the outside, the way a company manages both its money and data is very deliberate.
If you are a private company, you may outsource, say, your HR because there's economies of scale, like in HR software, say. Probably not handing away the actual rights to the data, but when it comes to the data that is core to your business, that is your point of difference, the very important data that runs your business and enables you to succeed, you do that internally very well and you commit to that. And that's the part that I would learn from.
great if you need any more answers thank you um the second question is about practice service delivery uh and can we you know can we look at data and sort of understand what will come next and be proactive rather than sort of reactive to well i can give a quick response but i hope that laura's going to give us a really good example i mean i mean that's a really good point i think it does
Look, I mean, there are all sorts of ways that government can use data. The technology has changed and there are ways in which, you know, data science technologies and AI sort of lend themselves more to being able to tackle that kind of question. But I think one of the challenges that particularly remains is this question of
of bringing different sorts of data together from inside and outside government, which is really what you would need to do there. I think we, for example, during...
the pandemic, you know, there was so much emphasis on the health data for understandable reasons, of course, and data from the health system and so on. But there were also complex interactions going on between the economy and the kind of epidemiological situation. And
Far, far less attempts to bring those kind of data together and look at them in the round. And that's what would be needed to happen here, I feel, with your weather data. You'd have to bring lots of agencies together to try and look at what model potentially that situation where it goes to 32.6 degrees or whatever it is today.
and think about the different inter, the complex interrelationships between them.
So that is a kind of extra challenge, but I hope Laura's telling me that someone's doing that. No, it's a good point. I think you do need to consider the system incentives on this. So very hot weather is currently still quite rare, but obviously massively increasing. The rail companies, in my view, should be planning for this sort of thing.
And the fact they choose not to may suggest that the way that we run and operate rail doesn't incentivise them to handle that. And also that not all corporations are getting it right. I mean, we shouldn't romanticise that. Well, you know, they may be getting it wrong, but if they had a very serious business incentive to get it right, I think we'd find that they could, as we did indeed in the pandemic. There is certainly a lot more thought about this. You're right, AI can help because we can automate the sort of analysis of this sort of thing. And we are starting to see that in the health service world.
AI is starting to power a lot of the logistics and supply chain sort of work. And I saw an analysis yesterday adjacent to government which was very powerful around one of our major problems at the moment is the lack of provision of air conditioning in the UK compared to other countries, which is why we get a lot more heat-related deaths than hotter countries. So there's definitely a lot of analysis going into it.
I think as systems become more AI-powered in that management state, we will see better outcomes because it will be clearer to people. But again, until the incentives point along the lines of there being a penalty for
for the people or corporations running, for example, the rail networks, if they are not able to handle weather, and for that penalty not to be borne out by the very significant costs of actually doing anything about it. I wouldn't see it changing, because that's how people are.
Sarah, I was wondering if you might want to address the question about the gaps in data collection. I think there's lots of things going on. There's definitely – I mean, if you look at the kinds of people that maybe aren't peering the data, you know, minorities or people who are, you know, from more deprived areas, there's definitely a lack of trust there. But also I think there's not a particular willingness maybe from the data collectors to follow it up. And, you know, there
There are some other... So we're looking at administrative data. This is kind of record collection. It wasn't designed to be analysed. If you look at other kinds of data sets, there's a whole bunch of cohort studies in the UK and they are designed for research and a lot of effort goes into making sure that you capture...
underrepresented populations that you can address these kinds of things. Whereas admin data, I guess maybe just because it hasn't been intended for research, doesn't have this. And this is something that maybe it should start doing so that we can understand why some people are missing and why they're being missed out on these things. But also, I don't know, perhaps there's not an incentive, maybe not a financial one, but
does government really want to know about people who maybe aren't going to be voting because there's the luck in government so sometimes I think it's a bit malicious also but I don't
I don't think I've ever come across that as a viewpoint. I think generally when there is an analysis in government, the government does have a lot of professional analysts, and I do think they do that professionally. But that, you know, maybe I'm wrong. No, no. I hope that's not right, but I actually don't think it's right. Good, I'm glad. I will believe you. We have the question about the problem with all data being given to private firms, and does government realise this?
So that was the fragmentation things being handed out to private firms. Well, yes and no. So they don't necessarily think that that's a problem per se. And it only really is a problem if you lose control over the data and they gain control of the data. So government working with private firms, although I've often been quite a big opponent of outsourcing, is a very effective way to get things done if that partnership is well established.
and well managed. And where it goes wrong is when you're procuring the wrong thing from the wrong company on the advice of that company on what you need. And somebody will come in with something they've maybe already built, swear blind it's the thing you need, and then it's very expensive, it doesn't actually solve the problem. And I think that there is starting to become a real recognition of that. There is, I know Georgia Golden Cabinet Office, one of the Cabinet Office's ministers, is spearheading a new sort of
way of procurement that's supposed to handle tech procurement specifically and assuming that that goes well, and we certainly hope it does, then we will be looking at much more professional tech procurement which rolls in with it better data management, we very much hope. So I'm cautiously optimistic on that one. I mean again the changing technology offers us some hope here I feel. I mean
when bits of government have had to move their data onto the cloud, in a way they've had to get more sort of up close and personal with it and kind of understand it better perhaps. You have to be able to understand data more to be able to put it onto the cloud. And I think that's one of the ways in which we're seeing more expertise in government rather than this
Let's not have anything to do with it kind of generalist attitude that we we we had before I mean
In the UK in particular, in fact, I used to say this to my students when I worked at UCL on MSc in public policy, I used to say, if you're interested in computer disasters, public sector computer disasters, you've come to the number one country, you're going to have a really, really cool time. And at that period, you know, people, civil servants were running from anything to do with any sort of computer or computer contract or anything. And they just weren't...
wanting even to have things to do with the contract. And the technology has changed in ways which I think work against that. It's still happening, of course. There's still big things like customer relationship management systems, you know, being – a lot of money being spent on them. But there is a sense in which there is more actual willingness to kind of take control and to manage the situation.
And I think we had one final question on the AI Security Institute and the Alan Turing Institute and do they work together?
it's very specific it is quite it is quite specific and i mean the ai or this kind of government you mean government adjacent agencies well i mean it depends on the agency it depends on the humans involved in the agency there's nothing to stop them um it depends on how busy they are as well uh i i think i mean people that work on the turing often are largely engaged with other
research institutions and universities anyway, so they're sort of quite integrated across the system. I wouldn't say there's a huge amount of crossover between, say, Turing and central government, where they're doing quite different things, and the Safety Institute is doing some very specific things in testing models. So we communicate across government with the Safety Institute to, say, get advice, and
But they're hiring AI engineers and there's only really one other team in government, which is the Incubator for AI, that's hiring large numbers of AI engineers. And they both have different things to do and they're busy. So I don't think there's a badly meaning lack of collaboration. It's just that they're doing specific things, I suppose. What do you think? I mean, well, the AI Safety Institute is now called the AI Security Institute. It certainly is. For various complicated reasons. Yeah.
Anyway. I mean, it is also doing quite a bit of work on kind of societal resilience and AI. And some of that is very much in the research domain and it is collaborating with quite a lot of universities and has, the Turing Institute has done quite a bit of work with the AI Safety Institute. So, yeah.
Yeah, I agree. I think they are working together to some extent. I mean, the AI Security Institute also gives out research grants and deliberately builds collaborations with research institutions. But also collaborates with the private sector, and I think that's quite important because it is actually collaborating with the big tech companies and still has model access, has access to frontier models. And I...
You know, I think that's a good thing because there's always been this kind of massive tendency for a kind of us and them and a kind of twin track of academic research and Silicon Valley company research. I think we do need to think about how we can build linkages there because
you know, it's unsustainable to have these two tracks that are not working together. I think that's right, and collaboration outside government, and they collaborate with other governments as well, so they're probably not collaborating a great deal in government, because you're not, you know, there's not a natural docking point other than the advice side, but I think they're a collaborative organisation.
I am being told that we are at time, but it has been a real pleasure and an opportunity for both me and I hope for all of you to hear from our wonderful speakers. Today is the final day of the LSE Festival, where there have been a lot of interesting talks and events. You can catch up with them on YouTube in case you have missed them.
But for now, thank you so much for coming to this event. We promised Laura that she can leave at 6 o'clock sharp, so she's going to dash off. But thank you so much. Please give them a round of applause.
Thank you for listening. You can subscribe to the LSE Events podcast on your favorite podcast app and help other listeners discover us by leaving a review. Visit lse.ac.uk forward slash events to find out what's on next. We hope you join us at another LSE Events soon.