We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Harnessing AI: safeguarding high-integrity data for climate action

2025/6/24

LSE: Public lectures and events

AI Deep Dive AI Chapters Transcript

People

Amy Fisher

Carmen Nuzzo

David McNeil

Melissa Chapman

Sylvan Lutz

Topics

Carmen Nuzzo: 作为TPI中心执行董事，我认为人工智能在气候行动中扮演着关键角色。我们中心致力于提供独立、严谨的数据，以评估公司、银行和国家在向低碳经济转型方面的进展。然而，在利用人工智能进行数据收集和分析时，必须高度关注数据完整性。我们正在进行一个试点项目，旨在自动化数据收集过程，但同时也强调区分好坏数据的重要性，确保为投资者提供可靠的信息，以便他们能够做出明智的投资决策并与投资实体进行有效互动。人工智能的应用必须审慎，以避免绿色清洗和信息误导，确保数据的质量和可靠性。 Sylvan Lutz: 作为LSE数据科学研究所的成员，我认为人工智能在气候变化领域既带来了机遇，也带来了挑战。一方面，人工智能可以帮助我们更快速、更高效地处理大量数据，从而加速净零转型。另一方面，我们也必须警惕人工智能可能带来的风险，如绿色清洗和环境成本。我们需要构建强大的AI管道，并确保在整个过程中进行严格的分析师验证，以保证数据的质量和可靠性。此外，我们还应该关注人工智能的能源消耗问题，并努力开发更节能的模型。重要的是，我们要认识到人工智能只是一种工具，其最终结果取决于我们如何使用它。我们有责任与人工智能合作，引导我们走向可持续的未来，而不是被市场激励所驱动。

Deep Dive

Chapters

This chapter introduces the LSE event, "Harnessing AI: Safeguarding High-Integrity Data for Climate Action," organized by the Transition Pathway Initiative Centre (TPI Centre) and the LSE Data Science Institute. It sets the stage by highlighting the event's focus on using AI to improve data collection for net-zero transition assessments while addressing concerns about data integrity and potential biases.

Event title: "Harnessing AI: Safeguarding High-Integrity Data for Climate Action"
Organizers: TPI Centre and LSE Data Science Institute
Event is part of London Climate Action Week
Focus on AI's role in accelerating decarbonization and addressing socioeconomic issues while mitigating risks of misinformation and greenwashing

Shownotes Transcript

Translations:

中文

Welcome to the LSE Events podcast by the London School of Economics and Political Science. Get ready to hear from some of the most influential international figures in the social sciences.

I think we're ready to start. Welcome, welcome everyone here in person, in the auditorium or online to today's session which is titled: "Carnessing AI: Safeguarding High-Integrity Data for Climate Action". Today's event is organized by the Transition Pathway Initiative Center or TPI Center for short.

here at the LSE in collaboration with the LSE Data Science Institute or DSI for short and is part of one of the events of London Climate Action Week. My name is Carmen Nuzzo, I am the Executive Director of the Transition Pathway Initiative Centre

The center provides independent, rigorous, and forward-looking research and data into the progress that companies, banks, and sovereigns are making in the transition to a low-carbon economy. And we're joining forces today with the DSI, the Data Science Institute,

which working alongside academic departments and research centers across the school fosters research education and engagement in the study of data science and artificial intelligence AI through a social science lens so

I'll quickly run you through the agenda for today. So the program is, after a short introduction by myself, I'll hand over to my colleague Sylvain Lutz who will explain a bit the reason why we're hosting today's event. The use of artificial intelligence can be examined through various lenses.

various dimensions, data quality, ethics, biases, privacy, changes in the labor market and the need for new skills. Also some people are very ideologically opposed to AI, some are fearful, some are worried about AI.

And we have a fantastic panel today to discuss all these aspects, although we'll be focusing mostly on what is at the heart of the problem that we're trying to address at the TPI Center. In fact, the idea behind today's session came about because of the TPI Center's pilot program to automate the collection of data that are essential to assess the entities that we analyze in the progress that they make to the net zero transition.

So the mission of the TPI is to produce all this research and data to enable investors to track and monitor the progress that these entities are making. So it has to be investment decision useful. In fact, we are the academic partner of the TPI, which is an investor-led initiative

led by asset owners and supported by asset managers. And we're also the academic expert of ASCORE. ASCORE stands for Assessing Sovereign Climate-Related Opportunities and Risks, which is a project that we have taken on more recently in 2023 compared to the TPI, which began in 2017. And it focuses more on country analysis. And it's great to see some TPI and ASCORE supporters here as well as new faces. So by the end of this year,

We will assess over 4,000 companies on their management quality practices and the pace at which they decarbonize, over 35 banks in 85 countries. So you can imagine the sheer amount of information that we have to process and assess in the

in the research that we produce. And very importantly, the research that we produced is based on publicly available information. So until recently, all our dedicated team of expert analysts have been searching manually all the research reports, the sustainability reports that companies produce, banks reports, the websites manually.

And more recently, we've tried actually to start looking at the possibility of exploring some automated searches and training large language models. And so it's for this reason that about a year ago, I approached Dr. John Cardozo just to introduce myself. And since then, he couldn't get rid of me anymore because we were asking for help from the Data Science Institute to actually do these assessments in a faster way and a more productive way.

And since then, I'm really proud to say that the TPI work has been embedded in some of the work that John has been leading on. In fact, his students in the D2-105 Advanced Data Manipulation course actually work on the TPI centers' assessments, and they collect unstructured data and information that feeds into the TPI's problem sets, formative and summative assessments.

So the focus really of today's session is on data integrity. How do we really, how can we be judicious about the results that we get through these automated searches, dissecting good over bad data, and really to provide our key stakeholders who are the investors with the right information that they need to assess and engage most importantly with the investing entities that they have. So without further ado, I'm going to hand over to my colleague Sylvain.

We will give you a short presentation. Before doing this, I'll briefly introduce you to the panel and John will do that more in depth and all our esteemed panelists will introduce themselves, telling you a little bit more about their role. So we have Melissa Chapman who is Assistant Professor of Environmental Policy at the ETH Zurich University here on the stage.

She's joined also by Amy Fisher who is joining us online. She's been patiently waiting for us to get started. Thank you, Amy. She's Director of Partnership at Mooir AI where she leads customer and partner engagements. And last but not least, we also have David McNeil who is Vice President Global Climate Research and Strategy at PGIM. For those of you who do not know, PGIM is the management arm of Prudential Financial and asset owner and

he is also one of the research funding partners of the TPI. I need to give you some housekeeping rules as discussed to me. So first of all, in the unlikely event of a fire, please evacuate the building through the two exits that you've used to access this room. There will be colleagues who will guide you, but you can see the assembly points here on the screen.

Also, this event is being recorded and it will be available on our website if technology doesn't fail us, let alone AI. And there might be media representatives in the room, so be aware of that. Put your mobile phone to silence. You can tweet, you can not tweet, but post on social media about this event, but do not

get distracted and stay with us. There will be an opportunity for you to ask questions at the end of the event. For those of you online, I know there are quite many, so thank you for joining us. And there is a function that you should be able to see at the bottom of your screen to post questions. So with that, I'll hand over to my colleague Sylvain. And thank you so much for being with us today. And I hope that when you leave the room, you'll be great ambassadors of the TPI Center and what the DSI does as well. Thank you.

Thank you very much, Carmen, and good evening, everyone. Thank you for being here, and thank you particularly to the panelists for coming however far they've come and for waking up early this morning to join us, Amy, who's on the west coast of North America. I'm going to give a brief overview of some of the work that the TPI Center is doing looking at automating its assessments, and here we'll primarily be talking about large language model pipelines, but I'd like to kind of

first point out that large language models and Chet GPT is not all that AI is, and so I think the panel will come to that after the fact. And then I'm going to walk through a little bit about what the students in the DS205 course led by John, who's our lovely moderator tonight, and some of the work the students have been doing. But before I get into the details of that, I want to take a little bit of a step back.

At the LSE, we're lucky to be one of the world's most prominent social science research institutions. And through that role, have had many great speakers come discuss the topic of AI over the past few years. Last year, we had Jeffrey Hinton, who's often called the godfather of AI.

and he brought his kind of pessimism about the existential risk of AI and some of the short-term social risks that come with proliferation of particularly generative AI tools. We've also had Gary Marcus, again moderated by John, so John's clearly a key character here at the LSE in terms of understanding AI in the social science world. And he talked about how

The hype around artificial general intelligence might be a bit too extreme, given where current models are at. LLMs, they're not really performing actual reasoning. Andy kind of followed this up with a recent comment on the Apple paper on reasoning that some of you may be familiar with.

On the kind of pro-AI opportunistic side, we've seen Anthropic come, be in a partnership with LSE, and they came and presented some of the work they're doing on AI in education. And again, we're very kind of optimistic about the transformative nature AI could have, much to the...

Maybe chagrin of some of the academics in the room, but I'll take that. And last week, we were lucky enough to hear from Lila Ibrahim of DeepMind, and her argument was more that developing AI really is everyone's responsibility. It's a critical tool, and it can be used well or it can be used poorly. And we're at such a nascent stage in the development of this technology in terms of its social and political and economic impact

that it really is on us to work with it and understand it so that we can guide the futures that we want to see rather than some of the futures that more raw market incentives might be leading us towards. And so tonight we're going to kind of follow that lead a little bit and talk about...

what are we doing in terms of our work, what are the panelists doing in their work, and from there hopefully kind of talk about the futures that we can build rather than the futures that we might otherwise end up in. And so despite the title of my first slide,

peril. If we look at the details on what if we get it wrong and what if we get it right, they're actually all kind of use cases that I think should be positioned on a scale of responsible to unresponsible development. So on the left side of your screen there, we're looking at the risks of greenwashing that AI brings forward.

In the TPI Center context, we're using AI to analyze corporate reports, and we just ask a simple model to analyze that corporate report. Likely what we're going to get back is exactly what the corporation put in that report and all the great work that it thinks it's doing. And so you have to be a little bit careful about the types of questions you're asking.

On the bottom left side, we can see a little bit about the environmental cost of AI. Of course, this is often discussed when talking about AI and climate change, and I don't want to dwell on this, but just to note that some recent figures from a paper also from Grantham Research Institute led by Nicholas Stern highlighted that their estimates put somewhere between 0.4 and 1.6 gigatons of CO2

CO2 emissions from all data center usage in the year 2035 and that's roughly equivalent to the current emissions of the entire country of Japan. So that's not a non-significant issue, but it is also an issue that can be managed if developed responsibly because

AI data centers don't need fossil fuels, they need energy. And so it's about building out the right types of incentives for developers to build out those data centers rather than the fossil fuel intensive pathway. On the right side of the screen, if we think about what the world might look like if we build responsibly, again, that Stern paper

highlights in the transport power and agricultural sectors. They expect to see anywhere between 3.2 and 5.4 gigatons of CO2 emissions reductions per year in the year 2035 from

efficiencies, better uses of resources, and many of a wide array of applications of AI from alternative protein development, etc. And the authors quite readily point out, and rightly point out, that if they take this path, this is almost double or more than double the amount of

expected emissions that they also estimated from the emissions of data centers. So it is really a trade-off potentially, but it's about working towards the path that we want to be on rather than ending up on

a pessimistic path and then I just want to finally on the slide highlight some of the work that people are doing like Google Flood to help the world understand the risks of climate change or better track nature loss or

all sorts of other innovative applications that are not necessarily LLM generated, but can lead to more accurate real-time information. And so as we go through the work that we've done at the TPI Center, thanks to some of the support from the Data Science Institute, keep in mind that we're not just talking about ChatGPT or artificial general intelligence.

AI is really a suite of tools and algorithms that can empower researchers and society, or it can be captured by social interests for commercial or political gain. And the future that we end up in is partly our responsibility to create.

Just talk more specifically about the pressures in climate change. We all know that the 1.5 degree goal of Paris Agreement is slipping away from us with no limited overshoot. We're heading into a world where we will likely pass 1.5 degrees Celsius. The biodiversity crisis is accelerating. So from the stakeholders in the financial space that we deal with to politicians and other key decision makers, there's a large demand for more data,

In the TPI context, this is more entities assessed and more timely data. So in the TPI context, this would be more frequent updates from the time that a company or a country produces a document to the time that we can integrate it into our assessments. But at the same time, good data isn't just

lots and fast. It has to be context and knowledge specific. There has to be a regular check in on the biases built into that data. And ideally, it's open, understandable, and reproducible. And so AI, I think, can quite obviously help us with the first two of those problems. But there's a kind of economic problem with it.

A human doing an assessment learns over time if we look at the blue lines. So if we look at this illustrative chart, this is for illustrative purposes only, the marginal cost of producing a volume of information is on the left side and on the

x-axis we've got the volume of information produced a human will learn a little bit over time and the last assessment will be a little bit more efficient likely than the previous assessment but the marginal costs don't decrease that rapidly if we look at ai generated content and i think a lot of you both familiar with ai slop which could be anything from nice hyper realistic cat videos popping around twitter or um

or TikTok to just general generated content. Quite famously, the Chicago Times, I think it was, put out a book review list full of fake books recently for 2024. So it's really easy to produce information, but does that information have much quality to it? And so to kind of combat that flood of information, that timeliness and scale of information that AI is very likely to create just through pure market incentives, we

We need to work towards building robust AI pipelines. And here I put this in as AI plus analyst. You can categorize that however you want. But the key point I want to make here is that these pipelines are a lot more expensive to set up. They take a lot of work to develop the right information sources that you're putting into them.

and the right structures of output that you want to get. And they take costs to maintain because you have to actively verify a lot of that data. You have to maintain stakeholder engagement with the people that are affected by that data. And so you don't want to leave that to just a completely plug-and-play AI-generated data production system. And so to bring this into the TPI centers remit,

This framework here that you can see on the screen broadly represents how we do our assessments. And you can think about this as a human assessment, and you can also think about it as approximating what we call retrieval augmented generation. And I'll walk through a little bit how that plays out, but that's essentially a dedicated pipeline that draws on specific sources and includes an LLM to kind of reformat those specific set of sources into more grounded

truth and information. So at the TPI Center, we have our input, which is going to be a question, and we've got kind of four projects, corporates, banks, sovereigns, and then for corporates, we do kind of a more quantitative assessment as well. And so we'll have a question, and we'll have to go out and find information to answer that question.

Then we'll have to identify where that information is in that document that we found, apply our judgment to kind of come up with a score, and then document how we came up with that score, keeping in mind that we need to source those very carefully so that they can be reproducible in the future.

To give a little bit more detail to step one, if I were to do an assessment asking what is Germany's 2030 emissions relative to a 2019 baseline, the first thing that I would have to go do is go find the documents to where I can find that answer, where I can find the 2030 target. And so for an analyst, this looks like web searching or going to somewhere where they already know the information is. For an AI, this would be more like web scraping.

Sorry, not for an AI, for an automated pipeline, that would be more like web scraping. And then we need to find where the relevant information in those documents is. So in this case, the documents are the European NDC and the German Federal Climate Law. And within those documents, we can find a few ways that the 2030 sources frame. We're not going to go through this in detail, but just want to highlight that there could be multiple competing sources of evidence, and you need to be able to evaluate

where the best evidence is coming from. And then we have the next two steps, which are expert judgment and drafting the assessment. And this results in the kind of structured output that you would find on our website here, where we do the assessment. But there's a number of calculations that go into this, including rebasing the emissions.

figuring out what scope is covered, so how many, what part of the economy is covered by that emissions reduction target and whatnot. And sorry, I think I forgot to mention on this slide, this would, in an automated pipeline, this would be the embedding model and the vector similarity search that then when you called the LLM at the judgment stage,

would have a clear specified prompt that would then refer into that database that you created and match the keywords in your question to the keywords that have been saved in that database put rather simply. And

In the course with John's students, we basically asked them to recreate this pipeline using the tools that they learned in the class. And so they were tasked with building modular frameworks that could be mixed and matched between the individual student projects, depending on which student group built the strongest part of the pipeline. We also tasked them with building flexible frameworks so that you could change the prompt and get still

valid results as long as your prompt was still effective. And they worked on four key problems over the winter term in this class, or at least the three groups that worked particularly on this as their final project. They worked on information gathering from websites and databases. They worked on efficiently extracting information. They worked with multilingual workflows, as is often the case in the

the sovereign assessments, the country level assessments, and they also built some tools to check the confidence in their LLM results. The primary method of doing this was by calling multiple models to do the same evaluation and then checking for similarities between the two answers to give the human analyst, when they went to verify the results, a bit of a sense of whether this was reproducible or not across various models.

Of course, that would then go to, in our way of setting up this work, somebody, one of our analysts, to go and verify that data and allow them to do more and faster assessments. And some of the key learnings from this part of the project were that really high quality sources matter.

It wasn't very useful to go out and search the web, as some of you might find by just typing questions into ChatGPT, if you don't explicitly state where you want the information from. So we used the UNFCCC NDCs, and that was scraped to create a document repository. We used an expert curated climate information source, Climate Action Tracker, as well as a pre-structured database from Climate Policy Radar, which contains a number of climate policies and laws.

And the students also happen to find that smaller context specific models often perform better than larger general ones in embedding and recall tasks. So this is to say that not always is the biggest, best, fanciest model going to be your best result. Those are also the more power hungry models. You might end up getting better results with downscaled models designed for specific tasks.

And that also helps with the energy problem related to AI. A really critical point is that clarity in the questions matter. You need to know what you're asking because what you ask shapes what tools you build and how you measure success. So it's really important to think about why you want to ask questions

create an automated pipeline. What work are you actually going to get out of this? And why is this going to improve climate data or other forms of data? And that will help you structure your approach. And finally, as always, structured outputs is key because unstructured outputs are harder for people then to take into their own decision making processes.

So the kind of key overall takeaways is that models can get you accurate outputs, but only with trusted sources, built-in nuance and careful validation. I'm just going to walk you through three kind of crude comparisons between human analyst assessments and an automated pipeline. So in the Q1 there where we have a 29% different than human rate,

we asked in what year is the country's net zero CO2 target set? This is a really simple question. The answer should be something like 2050, 2070, and it should be pretty clear. In this case, the model had access to the NDDC repository and nothing else.

And what we could clearly see is that the model over-applied the shared framework of the EU and other things like this to all EU countries, even though that wasn't clearly stated in the EU NDC. So while the EU might have a 2050 target, that doesn't necessarily apply to the subset of countries.

And so, you know, maybe that's not the most useful comparison to the human assessment. So then we subset that a little bit more to look at the countries excluding the EU where the human in the model used the same source, so that being the country's NDC, where the human found no target. And in this subset, we got a much higher same-as-human result. And here we could find that the errors...

quote-unquote errors made by the model were that SEPHCR-specific targets or targets that were

Or phrases that sounded like targets like we intend to reach net zero, but we don't commit to reach net zero We're often accepted by the model And this kind of highlights that even when the models giving you results and explanations that make sense It's careful to critically evaluate how you think about what it's giving you and then finally we asked under the ASKER framework we have a

the question, does the country have a framework climate law or equivalent? And that has a number of sub-criteria, so we fed the model that sub-criteria in a structured prompt, and this time we opened up the sources. We asked it to search the internet, look for government sources, and what we found there is that, well, when it got the answer right, that 66% of the time, more or less, it had proper explanations, found the same sources as we were using in our assessments. What it did was it provided a lot of false positives because it

would combine sources when it analyzed things. So it looked at the country's NDC, which is not a legal document.

as well as a legal document like an environmental protection law that doesn't say anything about climate change and combine those two. So just to say these aren't always going to be errors that you're going to get in every pipeline setup, but these are some of the things that we encountered and we find that limiting the source bucket to places that you know where you'll find valid information is important for high accuracy rates, but

But depending on what you're looking for, time saving or accuracy, you might want to change how you set those up.

So I think just to conclude and hand over to John here, what we build does matter. The tools that we use and the information that we put out into the world, if we were to take some of those results that we've gotten from our pilot studies, would be slightly irresponsible, and we need to maintain the analyst verification on top of where we're at right now. I think if we work together towards a more...

oversight view of some of these pipelines, then we can build accountable and transparent data rather than a lot of noise, AI slop, green watching, and repeated information that isn't very critically analyzed. So over to you, John. Thank you very much. Thank you very much for this presentation, Sirvan. As Carmen introduced me before, Sirvan also alluded to me, I'm John.

I'm with the Data Science Institute. I'm an assistant professor there. I'll be chairing the rest of our conversation, doing, you know, mediating, moderating our panel, and then inviting you for questions, whether you are here in the audience or online. I'm going to follow up with Sylvain first and ask you a few things. So in this project we're doing with our students, and as part of your current role, you said something very important, which is

You might want to get, you might want, like your desire is perhaps to get more output, more analysis, more of what you already do well. And AI presents as an opportunity, as something that could be very valuable there. However, there's a downside that you presented many times is that there's a tendency of, a tendency like this built-in feature of LLMs to make stuff up.

and we need to get around that. So where do you find this balance? How do you maintain quality of data? How do you maintain quality of what you're doing? Where is AI useful and where is AI not useful? What is the role of the human there? Packaged in this very long, big question. Thanks for the question, John. Can everyone hear me from over here? Fantastic. Yeah, so I think...

big part of that answer to that question is is what question are you asking and that will design what the role of the human is uh if you're trying to speed some basic information finding um then an llm might be pretty good in those climate law assessments that i talked about it's really hard for us to review such a wide variety of documents and we might actually be okay with some false positives and throwing them out if it takes more time on the verification step but the

But the role of the human in this process is to make sure they're thinking about what question they've asked and what needs to be done then after they've created a model, created a system, created a pipeline to answer this question. So it's a critical thinking role throughout the process to make sure you're answering anything that you put out into the world after you've processed it through your pipeline.

has been verified to the extent that was required by the question that you initially asked. And I think in the TBI Center case, obviously reputational risk is a big part of the work we do. We pride ourselves on detailed assessments of various entities.

And so the verification role is very strong, and that's at each step of that pipeline that I talked about. Understanding what sources you put in, understanding how the embeddings are done so that when the search goes out and you can then verify the sources that were actually used when answering the question, understanding how those sources are then being pulled into the model, what the metadata structure looks like,

And then ultimately understanding the output is all, I think, key for the human to do. So it's not just set up the pipeline and then let it run, but it's interact with it each stage of that pipeline and have an understanding of what's happening so that if there's an error, A, you can go back and understand where that came from, but also so that you're less likely to just assume that it got it right. You can kind of more clearly spot...

where something went wrong. And it might be the case that prompt engineering alone might not solve all of these issues, right? You might still need someone to validate every step of the way. Yes, I think so. Prompt engineering alone is...

More or bigger models aren't going to solve these problems. I think it's just better prompts aren't going to solve the kind of data quality problems that come from LLMs or other forms of automated assessments. There always has to be a real understanding of what's going on at each step of the process, not just a fancier prompt.

though they can help sometimes. - This is what we saw with some of our students that sometimes basic NLP techniques using traditional techniques before language models were introduced yielded better results sometimes. So there was some of that.

So this, I want to make a connection here to the work that Millie does. So Millie, please use the opportunity to introduce yourself to our audience as well. But I wanted to connect this to a paper you co-wrote some time ago about

The title of the paper is Opening a Conversation on Responsible Environmental Data Science in the Age of Large Language Models. So can you tell us what is responsible environmental data science? How does that connect to the quality of data in this discussion we're having? Yeah.

A great question. I think responsibility can mean a lot of different things. Oh, also, I guess you said introduce myself first. I'm Millie. I'm an assistant professor at ETH in Zurich. And my work is kind of focused on environmental policies and how AI can help

or hinder our implementation of policies and our design of policies. Yeah, so like what does responsibility mean? I think it's really like a moving target and a changing thing. So a lot of AI applications in the climate domain

have not been large language models. Like to date, a lot of applications have kind of been more traditional machine learning, things like computer vision or other like predictive modeling approaches. So some of the examples that Sylvan presented, like the flood forecasting, detection of deforestation, there's a whole number of applications of AI to kind of climate relevant decision making or policy or

data curation, these sorts of things. Yeah, but for responsibility, I think in that paper specifically, we kind of are looking at what responsibility means in different parts of the research process or in different parts of the problem process, which again, I think is pretty aligned with some of the themes that you chatted about. So what does it mean to use a large language model for question generation, for deciding what sorts of

what sorts of research we're doing versus using a large language model for helping us kind of design our analysis or write our code. Those might be two very different things and have very different

issues of responsibility. So some of the things that I try to think about in my own work, I guess, are like what are the costs and benefits of using these tools? Again, going back to the emissions point, they're not like trivial amounts of emissions that are associated with these technologies. But again, a lot of the kind of best use cases that are being applied and actually having impact in the climate domain

are smaller scale models. Not all AI is created equal. There are a lot of examples of actually really pretty energy efficient models that are being used for really impactful work.

Another thing is I think a theme that has come up already in the talk is these issues of transparency and accountability. So like how transparent are the processes that are like embedded in the models themselves and in the ways in which we interact with the model. So yeah, can we understand the outcomes and who's accountable when

the outcome is wrong or is impacting a subset of the community in a way that is bad. You cannot just blame the AI sensor and then you cannot just do that. Yeah, what does accountability even mean in this context? I think kind of our traditional notions of who's accountable for harm can be like totally upended. And so as a community, we need to think really deeply about, yeah, about like where we place

maybe blame, but also like how we interpret our own accountability for decisions. Yeah, and then I think to your point kind of in framing this problem around bias and the ways in which these models might impact different communities really differently, especially in the climate domain. So this is partially that like

the data that goes into these models and kind of is used to train them is quite biased. And so you can imagine that that can propagate through our use of these models and their downstream impacts. And then I think a point that we bring up in this paper is that it's not only

Yeah, it's not only that bias, but it's also that can be a bias of omission. So we often talk about like, oh, you know, the answer is like wrong. That comes out from an AI model.

or a large language model. And that is true. That absolutely happens. You know, like, yeah, you can come up with as many examples as you want. But I think that there are equal number of examples of, like, just bias of admission. And so it's not necessarily, like, it's giving you the wrong answer, but it's giving you an answer that is maybe kind of framed in, like,

global majority and kind of reflecting inequities that have existed for centuries in society and whether or not we want those biases to propagate I think is probably the question we as a community need to be thinking about. So in connection to the first point you made about distinguishing AI from LLMs because I think it bears repeating that point

Even the machine learning models that were existing before in large language models, they also propagate this kind of biases before. We have these, even from outside this domain of climate, you have credit, right, in the world of credit. So the data that feeds these models of credit and they are created by language, not by language models, but by machine learning algorithms, they also have carried those biases as well.

So it's a thing on AI in general. Yeah, yeah, yeah. I think that the biases exist kind of across the board. I like this point about omission and transparency, accountability. So all of these points about responsible AI is something we need to think about in academic corners when you're doing assessments. But I also want to invite our colleague, Amy, who's online with us. You see her on the screen to reflect on

on the same topics, but apply to the world of companies. So, Amy, tell us about the work you do at Muir AI and how you help companies with their sustainability initiatives, risk management, and whether responsible AI is part of your day-to-day work as well.

Yeah, absolutely. So hi everyone. Sorry to be joining you virtually, but really happy to be here. I can give just a really quick overview. So I lead our partnerships and customer engagements for my company, Muir. I'm a really passionate advocate for embedding sustainability into business operations.

And that's fundamental to what we do at MIRROR. To give a really quick snapshot on what MIRROR does, we have an AI solution that creates product carbon footprints at scale. And so our system ingests common data available from companies ERP systems, like product names, source locations, product mass. And from there, we can create a product carbon footprint.

And really that the companies that we're working with, what they're trying to do is solve a fundamental challenge of how to more accurately measure and then manage the emissions that are associated with their products and their supply chains. Managing supply chain data is notoriously complex.

I don't think that's really a surprise to anyone in this room. That challenge absolutely extends and maybe is even exacerbated into the sustainability realm. It's really hard to collect data day to day from a tier one supplier who you most likely have a direct relationship with, let alone getting data from a tier three or tier four supplier.

It's hard to get that data for business operations. And if you start to ask questions about carbon emissions, which can be a really nuanced field and requires a lot of technical expertise, it just becomes more difficult to get that data.

If you use traditional methods, if you had a team of consultants or other providers who are going out to manually calculate and create a product footprint for every product in your portfolio, frankly, that would take all the time and all the resources available to simply calculate product emissions. Ultimately, they wouldn't have time to actually do anything with that data, and that's at

end of the day why we want data we want to take data and make you know intelligent decisions and actions from that and frankly that's just it's not realistic for any company to focus all their time on that aspect and from from our perspective what we're doing at mirror what we're seeing others do in the market is this is where ai really comes into play

Because we can use AI solutions to rapidly assess and estimate embodied emissions of hundreds or thousands of products in a matter of minutes or maybe a few hours, you know, in a much more scalable fashion than if you used human resources to do that. And that's what our customers are using NIR for. It's to understand the emissions of their products at scale.

And many of the companies that I work with have set really ambitious net zero goals. And ultimately that requires a comprehensive understanding of their supply chain emissions. And so with AI, what we can do is we can estimate product carbon footprint for everything that they buy or manufacture using the data that they have.

And so what that means is that sustainability teams and those really nuanced professionals are then freed up to actually create strategies to decarbonize, whether it's looking at alternative sourcing decisions, manufacturing processes, or implementing renewable energy strategies with their suppliers or for themselves.

But those teams need that data first in order to create those strategies. And so that's really one of the fundamental challenges we're using AI to solve for is to advance that data-driven insights for effective decision-making. And so I do think responsible AI, to ultimately answer the question, we absolutely find that AI can be a...

A tool that can be used responsibly and effectively for sustainability of procurement teams to inform priorities and set actual decarbonization strategies that are really effective. A quick follow up on that, Amy. How can a company balance the tension between scale, like you mentioned, we can do things in scale and focus on the operations.

and the tension of I want to do more of that without devoting a lot of resources and at the same time how do I guarantee the quality of that work and how do I guarantee that this AI created data, AI created assessments are of good quality. So what are the best practices there?

Yeah, absolutely. So, you know, in our experience, some of the best practices are opening up the data that we're estimating for customer view. So it's not just around creating, here's a number, you have to take it and run with it. But here's the number, here's the calculations, here's all of the estimations that our model went through to create that number that you

than as a customer, as a company, have the opportunity to review, evaluate, and refine based off of information that you may have that wasn't initially provided in the calculation. So really, we're endeavoring to create that scalability, but also that transparency when we produce and share these insights. Thank you, Amy. We're going to go back to you shortly.

I want to bring in David, also another member of our panel. You're an investor and you focus in the investment world focusing on ESG and you also have to probably battle the tension of deciding about companies to invest or not to invest, making decisions that are grounded on data that can also be useful to have an AI to assist you with that.

What is the role of AI in investing or in any adjacent processes that you have there?

Yeah, so for... Introduce yourself. Yes. So I'm Dermot Neal, my Vice President within PGIM. So we are the asset management arm of Prudential Financial. And I sit within our central ESG research and strategy team. So we support our seven different investment management businesses, all of which are organized broadly along asset class lines. So fixed income, private credit, real estate, quantitative equity, all of which have different needs regarding ESG data collection and reporting, very different client bases. So...

What we do is look for commonalities in terms of research needs, tools and analytics. We are a supporter of TPI, as Carmen alluded to, but we're also a user of the data. And really interesting just to hear some of the work that's been ongoing with the pilot project here. So in terms of the use cases, I'd probably boil these down to two.

Two broad use cases. One is data aggregation, so improving the quality of reporting, trying to address gaps in coverage, particularly within certain asset classes within our assets under management. And then the second use case is data integration, so the integration of that data into investment decision making. And depending on those use cases, you can actually have very different models that are applied to this, different rules that you apply to this.

On the aggregation side of things, asset managers are familiar with using a wide range of different data sources in terms of ESG integration reporting. Historically, there's been a very strong focus on trying to improve coverage of emissions data as a proxy for how companies are managing their transition risks, how they're improving their carbon performance over time.

But I think there's also growing recognition in the industry that carbon is just one metric that you need to take a broader view of how companies are transitioning their forward-looking capex plans, their investment strategy, how their product portfolio might evolve over time. And I think the work of the TPI really attests to the fact that not all high emitters are created equal. So just looking at sector exposures doesn't tell you the full scale of your transition risk exposure.

And increasingly, we're looking to incorporate qualitative assessment of companies, a lot more information around their product strategy, their public policy objectives, some of the targets they're setting, and whether they're actually delivering against those targets. So the role of generative AI in LLM is particularly relevant here in terms of integrating that quantitative and qualitative data. So we think there's particular opportunities there to integrate these in a more structured way.

So thinking, for example, on the net zero alignment of companies within our holdings, we can capture a lot more qualitative information, a lot more public information, similar to what the TPI team are doing using some of these tools, albeit it needs strong analyst oversight and human judgment to validate those results. We need to apply it in quite a strategic way.

The other aspect which I've alluded to is the fact that existing disclosure is very heavily skewed towards large public companies in developed markets. We're a global manager. We invest not just in large public companies, we invest quite significantly in private credit. So there's a long tail of mid-market companies, many of which actually sit in scope three of those large public companies. And increasingly, our clients, asset owners, expect us to do that additional work to try and collect that data

even if disclosure is going to tail behind quite significantly, to try and get better proxies, better ways of estimating real-world emissions, real-world transition by those companies, and to engage with them where it's appropriate. So I think there's real opportunities there on the aggregation side to use these tools to get better quality estimates, to try and address gaps in data coverage. And then on the integration side of things,

This is obviously somewhat less developed, but again, there's ongoing discussions in the industry. And we talked about some of the other models that are maybe smaller scale that can be applied here. So machine learning, predictive AI has a lot more applications to the investment process where we can start to think about

how we model transition risk, physical risks at the macro and country level scale, how that translates down into potential risk of market repricing or repricing of certain sectors. We've seen examples of this with wildfire risk, for example, in US utilities in the last few years, where again sentiment has shifted and you start to see that reflecting credit ratings and in markets over time, and then down into the company fundamentals.

And then the other point to include here is improving the quality of asset level data is something that's a key priority for the industry as a whole. And again, something where we think predictive AI has real applications.

Again, understanding patterns of ownership of company assets, not just looking at overall exposure to physical risks or transition risks, but actually the vulnerability of those assets, any specific characteristics of those assets which mitigate that risk, any transition planning that's going to take place on the ground, any physical risk mitigation that's being undertaken by companies on those assets.

Then in a real estate business as well, we're also thinking about how the value of real estate assets could be impacted by tightening emissions standards, over-climbing, falling demand for energy and efficient buildings.

rising sensitivity to physical climate risks and what is the value of resilience measures of flood risk management? Is that just applicable to those individual real estate assets or do they need to be engaging with local municipalities on issues like flood risk management? So that's another case where predictive AI, which can be smaller in scale, less resource intensive, much more targeted, can help us check that we're asking the right questions.

And then going back to my private company's example, we do actually collect a lot of really useful material and information from private companies when we engage with them. There is a large expansion of private credit in the asset management industry that's largely

shifting from bank lending into direct lending by asset managers. So asset managers have very strong access to management, typically within these companies. Because they are mid-market companies, because they are heavily skewed towards industrials and utilities in many cases, arguably physical and transition risk could be more severe in many cases. So to

Combining the information that we get through direct engagements with publicly available information, so some of the Google resources that we talked about, WRI, many of these. And this kind of data is very, very valuable because it's unique to whoever holds that data, right? So in your case, because it's not publicly available like with the other publicly listed companies, you have more that you can use, whether you have more nuance about decisions you're making.

And it's a question of structuring that data, making sure we're asking the right questions, demonstrating through engagements the financial materiality of these factors to the companies so that engagements are ultimately successful. Excellent. Yeah, so again, the importance of high quality data and that's

Acutely especially important for predictive AI, like you said, because you do need data in a very numeric format, in a very structured format. But it's also, as we've discussed amongst all of us, it's also important for large language models even, because you do need some sort of organization of that data to make sense of it. Thank you all in the panel for this.

pivot our discussion a little bit from the opportunities of ai and thinking about responsible uses of data how to curate and analyze high quality data to think about also the risks of the technology itself of using ai and how uh you know

how this could have an impact in all corners of society. So I'm going to ask Mili to offer her thoughts on that. So are there any important current gaps in research regarding the integrity of climate data used by AI systems or the AI systems themselves, not to mention the environmental footprints of these large language models and et cetera, et cetera. What are the risks that you will look

you look forward to like having a community of academics fixing the near future? Yeah, that's a great question. I mean, I think one of the major risks really does come from the bias of input data and the propagation of those biases. Like, are we really just

kind of entrenching the environmental inequities that have existed in the past. So, for example, a lot of my work really is focused on nature-based solutions to climate change and understanding the impact of climate change on ecosystems, so on biodiversity and things like this.

And then when we look at the underlying data that goes into a lot of our assessments of these sorts of things, as well as kind of the sort of data that goes into the decision making for prioritizing like areas, because this is often like land based or ocean based interventions.

that data is incredibly biased. Like geographically, absolutely. So we can see like high income countries kind of light up in, if I was showing a map of like all the data of biodiversity, you would see a map of like macroeconomics effectively.

And so it's really, really striking. And even when we look at these high-income countries, so like the US, for example, you'll see cities and roads, which is, of course, not necessarily where nature is. And I think it's not only a geographic issue, it's also a temporal issue. So when we look at a map of a city, you can see...

biases in the underlying data that are like reflective of the legacy effects of racist policies. So you can see things like redlining, which was a policy in the United States that kind of like segregated cities effectively, you know, in the like, yeah, in the mid 1900s. And so, yeah, all of these sorts of things.

are visible in the data. And so you can imagine when we take that data and we model it-- - And make decisions out of it. - Yeah, we get all these insights from the data and that's great. And then we make decisions from those insights. And so you can get pretty far removed from the actual

biases or the observational processes that led to the data that we have when you're actually making your decision. And so I think this also comes back to some of these points about kind of understanding the processes by which you're going from sort of raw data to decisions, because you want to understand how those observational processes, how those biases or disparities in the data may or may not be propagated through your decision. In a lot of cases, actually, like we have

methods for accounting for bias and and it's not kind of all just like uh like there's no hope or something like this but but you do need to be really conscientious and really careful and then i think um just just i want to touch on a couple of points that were made in the last comment um one of which is that like and this is again going back to some hope is that like a lot of these biases are not only geographic but they're also like

biases and what has been really easy to datafy in society. So there are certain things that are just easier to account for or make into a single metric. And I think large language models are maybe allowing us to actually integrate some kind of non-traditional or other modes of knowing and of information

presumably kind of into our analyses and into our decision-making processes. And so while those technologies can actually propagate biases, and they do propagate biases in a lot of cases, I do think that they also provide some mechanisms by which we can use qualitative data

alongside some of our like more traditional quantitative data. So perhaps a solution would be in the increasing coverage to sort out this this social economic disparities but also having a different look at data and thinking what what else can be captured by data that maybe it's not exactly quantifiable but should be there in a qualitative manner but it should be captured as data in the permit. Very nice. Thank you, Mimi. So

On a level of the individual, Sylvain, what can a researcher or an analyst do? What are the steps you can take to enhance transparency? Or in other words, I'm picking up transparency and bias and all of those things, but if you see any other kind of risks in this space that you want to mitigate, what is the role of the individual researcher/analyst there?

Yeah, thanks, John. I think some of the points have been made pretty clearly already, so I'm just going to kind of repeat a few of them off the top. I think Amy made it very clear that one of the key things is understanding the data that you're working with. Like, that's just, if you're an expert in the field and you know what the source data should look like,

and you have a sense, same to Milly's point there, you can more easily catch when something looks amiss or when something looks awry. So as a researcher, it's great to have access to this larger and larger sets of data, but you really need to dive into the source data first, get an understanding of what that source data is, and then you can go about applying these extraction tools to get more and more structured data. But if you start with just the extraction tool,

and try and then look at the data that's been extracted, you're going to have an unclear idea of whether it's kind of gotten the information that is decision useful or whether it's just picked up some noise that exists either in misspecified source documents, too many source documents, among other things. I think

Otherwise, we've talked about how LLMs can return misleading content or irrelevant content. And I think that's also...

as long as you're aware of that and what you're doing. So as a researcher, you just have to be aware that if you do do a very broad search and you're trying to grab a lot of information from a lot of places and get a concise summary of that, just be aware that you might not only be getting completely accurate information. You can use that to start. That is also true without being inviting an augmented augmentation to a war, without inviting any sort of AI. Yes. Awareness of sources might be relevant even like

as a human only endeavor, right? - Absolutely, I think it's relevant in all cases. Awareness is pretty important, but I think it is increasingly important when you're using these kind of tools. Just a simple example, if you do a literature review with

I've gotten some pretty great results from that and found a lot of Millie's papers on there. But I've also gotten some results that were from different fields or it didn't make sense and didn't fit. And just being aware that I should check each source and then go read the paper after the fact. It's an information gathering exercise, not my output. It's not the literature review I put into my end work.

awareness and due diligence as well for that process. So, Emi, what is the process like for companies and businesses? What are the risks there and what are the ways to mitigate those risks and any reactions to the panel so far?

Yeah, you know, I think I want to build off of Sylvan's point, just that awareness is key, because that resonates not only with researchers and professionals in academia, but also for the business community as well. You know, in my experience, and what we hear a lot from companies, is AI often feels like a black box, and it is often a black box. You put data in, you get data out, and it's not always clear on...

what was modeled, how did the AI system come up with this result? And so the way that we really approach it and how we build trust to mitigate the risks and to build some of that awareness is really to expand transparency into our model and how our AI solution derives results. And then also the actual kind of methodology that sits behind that process.

that process. And so, you know, we do a lot of discussions and discovery with companies to build trust along the way. But to start, transparency is really kind of that first outset. And what we do in particular, and I touched on this earlier, is we don't just provide one single number for, you know, the carbon emissions of a laptop, because I'm looking at my laptop right now. We don't say, here's the number, you have to trust it.

Instead, what we do is we provide full transparency into that product supply chain graph and how our system modeled it. So what are all of the elements and components that are going into this laptop? What are the manufacturing processes that we...

derived to create the laptop? Where are the components most likely coming from? And then what are the emissions for all of those nodes and links within that supply chain? And so a customer can actually investigate and review what our system did and the steps it took to create that number so that they can hopefully then trust the end result that we provide.

That level of transparency for us has been really fundamental in how we bring companies along to trusting an AI system. And then just the other thing that we do personally is not only do we kind of pull back the cover

remove some of that black box feeling. But then we also have our methodology for product carbon footprinting, third party certified to ISO standards. And so frankly, that's just a conversation opener where companies can start to say, based off of the data you're sharing, based off of the methodology that you are having annually reviewed and looked at against ISO standards, that really leaves a lot of the trust and a

allows companies to start drawing those insights and using the data we provide from an AI solution to inform their decision making. Very nice. Thank you, Amy.

David, does that ring a bell? Like discussions of trust, transparency, due diligence and mitigation of bias, like is that also part of your world when thinking about data in AI? Yeah, it definitely resonates because we're effectively the link in the chain between the companies that Amy's working with and then us as the asset managers and ultimately the asset owners that are the consumers of this data. So I think with

the deluge of disclosure we're likely to see over the next few years with CSRD in Europe, with the ISSB being implemented by quite a number of jurisdictions to mandate climate and ESG reporting. There's going to be realistically a lot more corporate disclosure that's dependent on AI-generated metrics. So as the link in the chain, and particularly with some of the regulations that are upcoming around AI and the EU AI Act,

It's effectively, the onus is on us to develop governance processes for the use of that data, to interrogate that data in the way that we would other ESG data that we get from our vendors. So I'd say irrespective of AI, we do spend a lot of time validating data that we get from third-party vendors, amending that, sense-checking it, to make sure this is an accurate representation of company impacts that we're reporting to our clients.

AI can help with some aspects of that. So our data quality checks, generative AI is obviously very good at pattern recognition. We can detect changes year on year in company impact.

But really, I think there's three core principles that we need to be aware of when we're looking at this data and thinking about AI usage, both from companies we invest in, but also internally when we're using these tools as part of our reporting. First, again, is that point around data and information sourcing, so understanding what the models are being trained on, the accuracy of that data, and UNFCC versus the smaller NGOs where you might have less confidence in the data, understanding how that's been used.

Second point is the model transparency, so understanding if this is a generative AI or it's a smaller predictive AI model, what the assumptions underpinning those algorithms, how is that model suitable for this use case, if this is reporting, if this is integration, understanding that we're using the right model, it's appropriate in terms of the output.

And it's also appropriate in terms of resource use from the point of view of responsible AI. And then the final point is really, again, just sense checking these outputs, having that analyst oversight. So, again, looking at any...

changes year on year in data, comparing companies against peers in different industries and sectors, and really doing that additional level of animal scrutiny to make sure these outputs look accurate. And I think that final point really alludes to the fact that there are trade-offs here, so similar to what TPI team has been experiencing between the opportunities to use these tools to scale up

disclosure and coverage very quickly and the accuracy and reliability of that data. So I really do think you need to be quite strategic in how these tools are applied. You need to think about material sectors, but also asset classes that are currently underserved and how these tools can be used more strategically there, rather than just trying to blanket increased coverage. And just given the challenges you'll then have around sense checking, validating that data. Very good.

It strikes me that there will be a lot of jobs for climate data analysts in the future. So AI will not take over those. We've heard from everyone in the panel the importance of having a human in the loop and having a human sense check and validate information. Because as it strikes me from this conversation, trust and transparency are very human values that we cannot ascribe to machines. Or so I gather from our conversations.

Take a sign for us to open to the floor and ask-- please ask-- raise your hands if you have any questions. We're going to go-- someone will bring a microphone to you. If you are watching us online, you can also type your questions there, and I'm going to get it here on my tablet.

When you speak, please state your name, your affiliation and direct your question. If you go on in the lecture, I'm going to have to interrupt you. So yes, in the middle there first. I'm going to get two questions first and then we'll let the panel react to it and then I come back for more questions. Thank you. Hi, I'm Charlene from the Climate Governance Initiative.

And I'm interested in the interaction between availability of data, policy and investment and finance. And I guess my question to the whole panel actually is how worried are we about the EU omnibus?

and that massive reduction in scope that they're proposing in terms of reporting ESG. And do you think AI can fill the gap? And if it can, what would be the sources used if it isn't the companies themselves reporting, particularly when it comes to SMEs?

So the gap on, so you're interested in data policy, investment finance, and the gap that like SMEs and other private companies might be reporting, it might not be represented there. Relative to some recent regulatory changes. The recent regulation. Okay, good. Perfect, perfect. Okay. Any other questions from, yes, we have another one here. Hi, I'm interrupting this event to tell you about another awesome LSE podcast that we think you'd enjoy.

LSE IQ asks social scientists and other experts to answer one intelligent question, like why do people believe in conspiracy theories or can we afford the super rich? Come check us out. Just search for LSE IQ wherever you get your podcasts. Now back to the event.

Wonderful panel, guys. Everyone can hear me. Melissa, I really found it fascinating the topic of historical inequities being embedded in data and then how that propagates undesirable outcomes for the future. I thought it would be wonderful to get your perspective on some of the practical challenges of those data sets and where you see negative ramifications even today in the use of AI.

Thank you very much. So first question about recent regulation, how we're seeing this, and then historical data, practical challenges over there. Anyone wants to react to anything in particular? I can take the regulatory points. So yeah, I mean, I'd say the omnibus proposals are still proposals. They're still under evaluation, still going through the European Union structures. If they take the form that

that they're currently proposed in. Potentially you could see a lot less real economy data flowing through from companies to investors, so less direct disclosure from companies. That being said, I think the direction of travel is pretty clear in terms of ISSB, in terms of the demand coming downwards from investors to companies to try and disclose this data. Standardization does reduce the costs both for the investors and for the corporates in collecting this data and

providing a comparable way to investors. So I think irrespective of what happens with the omnibus, particularly that long tail of SMEs that could have been in scope,

I still think there will be a lot of pressure to collect this data to disclose, and I think AI will inevitably play a role in gap filling. And that will be particularly important for scope three emissions, where, again, historically, we've relied quite heavily on input-output models to approximate a company's scope three. For most corporates, that's 80% of their carbon footprint. So anything that we can do to get more accurate product level and goods and services level carbon footprints, I think, can help there in a big way, and AI can play a role.

I was recently at the Reuters reporting Europe conference and basically tackling this exact subject with a number of CSOs from auto manufacturers here in the UK and in Europe.

electronics good manufacturer. And I think what I really got from them was this Omnibus is actually a chance for them to breathe and collect good data and then report good data that they can apply AI systems on top of to reach strategic insights, to help shape company strategy, to find efficiencies. What was happening under CSOD like wave one, or CS triple D wave one,

is that companies were just scrambling to gather data wherever they could, and it wasn't really good inputs. They were just working hard to get something to meet the regulations. And so with this wave two, two-year reprieve for companies, I think a lot of, at least the leading companies, are now taking a step back and thinking, okay, we worked really hard to get ready for compliance. Now we've got some of the data foundation. How can we improve that foundation? And then how can we apply AI tools to take strategic insights forward?

from that thing. So what I heard from the companies is very few of them had actually already started using AI for anything other than document summary and things like that. These are mostly real economy companies. But they were excited about having a better kind of single point of truth in ESG data lake that they had to sort of create in pre-compliance for the initial wave. And now that they got a little break, they might actually be able to rearrange that data set and do something with it. So I think I'm not so pessimistic about

how this will affect the long tail. I think in the first two years, yeah, we'll have a little bit less access to data. But I think when we do get access to that data, it'll be better data. And it's too bad that some companies are dropping out of scope. But again, the major companies are still in scope and some of their tier one suppliers, which will cover a lot of additional companies. And then maybe AI can help plug in the gaps with that better source of truth that we will have relative to what we would have had this year with

the kind of rush that companies were preparing to comply with. Sure. I think Amy has a point. Yes. Thank you. Yeah, sorry. If I can jump in there too, you know, despite what the current omnibus proposal looks like, what we're hearing from companies is that especially those, you know, Fortune 10s, Fortune 50s, where they have publicly set, quote,

net zero targets. They are not, you know, turning the bus around, turning the car around. And to David's point, you know, the direction of travel is clear. Not only are these companies still progressing with their net zero targets, but they are now advancing to the point where they are starting to

require data from their tier one suppliers to feed into their own strategies. So while SMEs might no longer be affected by the omnibus and regulations, that doesn't mean that corporates aren't still pushing the needle forward in terms of requesting and

this type of data. And certainly in the US, that's really where we've seen the greatest movement is private corporations who have been leading the charge and federal regulations have lagged. And I think maybe that will be reflected in the EU as well, but it certainly doesn't seem like companies are turning around on their commitments wholesale. Rather, they're focusing instead on their own prioritization and engaging with their suppliers to meet their own net zero targets.

Excellent. Thanks, Emmy. Millie, would you like to respond to perhaps the second question? Or the first one, reacting to the regulatory environment, if you will. No, I'll hop into the second question. I feel like those are all great responses on the first.

Yeah, I mean, that is a fantastic question and a really complicated one, I think, for a couple of reasons. The first of which is that some of these, like, biases or apparent biases or disparities in data are actually, like, reflective of the ways in which, like, society, people, our histories, our economies change.

like impact ecosystems, impact like the ways in which our world works. And so we don't always want to just like flatten out like anywhere that there's a biases. Sometimes like

biases are actually like patterns that give us insights into impacts of different processes. So the example I gave about like redlining or residential segregation in the US and the ways in which that impacts the information that we have about biodiversity and its changes in the context of climate change. In some ways, like,

Redlining impacted green space in neighborhoods which impacted biodiversity and so there may in fact be less biodiversity in those neighborhoods and therefore less data. And so it's a bit of a complicated situation. Kind of a practical response to that and maybe an obvious response is like, okay, well, we just collect more data in these places where we have data gaps.

certainly like that will fix the issue. There's a couple challenges to that. One of which is that like a lot of the places where we have data gaps is because it's either hard to collect data or that some of these communities are some of the communities that have been the most like heavily surveilled and policed

communities in the world. And so it's like really important to think about the ways in which like collecting environmental data is also collecting like human data and ensuring community consent in the like process of data collection and in the process of drilling data gaps.

And I think that comes to just my last point, which is about like how there's not only geographic kind of historical biases in the data, but also biases in like who owns and manages data. And so you can see like a lot of patterns of colonialism in like who owns, manages, processes data from different parts of the world. And so I think kind of to this point of collecting more data, but it's also an issue of like shifting ownership and management and use of data and kind of decentralizing some of those processes such that like

you know, a broader set of people can engage in kind of, yeah, all of this process and kind of have agency over the ways in which things might propagate through decision-making and policy. Yeah.

Very good. I'm going to take one question from the online audience and then I'm going to get back to the floor here. So, Ann Chin sent us this. What are the biggest challenges to individual businesses disclosing more data than required in order to accelerate benchmarking and improving the integrity of data quality and therefore create best practice advice?

If there are privacy concerns, can these be overcome to encourage collaboration and working together for the greater good? Anyone would like to pick that up? Well, I guess the assumption here is if it's managed in this case.

I think there is obviously a push as the question implies to just disclose more and more in terms of ESG impacts, ESG risks. I think what ISSB does which is quite helpful is really gets companies thinking about materiality of specific ESG factors to their business model and disclosing what's specifically material to them. That's also a useful lens for investors both from a financial materiality and a double materiality lens to think about what data they're looking for from companies. So

I think more data is obviously useful for benchmarking purposes, but again, as we saw in the chart from TPI, there's diminishing returns to that data over time, and particularly if it's AI generated, there'll be trade-offs in terms of the quality. If we are using data for benchmarking purposes, we want it to be a real accurate reflection of what companies are doing on the ground, and not just reporting for the sake of reporting.

Yeah, sure. I can just add a little bit. I think it definitely depends on what more data they're reporting than what was necessary. Mandatory disclosure requirements are there for a reason, and companies should focus on complying with that.

the legal regulations. And at TPI, we love to see when companies disclose additional information, be it through SPTI, science-based target initiatives, and get that information verified. I think where the opportunities lie is when they collect additional information that maybe they had somewhere in the information before but hadn't been brought together and then can use it to drive strategy, find efficiencies, and

and kind of optimize processes whether that's energy use or water waste i heard from a few companies recently that had been forced by regulation to disclose their water use in in the thames water basin and and they didn't realize how much money they were wasting on water use because no one had bothered to check in the past and they made a closed loop system and saved a bunch of money and also wasted less gray water so i think whether it's a mandatory requirement or a non-mandatory requirement whether it's

analyzed by AI or by traditional technique, collecting data that hadn't been collected before and then thinking about how you can use it to drive strategy is where the opportunities lie rather than just the costs. Any other questions from the floor? I think we have a few hands there. Person in suit over there. And then we have here in the middle in the green shirt. Please come up.

Hi, my name is Pranjal. I'm from Osborne Clark. Thank you very much for the talk. It's been really interesting and glad to hear so many people thinking about the cross-intersection here. My question is really quite selfish. Thinking about my role, I work a lot with sustainability data.

My firm reports scope one and scope two emissions. We do 11 of the 15 scope three categories. And what I found quite challenging in my role is collecting the data, getting it ready, suitable enough for an assurance process.

And I'm conscious that AI has the opportunity to help with it. But I'm also put in the back foot thinking about whether this might create almost a cottage industry where we have to think about the different sources in which AI has been used, the different LLMs and how that might create more complexity from a verification perspective.

So really an open question to the panel, is there also a risk of AI creating more complexity in the data and therefore causing more of a verification challenge? Thank you. Perhaps more unnecessary complexity. So the person in the green t-shirt in the middle here, there was a hand. Can you raise your hand again so that the viewers can see? Yeah. And there were more hands in the back there. Yeah, the blue shirt down there.

Hi, thank you. I'm Scarlett, I'm studying a Masters here in Environment and Development at MSC. At MSC, sorry. And I have a question.

slightly off of your one, but from a different angle, which is, have you found that discussions about the ethics of AI are actually encouraging people to think more deeply about the integrity of the data and also to kind of evaluate their own stance in the decision-making processes? Because it seems to me that

AI is exasperating but also highlighting existing issues rather than necessarily just creating new ones. And so I guess I'm wondering whether those discourses are present or whether you feel that they're still quite overlooked. And let's take one final question from back there.

Hi, hello, my name is Roberto, I'm a climate tech expert. In my experience, I noticed that in order for the data to be comparable, the best way is if industries can cross collaborate and calculate and share data

for the same processes and for the same supply chains. For example, shipping industry, mining industry. And I see this the most powerful way, basically, that you can benchmark and you can have more confidence that you are comparing Apple with Apple.

So far, in the last two years, I only see a big effort from the automotive industry in order to create rules about how you calculate carbon footprint, a digital product passport, how do you share the data peer-to-peer between different suppliers, but I don't see anything like that from the other industries. I only see big companies doing their own thing here, their own thing there.

What is your view? Do you think there should be more cross-industry focus and collaboration in order to encourage this accurate and high-quality benchmark? Thank you very much, Roberto. So we have very, very interesting and complementary questions here. So first question about whether AI is introducing more complexity. A second question about

the conversations about the ethics of AI, whether this is leading to people considering more seriously the integrity of data, and a final question about whether we need more cross-industry collaboration. Any thoughts from our panel from any of those? I guess all those questions actually sit quite well together because you posed the first problem, the second pointed to some of the solutions, and then the third also pointed to some solutions here where

I think AI and the complexities it introduces, particularly in terms of verification of information, some of the audit requirements that might come through the EU AI Act actually does prompt a lot more thinking within the investment management industry about underlying data integrity, governance of models, how we ingest data, how we sense check this. So I think it will hopefully lead to a lot more dialogue within the industry around common principles, if not common standards.

the investment management industry, the responsible investment industry is very good at collaborating on trying to establish common principles within the market in advance of regulations. So we do this on finance emissions, we do this on all

on net zero investment frameworks, on fiscal risk assessment increasingly, a lot of work through the IAGCC. And this was actually one of the discussion points that came up at the IAGCC summit yesterday. Should we have common principles for how we integrate AI-derived data into reporting, into integration? So I think this will prompt a lot more of those discussions. I hope it will lead to

common agreement on principles in the industry and I think sectorally is probably the way to go given the value chain focus of a lot of this. The auto industry, as you alluded to, has really led the way in promoting standardization, trying to avoid double counting, traceability of data throughout the value chain. So seeing that replicated in other industries I think could help reporters lower the cost of collecting and reporting this data and potentially the audit costs. - Any reactions from you?

Yeah, I think that the question really around, you know, if AI creates additional complexity in reporting is absolutely valid. And it does. I mean, at the end of the day, what we're doing is introducing more data into the forum and into the public sphere. That being said, I don't think we should shy away from using AI to increase

create more data and create more analytics, AI can be a tool to do so. And so, you know, where we should continue to push auditors and assurance firms forward so that they can effectively evaluate and look at AI-generated data, I think is really where we need to see that industry go. And cross-industry collaboration and systemization

systematizing data is incredibly important. We follow PACT, the Partnership for Carbon Transparency, which is really focused on standardizing that scope three value chain emissions data sharing.

And that is really ultimately to drive towards more data, more open data in a way that is systematized and standardized so others can use it. So we are introducing more complexity, but I think that is what we need to do in order to move past the stage that we're in to start gathering better and more impactful data. Milly, any reactions?

Yeah, I mean, I guess my reaction is maybe primarily to the second question, but the first as well. And just like thinking critically about when and where more data is actually valuable. And I think this also goes back to one of Sylvan's points about like, yeah, how does data actually get translated into decision making? And like, when are we just kind of creating more data than actually like

human heuristics or decision-making processes have the capacity to integrate. And so I think that's a place where AI has a lot of value in translating data into decision strategies and also has a lot of risk. That's giving a lot of power to an algorithm to make a decision or

really push a decision rather than just create data that's thrown at a human to make a decision. And it's a bit of a nuanced difference, but I think it's an important difference. Yeah, so thinking critically about when more data and when to use algorithms to do kind of a second stage of processing that data into decisions.

Yeah, I think to the first question, I would completely agree that more kind of processing layers from AI may add complexity to the verification process that you need to go through for assurance. And that's where I think, you know, really critically thinking about, was it useful here? Like, did I need to take this step to process this data using AI? Could I have used a simpler process?

model, some R code, or do I need this data at all? So not thinking about the question you want answered before you go out and apply

apply the tool to whatever data source you have, I think is really becoming a critical process in understanding data more so than it always has been, but I think even more so than it has in the past. And I think this goes to the second question as well. Data has always had challenges. Understanding data has always had challenges. There's always been biases in data. And I think just as we think about giving more and more decision-making control or more and more control over

parts of the thinking and decision-making process from question formation to the ultimate decision to generative models or any other form of model, the discussions are only becoming more important. So I would agree that they're not new discussions, but they might be more critical now than they were in the past. Thank you very much. By the way, I want to say that the themes that we're discussing here right now

percolate through our pilot program in the course that we did. And Silvan was a very big collaborator there. I'd like to thank you very much for taking part in our pilot and the TPI Center also for hosting this event. We will probably do-- we're very likely to do a write-up on that. And there will be some of these threads and some of the thinking behind white quality data, thinking about efficiency, responsible AI would definitely be there somewhere.

I'd like to thank you all for coming. I'd like to thank our speakers. We've come to a close. And feel free to interact, engage with us. And if you want to know more about the work that the TPI Center does, visit their website, transitionpathwayinitiative.org, and check out their assessment tools and data releases. And check out the Data Science Institute website as well. The LSE Data Science Institute as well. We are good friends.

Thank you all very much. Have a good evening. Thank you for listening. You can subscribe to the LSE Events podcast on your favorite podcast app and help other listeners discover us by leaving a review. Visit lse.ac.uk forward slash events to find out what's on next. We hope you join us at another LSE Events soon.

Harnessing AI: safeguarding high-integrity data for climate action 01:31:14 Share

LSE: Public lectures and events

Deep Dive

Shownotes Transcript

Harnessing AI: safeguarding high-integrity data for climate action