We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Unlocking the Power of Generative AI in Dow Jones with Ingrid Verschuren

Unlocking the Power of Generative AI in Dow Jones with Ingrid Verschuren

2024/11/19
logo of podcast Analyse Asia with Bernard Leong

Analyse Asia with Bernard Leong

AI Deep Dive AI Insights AI Chapters Transcript
People
I
Ingrid Verschuren
Topics
Ingrid Verschuren: 道琼斯Factiva平台在使用生成式AI处理来自其他出版商的内容时,始终坚持公平补偿的原则。我们与出版商保持透明的沟通,确保他们获得公平的报酬。在使用生成式AI之前,我们有义务重新与所有出版商协商,获取额外的许可权。这一做法源于道琼斯自身作为出版商对内容的保护意识,以及我们希望对所有出版商一视同仁的原则。虽然这带来了挑战,但也确保了我们现在通过生成式AI提供的摘要内容均获得充分授权。 Factiva平台的发展历程,从手动索引新闻文章到利用AI技术自动化处理,再到如今应用生成式AI进行摘要,体现了我们对技术创新的持续追求。我们始终坚持内容的可靠性和可追溯性,确保用户能够信任Factiva提供的信息。生成式AI的应用提升了搜索效率和信息处理速度,同时保证信息的透明度和可审核性。 我们与谷歌合作,采用Gemini模型,主要基于以下几个原因:首先,我们可以更好地控制数据输入和输出;其次,这有助于我们对出版商保持透明,因为我们可以追溯到每个文本片段的来源;最后,谷歌的Gemini模型具有多语言能力,这对于处理Factiva平台中32种语言的内容至关重要。 在应对生成式AI带来的挑战时,我们注重以下几个方面:首先,确保信息来源的可靠性,我们拥有专门的团队负责评估出版商的可靠性和信誉;其次,我们进行持续的测试,以最大限度地减少幻觉等问题;最后,我们强调人工审核的重要性,确保信息的准确性和可靠性。 我们认为,未来AI技术在风险管理和决策方面的重要发展方向在于整合结构化和非结构化数据,从而创建一个更智能的平台。同时,我们也强调人机协作的重要性,将机器擅长处理的任务交给机器,让人类专注于更深层次的分析和价值判断。 Factiva的成功在于整合结构化和非结构化数据,并坚持其核心原则:使用可信赖的信息、对合作伙伴保持透明以及公平补偿。 Bernard Leong: 作为访谈主持人,Bernard Leong主要负责引导话题,提出问题,并对Ingrid Verschuren的回答进行总结和补充。他关注Factiva平台在生成式AI时代面临的挑战和机遇,以及如何确保信息的可靠性和透明度。他与Ingrid Verschuren就Factiva平台的技术发展、与谷歌的合作、以及生成式AI在风险管理和决策中的应用等方面进行了深入探讨。

Deep Dive

Key Insights

Why did Dow Jones feel the need to renegotiate licensing agreements with publishers for generative AI use?

Dow Jones felt obligated to renegotiate licensing agreements because they license content from other publishers and wanted to ensure transparency and fair compensation. As a publisher themselves, they are protective of their content and wanted to extend the same principles to other publishers when using their content for generative AI.

What is the significance of Factiva's fully licensed content in generative AI summarization?

Factiva's fully licensed content ensures that all publishers have granted permission for their content to be used in generative AI summarization. This guarantees that the content is legally compliant, traceable, and citable, maintaining trust and transparency for users.

How has Factiva evolved from manual news indexing to leveraging generative AI?

Factiva started with manual news indexing, where articles were tagged with metadata. As the volume of news grew, they adopted natural language processing to automate the process. Today, they use generative AI for semantic search and summarization, making it easier for users to access and understand vast amounts of information quickly.

What challenges did Dow Jones face when integrating generative AI into Factiva?

One major challenge was ensuring that all content used in generative AI was fully licensed. Dow Jones had to renegotiate licensing agreements with publishers to secure additional rights for AI use, ensuring transparency and fair compensation for all parties involved.

How does Factiva ensure the reliability of its content in the age of generative AI?

Factiva ensures reliability by licensing content from trusted publishers and using a dedicated team to verify the credibility of sources. They also conduct extensive and ongoing testing of AI tools to minimize hallucinations and maintain the accuracy of the information provided.

What role does human judgment play in Factiva's use of generative AI?

Human judgment is crucial in deciding which sources are reliable, testing AI outputs, and determining what constitutes good results. Factiva emphasizes 'authentic intelligence,' where AI handles tasks efficiently, freeing humans to conduct deeper investigations and add value to clients.

Why did Dow Jones choose to partner with Google for its generative AI initiatives?

Dow Jones partnered with Google due to their existing relationship and use of Google Cloud solutions. Google's Gemini model offered multilingual capabilities and low latency, which were essential for handling Factiva's vast and diverse content in 32 languages.

What are the benefits of using a retrieval-augmented generation (RAG) model in Factiva?

The RAG model allows Dow Jones to control the input and output of information, ensuring transparency and traceability. It also enables them to remove content if a publisher withdraws permission, which would not be possible if the content were merged into a large language model.

How does Factiva's structured data processing enhance its generative AI capabilities?

Factiva's structured data processing, which includes fielded information, allows for easier slicing and dicing of data. This structure ensures compliance with regulations and enables users to exclude specific information when necessary, enhancing the platform's flexibility and reliability.

What does Ingrid Verschuren envision as the next big advancement in AI for risk management?

Ingrid envisions the next big advancement as the ability to combine structured and unstructured data seamlessly. By merging these data types and applying advanced search capabilities, Factiva aims to create a more intelligent platform for risk management and decision-making.

Shownotes Transcript

Translations:
中文

Do you manage your own IT for distributed teams in Asia? And you know how painful it is. SFL helps your in-house team by taking cumbersome tasks off their hands and giving them the tools to manage IT effectively.

Get help across eight countries in Asia Pacific from on and off boarding, procuring devices to real-time IT support and device management. With our state-of-the-art platform, gain full control of all your IT infrastructure in one place. Our team of IT support pros are keen to help you grow. So check out ESEVEL.com and get a demo today. Use our referral code ASIA for three months free. Terms and conditions apply.

So we are very conscious about the fact that we license the content from other publications. And as I mentioned previously, we do that through licensing agreements. We are transparent towards the publishers what happens with their content. We ensure that they are being fairly compensated for the content that

we use. But as a result of it, when we started talking about how we wanted to use GenAI, we actually felt that we had an obligation to go back to publishers and ask for additional licensing rights.

And part of that, I think, is driven by the fact that Dow Jones is a publisher. We are publishers ourselves. We are very protective of our content. We want to make sure that we understand what's happening with our content. Where is it going? Who is using it? And we want to be fairly compensated for it. So if that's one of our core principles, then we also want to make sure that we treat all the

other publishers exactly the same. So one of the challenges has been that we had to go back to all publishers and ask for additional Gen-AI licensing rights.

The positive of that is that the content that is now available through GenAI Summarization is that it's actually fully licensed content. All publishers have given us the permission to use their content for this specific use case.

Welcome to Analyze Asia, the premier podcast dedicated to dissecting the pulse of business technology and media in Asia. I'm Bernard Leung and Generative AI has transformed the way how information flows globally. How do a business intelligence platform such as Factiva navigate in the age of AI? With me today, Ingrid Verschuren.

Executive Vice President, Data and AI, and General Manager, EMEA at Dow Jones. Ingrid, welcome to the show. Thank you very much for having me. I'm super excited to talk about this today. Yes, I'm very, very excited to have this conversation. Previously, I have spoken to Joelle on the show and we were talking about AI and compliance, but I think this is much more exciting because we're talking about Factiva,

which is a platform that a lot of businesses use in the flow of financial information. But then to begin, we always like to talk about the origin story of our guests. So how did you start your career that eventually led you to your current role at Dow Jones?

Yeah, and I'm happy to talk about that. I think what is interesting, so I have a master's degree in Latin American studies with a minor in business management. And then I'm originally from the Netherlands, as you most likely can tell by my accent. But when I moved to Spain way back when, I really needed to find a job.

And it was hard to find something that was directly related to my degree. So I ended up finding a job at the time, which was Reuters, which is now Dow Jones, at least that part of Reuters. And I was hired to manually index news articles in German and Dutch and in Spanish and Portuguese.

And what I realized over time and the reason why I'm still at Dow Jones is that it actually gave me the opportunity to switch topics. So sometimes I would be talking about the format of Burmese names and then at the next hour I would be talking about technology solutions.

and then I would be talking about budget. So ultimately, everything that I'm interested in is all coming together into this career. So given you have such a long tenure with Dow Jones, what are the valuable lessons you can share with my audience about your career journey? Just like

three main lessons, I would say. The first one, and I think it's the most important one, is to work with people that you like. You spend a lot of time at work, we all do, and being surrounded by people that are nice really makes it more enjoyable, people that are smart. So it's always the people that come first.

Secondly, and I think this took time over the years, is ultimately finding a job with a purpose, right? I think purpose is really important. And if you think about the purpose of Dow Jones, which is providing the most trusted journalism, news, data analysis to people to make decisions, ultimately what we are doing is we are

holding the world to account, inform people with facts. And that is a great purpose. That's a great way to spend your day and to spend your career. And thirdly, just thinking about some of the advice is never be afraid to say yes. Sometimes a challenge comes your way and you feel like, oh, maybe I can't do that. Maybe I can. Just say yes.

And I started doing that very early on in my career, and it's definitely benefited me. So we're going to get to the main subject of the day, which is talking about Factiva in the era of generative AI. I think for those who may be less familiar, could you introduce Dow Jones and its Factiva business intelligence platform? I think what kind of role is played in supporting business with reliable data, insights, and risk management solutions? When I was the head of AI ML for Factiva,

AWS when I'm dealing with FSI clients, Factiva is always part of the conversation.

- Factiva is an amazing platform for those of you who don't know it. It is a business intelligent platform and the easiest way to think about it is it's a huge news aggregation database. It includes over 33,000 sources in 32 languages. What is really good about those sources is that we actually license the sources. So what we do, we go out and speak to publishers

And we get that permission to include the content into our database. Not only do we do that, but we actually pay them for it as well, right? So that means that people who are using the database know that the information they are going to find is trusted information. The other thing that makes it really interesting, it has a lot of

information that isn't freely available on the web, right? So right now, if you think about I'm Googling it or I'm finding information in any other way, a lot of the actual news sits behind the paywall. Factiva allows you to access that information. It has a very large archive, so it includes billions of news articles, which means that you can

very easily do historical backups, right? Combine it with corporate information, so information about companies and executives, and that really allows business professionals, whether that's in the government space, in the academic space, whether you're a legal professional, whether you work for a big consultancy, to do really in-depth research.

So before coming to this interview, I've done some extensive research on you and I know you have a front row seat to the evolution of Factiva, starting from manually tagging data in Reuters as a news indexer. Then from all that digitizing and today overseeing the role in the age of AI, I think

It is actually very rare to have someone like you to actually see the entire evolution. So what core values or principles have remained constant for Factiva throughout the journey? And how do have those values shaped the way with the platform adapting to the new technology such as generative AI?

It's funny you mentioned that because I very often talk about that, mostly because what is interesting about that role. So just very quickly for people explaining, we were literally being paid to read news articles in different languages and we had to tag them or we put metadata on it. And we did that manually. So we had to say it's about a merger. These two companies are involved, this industry and these two countries.

But we realized that as the volume of news articles was growing, that doing it manually was not scalable because it meant that as news was growing, you had to constantly adding more and more people. So very early on, we started using rudimentary forms of natural language practices.

processing that allowed us to automate it. So within four or five years after I started this job, the job was fully automated, right? Yet here I am, which is a good thing because it actually, I think, helps with taking away the fear that so many people have when it comes to AI or when it comes to Gen AI.

So I think because of the vast amount of data that Factiva processes, and currently we process between 600,000 and 700,000 news articles a day, we've always had to think about how we could do things smarter and how we could use technology in order to do that. Ultimately, what this has ended up

is we have started to use Gen AI. Initially, we used it as to improve our search. We actually last year launched Factiva Semantic Search to make the search, the way that you search in Factiva much more intuitive and more easier to use for people who are not necessarily information professionals. And what we are now launching is like the next step

which is GenAI summarization. And GenAI summarization really helps our users to make sense of all the information that is in Factiva quicker. It neatly summarizes. If you ask a question, you get a summarization of the search results. So in a very quick, efficient way, you understand whether you're actually getting answers to the questions that you asked earlier

But it also allows us to do, and I think this answers your question about what hasn't changed, right? Because that was the other part of the question.

it also actually shows you where the search results came from. So it's very clear about this summary came from these three sources, here are the sources, here is if you want to read the full article, you can do that. So the information is traceable, it's citable, and you can actually use it in an audit as well because you know exactly where the information came from.

And sticking to that transparency, ensuring that we have the right content, that has never changed. That was the case more than 25 years ago. And that is still the case today. Wow, that's a very interesting point of view. So could you explain how AI beyond simple automation is transforming the way businesses approach risk management today?

Yeah, what I think is interesting in that, I think ultimately AI and Gen AI have, they help with automation, right? So they help businesses to become more efficient. But if you look at it from our perspective, part of what I think it allows us to do is to solve really complex customer problems.

problems, right? And our starting point when we introduce new technology is never just introduce new technology for the sake of introducing new technology. What we want to do is we want to make sure that we listen to the customer, we understand what their problems are, and then we can actually do something about it. And what I think is interesting, if you don't think about the automation part, but beyond that,

What has been, what AI and Gen AI is allowing us to do is to speed up the process of making sense of large amount of unstructured information, which in the past was always challenging. Any major challenges in the platform when you start to adapt to the generative AI? And also, there may also be opportunities that emerge from these technological advancements. Can you share a little bit?

Yeah, I think one of the challenges is the fact that it is not our content, right? So we are very conscious about the fact that we license the content from other publications. And as I mentioned previously, we do that through licensing agreements. We are transparent towards the publishers what happens with their content. We ensure that they are being fairly compensated for the content that

we use, but as a result of it, when we started talking about how we wanted to use GenAI, we actually felt that we had an obligation to go back to publishers and ask for additional licensing rights.

And part of that, I think, is driven by the fact that Dow Jones is a publisher. We are publishers ourselves. We are very protective of our content. We want to make sure that we understand what's happening with our content. Where is it going? Who is using it? And we want to be fairly compensated for it. So if that's one of our core principles, then we also want to make sure that we treat other

other publishers exactly the same. So one of the challenges has been that we had to go back to all publishers and ask for additional Gen AI licensing rights.

The positive of that is that the content that is now available through GenAI summarization is that it's actually fully licensed content. All publishers have given us the permission to use their content for this specific use case.

So like, for example, how do you, let's say, Dow Jones, Gen-AR powered tools specifically sort of like safeguarding against things like misinformation and help detect anomalies. Can you provide some examples of these kind of technologies in action?

Yeah, I think the key is it all starts with the information you use as input, right? So we want to make sure that we actually use trusted, reliable information, which is why when we go out and license the content, we want to make sure that we actually, like the content that the publisher is providing us is reliable. We have a CRM.

team dedicated to doing that. So it's a licensing team that knows that is spread across the world. So that knows the media landscape really well in specific regions. They can use their human judgment to actually understand, can I trust this information? Is this publisher reliable? If so, then let's do a licensing agreement. So that means that the input is

is actually reliable. That doesn't always mean that hallucination doesn't happen. And there's some examples where it still does, but you definitely minimize that. The second piece is constant testing, right? You have to constantly test it because specifically if you're using prompts, even if you make the smallest change,

it can upset your entire prompt. So we do extensive testing and constant testing. It's not we test it, we launch it, and now we are done. The testing is ongoing. So what's the one thing you know about data and AI infectiva that very few do?

I know so many things about the RENI. We share all of them. Totally. I think I was going to share two. I think one of it is, and we've spoken a lot about the news aspect of Factiva.

The other aspect is the companies and executive information that you can find in Factiva. And it's actually well known to our clients. It's not necessarily a secret. But what I do think it makes for a very interesting combination. So we cover more than 40 million companies and more than 80 million executives. So combining that with the volume of news that we process. So that is one. The other thing is,

that I think is interesting, and this goes back a little bit to from a very early on, right from the early onset, we had to deal with this vast amount of information. We also were dealing with multilingual. Even back then, 25 years ago, we had fewer than 32 languages, but we definitely had more than just English.

So how do you actually allow users to find all of that information, even if you don't speak all those languages? And again, now it's really easy. Everybody will say, well, use Google Translate, right? 25 years ago, there was no Google Translate. So we actually came up with...

Going back to my initial job with putting metadata to all of these articles. So there was a code for merger, right? So even if I didn't speak Japanese or Chinese, I could still look for merger and acquisitions and I would get those specific articles. So it actually...

us to tackle multilingual content and what it has allowed us to do over the years because we never stopped that tagging when we started using machine learning or supervised machine learning all of the information was already annotated so I think one of the big

challenges for many business owners, specifically for large enterprises, is to decide whether they should build their own large language model or LLMs in short, instead of say, create a retrieval augmented generation with a query with the existing enterprise large language models. What is the mental model behind choosing to partner with Google and adopt the Gemini models for use in Techiva?

Yeah, and I think I'm going to split it into two, I think, because I think on the one hand is choosing the platform and on the other hand is then actually how you're going to use the platform. And starting with the second piece, one of the things that

using a RAC model allowed us to do is to have really good control over the input and output, right? Because ultimately what it allowed us to do was using all of the licensed Factiva content for Gen AI, put that into a vector database, and then the RAC model could actually use the information that was in there. So that was a good way, one, of

controlling what information was being used when providing answers.

Two, it also allows us to be really transparent to publishers because ultimately we can trace back each chunk of text to the publisher, right? This was from this publication, this came from that publication, and that was really important to us. It also means that if at any point in time a publisher decides that they are no longer interested in working with us,

we can take it out. If we would have merged it into a large language model, it stays in there forever, right? So those were all reasons for going with the RAC model. The reason why we decided to work with Google is that we already had a really good relationship with them. We were using various solutions in Google Cloud.

Gemini is one of those solutions. They also had a multilingual model that was very helpful considering the 32 languages that we had to deal with. And overall, the journey has been really good and it's been a really supportive, collaborative effort. So based on the partnership, how will Google help Dow Jones with that?

their AI-powered solutions onto the Factiva products. I think what it allows us to do is, and this was another thing that I think was important and definitely a result of the collaborative relationship

Going back to the vast amount of content, when we looked at the latency, it was actually, it was what we were looking for. So I think that was important. And then going back to the multilingual model, that was something that was very important to us as well. Ease of use was another one. Yeah.

I think just now when you mentioned about taking the finding the correct trace to the data is kind of the provenance of where the data is coming from and of course if the license is gone the provenance is taken away so that everything is still consistent so maybe what would be the specific benefits for businesses to gain from the transparency features embedded with the Gemini Power tools there I think

I think that the main benefit goes back to this auditability. I don't know whether that's a word, but yeah, no, no, no, that's fine. But I think it really goes back to most of our clients need to understand, they need to prove where they got the information from, right? Because if they are doing research, again, whether you are in a consultancy or whether you work in a compliance department, right?

You really need to document why you made a certain decision. So being able to provide that information, knowing that one, it was licensed. So there's no copyright issues, which is important as well. And then understanding where it came from is important. So I advise a lot of CEOs on generative AI and I always tell them hallucination is a feature not a bug. So the big question on everybody's mind is the issue of hallucination.

What is the framework or set of principles you need to incorporate with the use of generative AI in your products? Because Factiva is a trusted resource, right? People trust the data that you put into there and asking queries with the source that you have. I think it goes, and I'm not a technologist, right? So I can talk about it generically. I think the...

main thing goes back to the testing and to constantly test it because there are ways that you can adjust the prompt to prevent hallucinations. So if you see that a hallucination is happening, can you then actually make adjustments to the prompt to prevent that from happening?

But going back to what I was saying earlier, it's a never-ending journey, right? Because once you fix that, you might actually see there's the next one or the next one. So that is why the testing is so extremely important. So how would you advise your customers to use the GenAI tools to stay agile in response to, say, maybe new data comes in the form of, say, compliance or regulations?

What I think is important, and we are following a similar structure, I think, is to be as flexible as possible. And what I mean by that is rather than focusing on one solution that solves one problem, can you actually deliver more of a solution

holistic solution that serves multiple purposes, and that can be because you're building a platform or you are building several modules. And by doing that, you can actually use that and you can adjust that based on compliance needs. The second piece that I think, and this is not so much with technology, but I'm thinking more about it from a data perspective or even a content perspective,

The more structure you bring towards your data and you bring towards your content, ultimately the easier it becomes to slice and dice the data. So even you asked earlier about like what hasn't changed with Infectiva.

From the beginning, the way that we process, the way that we ingest the news has been very, very structured. So even though it's unstructured text, within that, it's very structured. So it's all fielded information. And that allows you to say, actually, I know that I'm not allowed to use this information. I can easily exclude it. So it's really about the structure of your data as well. Mm-hmm.

So I think generative AI has been moving very, very fast. I think almost every week I'm chasing a new innovation. Of course, for some, for Dow Jones also continuously to innovate in the area of business intelligence. What do you envision, say, the next big advancement in AI for risk management and decision making? I mean, not necessarily to predict, but maybe what are the things you are looking out for? I was going to say, if I could predict the future, that would be super easy. That's not a bad question. But I think...

If I think about more the immediate future, the one thing that I'm excited about is really being able... So up until now, the way that we've looked at this is very much either it's structured data or it's unstructured data. Whereas...

The next step is combining these two. So we have this vast amount of structured data, we have this vast amount of unstructured data. Can we actually merge that together and then do the same type of like put on top of that the same search capabilities? If you manage to do that, you actually create a very intelligent platform. So what is the one question you wish more people would ask you about AI and data-infected?

I think what is interesting in those conversations is a lot of the discussion goes around the technology. And what I think is an interesting question to ask is, what is the importance of the human still in this, right? So... I'm going to ask you that question. What's the importance of human in the whole conversation? Thank you.

And this is, in that Jones a couple of years ago, we used this term authentic intelligence. And authentic intelligence really is to show that artificial intelligence alone isn't the answer. And we've seen multiple examples of that. You still need the human intelligence, the human knowledge, in order, for example, to decide,

what sources are good input you need to be able to decide what is it that you can do um

from an output perspective, if you're testing the output, what does good look like? Right now, a machine is unable to do that. So the way that we look at it is, what is it that the machine can do best? Let the machine do that and then actually free up the human so they can do like deeper investigations, for example, to bring more value to our clients.

So it's about the judgment then. So that comes to my traditional closing question about, well, what good look like? But I'm actually going for what does great look like for Factiva enabled by this new age of generative AI in the future? I think what good looks like is going back to what I was saying earlier. I think the challenge of combining structures with unstructured data, basically,

But I also think if we manage to stay true to our core principle of relying on trustworthy information, being transparent to our publishing partners, compensating them fairly, I think we've done a good job.

Ingrid, many thanks for coming on the show and spending the quality time with me to talk about Factiva and the use of generative AI. And of course, congratulations with the Gemini models. I'm looking forward to seeing some of those features. So in closing, I have two quick questions. Any recommendations which have inspired you recently? I'm going to go with, I read a book. This is actually a couple of years ago, but unfortunately, it's still the best book that I read since then.

which is called A Little Life. And A Little Life is a very depressing book. It's very well written. So you can just feel the anxiety seeping through the pages. But what it made me realize is that life is short and enjoy. So that would be my recommendation, even though that wasn't necessarily the story because the story was the opposite of that. But I think it showed the importance of doing just that.

pretty interesting book because we have just kept the conclusion of the US presidential elections but we will not talk about that so my final question how do my audience find you I am on LinkedIn so that is one way of finding me you will also find a

couple of interviews, I think, on YouTube. I did a very famous one a couple of years ago that people still watch, which is interesting. And then I'm right now really interacting with the publishing community because we believe that

working together as a community, we actually stand a better chance of being fairly compensated for all of our content. Thank you so much. And of course, you can find us in all the podcast platforms and also on YouTube and Spotify. And of course, subscribe to our newsletter, whether it's on LinkedIn or on our main site. So Ingrid, thanks for coming on the show. Thank you very much. And I look forward to continue the conversation. Great, thank you.

you