You're listening to Data Skeptic: Graphs and Networks, the podcast exploring how the graph data structure has an impact in science, industry, and elsewhere.
Welcome to Data Skeptic Graphs and Networks. Today on the show, we're going to cover the concept of actential networks. If you don't know what that is, don't worry. That's a big part of the interview. The TLDR is that an actential network is a rather novel and interesting way of going from natural language into a network structure from which we can study the narratives in a discourse.
Our guest today, Armen, looked at both social media data and some official political discourse documents from the European Union. His techniques apply well in both areas. So without further ado, let's jump right into the interview. ♪
My name is Armin Purnake. I'm a joint PhD candidate at the Max Planck Institute for Mathematics and Sciences in Leipzig and the Laboratoire Lattice and the Sciences Po Media Lab in Paris. And can you share a few details on what you're studying? In my PhD thesis, which is inscribed in the field of computational social science, I am developing methods to extract and analyze narratives in large textual corpora using a combination of natural language processing and network science.
And the goal is to better understand the role that narratives play in socio-political phenomena such as polarization and issue alignment. Could we zoom in on the word "narratives"? What does that mean in the context of your research? In my context, I'm mainly interested in political narratives.
and their functions in political communication and potentially also political persuasion. I conceptualize it in a way that allows me then to also operationalize the concept using methods from network science. The idea is to conceptualize narratives as a representation, and in particular political narratives as a representation of political reality. And this representation is done through the means of actors that have specific goals and motives.
And these actors take part in events which might induce state changes of the world. These events can then be decomposed as relationships between actors. We try to recover such narrative constructs that are then causal connections between these events from large textual corpora.
And what we want to rather see is something that I and my colleagues call narrative signals, which can be, for instance, specific actors or specific catchphrases that might allude to a larger political narrative. And the challenge somehow or the goal that we set here is to extract precisely these narrative signals and then reassemble them together in the form of networks to then make sense of the underlying narratives that are present in the corpus.
Well, in your definition, you said a political narrative is a representation of political reality. How does that work in a world where reality seems debatable anymore? This is precisely the point. I think this is exactly where the narrative comes in. The way we can see narratives here are really different interpretive lenses through which this political reality is being viewed.
The assumption is that in highly polarized debates where we see that two groups discuss the same event but in a totally different way, there are conflicting narratives about the same reality. And conflicting narratives in the sense of different roles and different relationships that are ascribed to actors. Well, you mentioned doing this on large corpora. What sort of corpora is interesting to you? There's political discourse in lots of different places.
That's right. Yeah. So right now we look a lot on social media data. We have a very large corpus that we analyzed of Twitter data. So every day we collected the top five trending topics and all the tweets that are associated with these specific hashtags or keywords. And this gave us a corpus of something like 20 million tweets. And this allowed us also to measure using retweet networks. And this is probably where we will come to the concept of networks.
Well, the topic of the podcast season is being graphs and networks. I obviously want to get there. But before we jump directly into the network science components, can we spend a little bit more time with the NLP side? What are some of the methodologies that were useful for you in this process? The main methodology or the main function that I really use natural language processing for in this framework is to extract these narrative signals from raw text.
And so what we need is a framework that allows us to very easily extract from a given sentence the actors and their roles and their relationships. So if there's only one actor, I want to know what they're doing. In computational narratology, this concept of narrativity has been summed up as finding out
Someone that tells someone somewhere that someone did something to someone somewhere at some time for some reason. So this is a paper by Piper and colleagues. Great title. And
Yeah, this is not the title of the paper, actually. This is just the kind of the way they define narrativity. The paper is called Narrative Theory for Computational and Narrative Understanding. I can also send it to you afterwards. Sure, yeah, that'd be great. So essentially what we want to extract is who does what to whom in a sentence. And for this, there are many approaches that we can take.
There are more traditional approaches using dependency trees, or you could use approaches based on semantic role labeling. The approach that I chose together with my colleague Tom Willert, the framework we choose is based on abstract meaning representation. And the idea here is that you extract the meaning of a sentence in this kind of language that was defined by the authors of this framework.
It is fairly similar to syntactic parsing, but it has some benefits that make it more suitable for the extraction of narrative signals. They are represented as a directed acyclical graph, where you have the predicate at the root, and then you have directed edges to the different arguments of that root. This structure in the form of a graph
is very powerful because it allows us to then process it very easily into the different sub clauses of the sentence. We use this as an intermediary step in order to transform a large corpus into an easily queryable database. So we go from this graph format, many many graphs depending on the number of sentences we have, to something like a tabular format.
that then allows us to very easily query the corpus for the main actors, their predicates associated or the verbs associated to them and their roles. From that we can then create these somehow like higher level representations in the form of what we call "actential graphs" where every node is an actor in the narrative and the links between them denote their relationships.
Let's do a little thought experiment. If we got maybe 20 experts in this structure, you're one of them, there's 19 more then who know how to build these structures, convert a sentence into the structure, and we asked everyone to translate the same text, would there be ambiguity amongst these experts? This is in a way how this works, right? So there are very clearly defined rules on how to build the AMR representation of a sentence.
And so this is a really, it's a formalized and clear language. There could be some small ambiguities sometimes that are related to some edge cases, but in principle, it is a clear mapping. The way that these systems now work, and in particular, for instance, these parsers that I'm using, is that they are trained on large numbers of examples.
These are essentially transformer models then. You give it an ascendance that it hasn't seen and it is able to infer the AMR representation based on this pre-training that it did. And the accuracy is fairly high. I think the benchmarks that I've seen are close to 90%. So then we take a large document, let's say, and we turn it into a sequence of graphs, I guess?
Exactly. We cut the document into sentences because this is something that, I mean, people are working on this and I think there exists also multi-sentence AMR parses, but in principle it is limited to the sentence. So we first cut the corpus sentence by sentence.
And then we generate these AMR graphs for each sentence, and then we somehow reassemble them into this large table. I think that's the process of getting towards, I may not say this correctly, is it the actential network? That's right, yes. Can you talk about that transformation? How do we go from sequence of graphs to this presumably more useful network representation?
Everyone's talking about AI these days, right? It's changing how we work, how we learn, and how we interact with the world at a tremendous pace. It's a gold rush at the frontier. But if we're not careful, we might end up in a heap of trouble.
That's why I recommend you check out Red Hat's podcast, Compiler. This season on Compiler, they're diving deep into how AI is reshaping the world we live in. From the ethics of automation to the code behind machine learning, it's breaking down the requirements, capabilities, and implications of using AI. Check out the new season of Compiler, an original podcast from Red Hat. Subscribe now wherever you get podcasts.
Delete.me makes it easy, quick, and safe to remove your personal data online at any time when surveillance and data breaches are common enough to make everyone vulnerable. Like many of us who enjoy connecting online, I take my privacy and personal information seriously. Privacy protection has never been more crucial. That's where Delete.me comes in. Their team of experts specializes in finding and removing your sensitive data from data broker sites before it can be exploited by bad actors.
In just one quarter, they reviewed over 3,000 listings containing my personal information. They're sending over detailed reports that show exactly what they found and what they removed. It's incredible how much exposed data they discover and protect, all while you're saving countless hours of frustration. Take control of your data and keep your private life private by signing up for Delete Me, now at a special discount for our listeners. Today, get 20% off your Delete Me plan by texting DATA
to 64,000. The only way to get 20% off is to text data to 64,000. That's data to 64,000 messages and data rates may apply. How do we go from sequence of graphs to this presumably more useful network representation? So the one issue that we have, and this is in general a problem when you deal with large data, is that you have a lot of
sentences, for instance, in there that might not necessarily carry any narrativity, or maybe to say it a bit more, more, more clearly, that might not not really be part of the narrative. And, and so one approach to mitigate this is to try to extract really the strongest signals, in the sense of the actors that appear the most, or the actions that that are repeated the most,
And for this, depending on the corpus, we have different options. For instance, using social media data, we could, for instance, focus only on those relationships between actors that are retweeted or liked by the community the most. For instance, if there is a sentence
um like the covid vaccine saves many lives and this is retweeted a million times then this might be an important part of of of the narrative of a certain group for instance um and so how do we get to this how do we get to this actential graph as i said before in the actential graph each node is an actor
And the directed link between two actors needs to carry some information about the nature of their relationship. Now, before the actential graph, we can create something like an intermediary network, which is... I'm using graph and network a bit interchangeably. This is maybe also not very clean, but...
so so before that we can we can just think about what is the what what is the easiest way or what is the easiest approach to use uh this this large sequence of amr graphs in order to extract information about the actors and their relationships one thing that we can do is that we create a multi-graph a directed multi-graph where each node is a is an actor within the narrative
And I have one edge for each action that actor A performs on to actor B. So in the case of the sentence that I told you before, or maybe the first one, there would be a directed edge from physicist to experiment. And this edge would carry the label adjust.
And we do this at scale, right? We do this for the whole corpus and then we get a very, very large graph with a lot of edges between many actors. And this is somewhat, we can see this as a representation of our corpus, as a flattened representation, because we lose the temporality, we lose...
somehow the sequence also of actions. Of course, in principle, we could we could assume we could model this also as a temporal network. But for now, we keep it very simple and say this is just this is just a static representation of our corpus.
And now what we want to do typically in approaches that should yield interesting hypotheses or interesting research questions for social science is that we have all this data and we want to somehow reduce it and get somewhat the most meaningful signals out of there. And the one approach that we chose to take in this project of extracting narrative signals with AMR
is that we were asking how can we summarize the relationship between two nodes in a meaningful way. So imagine I have not only a just between physicist and experiment but many many other things that the physicist does to the experiment. And since ultimately we were interested in the analysis of conflicting narratives and polarized debates,
We were also looking a little bit at the literature in narratology, more the structuralist literature in narratology. And what we found was that one central or one archetypical relationship that is usually found in narratives is the one of the protagonist and the antagonist or the helper and the opponent.
And usually in folk tales or in some kind of narratives that are more from the literary world, there is one protagonist and one antagonist. Usually you can summarize tales in that way. But in political narratives, it might be a bit more complicated than that.
That's true, yeah. Maybe I'm also overestimating the complexity of political analysis. This might be something also to discuss. But the assumption is that there might be more than that. And so the idea is to instead of define this for the whole corpus,
We define these in a relational way between two actors. So I can be either your helper or your opponent. And so what we're asking for each edge in this graph is whether it implies a supportive or a conflictive relationship between these two nodes. And we can do this in many different ways. In this case, what we did is that we harnessed...
So we use the fact that we have prop bank frames for these verbs and these can be mapped onto another ontology called Verb Atlas, which gives us some higher order categorization of the verbs that exist between the two actors. And then what we did was that we manually just hand labeled each of these verbs. I think it was something like 500 verbs or something.
into either supportive, conflictive or neutral. Some of them are a little bit ambiguous sometimes, but then we just call them neutral if it was the case. And then all of a sudden it becomes very straightforward, right? We have a label for each of these verbs. We know whether they are positive or negative or supportive or conflictive. And then we can compute an edge score between the two nodes based on the number of conflictive versus the number of supportive nodes.
actions between them. So what we do is we take the number of supportive minus the number of conflictive and divide it by the sum of all actions. And that gives us a score between -1 and 1, where 0 is neutral and -1 is strongly conflictive and +1 is strongly supportive. So we have this plus the weight, which is the number of actions between the actors.
And this all of a sudden becomes a much more manageable network because we somehow flat, we reduced heavily the complexity and of course we lose a lot of information. But as a first approach in order to see what somehow is found in the data, we thought that it was fairly beneficial because this now allows us to systematically find differences in narratives between two groups.
So now you can imagine I do this exercise for the corpus of, for instance, I don't know, certain clusters in retweet networks that might be more left leaning and another cluster of retweet networks which might be more right leaning. And then I can systematically compare the sign of the edges between two actors. And if the sign of the edges are different,
then this points to conflicting narratives in the sense of different relationships that are attributed to actors. And this might point to this difference in perception of political reality. For instance, in the case of COVID, between the COVID vaccine and safety, which for instance in right-leaning classes is not at all a supportive edge.
So I think I spoke a bit too long, but this is the way we get to the existential graph. Yeah.
One of the, at least one of the corpuses you looked at is the State of the European Union addresses for over a decade. Could we talk about the structure of that graph? I don't expect you'd know top of mind how many nodes and this sort of thing, but broadly speaking, how big is it? How many nodes? How sparse is it? Or that sort of thing. Could you give us a language summary of a graph data structure? Actually, the main idea behind it was that
There is already quite an interesting body of qualitative literature that analyzes narratives of European integration. And in particular, there are several papers that we cite. In particular, there is already something like a taxonomy or a categorization of different narratives that has been done by social science researchers.
The idea was to extract these narrative signals from the corpus inductively using this AMR method and then see whether we can find traces of these different narratives in the corpus by looking at the actential graph and by also systematically extracting the actors and so on.
The full Accenture graph, if we extract it, consists of 1572 nodes and 1778 links. Big and robust but manageable. It is still fairly manageable, yeah. You can still somewhat explore the full graph. And I think in the paper I'm also linking an interactive version. This is also one kind of side product of this approach.
is that it can be really used as a reading device for a corpus. You have, I don't know, these are something like 11 or 12 speeches and reading all of them is actually doable, but it takes a while.
It's not like reading 20 million tweets, so you're still somewhat not yet at big data, right? But if you just look at this graph, you immediately see, okay, what are the central actors? And these actors also are connected to narratives and topics. Of course, if you see the actor of the economy or growth, for instance, in one cluster or the market, then this alludes to the neoliberal narrative.
If you see the actors of solidarity connected to migration and responsibility, for instance, this might allude to this narrative of inclusion. What you can do now from this graph perspective
is that it allows you something in between close and distant reading. So I'm referring here to this concept of Moretti, of these different ways of looking at textual corpora. And so distant reading would be really to look only at a very high level overview of the text in the form of, for instance, the actantial graphs.
Close reading would be to really actually read these speeches. If we look at an interactive graph, we have the possibility of doing both, actually. We can use this graph in order to sample the data set for exactly the points that are interesting to us. And if I want to know, for instance, what is contained in these speeches about the relationship between the European Union and the job market, for instance...
Then I can just go into this part of the graph and click on the edges and the edge already will tell me whether it's positive or negative. But then I can really go and go to these sections of the speeches that then give me enough context to really understand what also the different speakers think about or what the different speakers express about.
about these relationships. Well, having some of those insights you shared, like perceptions on COVID and perceptions on immigration very clearly aligning with the polarity you'd expect, given the actors they're connected to, is like, what would you call it? It reinforces a confidence that the network has the right structure and is emerging truthful, accurate details. Were there any surprising things that emerged in that process or things you didn't expect to find?
Yeah, the thing is like for the European Union, it was really neatly confirming the approaches from the qualitative analysis. We find that certain issues are emphasized differently depending on the speaker. But I think this is also something that probably has been found on social media. Here it's a bit easier to find surprising edges. When we were looking at the topic of climate change on Twitter,
As I said before, we compare the narratives that are connected to different events and different issues for different opinion groups. And these opinion groups are extracted using clusters in retweet networks.
And we were looking at topics related to climate change and then you see in these in these existential graphs what you would usually expect also related to the events that were salient during that time in Germany. So you have for instance discussions about speed limits on highways or you have
discussions about uh of course the climate summits and so on but then there was one thing that popped out very strongly in the right-leaning cluster one and this was the act and Lidl the the supermarket chain and and this was a bit surprising because I I I did not know what what role they played in climate change and this is also an actor that you do not see at all on the left-leaning narrative
And so then we looked closely and we looked at what it is connected to. And there's a very strong connection between Lidl and meat, a very negative one. And then it turns out when you look at it a bit more closely, when you look at tweets, there is this
I think the CEO of Lidl was invited to some summit and he claimed that they want to invest more in plant based products now and they want to sell more of this. And I mean, to me, this is a little bit of a marketing thing.
But to the right wing cluster, this was terrible. And they claimed that now Lidl would ban meat and that this would be the end of Lidl. And of course, banning meat is part of a larger narrative, of course, of obligations and loss of freedom that is connected to climate change policies. And so this was used somehow as something like a trigger point to spark outrage.
And to connect all of this to a larger narrative about climate change skepticism. But I found it interesting because it is something that was not really on the news also. But this was a very, very salient kind of connection in the right-leaning narratives.
So it's clearly a robust structure, the Actential network. You could probably share it with people who could in other departments, who could do social sciences and all types of studies from it. For your particular work, how does it help you study polarization and issue alignment? About your first point, this is precisely what we're doing right now as well. So we have colleagues that are, for instance, that are experts in pro-Russian conspiracy narratives, for instance.
We can show them these graphs and they speak to them, of course, much more than to someone who is not an expert in the field. And then they can connect certain edges that they see to other things that they know from their qualitative research, for instance. So in this sense, this is very powerful.
And for our research, well, one thing that I use them for is really to really better understand the origins of polarization or at least the origins of the polarization that we observe. We take retweet networks on Twitter where nodes are users and a directed link from user I to user J denotes that I retweets J. Since retweeting is endorsement,
Now, there's a lot of literature about this. So this is something like a robust signal. Quoting is not. Usually when I retweet and write something on top, this usually means that I have something against what has been said. But if I just retweet, this is usually considered endorsement. We can then assume that clusters in these retweet networks are something like opinion clusters in the underlying debate.
So if users retweet each other very much, then usually this corresponds to something like an opinion group in the underlying debate. And in the German case, and we have been working on Twitter for quite some time now, and also building interactive interfaces in order to analyze retweet networks, what we usually see for political topics is that we get two camps, something like a center-left leaning cluster,
and a more right-leaning cluster. And so the question that we had in this project on issue alignment and polarization was, is the Twittersphere really polarized? Do all the different topics generate polarized retweet networks? And if yes, are these clusters that we see for each different trend or each different topic, are they always the same? So do different topics sort people into the same clusters or not?
If these clusters are persistent across topics, then this would be a strong evidence for issue alignment in the sense that no matter what, if I know a user's stance on COVID vaccines, then maybe I can also infer their stance on speed limits on highways, for instance. What we see is that the clusters are fairly persistent across topics. So we have very strong evidence for issue alignment. And one thing that we see in particular is that there are two groups of users that
who might play a very strong role in this one group being what we call influencers these are the users that are the most retweeted so these are the ones that generate the content that circulates
And the other group of users that play an important role are what we call multipliers. These are the opposite. They don't create any content themselves, but they just very heavily retweet. We divided our user base into these two groups, and then we compared how these two users are aligned across topics. And what we find is that both are very strongly aligned. But in particular, multipliers, so the ones that are the strong retweeters, they are more aligned even than influencers.
And what this showed us is for, and we looked at this then in detail for certain topics, is that there is this strong core of users that tend to, through their retweeting behavior, play a role in aligning the whole discourse somehow into ideologically consistent bundles. And so across issues, they have to have a very strong, as if it was some kind of agenda or a very strong signal,
into one or the other camp, much more so than the influencers have. And so now the natural question that arose from this was that,
If we see this in the structure, what can we say about discourse? And what can we say about the tweets that are behind all of this, right? Because in the end, we're just using the affordance of the retweet in order to build this whole schema of analyzing and computing this. And so the text so far has not been looked at in this project. And so the next step was to look at the actual text.
And since we already had this division into these camps, so we use a measure of user alignment across topics that somewhat divides the user base into these two groups. For each trend or for each subcorpus, we can ask for each tweet, was it retweeted by the left or by the right more? And then we set a threshold and then we create these two corpora that we then can analyze using the narrative method.
And then for each topic, we can generate two existential graphs, one for the left, one for the right. And then we can systematically compare these graphs. Let's take the topic of Ukraine or let's take the topic of COVID. First of all, are they talking about the same actors?
Then if they are, do they connect them in the same way or not? We can use methods of graph distance if we want to even in order to measure this in a formal way. We have these two these two actual graphs.
And then we can systematically compare them. And then we can look at the main actors that are involved. There, as a side product, what we also observed is that the groups tend to speak more about the political enemies than about their political allies.
And this is something that has been shown previously also in the literature. We know that this happens, but let's look actually in this narrative structure, in the actential graph, how do they speak about them? And how do they implot them into their narrative? And then we can really systematically compare for certain issues. And this is now ongoing work. We're currently working on it and I'm writing the paper.
For certain issues, the narratives are conflicting in the sense that
There are the same actors involved, but they are connected totally differently. And their role in the narrative is a different one. In the example of the Russian invasion of Ukraine, in the left-leaning narrative, it is very clear that NATO, for instance, is trying to prevent the war and Germany should send weapons in order for Ukraine to defend itself, and in particular also for Ukraine to defend our freedom.
Whereas on the right-leaning narrative, it is completely the opposite. The NATO is profiting from the war. In fact, they call it a NATO war. Ukraine is defending freedom, but it's not defending our freedom. It's defending their own freedom. These snippets that I gave you now, these translate into opposing signs in relationships in these existential graphs. So this representation allows us to very easily find what we call the fault lines in polarized debates.
So we can very easily know what are the discursive markers and what are the conflicting narratives that on one hand might lead to the polarization, but also maybe might just be explanations for the assumed opinion on a given topic.
What's next for you and for the project? So next for me is to submit my thesis, which will happen now in two weeks, two, three weeks. And then, so there are the papers that are currently under review. Some of them have been accepted. And then there's the one about the ongoing work that I just talked to you about, which is going to be submitted very soon also. And then there's a few conferences this summer where I will present all of this work. And then, yeah, I'll see where I go next.
And is there anywhere listeners can follow you online? I usually post new results on my website, but you can find me on Twitter and Blue Sky as well.