You're listening to Data Skeptic: Graphs and Networks, the podcast exploring how the graph data structure has an impact in science, industry, and elsewhere. Asaf, I know you've started teaching. Can you tell listeners a little bit about what you're teaching and some of the details around the course? You know what they say, those who can't do, teach. So, and it's really hard to do things with network science.
Because, well, there are a few reasons for it. Because network science is multidisciplinary, right? And maybe too multidisciplinary. And it's so relevant to so many fields. You have to be an expert in each field to show people why network science is so good for them. I decided it was just too difficult. So I decided to teach instead.
What I'm teaching, well, not what the students expect, I guess. Because I'm a network analyst, I'm trying to teach them how to use networks in practice and how to think in networks. When I saw the different syllabi, right, plural of syllables,
Sure. Okay, cool. I see people usually start with the Euler's seven bridges of Königsberg example. Sure. It's a requirement, right? Start with the seven bridges. Yeah, right. And then start mentioning the different algorithms that there are.
The problem is that, okay, you learn about different algorithms you can use on networks or graphs, but you don't understand why you need them and how to use them. Okay, so you understand the formula, but you don't know the relevance.
Relevance in context is the secret sauce that networks add to your data. To understand the connections where you are in the network, it's really all about context. Well, can we frame that maybe from the perspective of social networks? Many people are engaged in one or more social networks. They kind of understand they're connected to people. How can they think about it from the lens of network science?
I was reading a paper. I have a train of thought here and I'll get your question. But recently I read a paper about influence. It's called, I have it here actually, The Spread of Behavior in an Online Social Network Experiment. That's not a new paper, but it's a very interesting one. It's about the influence that passes through a network
What the paper shows is that the structure of the network influences the influence. He constructed two networks and randomized the participants between them and actually controlled for other variables. To mitigate the effect of other variables, the only difference between the two networks was their structure. One was randomly connected and one was more clustered.
more similar to real-world networks. What he showed was that the influence passes, there can be more influence in a clustered network than in a random network. When you think about the structure of the social networks, we usually think of it as random, right? I met this guy in a party somewhere. It was by chance, right? I talked
talked with this guy i haven't talked with years because we met on the street so many of our interactions are random or seem to be random i think i know where you're going with this but yes go on seems to be random to the untrained eye seems to be right but i think when you understand networks i think there's um you know i would say a philosophical aspect of it
When you look at network science and the discoveries that were at the base of network science, like the small world law in networks, that networks have... It's kind of a paradox, right? The bigger the network, there are more shortcuts between people on the network. The power distribution or the long-tail distribution of the degree, the number of connections,
that different nodes have in the network, right? There are a few with many and many with just a few.
The community structure in networks, the clustering coefficient in networks. I don't think we talked about it. No, can we go into clustering coefficient? Yeah, just in a sentence that clustering coefficient is actually the amount of triads in a network. How many triads, the closed triangles you have in a network. So a triad is just three people that are all connected, like mutual friends. Exactly. And all of these
Phenomenons we see in networks we don't see in random networks. When we compare it to... We do see it in real-world networks, okay? And we understand it's not random because when we compare it to random networks, we see these phenomenons. It makes you think, right? If it's not random, what is it?
What I'm trying to pass along to the students is the feel for networks. By that, I'm using lots of real-world examples of how to look at networks, how to see networks, and less about the theory. Practical things. If I can say about... We talked a lot in this series about graph theory and graphs. You know...
I don't practice graph theory. It's funny because to me, as a computer scientist, that's the term for everything. And network science is a part of it. Yes. That's why I always say you complete me.
Graph theory doesn't explain networks. It utilizes them. It uses networks and graphs as a tool. Even the concept of graph. Graph isn't a network per se, right? It's a drawing of some kind. Well, it's a terrible vocabulary word in that sense. Yeah, people think it's a data visualization, not nodes and edges. Exactly, but I think graph theory uses networks as a tool.
graph of some sort, right? As a tool. But it doesn't explain the phenomena. It doesn't give you insights about the data. So that's why I prefer the network science and the SNA, the social network analysis perspective. Speaking of which, we look at the entire series and try to think what haven't we covered, right? Good question.
Yeah, it was yours. So I just repeat what my boss says. Always a good idea. I thought, well, I don't think we covered epidemiology, right? The spread of diseases, except for a small part in a video game, I think.
You know, it was really a very big issue for us, like not so long ago, right? It was like two long years. And network science had a lot to do with it. And I think not enough because many governments, I think, didn't, you know, I can't blame anyone because it was like, you know, one of a kind event, right?
and there were lots of stakes there. But still, I think network science could have been used more in more countries to stop the COVID. And we didn't mention neuroscience. Actually, Barabasi, the lord of the networks, called neuroscience, I think, the last frontier of network science. Quite a dense network there in the brain, yeah.
Delete.me makes it easy, quick, and safe to remove your personal data online at a time when surveillance and data breaches are common enough to make everyone vulnerable.
As a podcast host who's been in the public eye for years, I know firsthand how important privacy is. I got my latest Delete Me report, and let me tell you, they found and removed 35 listings of my personal information from 32 different data broker sites. My name, contact information, home address, even information about family members. That's all compiled by data brokers and sold online. It's just out there for anyone to find.
Delete.me's experts spent about 22 hours scanning and removing my data, work I'd never have time to do myself. What's really concerning is that without Delete.me, this personal information would be available to anyone with an internet connection.
Take control of your data and keep your private life private by signing up for Delete Me. Now at a special discount for our listeners. Today, get 20% off your Delete Me plan by texting DATA to 64000. The one way to get the 20% off is to text DATA to 64000. That's D-A-T-A to 64000. Message and data rates may apply.
Barabasi, the lord of the networks, called the neuroscience, I think, the last frontier of network science. Quite a dense network there, the brain, yeah. Cyber networks, we just, I think, brushed, we didn't touch it. What's a cyber network? No, the aspect of cyber in networks, cyber attacks, defense, and so on. Yes, yes, yes. That's talking about network science. When I...
look at SNA, social network analysis. You know, there's like many fields that use SNA. History studies uses some SNA. There are some cool stuff there. And we didn't even mention blockchain. You know what? I think it's a good idea we didn't mention blockchain. Yeah. Two topics disinterest me more than blockchain.
Indeed. And I think you can use network science to show why blockchain is such a bluff. Oh, now that's interesting. I'd love to cover that. Do tell.
You know, what's the claim to fame of blockchain or crypto? Well, no, I think that let's be more of a straw man here and describe it in the kindest way that it's a technology for an open ledger and it's a computing platform upon which clever things could be done in public. Okay. Okay. Let's, for argument's sake, let's say it's true. Sure. Well, like...
I'll give you a quick 30-second example. There was this idea to get rid of ticket scalpers, that in order to use your ticket for an event, you had to show your true identity. And if you wanted to transfer it, you could. You could sell it, but that some large percentage of that would go to the artist, not to the scalper. So in which case, it's fine. If the artist gets 50% of the money, let it be resold five times, and they cash in each time. That sounds good to me.
Yeah, I don't know if the problem is such an acute... Does it require the blockchain? That's a good question. You can say that. But
What I'm talking about is like when you talk about cryptocurrency that runs on blockchain. The claim to fame of crypto is that it's not centralized. It's a decentralized coin. That's what they say. And because it's decentralized, it belongs to the people. It belongs to the people.
It belongs to the miners. Yeah. Yeah. And what people don't know. And when you think in networks, it's, you know, it's it's crystal clear that the network isn't the crypto network is not decentralized. It's actually, I guess, the most centralized currency in the world. Right. Because as you said, it belongs to the miners. And how many miners are there?
So, and, you know, there are lots of papers about it and I guess it won't come as a shock to you that 10% of the miners control 90% of the market, right? Because it's the basic long tail distribution that we know from networks.
What people might not know is that these 10% of the miners, most of them, of course, in China. So if your claim is it's the most centralized, we have to compare that to regular currency, right? How do we measure their level of centralization? Yes. Let's say we call it, I think, fiat, right? The currency held by governments. Usually when I hear that word, I become disinterested, but go on.
Well, I know I look like a big finance guy, but I'm not. I'm not. I'm really not. I'm looking for loose change in the couch like everyone else. No, I trust you. What I'm saying is that a currency is influenced by lots of things.
The government decides, but really it's not, right? There are markets and it's a complex system. But in the currency that is cryptocurrency, the problem is that if you can change the market completely, right? If a central stock exchange of crypto or a user, okay? We know there are very few because of the long-term distribution.
If they decide to manipulate the currency, they can. They can easily do it. And I guess they do. And I guess they do. Enough on blockchain. I actually want to go back to one of your comments about the pandemic. I don't know if this comment made the air, but made our airwaves. But you commented to me that for a time, all the network science conferences were nonstop COVID-19 papers in some sense. And we did a few, you know, little series. What would we call it?
I don't even remember the title. We did a short series on things here on the podcast about COVID and the pandemic. It just seemed to be a topic that was exhausted a little bit. So the analysis is out there. People do tons and tons of analysis, interesting things to say, but I'm more interested in looking forward. So we could say, what could we have done better? Maybe we should phrase it, what will we do better next time?
Where does the effort lie? Should it be in contact tracing or what aspects of network science can be most protective and helpful to society? Well, I think at the beginning of our talk, I said the problem with network science is you are a jack of all trades but master of none. Well, let's put a team together. I certainly qualify for a master of none. So I'm...
I don't, I'm not an epidemiologist. Okay. I don't know. But from my very narrow point of view, I could say that I think models used by network science were very popular. I think they were used and used well, at least where I saw it in my country.
But I think contact tracing was very... They didn't do contact tracing well, and I think a few countries did. And I think the problem is network thinking. I thought we were encouraging that. Yeah, no, I think...
The lack of network thinking. When you look at contact tracing, what you get, you get lists, lists upon lists of contacts. And if you think in networks, you know that most of them is, well, I won't say garbage, right? Must be false positives all over the place. No, no, no.
Let's say people say they're real contacts, but what I'm saying is most of them, garbage is not the word, they are on the long tail. They don't matter to the network. They are the least important nodes in the network, most of them. How do I know it? Because that's how networks are. Most of the contacts are not important to the network.
They don't influence the network. Let's use me as an example then. When the pandemic hit, it was very simple for me to transition into remote work. I shut down the data skeptic office. Everyone went remote. I've still never gotten COVID and I isolated
you know, almost perfectly for the worst of it because I was capable of that, right? My job, et cetera, et cetera, was possible. So I'm in the long tail, right? Because I didn't contribute anything to the network. And you like graph theory and statistics. So no, no, no. You know, I see a pattern here, right? Of seclusion.
But what about a DoorDash worker, as an example? How do we envision them in the network? How do we envision? Sorry, I probably used a local term. A food delivery person. Ah, DoorDash. DoorDash. I didn't hear. What you're looking for are the super spreaders, the potential super spreaders in the network. And what we know for sure is that there are just a few of them. A few can be, you know, when you look at an entire state, it's...
It's not so few, but still, you look for the, let's call it the 1%, not the other 99%. What you need to do is not go over all the lists in a FIFO manner, first in, first out manner, right? You need to look for the ones with the highest degree and then take care of them and their contacts.
I don't think this message hit the target. I don't think people understood it. We saw it, I think, in the papers, right? I saw some articles about how 10% of people are responsible for 90% of the cases in COVID-19.
and people were shocked to hear about it, but people who work with networks know it by heart. And I think it didn't trickle into the minds of the people in charge. I think that's the main thing I would change in the next pandemic.
rain time rain time yeah rain coming in an example i uh i used in in class was a brand a company's brand try to look at the brand of ford specifically f-150 ev there are lots of posts about the ford brand and the f-150 and so on and you get lots of posts but
But not all of them are relevant, right? You need to find the relevant ones. And what we're looking for are, let's say, influences that could, you know, how do we do it if there's lots of noise, right? So first thing data scientists do, right, is to clean the data, right? To look for texts that are not relevant, look for sentiment and so on. But you don't need to do it with networks, right? Because networks can...
find the signal themselves. What the network does is all posts that are not relevant won't be in the same cluster as the other ones. What we know about networks is that when we open up a network, what we'll see is a giant connected component and lots of small connected components that are not
connected to the main component, of course, making them not relevant, right? Because they are very small, not significant on the network itself. So we look at the giant connected component and if it's there, it's relevant.
What network science allows us to do is because we know networks are built by, are made of communities, we can find the community that interests us. So in the case of Ford, the whole giant connected component was relevant. So we can say what's outside of it,
Doesn't matter. But sometimes we need to be more specific. We want a community that maybe can be defined by a geographical location or specific interest in our product or something. Networks by themselves help us to get rid of the noise.
And I think the same goes for contact tracing, right? We'll have a giant connected component and I guess it won't be so giant because we can't follow people, as you said, with GPS and so on. But it will be the largest connected component on the network.
We need to look for the people with the highest degree, with many contacts that influence the rest of the network. I'm simplifying it. It doesn't have to be the high degree nodes.
Just keep things simple. Look for the central people in this network and there's your signal. And those are the people who are most contributing to spreading? Is that the right interpretation? Yes. Following the information you have.
It's better to have some information than no information at all. Well, some truthful information, it depends. It's funny you mentioned that because when I talk about social networks and especially when you actively build a network, like in the case of organizational network analysis, sometimes you build the network by asking people, who do you meet on a daily basis and so on.
Who do you come for advice? And people say, well, people can lie. Even people who lie and say they meet lots of people and they are popular and so on.
networks can easily detect it, right? You don't have to go far, right? The whole idea of page rank or even in degree, in degree centrality, right? It can say just to see how many edges point to you. You know, you can't change it. You can't lie about it. The network puts you in the right context. And as I said, context is a very strong word because...
Network science is all about... Networks are all about context. Let's put the liars out of the pool and then just talk about...
false negatives. So maybe the true super spreader was Kyle, the checkout clerk who doesn't wear his mask and, you know, all this kind of stuff. And when you ask people who happened to have shopped at that store that day where I, the checkout person coughed on them, they don't report the checkout clerk because that's not in their list of names. They list their friends, their coworkers that they saw and so-called important people. What if the unimportant person is the spreader? Again,
i'm not an epidemiologist okay but the example you gave is um is interesting because i don't know if you can get it from us you know just a brief encounter right you need to to spend some time
I guess, but nevertheless, okay, so you probably miss someone. You probably, it's not probably, you most definitely will not have the entire contact network. That's a given. But if you push enough buttons on the network, suppress enough central nodes in this network,
I think you'll have a better chance of, you know, you can't stop the disease entirely, right? But you can mitigate it. You can try to slow it down a bit. I think that's what we are aiming for, right? The funny thing is that organizations, at least in my country, kept doing contact tracing, but they didn't do it efficiently. They did it,
But they did it, you know, maybe they didn't efficiently, but they didn't do it effectively. People did do contact tracing. I got COVID and they asked who did I come in touch with and so on and so on. If you do it, do it properly, do it effectively. Well, let me jump way back to the question I probably should have led with some time ago regarding the course. What is the title of the course? Is there a number and where is it taught? Yes.
The course is called Complex Networks and the reason for it is that they forgot to change the title. What was it going to be? I would have called it Telling a Story with Networks. Okay.
Because I think networks are a great way to tell a story about the data. And I think when you have data, what you actually want to do is tell a story, right? Because let's say you're an analyst and you have lots of data and you're trying to get insights from it. And more importantly, you want to deliver the insights to people that can do something with it, right? To make it actionable, right?
You do it by telling a story about it. If you can't tell a story, if you just throw someone lots of data and say, well, it's obvious what you need to do. Maybe it's not so obvious to them and you need to deliver it in a way. I think networks are a great way to do it. That's the name I would have chosen.
I don't think they would have proved it, but still, it's an engineering department, right? More people like you than people like me. What are the prerequisites? What does a student need to have to be equipped to take the course?
It's a great question because there are no prerequisites, at least from my end. They put some again because they forgot to change it. But if you have a passion to learn something new, you're welcome. That's me. So I've been working a lot with Python's NetworkX library, and I like it a lot for small data that fits in memory. It's pretty good.
What is the tool set that a student would need to be successful in the course? Okay, so as you know, I believe in no-code policy. All right. Okay, because I believe it's like the movie Ratatouille. I think everyone can cook. Like the rat said, everyone can cook. I think everyone should be able to analyze the network, and not everyone can code.
So we use Gephi, which is the software for analyzing networks. It's not the easiest software there is, but it's a powerful tool. To do a deep study, you need... Yeah, it's a really powerful tool. NetworkX is great. I think 99% of people analyzing networks, I guess, use it. And...
I say, you know what, maybe 90% because the others are using iGraph for R. I'm not familiar with iGraph. It's an R package, you said? Yeah. Oh, cool. Probably somewhat similar to NetworkX. Maybe in typical R fashion, probably has better built-in visual things in the library.
But I think it's the most common libraries for network analysis. Good resources. And what do you see as the learning curve for students coming in without programming skills who pick up one of those tools? I guess it'd be Giphy for them. It's hard to say because it's my first course. Okay, I'll let you know.
I let you know. Very cool. And at the end of it, what skill set does the student come out with if you were to summarize it in a paragraph or so?
The student can build the network and analyze it and derive insights if he or she were a good student, I hope, and use Gephi in order to do it. But more importantly, I can't quantify it, but I would like them to think in networks. And it's something that lots of people in networks say.
And it's very hard to explain, but I hope in the last minutes we had this talk, I hope people would understand what does it mean to think in networks. Good idea, yeah.
Well, I didn't get to my list of things we haven't covered or covered yet in the season. I've got a few I'd like to get your reaction on. The first I think is, well, maybe we've covered it to some degree, but could have done more on graphical neural networks. Seems to, I don't know if I should call it an emerging field, but the machine learning community has a lot of publications coming out dealing with graph neural networks for link prediction or node classification or some of these typical GNN problems.
The trouble is they were all, at least in my assessment, what I found very theoretical, which is great for this show too, but not a lot of practical use cases, which I always prefer to tackle if we've got them. So I'll have to keep watching graphical neural networks evolve into the future, I think.
Machine learning is, of course, part of it. But again, in the case of machine learning, kind of like in the graph theory, networks are a tool, not the means, not the end. Machine learning uses network features because what's cool about networks is that you don't need more data. You can take the same data and just find more dimensions in it using networks, right? And that's what machine learning is looking for.
So it's very cool. But again, it's more machine learning, less network insight from my end. A couple of others we didn't get to cover. Transportation networks that I just couldn't find that much exciting research on. Topology, because I'm not sure I knew how to query. And then ethics, just because it seems that a lot of networks contain a lot of PII and there should be some adjacent ethical question here that I didn't really find the right person to interview on.
A lot of ethics people, but not graphs and ethics that I was able to locate. You know, I think a friend of mine was the first one to predict that before AI will come the ethics, the discussion about ethics of AI. My last one is we really haven't covered the relationship between networks and NP-complete problems.
It is very interesting how many graph theory problems are NP-complete. I mean, once one is, it makes sense. Many are. That's the nature of the class NP-complete. But the fact that they all pose unique computational challenges and cannot be efficiently solved or answered is novel and interesting, especially when we mesh this with your point that a random graph is –
Not so interesting. You know, a real graph bears all these properties of network science. Well, could we...
apply graph theory and look at NP-complete problems from the lens of only those graphs one would find interesting. Although I don't know how you define interesting. Random networks are very important because it helps us to, we use them to compare to real-world networks. Those are what I call the interesting networks. But what's interesting is that, you know, when you say NP-complete, some of the problems in network science that
Some of it is kind of fuzzy. One of the major discoveries, right? The power load distribution. When people came around to it and studied it vigorously, they found out it's not so accurate, right? Not all networks, actually, most of the networks are not power load distributed. They are maybe long tail distributed. That's why I use long tail. But they're not exactly power load or scale free networks. And communities...
When I say communities, I mean clusters. What is a cluster? What do you define as a cluster? Well, when you have a clique or a complete graph where every node is connected, of course it's a community or a cluster.
Most of the communities are not cliques or clusters. So how do you define it? So there's a loose definition that a community is a cluster where more edges are inside it than outside of it, that point outside. But is it a scientific definition? I don't know. It's kind of fuzzy.
Well, it's robust in that we could mathematize that more. We could say, by your algorithm or your definition, it has to be 30% or minus 60%. You know, we could kind of make a generalized statement that encompasses the domain of all possible constructions of communities.
Yeah, there's hundreds of community detection algorithms because of it, right? Because of this definition. And, you know, when you involve thresholds, that's when I say, you know, it's, yeah, exactly. So I think network science has a long way, still a long way to go. And we didn't talk, you know, about dynamic networks, right? Because most of the networks are dynamic and most of
most of the networks we talked about are static right we look at the snapshot of a network in time but we didn't talk about how they evolve and what does it mean that you know like the internet there's constantly new pages putting up taking down links being added links going dead exactly and
All social networks, they are dynamic. What are some of the interesting problems in dynamic networks? Who studies them? Everyone in network science is studying them because they are, to paraphrase Ambar Abashi, the last frontier. Well, I thought neuroscience was the last frontier. I said paraphrasing. Okay.
I'm with Barabashi. You can't... Oh, understood. Yeah, I'm not going to throw you under the bus there. Yeah, exactly. Thinking of transportation or network. Well, let's see. Anything else you think we should cover? I think we've got a lot of ground here. Yes, I think we should stop now before my house collapses. Yeah. Thanks for making the time and good luck with the storm.