We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Organizational Networks

2025/2/25

Data Skeptic

AI Deep Dive AI Chapters Transcript

People

Asaf

Hiroki Sayama

Topics

Asaf: 我对组织网络分析很感兴趣，特别是组织结构图如何反映组织的实际运作和决策过程。传统的组织结构图只显示了汇报关系，而忽略了实际的影响者和跨职能的合作。网络分析可以帮助我们找到实际的影响者，并揭示组织中隐藏的联系。 Hiroki Sayama: 我和我的合作者开发了一种算法，可以从传统的组织结构图（PDF文件）中提取信息，并将其转换为可量化的网络数据。这个过程很复杂，我们使用了基于启发式算法的方法，而不是人工智能方法。虽然我们的算法不能处理所有类型的组织结构图，但它成功地处理了大部分数据。我们使用NetworkX库来构建图对象，并对结果进行了人工验证。我们的研究目标是探索组织结构与企业绩效之间的关系。我们发现，从CEO到员工的平均距离对企业财务成功有统计学意义的影响。公司规模和组织结构深度之间存在权衡关系。我们还进行了一项实验，研究了团队绩效和网络结构之间的关系。我们发现，稀疏连接的网络比完全连接的网络产生了更多样化的想法，尽管团队成员在稀疏连接网络中的自我评价较低。这表明，过度连接可能会扼杀创新，而适当的连接性可以促进创新。 Hiroki Sayama: 我们的研究表明，组织结构对团队创造力和创新有显著的影响。完全连接的网络虽然团队成员的自我评价很高，但实际上会限制创造力和多样性，导致“群体思维”。而稀疏连接的网络，虽然团队成员的自我评价较低，但能够产生更多样化和更有效的创意。这与我们直觉的认知相反，但实验结果表明，适当的连接性对于促进创新至关重要。我们还发现，将具有相似思维的人聚集在一起，可以促进深入探索，但最终需要将不同团队的结果整合起来。这表明，在创新过程中，可以先让团队成员分别探索，然后再进行整合，而不是一开始就进行头脑风暴。

Deep Dive

Chapters

This chapter explores the challenges and methods involved in automatically extracting network structures from corporate organizational charts. It details the process of converting bitmap images from PDF files into a computer-readable graph object, highlighting the use of heuristic algorithms and the Python NetworkX library.

Automated process to convert traditional organizational charts into computer-readable graph objects.
Used heuristic algorithms instead of AI/machine learning due to lack of training data.
Python NetworkX library used to construct graph objects.
46% success rate in generating graph objects from 10,000-11,000 images.

Shownotes Transcript

Translations:

中文

You're listening to Data Skeptic, Graphs and Networks, the podcast exploring how the graph data structure has an impact in science, industry, and elsewhere. So welcome to another installment of Data Skeptic, Graphs and Networks. Today we're going to get into a theme that'll come back up, and that's organizational network charts. Our guest today did some very clever things, did not have to use a neural network, and was able to extract these crazy charts from PDF files and do some analysis on them.

Asaf, do you find this topic of organizational network analysis more up your alley? It's actually my favorite. My favorite networks are organizational networks.

You know, full disclosure, I don't like organizations. That's why I'm not working in one. But what can I say? I like their networks. Well, most of the time I think of a tree, that it's top-down. You know, so many people report to the CEO, and the CEO is like my grandfather or great-grandfather or whatever in the corporate tree. What makes it a network?

Actually, and that's interesting, that's what Hiroki studied, the organigram or the organizational chart. But those charts are imagined networks. It's not a real-world network. It doesn't have to follow network rules that we covered in the season because it's not a real-world network. It looks like a tree. But the thing is that this chart also doesn't tell you how the organization works or who is making the decisions. In the organization, everybody makes decisions, right?

Sometimes when you ask me on the podcast to keep my answers short, I can decide not to do that, right? You might edit it out, but throw out all my smart remarks. Only about half, usually.

and you're being generous, but that might be a price I'm willing to pay. So another thing about the organizational charts that they don't show you what actually happens, who are, let's say, the actual influencers, either the formal or informal ones and so on. On the other hand, network analysis of the formal or informal networks can help you find, let's say, those influencers.

The organigrams, straightforwardly, they tell you only one thing, who pays to whom. And they don't cover those cross-functional achievements. Most good data analysts will find relationships in other departments, in other groups, where because they have access to the data and know how to query it, they can be very useful and make friends everywhere if they know how to play their cards right.

And your typical org chart doesn't capture that. Exactly, because in the organic gram, all you see is actually silos. Nobody talks to anyone. But what you can derive from them is what is the perception of the people that drew them? What is their perception of the organization?

like what he rocky he gives a great examples of it using the network metrics like the diameter of this network or in other words how far is the upper management from the lowest employee this is the perception of the people who drew this organigram of the organization and of course in real organizations and every organization can be represented as a real world network

The real diameter would be, I guess, much shorter because of the small-walled effect, which I believe we will talk about in the semantic network episode. So kind of a cool project Hiroki and co-workers did.

I've always felt there was a lot of good information locked up in PDFs, although the tools to get that kind of stuff out are pretty much here nowadays. Yeah, you know, full disclosure, before the episode, I took an organogram from, I downloaded an organogram and fed it to ChatGPT, and he made me a very cool edge list, I have to say. Wow, not bad. Let's jump right into the interview then. My name is Hiroki Isayama.

I'm the SUNY Distinguished Professor of the System Science and the Industrial Engineering School at Binghamton University, State University of New York. And can you share some details on your role there? I am playing multiple roles. I'm the faculty member and also I'm the director of the research center named COCO, C-O-C-O, the Binghamton Center of Complex Systems, which has been running since 2007-ish.

informally. So it's a very long standing research center and complex system in network science. I'm also now taking a little bit more of the administrative role. I'm also now acting as the executive assistant dean for graduate studies for Thomas J. Watson College of Engineering and Applied Science. So multiple hats.

Well, I first got introduced to your work through the paper we're going to talk about today, Extracting Network Structures from Corporate Organization Charts Using Heuristic Image Processing. So got a lot of follow-up questions there, but before we get into it, could you share some details on how this fits into your research and your overall academic path? Is this like a focus or was it a side interest? This happened to be my side project, so to speak.

So I am, you know, myself being a computer scientist, turning into more like an interdisciplinary scientist, working complex system network science. And I understand this show is about networks. And in that sense, the paper you mentioned has a connection to network science. But the original idea actually came from my collaborator back in Japan. So I wouldn't say this is my main research focus. But my experience, past experience always tells me that

Your main interest never gets popular. Always inside the project, you did something together with someone that attracts people's attention. This is yet another example like that. Could you share some details for listeners who haven't read the paper yet? What was the overall goal of the project? So the overall goal of this project is to come up with an automated process to convert the traditional organizational chart.

Maybe you can imagine that at the top, there is a CEO or the board of trustees, et cetera, and you have the several divisions, branches, et cetera. Usually, it's a tree structure. Sometimes, a little bit more networked. But those kind of organizational chart has been around for hundreds of years. The people actually drew

the diagram that published somewhere, a website or in the book. My collaborator, Dr. Junichi Yamanoi, he is a professor in the Waseda University in Tokyo, Japan. He studies the organizational performance. One day, many years ago, we discussed, "Okay, is there any way to extract the more quantitative information from that kind of organizational chart that nobody actually cared back then?" So they are around, but nobody actually tried to quantify

or model this as a mathematical graph. And one of the reasons is because those charts are just published in the paper, or maybe embedded in the corporate website as an image, a bitmap image. And then, as far as we know, there was absolutely no computer-readable graph database that captures many different organizations' organizational charts.

So that's the motivation. Luckily, back then, it was, I don't remember when we started this project. It's a long, long time ago. My collaborator, Junichi, had published a book all in PDF document. That's coming from a Japanese version of the some organizational chart bulletin. Great. It's a computer readable PDF and you look it up, it's just a diagram, bitmap image.

So even though it's a computer readable in the sense that it's a PDF file, nothing is readily useful for more quantitative research. So that's what we did. We actually designed not even the AI. It's a fancy, you know, hard-coded heuristic algorithm that actually scans the PDF file and they extract organizational diagram step by step to construct the graph object.

I've seen a lot of these organizational charts, and I don't think two have ever looked the same. Everyone seems to reinvent the wheel on making the style of it. How do you build an elegant approach considering the variety of org charts you'll find out there? Yeah, the simple answer, we didn't.

Yeah, we always assume that the standard version of the diagrams are always made of the rectangular boxes that contain the labels of each division or sections that are connected by line segments. And most of the charts published in that series of books we used

they use the same style. So we exploited that kind of regular patterns, but there are always exceptions. In this case, about 30%, 35% of diagrams are more creative ones, which we couldn't actually adapt because people actually use a very crazy creative way to show that organizational chart.

Unfortunately, our algorithm is not capable of handling those. But most of the diagrams, they follow the very simple, rigid line structure. So we actually wrote the code to extract that kind of information from the image.

Well, of those that sort of conform to the more standard format that we're used to seeing and didn't go too far out there, could you talk about some of the, I guess, ETL process along the way? How you had to transform that raw bitmap data into a more useful data structure? Yeah, it's a very, very messy, ugly process.

Because nowadays, in 2025, people say, oh, you just use the convolutional neural networks or blah, blah, blah, deep neural networks, machine learning AI, and that's going to do it. I doubt it because in order to build that kind of automated AI process, you would need to have a large number of training data set, right? Here's an image. Here's the correct answer.

the problem that we faced was there was no really correct answer to begin with. So therefore, we have to think everything from the beginning as a logical basis

Okay, if you look at the diagram, how would we identify what part is a label? Well, we look for rectangles and then we extract text from the inside of that box. So those kinds of logical procedures is exactly what we programmed in the code. So basically the process goes like this. First, you identify the location of a text labels, remove all the texts from the image.

Then you have only the wire diagram. Some of the lines are actually not the solid lines, dud lines. You need to do a little bit of processing to connect those dud lines to the solid line, etc. All the details. If you're interested, please read the paper. It's a messy, ugly paper.

And then you just scan from the top to the bottom, left to right, identify all the horizontal, vertical line segment. And you can identify where the vertices, the dots, where the edges, lines, and then you can construct all the topological structure from there. So that's basically it.

It's a very, very ugly, nothing fancy advanced stuff. Do you have any notion of ground truth you can compare against to evaluate the quality of your output? Yeah, yeah. So, of course, we sampled a couple of the diagrams and then look at the original diagram. We're using our

eyeballs, right? And then made sure that at least for those sampled examples, the results are putting a good match with what we would construct by using paper and pencil. So those are kind of very manual labor intensive validations. Obviously we couldn't do that for the entire dataset because there are thousands of such diagrams. So we just sampled a small portion and make sure this is not going too crazy.

And then we trust the rest. I'm sure there are many mistakes made by the algorithm, but sometimes quantity takes the priority over the quality. Do you have any way of measuring it? I don't know if accuracy is the right metric, but some sort of mathematical measure that you put it under? Yeah, so in this case, we just use a very crude measurement. It's really just a data acquisition success rate.

We had, I think, 10,000 to 11,000 original images in the original dataset. How many of them successfully completed the whole image diagram acquisition process?

And we achieved 46%, if I remember correctly, which is surprisingly high because we didn't do any AI or machine learning kind of stuff. Everything was based on the logical steps. And yet we were able to generate a graph object for 46% of all the original images. And then you can imagine all of those diagrams are the zoo of monsters. There's so many different styles.

you know, crazy diagrams. And then we considered getting the 46% of the graphs coming out of the diagram. That's a huge success, given that we are not using any fancy AI. Or spending thousands of dollars training on GPUs. Yeah, no way. We wouldn't do that. This is very sustainable computer science. You know, we use only a couple of human brains and the laptop. That's it.

While I've lately been spending a lot of time using the Python Network X library, which I know you made some use of, could you highlight in particular how that was useful to the process?

This project uses many of the libraries. NetworkX is, of course, one of the key components, but we use many other packages like a PDF to image. And sorry, I forgot all the details. There are several different packages that we use to do the image analysis. The NetworkX Python package was used to construct a graph object.

It's a very useful, versatile library in Python. It's very flexible, very transparent network modeling library so that you can store many different kinds of information into the single graph object. That's why I love using it. It's definitely not the fastest one.

It's slow, but it gives you a lot of flexibility. So there's a lot of novelty in the project and just the effort to get the process running and get things out the door. I'm curious if this was just more of a demonstration and a project or if you have bigger aspirations for next steps along these lines?

My part is just designing this kind of tool, and I'm done basically as a computer scientist. But my collaborator Junichi has, of course, a scientific motivation. He wanted to use this kind of quantitative data. Specifically, it's a graph object coming from each organization.

his ambition, which we are actually already working on it and we are finalizing the paper, is to find any correlates in the organizational structure with corporate performance.

how well, how badly each corporation was doing. So luckily this data is actually the historical data. I believe it was from the 2008 to 2011, '12, more than a decade ago. So we already know how well each company performed in that range of the dates.

So in terms of the stock prices and the reported performance, financial performance of each company. This is a very indirect way to predict the corporate performance, but there must be something in the organizational structure that would tell you a little bit about how well, how efficiently the organization is making decisions. So that's the original research idea we had. I'm reminded of...

philosophy associated with Jeff Bezos that I've always felt was very true. His two pizzas rule that a team should be no bigger than what could be fed with two pizzas. I'm wondering if maybe you have any interpretation of the idea and if that maybe shows up in the work at all. That may not be applicable to many of the companies we studied here because some of the companies are very large.

But similar ideas we tested in this analysis include the distance, like maybe pizza delivery distance. You should have the organization that you can deliver the pizza without getting cold, for example. That might be the indirect analogy if we were to use a pizza. And then that could be captured by the depth of the tree structure. Let's imagine you have a CEO in the middle.

The CEO can be connected upward, downward, but how deep the organization can go from the CEO? You can characterize this in many different ways. The simplest possible way would be just to calculate the average distance from every section to the CEO. How many steps you would need to go. Also, another metric would be how clustered networks are. How many triangles would exist, for example?

That's a very popular network measurement, which may or may not be relevant in this study because most of the diagrams are just tree. The tree does not have any triangles, but we measure the clustering coefficient anyway. So there are several other measurements that we thought that would be insightful to consider when we consider the performance. And it turns out the distance from the CEO actually has statistical effects on the performance of the corporate financial success.

I forgot which way Junichi knows the data and we already have the statistical table and I just forgot to it. Is it positive or negative? I need to look back. Intuitively, I would hope it would be a short path to the CEO, meaning that even the most rudimentary employee could have some chain to the top with if they had a really good insight.

I think so that that was the case. Although now I remember, yes. Generally speaking, the shorter the better because that actually allows you to have a quicker information flow and faster decision making. But I think now remember it also has an interaction term with regard to the size of the corporation. If the company is too large, having a very short distance means that the CEO is overburdened.

Because it's going to be very flat, too wide. So I think there is a trade-off between the corporate size and the effective strategy. How deep or how shallow the organization should be structured. I need to check back the most recent statistical data, but that was what I recall at this point.

I could see in a way where that can trace us back to your work in complex systems, that these very dynamic large-scale graphs, what are its optimal structural characteristics to find? Yeah, yeah, exactly. Yes. You'd mentioned also doing some other work in network science. Could you share some details about other places you've explored?

Aside from that, the organizational chart research, we also work a lot with the management scientists in the same kind of research. I have been working in collaboration with the business school faculty here at Binghamton University for almost 20 years. And then one of the most recent publications we got in the journal called NPJ Complexity

It's a nature portfolio journal, so famous nature as a journal. It's not nature, sadly, but it's nature family journal and PGA Complexity, just published earlier this month. So we studied experimentally the performance of the teams

and how the people of 20 would collaborate and they come up with new ideas in the networked environment. And we tested several different conditions experimentally. This is actually human subject experiment. We had the people participate in the online experimental sessions and then let them work on some text-based creative tasks. And we secretly changed the network structure

from very connected to be very poorly connected and see what happens. And the other variables we changed include the people's similarity. We characterized who they are by requesting them to write a fairly long self-introduction essay.

So this is me, this is my academic major, this is my interest, blah, blah, blah. They write like a thousand characters of essays. And then we converted that kind of background information into the numerical vector by using machine learning. Now I'm using machine learning. And then we can manipulate it.

"Should I put you next to someone who is very similar to you? Or should I put you next to someone who is very different from you?" And so that kind of local similarity, dissimilarity among the social neighbors is yet another variable we changed. So the combination of those, and then maybe I can ask you a question. So we tested fully connected network first and a very poorly connected network second. And which one do you think performed better?

If it's binary, I have to go fully connected. Yeah, everybody says so. We thought so. And we are all proven wrong. Yeah, turns out the fully connected network produced

much less diverse ideas. And then the overall performance of the final results coming out, like one of the tasks we tested was, yeah, please create a really effective catchphrase to sell this laptop. It's a marketing slogan creation type of task. And then the team that was structured very sparsely, they actually produced better ideas overall. And the more interesting result here is that

If we ask them, "What do you think about your teamwork? How do you feel about your achievement?" The people who participated in the fully connected networks always said, "Yeah, we did great." Their self-evaluation is much higher. The opposite. The actual objective evaluation is the opposite. The people who participated in the very sparse network, they always complained.

This session sucks. Nobody's here. It's very boring. So they complained. Their self-evaluation was very low, and yet the final outcome is much better. So this is kind of a puzzle we face. It's a very interesting experiment we did.

Absolutely. It's easy to know the results and play a guess. But to me, I wonder if it's not that criticism is required. We need negative feedback like reinforcement learning. And it's easier to get that in a small group where you're not maybe as concerned with can I offend someone I don't know very well.

That could be. Or probably more the natural interpretation here is that if everybody's connected to everybody else, people can quickly gravitate toward certain ideas that they don't explore much. "Ah, yeah, I like that idea. Let's go with this." You see this kind of group think. People easily agree because we are so agreeable animals. And then they stop exploring many different directions. But if you are

Feeling alone and you have only a few other people working, then you have to push yourself. Okay, I need to produce.

That puts you into more stress, pressure, but overall the entire network actually explores more different ideas. So that's our interpretation. Connectivity sometimes kills the ideas. Do you think the research is at a stage where it would be good for human resource HR groups to start looking at this and letting it influence how they grow and groom their and manage their organizations?

I don't know. So the result is out there. It's really up to each reader, consumer, how they are going to interpret. And I can say, at least, there are lots of the insights and lessons everybody can learn from this kind of superficially contradicting results from our intuition. These days, everybody says connection is better. Let's connect to people. Connection is a source of innovation. I would say so as well, but you have to be very careful how you are going to connect people.

because if you connect people very intensively, then you are actually killing the innovation. I actually have the blog post about this, but you can imagine this is more like the...

maintaining the ecosystem in the island. And then if you are trying to promote biodiversity, then you don't want to have all the animals mixed together, right? If they are mixing together, it's going to get homogenized very quickly. And you have only one species dominating for the entire island.

So you need to have different habitats. It should be patches. I think that same thing applies to human idea generation process. We need to harbor the biodiversity of our own ideas in our social ecosystem. That makes sense. I've heard many business school analogies of, you know, like Silicon Valley, that you had all a density of the right people working in similar areas in the same place that enabled, you know, a

growth for everyone, all ships get lifted kind of thing. Yeah, that's another thing. It's actually better to cluster similar people together and let them go all the way. They can go very, very crazy directions. Just bring all the five mathematicians, nobody else. Let them explore and then they're going to go very deep into the mathematical world. Here you have the 10 artists, musicians.

and then you disconnected them from the rest of the world. Let them just explore all the musical directions, and here you have engineers only. And then these kinds of homogenized clusters can be also considered the kinds of social bubbles, which would be terrible if this is happening in the opinion formation world. That happens all the time. Social media is clustered so many times in tiny bubbles. But with regard to innovation and exploration ideas,

That may not be so bad after all. You can put people in the different clusters with similarly minded people and then let them explore in many, many different directions. Once its process is complete, you can bring the results back to the central discussion place. That would be better than having everybody talking to each other from the beginning.

I think that's a great takeaway. Counterintuitive, but good insight. I see that many projects start with brainstorming. It's kind of, "Yeah, let's have the brainstorming meeting." It could be Zoom and everybody comes to the same place. This experimental study tells you that's the terrible idea to do.

Because you are already killing many potential ideas that could have emerged if you let the people work alone, at least first. Hiroki, is there anywhere people can follow you online? I'm everywhere on social media. So you can look for my name, Hiroki Sayama, on Instagram.

x/Twitter or Blue Sky, LinkedIn, Facebook. What else? Mastodon. Very few people use Mastodon nowadays, but I'm still available there. Instagram. What else? Thread. But we'll have links to all those in the show notes for listeners who want to follow up. And I don't use TikTok, so I'm okay with the current situation.

Yeah, I don't think anyone can now for at least the immediate future. Yeah, yeah, yeah. See how that goes. So, yeah, I'm very active. Yeah, so please follow me. Yeah. Well, thank you so much for taking the time to come on and share your work. Thank you very much. It's my pleasure. Thank you.

Organizational Networks 27:48 Share

Data Skeptic

Deep Dive

Shownotes Transcript

Organizational Networks