Graphing the Grift: Mapping the Communication Flow within Enron
I chose to perform an exploratory analysis of a dataset containing the emails of the now infamous company Enron. Enron is perhaps the most famous example of corporate corruption and mismanagement, and I won’t dwell on the details regarding the company’s rise and fall. During the course of investigations by shareholders and government regulatory agencies, the content of the emails became a matter of public record. These emails give us the ability to track the flow of information in the company in its final months. This 1100 node and 1,802 edge subset of the emails suggests two interesting ideas:

1. It is obvious that the Chief of Staff at Enron, Steven Kean, was by far the most central communication figure at the company. It is less obvious that the vast majority of information passed through a narrow subset of employees.
2. Emails at Enron tend to flow ”downward.” It was common for supervisors to send emails to their subordinates, but not very common for subordinates to email their supervisors.
The first visualization will support the claim that the vast majority of email traffic passed through a limited number of individuals. The metric best suited to show which individuals are good information brokers is ”betweeness.” Betweeness describes the number of optimal paths between nodes that pass through the node in question. In a flat organization, we would expect this number to remain low as each node is able to set up ad-hoc relationships to pass information. In a highly tiered organization, betweeness values would vary greatly as the information must flow through managers before it is eventually disseminated. Of the 1100 nodes, only 74 have a betweeness greater than or equal to 24. Filtering for this value eliminated over 94\% of the nodes, leaving only the most central figures for communication. This reveals that information flow in Enron appears to have passed through a few key individuals while the majority of workers within their respective departments would remain disconnected from other departments.


The second visualization is similar to the first, but instead of filtering using betweenness it instead uses the out-degree of each node to show which individuals were sending a significant portion of the emails. Emails are, by nature, a directed form of communication. Therefore the degree of each node can be easily separated into the in-degree (received) and the out-degree (sent). When examining the distribution of these metrics, it became clear that the vast majority of the employees were receiving far more emails than they sent. Over 75% of the email traffic was sent from 42 nodes to the other 1058. The remaining 25% of traffic was split with one third (8% of total) taking place between the 42 “managers” and two thirds (16% of total) taking place between lower level employees. The sparsity of the edges connecting the nodes at different levels of the company illustrates that most of the emails occurred as transmissions from the leadership toward the followers.


I chose a Fruchterman-Reingold graph for both of my visualizations because I felt it offered a well-balanced, consistent view of the information. The symmetry and equal spacing between nodes allowed the information to be neatly presented without dense areas becoming overcrowded. The side-by-side display of employees vs. management was done to better illustrate the point that most employees were not a central part of the organization. It would be simple to say that we filtered 95% of the nodes when we took out those that sent less than 10 emails, but actually seeing how sparse the graph is by comparison is much more effective.
The primary factors driving my selections for node and edge attributes were clarity and consistency. I wanted to make sure that the nodes were large enough to communicate key information, but small enough that the visualization did not appear crowded. This was especially true for the labeling of nodes, where names had to be large enough to be legible without overcrowding the image. In the left half of the visualization, the decision was made to not show name labels because the names did not add context to the graph. In fact, because these individuals were all lower-level employees, the decision to omit their names is somewhat representative of their actual role in the Enron scandal, which was largely blamed on the employees with their names shown in the right halves of the visualizations.
The final enhancements that I used to improve the aesthetics of the visualizations were curved edges, thicker edges, smaller node sizes, and labels that scaled with node size. These all helped make the visualizations more organized and clear. The curved edges prevented edges from overlapping with other edges or nearby nodes, while the thicker edges ensured that they would remain visible when the image was scaled down to fit on the page.
Graphs created in Gephi.
Data Source: Enron Email Dataset. William W. Cohen, 08 May 2015. https://www.cs.cmu.edu/~enron/