Tim Evans' site on Complex Networks

Category: Uncategorized

The Importance of Being Close

Finding the most important pieces of information in a mass of data has always been a key task in any job. In network science the tools used to do this are known as “centrality measures“. My most recent paper (with Bingsheng Chen) looks at the relationship between two of the oldest and most important examples, degree and closeness.

Networks (for example see Coscia 2021) are a powerful way to organise the large data sets faced by modern researchers in many fields, from physics through biology to social sciences.  In their simplest form, the objects of interest are represented by nodes of a network. So a node might represdent proteins in biology or people in social science. The edges of a network then represent relationships between pairs of nodes.

Zachary's Karate club network.
Zachary’s Karate club network. The size of the node is proportional to the degree while the darker the colour of the node, the higher the closeness. The shapes indicate the two clusters in the network.

One of the most common questions we ask about our data is what are the most important pieces of information?  In a network representation, this corresponds to finding the most important nodes. Not surprisingly, this was one of the earliest problems tackled in network science where the importance of nodes is measured by centrality measures mentioned above. These have been developed in social network analysis since the 1950’s. The name centrality measure comes from an analogy with a city where most important institutions are found at the geographical centre of a city to enable the easiest access for the largest number of people. The arrival of large electronic datasets prompted a flurry of new research on centrality measures. For instance, in 1998 the founders of Google developed their own centrality measure, PageRank (Brin & Page 1998). This uses a simple diffusion process on a network of webpages (nodes) connected by hyperlinks (edges) with the idea that relevant pages in a search can be placed in order of their importance. The web page with the highest PageRank going top of the list returned by Google.

Finding the most important nodes is such a critical job that this problem has been studied many times and this has produced hundreds (if not thousands) of centrality measures. These encode different interpretations of what is the “importance” of a node and perhaps the context of the data and of the application may point a researcher to one definition over another. Nevertheless, one is left with the feeling that there are so many centrality measures that many may well be producing similar information and therefore there is a large redundancy in the centrality measures in the literature. For instance, David Schoch produced a periodic table of centrality measures, a nice visualisation of a large number of centrality measures placing similar ones close together imitating the periodic table of elements. To be more quantitative, one can measure the correlations between different values for centrality measures for the same node. What one finds from these studies is that there are clear correlations and yet no clear pattern has emerged. As Valente et al. (2008) put it, “The amount of correlation between degree, betweenness, closeness, and eigenvector indicates that these measures are distinct, yet conceptually related”. This leaves researchers uncertain what to do when choosing a centrality measure for their problem.

Our Work

In my work with Bingsheng Chen (Evans & Chen 2022) we looked at two of the oldest and most popular centrality measures: degree and closeness (Bavelas, 1950) defined closeness. The degree kv of a node v is simply the number of nearest neighbours, the number of nodes connected by a single edge to the node v. Closeness is defined in terms of the length of the shortest path between two nodes. A path in a network is just a sequence of nodes where each node is connected by an edge to the previous node in the sequence. The length is the number of edges traversed as you move along the path. You can then calculate the average distance from a node v to all other nodes. The smaller this average value of distance is, the closer node v is to the other nodes in a network.  So a low value of this average indicates which nodes are the most central and, using the analogy of a city mentioned above, we might define these to be most important. Centrality measures are defined so that the larger the value of a  centrality measure, the more important a node is. So the centrality measure closeness cv for a node v is defined to be the inverse of the average distance to all other nodes.

Example network showing shortest path and degree
One of the two shortest paths from the red square to the red circle is shown by the green edges. The red square node has three nearest neighbours so the degree of this node is 3. The number beside each node shows the distance to the red square node. The average distance from a node to the red square node is (2+2+1+1+1)/5 giving the closeness of the red square node as 5/7.

What we suggested in our Communications Physics paper (Evans & Chen, 2022) is that you can do better for degree and closeness than the qualitative statement of Valente et al. (2008) that they are “conceptually related”.

In our work, we picture the shortest paths used to calculate closeness as a tree which is a network with no loops and where there is only one path between any two nodes. In our case, our shortest-path tree for node v has every node from the original network, the edges in the tree are a subset of the edges in the original network, and the path in the tree from the root node v to any other node is always a shortest path in the full network. Such trees can always be defined for each node though they are not in general unique. Shortest-path trees are easily created using standard breadth-first search algorithms.

Our key algebraic assumption is that the branches of these trees will look statistically similar for most networks. The reason for this is that the number of nodes at distance L from the root grows exponentially in most networks.  This is the reason why we see the phenomena of six degrees of separation in most networks. That is, even in very large networks, the length of the shortest path from any node to any other node is usually small, typically of order log(N) (where N is the number of nodes in the network) rather than N1/D, you would expect if the network is in a D-dimensional Euclidean space (e.g. if our network was a regular cubic lattice so D=3). 

Example of a shortest-path tree
One example of a shortest-path tree for the red square node as the root node. The edges of the tree are shown in solid green lines while the two black dashed edges are in the original network but are left out of this tree.
Shortest-path tree for Zachary Karate club network.
One of the shortest-path trees rooted on node 1 for the Zachary Karate club network. Nodes of the same colour and shape are the same distance from the root node, 1.

Outline of Derivation

You can skip to the next section on results but I will give a brief outline of the analytical part of the work.

We used the simplest model for the branches of the tree. Namely, we assume that on average the degree of each node in the tree is z. That is one edge links every node to the parent node that is one step closer to the root node v, while the remaining (z-1) nodes are one step further away. Given that this means the number of nodes which are L steps away from the root is growing in proportion to (z-1)L i.e. exponentially fast, we have to stop the growth somewhere and we do so by demanding that the total number of nodes in the tree is N, the number of nodes in the original network Since we know that the distance in the tree from any node to the root node is along a shortest path in the full network, we can use the shortest path tree to calculate the average distance from v to all other nodes and hence its inverse, the closeness centrality measure cv for node v.

So far the degree of node v has not appeared in our calculation.  However, the first step of the shortest path tree must be to the nearest neighbours. So the shortest path tree from node v starts with kv branches and after that we assume these branches are all statistically equivalent and the degree in the rest of the tree is on average z

Analytical Result

Putting this all together gives us our central result that the inverse of closeness is a simply b + ln(kv)/ln(z-1)  and we have an expression for b  in terms of the number of nodes and the average degree of the tree z . In general, it is hard to determine the average degree of the tree z  in all but the simplest network models so in practice, we treat z as free parameter which we fit to the data. Since we usually have large numbers of nodes, hundreds and often many many more, we lose little by treating as a free parameter too.

The only reason we can do these calculations is that we have made some sweeping generalisations in order to reduce our networks to simple trees which represent branching processes used widely in physics and science in general. Given the simple pictured used here, it would not be unreasonable to assume that even if our relationship between degree and closeness works for some simple numerical network models, the relationship will fail badly on most real-world networks.  And yet, if there is one thing our experience as physicists has taught us, it is that there are situations where we can ignore the messy microscopic details and still arrive at results that are broadly true on large scales across many different systems.  The derivation of the ideal gas law using ideas from statistical physics would be a classic example. So how good is our result on real examples?

Results from Real Networks

Plot of inverse closeness against degree for artifical networks.
Plot showing the inverse of closeness against degree on a log scale for networks produced using the Erdős-Réyni (left) model or the Barabási-Albert model (right). The networks have 1000 (red, bottom), 2000 (yellow, middle) and 4000 (blue, top) nodes and an average degree of 10.0.

The answer turns out that our relationship works far better than we thought it would do. On three simple artificial models for networks, we get fits with reduced chi-square values which are between 1.0 and 1.3. The good fit is clear visually in the figure above. That is great but the real surprise is when we tried it on over one hundred networks based on real data.  One of our referees suggested that since we had the code working for an initial set of 18 real-world networks, it would be relatively easy to look at more networks. So for the published version we used a python package graph-tool to download all the networks kept in the Netzschleude repository (both are maintained by Tiago Peixoto) and we processed all of the networks there provided no further programming effort was required.  A number of these networks were too big for our code or had some technical issue that broke our code (perhaps non-ASCII characters), but we ended up looking at a further 112 networks from real-world data sets without imposing any further bias. To our surprise that we still found our relationship held well (reduced chi-square of less than 2.0) for around two-thirds of these networks, while around 80% had a reduced chi-square below 3.0. Of those that failed, a quick look at their description would tell us that they were always unlikely to work well: they were very small, or very dense (that is every node is connected to most of the other nodes) or had some other special structural feature not covered in our simple picture.

Conclusions

So what does our result do for network science? First, our relationship is a clear quantitative example of how two centrality measures are related, in this case degree and closeness, two of the oldest and best-known examples. The fact that our relationship is non-linear highlights the fact that most previous studies look at possible linear correlations between centrality measures, e.g. by using Pearson correlation coefficients. If non-linear relationships exist between centrality measures, these would not be so clear and this might explain why the previous studies have not reached any very specific conclusions. Second, our results tell researchers that, to a first approximation, we can forget closeness as most of the information is already in the degree of a node which is much quicker to calculate.  However, the corollary of this is that if you do calculate closeness, you can now remove the general dependence of closeness on degree by comparing the measured value against the predicted value made using our formula. The difference from our predicted value of closeness is genuine new information in the closeness measurement that is not given by the degree value.

The final message is that the success of our method reminds us that for most networks, the exponential growth in the number of nodes as a function of the distance L from any node is the fundamental property that makes complex networks distinct from systems constrained by our three-dimensional world. 

Note

This post is a revised version of one I wrote for the Nature Portfolio Physics Community entitled “Network Centrality“.

References

Bavelas, A. 1950.  Communication patterns in task-oriented groups. The Journal of the Acoustical Society of America 22, 725–730.

Brin, S. & Page,  L. 1998. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN systems, 30, 1–7.

Coscia, M. 2021. The Atlas for the Aspiring Network Scientist.  https://doi.org/10.48550/arXiv.2101.00863

Evans, T. S. & Chen, B., 2022. Linking the Network Centrality Measures Closeness and Degree. Communications Physics 5, 172, https://doi.org/10.1038/s42005-022-00949-5 .

Schoch, D. 2016. Periodic table of network centrality. URL http://schochastics.net/sna/periodic.html

Valente, T. W., Coronges, K., Lakon, C. & Costenbader, E. 2008. How correlated are network centrality measures? Connections 28, 16.

Recommended Books on Complex Networks

Network Books
Some of my Recommended Books on Complex Networks

These are texts and books I generally recommend at the start of my introductory course on Complex Networks given to third-year physicists at Imperial. The students are not expected to read any of these texts and the list is as much a way for me to keep track of the lecture notes and books I have found on the subject.

I give a short note on each one including my opinion of what each text offers. This often includes a price that will not be up to date but hopefully, it still gives a sense of the relative price of texts. The pricing of texts pitched for a similar audience (as defined by my interests) does not appear to be proportional to size. Interestingly, Cambridge University Press has five books in my list with a large overlap in their material. As this list was aimed at my students, there is an emphasis here on anything that has introductory material and which is available as a free legal download.

I split my list into two categories. First, those which give a more general and a less technical introduction, most of which put the work on networks into a wider context. The second section contains the texts with much more technical detail, including those I turn to when teaching these topics or when doing my own research.

General Reading on Networks

These are background material of a less technical type. These recommended books put the ideas into a much broader context.

  • Guido Caldarelli and Michele Catanzaro, Networks: A Very Short Introduction (2012) Oxford University Press ISBN: 978-0199588077.
    [Pleasingly short. I have scanned this but not yet read this but colleagues have recommended it to me. £5]
  • Philip Ball, Critical Mass (2004) ISBN: 9780099457862.
    [General popular introduction to ideas about Complexity in general, including networks. Readable background material. £7]
  • Duncan Watts, Six Degrees: The Science of a Connected Age (2004) Vintage Press, ISBN: 978-0099444961.
    [A non-technical overview focussing on networks. I found this very balanced even though written by one of the first authors to bring field to the attention of physicists. £9]
  • Tim Evans, Complex Networks, Contemporary Physics 45 (2004) 455–474
    [cond-mat/0405123].
    [My own Complex Network review aimed as an introduction to general physics researchers written some time ago. It has only the very basic technical issues making it short. Download Complex Networks free as cond-mat/0405123 from arxiv.org. Free.]
  • Network Literacy: Essential Concepts And Core Ideas
    [Free. It is a very short ‘poster’ overview to the main concepts.]
  • H.Sayama et al., What are essential concepts about networks? (2016). Journal of Complex Networks, 4, 457–474.

More Technical and Detailed Reading

Technical and more detailed sources, with the most useful for my course given first. If you look up one of these books online and check the recommendations made by the seller, you will find many more, some with an emphasis on a particular area or on one programming package.

  • M. Coscia, The Atlas for the Aspiring Network Scientist (2021).
    [This is an excellent, comprehensive, well presented and readable book. It has the right amount of technical detail for physicists starting out in this topic. Far more material than covered in this course (over six hundred pages) so you will need to pick out what you need. Don’t let the first serious chapter (chapter 2) on probability put you off, I don’t think you will need it for most of this text. Best of all it is free to download the Atlas for the Aspiring Network Scientist and there is also a printed version which when shipped from the US cost me around £50.]
  • Aaron Clauset, Network Analysis and Modeling lecture notes (2013).
    [Longer course so greater depth than mine at Imperial but with similar topics covered in a different order. Free. Clauset’s blog, Structure and Strangeness, and twitter feed, (@aaronclauset), are also informative.]
  • Mark Newman, Complex Networks, (2nd ed. 2018) Oxford University Press, ISBN: 9780198805090.
    [A very large book with 800 pages in the second edition which makes it physically less easy to use as a reference. A lot of technical detail in the style of a physicist rather than a mathematician. It has a lot of discussion so not a bad place to learn from but a beginner may get a bit lost by the comprehensive nature of the book. A good place to go when you want to go beyond the basic material in an introductory course and when you want more detailed discussions. £42.]
  • David Easley and Jon Kleinberg, Networks, Crowds, and Markets: Reasoning About a Highly Connected World (2010) Cambridge University Press, ISBN: 978-0521195331.
    [Well written book. Focus of later parts is rather different from my course but the first sections on networks in general should be very useful. Download a free copy of Networks, Crowds, and Markets or purchase a printed copy £32.]
  • Michael Gastner, Networks: Theory and Application (2011).
    [Notes from a longer and more mathematical course than mine, given in the Maths department at Imperial in Autumn 2011. General approach very similar to my course but with much more mathematical detail. Accessible to Physics students after some effort. Good place for more general proofs of topics. Unfortunately, these notes are not publically available unless the author can be persuaded to put them out there (image copyright issues need to be sorted out). Free.]
  • Robert Hanneman and Mark Riddle, Introduction to social network methods, (2005).
    [Has a social science focus making the later parts less relevant to my course. However, since it is free you might still find the early sections useful. Free download of Introduction to social network methods. Free.]
  • Filippo Menczer, Santo Fortunato, and Clayton A. Davis, A First Course in Network Science, Cambridge University Press, (2020), ISBN: 9781108471138,
    [I saw a copy of this book on the publisher’s stand and it looks good and at the right level for students on my course and will take them beyond that too. £35]
  • Vito Latora, Vincenzo Nicosia, Giovanni Russo, Complex Networks: Principles, Methods and Applications, Cambridge University Press, (2017) ISBN: 9781107103184.
    [Well written with examples and exercises. Covers pretty standard ground. I did find a bit more useful mathematical detail on one or two issues that I did not find in other texts. A very solid textbook worth a look. £55]
  • Ted G. Lewis, Network Science: Theory and Applications, Wiley, (2009) ISBN: 0470331887.
    [Has exercises. Looks comprehensive. Table 1.1 p2 for a timeline of papers as a history of network science. £70].
  • Nino Boccara, Modeling Complex Systems, Springer, (2010) ISBN: 1441965610.
    [General coverage of complex systems including chaos, recurrence equations. In particular has chapter 6 on spatial models including cellular automata and sociophysics models, chapter 7 on networks, and chapter 8 on power laws e.g. power-law vs log-normal distributions. £63]
  • Guido Caldarelli and Alessandro Chessa, Data Science and Complex Networks, (2016), Oxford University Press, ISBN: 9780199639601,
    [Yet to read this in detail. Remarkably thin for the price (144 pages in total, closer to 120 for actual text) but looks quite useful. £40]
  • Maarten van Steen, Graph Theory and Complex Networks, (2010).
    [Free download from web site. Style is rather different from this course but again general introductory sections on networks could be useful. Free download of Graph Theory and Complex Networks or printed version £15.]
  • Reuven Cohen and Shlomo Havlin, Complex Networks: Structure, Robustness and Function, Cambridge University Press, (2010) ISBN: 0521841569.
    [The text often seems a bit too bit brief. Some chapters are very short and feel like the transcript of a lecture rather than more extensive notes expanded from an hour long presentation. Good basic mathematics. See table 3.1 for nice list of properties of standard networks. Contains exercises. Smaller and lighter (238 pages) than Newman’s book (771 pages but similar text density) yet no cheaper than Newman’s book and less comprehensive. £42.]
  • A.-L. Barabási, Network Science, Cambridge University Press, (2016), ISBN: 9781107076266.
    [Very nice layout and great graphics so perhaps the most beautiful of these Networks books. Interestingly, the preface acknowledges in more detail than normal a list of other people who contributed to the development of the book. Very readable but not as comprehensive as others. For instance, there is no section on centrality measures. The online version of Network Science is (and will remain) free and has additional resources. The printed version is available for £42 hardback.]
  • Stanley Wasserman and Katherine Faust, Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences), Cambridge University Press, (1994), ISBN: 978-0521387071.
    [This is a standard text written from the viewpoint of social science, hence the title. It is, however, quite mathematical in places. It is a standard reference book for Social Network Analysis as well as a textbook. I’ve never found it very readable. I generally only use it to check to see the origin of some concept as many ideas in Networks go back beyond the 1990s. I don’t recommend this for beginners but it is a book I use as a reference in my work. I feel that my dislike of the text is my problem, not the book’s. £47]
  • Freeman, L. C., The Development Of Social Network Analysis: A Study In The Sociology Of Science, ΣP Empirical Press, Vancouver, (2004), ISBN: 1-59457-714-5.
    [A history of Social Network Analysis from one of the leading authors in the field. Useful background for those deeply interested in network science. Note a free copy of The Development Of Social Network Analysis is available on ResearchGate. ]

© 2024 Netplexity

Theme by Anders NorenUp ↑