I have just come across an interesting citation network analyser and visualiser – CitNetExplorer. Looks to be a very professional package that I will certainly be using.

One of the interesting things about citation networks is that their vertices have an order given by their publication date. This is a very strong constraint on the system so when you analyse or visualise such a network you should take the time ordering  into account. The simplest example is that you should not just look at the vertex degree in these networks, but at in-degree (citation count) and out-degree (length of bibliography). Perhaps the most obvious aspect of the constraint comes when you try to visualise the network. You can just put such networks into a standard network package and treat it as a directed network, which it is. However any standard visualisation will undoubtedly place the vertices all over the two-dimensional surface used for display. Standard visualisations pay no attention to the time-ordering of the vertices yet you almost certainly want to show that information when displaying a citation network as it is such a critical part of the definition. So many of the properties will depend on the age of the publication for instance. I have encountered this myself and played around with a few ad-hoc solutions but came to the conclusion I needed to write something myself, adapting a standard layout method to set one dimension of the vertex coordinates while the second dimensions is set by the vertice’s time. Since the same problem is encountered when making diagrams showing the critical paths in a set of tasks (such as Gantt charts) there are packages which will do this. However you will also want to do different types of analysis on a citation network plus they are likely to be much bigger than a normal Gantt chart.

This is where CitNetExplorer comes in. This comes from Nees Jan van Eck and Ludo Waltman at the CWTS (Centre for Science and Technology Studies) in Leiden, so comes from one of the leading institutes in bibliometric research. Its very early days and I have only had a short play but for me its good points are:

  • Free for noncommercial and teaching purposes.
  • Cross platform as written in java.
  • Stable on my Windows 7 machines.
    As it is written in java, it is likely to be stable on other platforms too.
  • Well presented with a reassuring professional feel.
  • Good graphical display.
    The publications are laid out using their publication data for the vertical coordinate and a layout algorithm to place the publications horizontally
  • Good default options.
    I got an instantly readable figure every tine I tried it
  • Good range of graphical output options.
    Vector graphics, especially postscript (eps), is essential for me. Note these are all under the Screenshot menu option.
  • Two basic network format output options.
    A pajek .net and a simple text file format (see below)
  • Various basic analysis tools.
    This includes transitive reduction  which is something I have been very interested in and can throw up some new insights into the citation counts of papers (see arXiv:1310.8224).

The forty most highly cited papers in hep-th (1992-2003) after transitive reduction as an example of output from CitNetExplorer. (click on image for better version)

So this looks to be a really nice package. Of course, I am never satisfied so what would I like to see in future versions:

  • Open source.
    It would be nice to be able to learn from their computational work and to add to this myself. Maybe some type of plug-in could be added to solve the latter problem. I have a few more tricks for citation networks in the pipeline for instance.
  • More input options.
    There are only two and one is tied to Thomson-Reuter’s WoS (Web of Science) database. In the example given by the authors you perform a search on WoS and then save the results in a text file (saverecs.txt).  Note you must select the “Web of Science Core Collection” not the “All Databases” option which the example clearly shows but I didn’t read, otherwise the output file will not include the full citation information needed to construct the citation network.  This file is a simple text file so you should be able to combine them by hand if like me you are limited to 500 records per file.
    The alternative is a pair of relatively simple text files.  These are not as yet explained in the documentation. Basically there are two files.  First is namepub.txt file lists the properties of the publications and the order in this file assigns each publication an index (the publication on line 2 is vertex 1, line 3 defines vertex 3 and so on). The second file is called namecite.txt and is an edge list written in terms of the vertex index. Look at the first few lines of the example data James Clough made from the open source KDD cup arXiv citation network data that we have been using in our recent work. Alternatively if you can produce a file from WoS open it in CitNetExplorer then save it in what is called CitNetExplorer format. These CitNetExplorer files are easy to look at, edit and prepare in a spreadsheet or a basic text editor and appear to be tab separated.
  • Visualisation editing.
    No layout is perfect so it is essential to be able to move the vertices by hand. One of my favourite visualisation packages, visone, shows what you can do in java, and even my own ariadne package built on the jung library gave that functionality automatically.

Rather less seriously, I am not sure about the name.  I would pronounce the “Cit” in “CitNetExplorer” as “sit” or perhaps “chit” so I would have kept the “e” in “cite”, CiteNetExplorer, but its not my product. As I’m getting bored typing it it, I’m sure it will become just CNE in any case.

Links

CitNetExplorer http://www.citnetexplorer.nl/

Van Eck, N.J., & Waltman, L. (2014). CitNetExplorer: A new software tool for analyzing and visualizing citation networks. [arXiv:1404.5322]

Van Eck, N.J., & Waltman, L. (2014). Systematic retrieval of scientific literature based on citation relations: Introducing the CitNetExplorer tool. In Proceedings of the First Workshop on Bibliometric-enhanced Information Retrieval (BIR 2014), pages 13-20.

James R. Clough, Jamie Gollings, Tamar V. Loach, Tim S. Evans (2013).
Transitive Reduction of Citation Networks. [arXiv:1310.8224]

Clough, James; Evans, Tim; Loach, Tamar (2013). Transitive Reduction of Citation Networks. (data set) figshare
http://dx.doi.org/10.6084/m9.figshare.834935

Clough, James; Evans, Tim (2014). KDD cup arXiv data for CitNetExplorer. figshare fileset.
http://dx.doi.org/10.6084/m9.figshare.1021647