Tim Evans' site on Complex Networks

Author: T.S.Evans (Page 2 of 4)

Sculplexity: sculptures of complexity using 3D printing

I have just had an unusual paper published by European Physics Letters (doi: 10.1209/0295-5075/104/48001). 3D printing (or additive manufacturing technologies to give the field its more formal name) is very much in vogue: the first 3D printed gun, Obama’s 2013 state of the Union address, and so forth. I like to know about new technological advances, even better if I can play with it. So how can a theoretical physicist get involved with 3D printing?

One of the ways I learn about a new topic is to set it up as a project for final year undergraduate students or as a masters level project. We have some of the best students in the UK here so all I have to do is come up with an outline and the students usually rise to the challenge.

What I needed was a connection between my work in complexity and 3D printing. This link came from one of those serendipitous moments. Another advantage of being at Imperial is that it is part of a Victorian complex of arts and science institutions so we are surrounded by national museums. One is the V&A (Victoria and Albert museum) which is dedicated to all aspects of design, from all ages and all locations. It also has some amazing tea rooms, covered in tiles designed by William Morris, great when I have a visitor and a little more time. It was on one of those trips that I just happened to walk past something that caught my eye. At one level it is just a table. However I saw this table as a branching process. To the designers, it was inspired by the tree-like structures of nature. The designers had embraced technology to do this, using Computer Aided Design (CAD) and 3D printing. For on further investigation this was the Fractal.MGX table designed by WertelOberfell and Matthias Bär, the first 3D printed object the V&A had acquired.

Fractal.MGX table WertelOberfell photo credit Stephane Briolant

The Fractal.MGX table designed by WertelOberfell and Mathias Bär, photo credit Stephane Briolant (Image rights WertelOberfell).

Branching processes occur in many problems in complex systems and they have a long history of mathematical investigation. So here was the link I was looking for. The question I asked was what other physics models familiar to me could be turned into a 3D printed object? How would you actually do this conversion? Does the tool, the 3D printer, impose its own limitations on the type of mathematical model we can use? Are these new limitations interesting mathematically in their own right? Until now researchers had only seen these mathematical models visualised using two-dimensional representations, often not even using perspective to give the impression of a 3D object. Making a 3D printed object opens up uncharted territory. My project shows one can move from traditional visualisations to new “tactilisations”. So can we gain new insights by using touch rather than vision?

The approach might also be useful for outreach as well as research. The same things that got my students and I interested in the project might intrigue school children, a retired judge or whoever. These objects might be particularly useful when explaining science to those whose sense of touch is better than their sight. However we could also go back to where this project started and see if models of complexity can produce something of aesthetic value alone – hence Sculplexity: sculptures of complexity.

The basic idea is simple. A 3D printer builds up its object in layers. So the height of the object can be thought of as a time. Suppose I have a model which defines a flat (two dimensional) picture. Typically this will be a grid with some squares full and some empty. The model also has to describe how this picture evolves in time. For instance there is a famous example known as Conway’s Game of Life for which are there many 2D visualisations. What I do is use the model at each point in time to define what the printer should print at one height. The next time step in the model will then define what to print on top of the first layer, and so forth.

In fact while the basic idea is straightforward, the implementation turned out to be much much harder than I expected. It is to the real credit of the undergraduate students working with me on this project, Dominic Reiss and Joshua  Price, that we pushed this idea through and actually got a final 3D printed object representing our modified forest fire model. OK so our final result is a bit of a lump of black plastic compared to the inspiration for this project, the Fractal.MGX table in the V&A. But this is just the start.

Now that we have shown that this can be done, there is so much more to explore.  We are adding another dimension to representations of mathematical models but the possibilities for 3D printing are endless.  All we have done is made the first step in terms of 3d printing and mathematical models. We have highlighted the key problems and given at least one way to fix them. I can already see how to extend the existing approach, new solutions to some of the problems, different ways to create an object from a wider variety of theoretical models. Imagination and ingenuity are all that are required.

View of top of Sculplexity output

Top view of the output produced from the Sculplexity project. Yes, its on a bed of rice since rice was used in an experiment to simulate another classic model of complexity - the sandpile model.

Note added

I have since found some other work using 3D printing to visualise maths which is full of useful ideas.  So look for the work of Aboufadel, Krawczyk and Sherman-Bennettand as well as that by Knill and Slavkovsky listed below.

References

  1. Reiss D.S., Price J.J. and Evans T.S., 2013. Sculplexity: Sculptures of Complexity using 3D printing, European Physics Letters 104 (2013) 48001,  doi 10.1209/0295-5075/104/48001.
    Copy of Sculplexity: Sculptures of Complexity using 3D printing on personal web page.
    Chosen to be part of IOPselect, a collection of papers chosen by the editors for their novelty, significance and potential impact on future research.
    (altmetrics for paper).
  2. Evans T.S., 2013. Images of 3D printing output for Sculptures of Complexity – Sculpexityhttp://dx.doi.org/10.6084/m9.figshare.868866
  3. Reiss D.S. and Price J.J., 2013. Source-code for Complex Processes and 3D Printing, https://github.com/doreiss/3D_Print, doi 10.6084/m9.figshare.718155 .
  4. Reiss D.S., 2013. Complex Processes and 3D Printing, project report, http://dx.doi.org/10.6084/m9.figshare.718146.
  5. Price J.J., 2013. Complex Processes and 3D Printing, project report, http://dx.doi.org/10.6084/m9.figshare.718147.
  6. 3D printing used as a tool to explain theoretical physics by Laura Gallagher, Imperial College News and Events, 9th December 2013
  7. 3D-printed models of complex theoretical physics” The IET Engineering and Technology Magazine, 9th December 2013.
  8. Aboufadel E., Krawczyk S.V. and Sherman-Bennett, M. “3D Printing for Math Professors and Their Students“, arXiv:1308.3420, 2013.
  9. Knill O. and Slavkovsky E., “Illustrating Mathematics using 3D Printers“, arXiv:1306.5599, 2013

Little LaTeX Lessons

LaTeX logoI know most of the world uses Word and Microsoft Office products so this post is for the chosen few – the small minority do use LaTeX.
LaTeX produces beautiful documents, with equations if needed which is big selling point for many. LaTeX has been around since the 1980’s and is based on TeX produced by iconic computer scientist Donald Knuth in 1978. Unlike most things in the ethereal world of information technology, it seems LaTeX is not going to die even in the age of tablets (see The revolution will be typeset, D.Steele, Physics World, Jan 2013, p35).
However, LaTeX is not simple to use.  There is lots of help, many free guides which a search for LaTeX guide will turn up, or one of my favourites, Tobi Oetiker’s The not so short guide to LaTeX2e.
What is missing are some of the little things.  At least, these useful snippets may be out there but they are buried under the important things. So I thought I would post my top tiny tips, the little bits it took me ages to discover, the things I see again and again in the reports from my students.
  1. Never leave an empty line after the end of an equation or eqnarray unless you really mean to start a new paragraph after the equation.
    An empty line in regular text always indicates the start of a new paragraph and the next line of text will be indented. Usually equations are meant to be read as being in the middle of a paragraph.
  2. Quotes in LaTeX are built from the two different types of single quotes, do not use the double quote symbol.
    The right way to get quotes is to put one or two single backwards quote characters ` (grave accent, ASCII 96, on a funny key to the left of the number 1 on my UK keyboard) at the start of a phrase, and match with the same number of normal single quotes  (apostrophe, ASCII code 39, with the @ symbol on my UK keyboard). Never use the single symbol for a double quote  (ASCII 34, on the key with the number 2 on my UK keyboard). LaTeX will interpret matched pairs of single quotes as a double quote and will produce a nice result. So use `single quote', ``double quote using pairs of single quotes'' but never use the "double quote characters". Note how these may well appear very similar when rendered on a web page but LaTeX will produce very different results for each.
  3. For labels as subscripts, use the \mathrm{text} to get the text in roman (normal) not italic (slanted) style usually used for maths.
    Thus x_{\mathrm{max}} looks better than plain x_{max}. If you use this a few times, why not define a new command
    e.g. \newcommand{\xmax}{x_{\mathrm{max}}} which is usually placed before the \begin{document}.
  4. Likewise,all the standard functions of mathmatics have their own commands.  The function is then typeset in standard roman so that it stands out from the italic style usually used for maths.  That is use \ln not ln or \cos not cos as the second version sometimes looks like it might be two or three variables, say l and n, multiplied together.
  5. For “much less than” and “much more than” symbols do not use double less than or double greater than signs.
    There are special commands \ll and \gg which look much better than doing << and >>.
  6. To see all the labels used in equations figures, sections etc. while you are writing a document, put a
    \usepackage{showkeys}

    command near the top of the LaTeX file, just after the documentclass command.

  7. Dashes and hyphens:-
    • one for a hyphenated-word,
    • two for a number range 1–2,
    • three for a punctation dash — like this (note spaces either side of the three dashes).
  8. To get the name of the file used to start LaTeX use something like
    \texttt{{\jobname}.tex}}

    To get the names of all the constituent LaTeX files is harder.

  9. I may want to create my own simple symbol by placing two on top of each other. LaTeX has some standard symbols for spaces and it is useful to know some of them for minor tweaks.
    • \; a thick space
    • \: a medium space
    • \, a thin space
    • \! a negative thin space
    • \qquad a large space

    So for example I might write

            \begin{eqnarray}
             A &=& B \, , \qquad B=C
             \\
             I &=& \int_{0}^{1} dx \; x \cos(2 \pi x)
             \end{eqnarray}

Can you game google scholar?

The answer appears to be yes, according to a recent paper entitled Manipulating Google Scholar Citations and Google Scholar Metrics: simple, easy and tempting  by Emilio Delgado López-CózarNicolás Robinson-García, and Daniel Torres-Salinas from the Universities of Granada and Nararra in Spain.  I thought their experiment was illuminating and, while it is an obvious one to try, the results seemed pretty clear to me. The rest of the paper is generally informative and useful too. For instance there is a list of other studies which have looked at how to manipulate bibliographic indices.

For the experiment, six false “papers” were created, with authorship assigned to imaginary author.  Each of the six documents cited the same set of 129 genuine papers.  The cited papers all had at least one coauthor from the same EC3 research group as the authors of the study.  This could generate 774 (=129 x 6) new but artificial citations to members of the EC3 group (more if some papers had more than  one EC3 group member but this is not noted) . These six fake documents were then placed on web pages in an academic domain, much as many academics can do freely. Twenty five days later, these were picked up by google scholar and large increases in citations to the papers of the authors of this study are shown.

The basic conclusion does seem clear.  As its stands it is easy for many academic authors to boost their google scholar counts using false documents.  In that sense, as things stand, it seems one should not use these google scholar counts for any serious analysis without at least some checks on the citations themselves.  Of course, google scholar makes that easy to do and free.

However I do not feel we should rush to dismiss Google Scholar too quickly.  Any system can be gamed.  Useful references are given in the paper to other examples and studies of bibliometric manipulation, in both human edited/refereed sources and in uncontrolled electronic cases.  A major point of the paper is to point out that it is possible in both cases, just that it is much easier to do for web pages and google scholar.  What is less clear from the paper is that the solutions may be similar to those employed by traditional indices of refereed sources.  As the authors point out, the manipulation of the refereed/edited literature can and is spotted – journals are excluded from traditional bibliographic databases if they are caught manipulating indices.  The easiest way to do it is to look for sudden and unexpected increases in statistics.  One should always treat statistics with care and there needs to be some sort of assurance that the numbers are sound.  Looking for unusual behaviour, studying outliers should always be done as a check whenever statistics are being used.  The authors themselves present the very data that should be able to flag a problem in their case.  As they point out, their indices under google scholar went up by amazing amounts in a short time.  Given this indicator of an issue, it would be trivial to discover the source of the problem as google makes it trivial to find the source of the new citations.  Then of course, if such manipulation was being used for an important process, e.g. promotion or getting another job,  it becomes fraud and the research community and society at large already has severe sanctions to deal with such situations.  It may be easy to do but the sanctions may be enough to limit the problem.

So to my mind the main message of this paper is not so much that google can be manipulated easily, but that currently there are no simple tools to spot such issues.  The timeline for the citations to a set of papers, be they for a person, research group or journal, can not be obtained easily.  One can get the raw citation lists themselves, but you would have to construct the time line yourself, not an easy job.

However the same is also true of traditional paper based citation counts.  It is harder to manipulate them perhaps, but it is also hard to check on a person’s performance over time.  I imagine that checks like this will be done and the information to perform such checks will be provided in future for all such alt-metric measures based on information where there is little if any editorial control.

However there is another approach to this problem.  The authors of this paper reflect the focus of google scholar and most other bibliometric sites on the crudest of indices, citation counts and h-index.  Indeed too many academics quote these.  The UK’s REF procedure, which is used to assign research funding, will produce raw citation counts for individual papers for many fields  (Panel criteria and working methods, January 2012, page 8, para 51). This will be based on Elsevier’s SCOPUS data (Process for gathering citation information for REF 2014, November 2011), except for Computer Science where interestingly they claim google scholar will be used in a “systematic way” (Panel criteria and working methods, January 2012, page 45, para 57 and 61).  Yet it is well known that raw citation counts and the h-index these are badly flawed measures, almost useless for any comparison (of people, institutes or subjects) which is inevitably what they are used for.  Indeed where the REF document says citation data will be used, it specifically lists many of the obvious problems in interpreting the data they provide (Panel criteria and working methods, January 2012, page 8, para 51) so I am sure I can hear the sounds of hands being washed at this point.

One solution to the weakness of google scholar citation counts, or indeed counts derived from other sources, is to look for better measures.  For example in this study the six dummy papers will never gain any citations.  An index based on a weighted citation count, such as PageRank, would assign little or no value to a citation from an uncited paper.

Of course any index can be gamed.  PageRank was the original basis of google’s web index and people have been gaming this for as long as it has existed: google bombs, where many false web pages all point to the page being boosted, is the equivalent for web pages of the google scholar experiment performed in this paper .  It is equally well known that google strives to detect this and will exclude pages from its lists if people are found to be cheating.  So google has developed mechanisms to detect and counter artificial boosting of a web page’s rank. There is no reason (except perhaps a commercial one) why similar techniques could not be used on academic indexes.

My google Scholar citation count for the second part of 2012

As few other points struck me as worth noting.  The authors waited for  25 days for google to index their false papers, yet only allowed 17 days for google to remove them.  Slightly odd as data was valid up to the date on the paper, 29th May 2012, yet arXiv submission was made 6 months later.  Pity this information was not updated. There is a much wider debate here on who owns data and if individuals can or should be able to delete personal data e.g. from Facebook.  What exactly does google do if documents disappear   Monitoring my own google scholar counts, there was a massive rise then fall in my counts over a period of about a month in September/October 2012, before the count settled down to pretty much the same trend as it had been earlier in 2012.  It does seem that google Scholar is monitoring and changing its inputs.

As with many experiments of this kind, the ethics are a little unclear.  Interesting to note that authors reported that other researchers, who were not part of the team performing this experiment, noted changes in their citation counts coming from the six fake papers. Would this have been allowed by an ethics committee?  Should it have been run past an ethics committee?  Am I allowed to say what kind of documents are allowed to cite my own papers?  My colleagues in the bibliometrics business suggest there is no legal bar to anyone citing a paper, even if there is no automatic right to use the content of that paper.

And finally, surely the increase in citation counts reported for the authors of this paper should be divisible by 6, as the authors imply the same set of 129 papers was used as the bibliography in each of the six fake papers. Yet the results reported in their Figure 2 are not all divisible by 6.

This paper seems to be telling us that google Scholar is still at an early stage of its development.  However given the resources and experience of google at indices derived from open sources like the web, I would not be surprised if it was soon much harder to manipulate these indices.

Note added: There is a lot of other work on google scholar including other studies of how it might be tricked.  A good source of papers on google scholar, with references to other studies, are the papers of Peter Jasco.

Emilio Delgado López-Cózar, Nicolás Robinson-García, & Daniel Torres-Salinas (2012). Manipulating Google Scholar Citations and Google Scholar Metrics:
simple, easy and tempting Delgado Lopez-Cozar, Emilio; Robinson-Garcia, Nicolas; Torres Salinas, Daniel (2012). Manipulating Google Scholar Citations and Google Scholar Metrics: simple, easy and tempting. EC3 Working Papers 6: 29 May, 2012 arXiv: 1212.0638v1

ResearchBlogging.org

Plotting

Almost all scientists need to plot at some point.  When preparing for publication, journals can be extremely precise about their requirements that can really stretch a plotting package (Physical Review E gets my vote for giving the author the most painful experience on this front).  Yet I still struggle to find the perfect package.  Recent licence changes at my home institution means I have lost access to one of my preferred packages. So this has led me to think about some of the different plotting packages that I am aware of, though some I have had next to no experience.

The spreadsheets are the easiest to use.  I tend to use them for instant data analysis and for playing with ideas. For serious work I find they fall very short. Nothing else is particularly easy to learn so I suggest you just have to pick one.

Cost and licence availability are therefore the biggest influence on my choice. I tend to use packages that I can have on all my computers: home, portable, work.  Also it is good if my students can use these packages as then I can help them.  This means free and preferably open source software is what I tend to use.  Even when there is a licence at Imperial they can be restrictive on who can have it and how many copies. In addition institutions can drop licences at any time.  I currently use R for the statistics and general data analysis so I have had to use its plotting anyway.  Its a bit painful to learn but it does what I want now.  Gnuplot makes for a good standalone package if you don’t need the features of the other packages. Matlab has the best interface I know for changing the look of a plot but I have never really used Matlab and I can’t have it on all my machines.

  • Gnuplot: basically a plotting programme but can do fits and knows about mathematical functions. Free though not open source. Command line driven i.e. needs scripts. Well established so many online examples. Can do very complicated plots. Mathematical formulae can be included in plots using LaTeX style notation.
  • Origin: This is a commercial statistics package. Imperial Physics Dept may have a licence. Looks much more like excel so this is the easiest package to use when manipulating data – the others work through the command line. Plots can also be altered using WYSIWYG GUI interface much like Matlab (though not as nice as MatLab). I have not used this.
  • R Statistics package: analysing statistics is its prime use. Produces good plots and these are easily extended with standard libraries. Command line driven, very well established so lots of help online and many books in library. Its heritage can make it difficult to learn – it is not like C++/Java/Python. Main advantage is that it is free, open source and cross platform. Mathematical formulae can be included in plots. See my page on R statistics pacake for some of the ways I get R to do my plots but also see the R Plots Gallery.
  • Matlab: Numerical analysis heritage with excellent plots. Language is similar to R so its not C++/Java/Python and tricky to learn. Lots of help in books and online. Big advantage is that it does offer considerable chances to manipulate the figures using WYSIWYG GUI interface e.g. change normal/log axes, change fonts of characters so it is very useful for changing a plot for publication.
  • Octave: Open source free Matlab like package but I have not used this.
  • Mathematica: primarily a symbolic manipulation programme and this heritage does not make it as easy to use for plots. It does have a very wide range of other abilities such as numerical solving, graph/network packages etc. Its graphics are considered excellent. The software is very expensive, though it may be cheap enough or free via an institutional licence. Command line driven.
  • Maple: as Mathematica. Graphics generally considered not to be as good but certainly can be high quality.
  • Spreadsheets: Excel or libre/open office. OK for quick look and for easy data manipulation but not for serious work as plot output is just not good enough for most scientific work. At least the libre/open office packages produce pdf and eps.

Myths and Networks

I have just read an intriguing paper by Carron and Kenna entitled the  ‘Universal properties of mythological networks‘. In it they analyse the character networks  in three ancient stories, Beowulf , the Iliad and the Irish story Táin Bó Cuailnge.  That is the characters form the nodes of a network and they are connected if they appear together in the same part of the story. It has caused quite a bit of activity.  It has prompted two posts on The Networks Network already and has even sparked activity in the UK newspapers (see John Sutherland writing in the Guardian Wednesday 25 July 2012 and the follow up comment by Ralph Kenna one of the authors).  Well summer is the traditional silly season for newspapers.

However I think it is too easy to dismiss the article. I think Tom Brugmans posting on The Networks Network has it right that  “as an exploratory exercise it would have been fine”.  I disagreed with much in the paper, but it did intrigue me and many papers fail to do even this much.  So overall I think it was a useful publication. I think there are ideas there waiting to be developed further.

I like the general idea that there might be some information in the character networks which would enable one to say if it was based on fact or was pure fiction. That is if the character networks have the same characteristics as a social network it would support the idea that it was based on historical events. I was intrigued by some of the measures suggested as a way to differentiate between different types of literary work.  However like both Tom Brugmans and Marco Büchler, I was unconvinced the authors’ measures really do the job suggested. I’d really like to see a lot more evidence from many more texts before linking a particular measurement to a particular feature in character networks.

For instance Carron and Kenna suggest that in hierarchical networks for every node the degree times the clustering coefficient is a constant, eqn (2).  That is each of your friends is always connected to the same (on average) number of your friends.  By way of contrast, in a classical (Erdos-Reyni) random graph the clustering coefficient is a constant. However I don’t see that as hierarchical but an indication that everyone lives in similar size communities, some sort of fiction character Dunbar number. I’m sure you could have a very flat arrangement of communities and get the same result. Perhaps we mean different things by hierarchical.

Another claim was that in collaboration networks less than 90% of nodes are in the giant component.  The Newman paper referred to is about scientific collaboration derived from coauthorships which is very different from the actual social network of scientists (science is not done in isolation no one is really isolated). I’m not sure the Newman paper tells us anything about character structure in fictional or non-fictional texts.  I can not see why one would introduce any set of characters in any story (fictional or not) who are disconnected from the rest. Perhaps some clever tale with two strands separated in time yet connected in terms other than social relationships (e.g. through geography or action) – David Mitchell’s “Cloud Atlas” comes to my mind  – but these are pretty contrived structures.

I think a real problem in the detail of the paper, as Marco Büchler points out, is that these texts and their networks are just too small.  There is no way one can talk rigorously about power laws, and certainly not to two decimal place accuracy. I thought Michael Stumpf and Mason Porter’s commentary (Critical Truths about Power Laws) was not needed since every one knew the issues by now (I don’t in fact agree with some of the interpretation of mathematical results in Stumpf and Porter).  Perhaps this mythological networks paper shows I was wrong. At best power law forms for small networks (and small to me means under a million nodes in this context) give a reasonable description or summary of fat tailed distributions found here but many other functional forms will do this too.  I see no useful information in the specific forms suggested by Carron and Kenna.

Another point raised in the text was the idea that you could extract subnetworks representing `friendly’ social networks. That is interesting but really they are suggesting we need to do a semantic analysis of the links in the text, indicating where links are positive or negative (if they are that simple of course) and form signed networks (e.g. see Szell et al. on how this might be done on a large scale http://arxiv.org/abs/1003.5137).  I think that is a much harder job to do in these texts than the simple tricks used here suggest but it is an important aspect in such analysis and I take the authors’ point.

Finally I was interested that they mention other character networks derived from five other fictional sources.  I always liked the Marvel comic character example for instance (Alberich et al, http://arxiv.org/abs/cond-mat/0202174) as it showed that while networks were indeed trendy and hyped (everything became a network) there was often something useful hiding underneath and trying to get out in even the most bizarre examples.  However what caught my eye in the five extra examples mentioned by Carron and Kenna was that they treated these five as ‘fictional literature’.  One, Shakespeare’s Richard III, is surely a fictionalised account of real history written much closer to the real events and drawing on `historical’ accounts.  I’d would have expected it to show the same features as they claim for their three chosen texts.

So I was intrigued and in that sense that always makes a paper/talk worthwhile to me.  However while I was interested I’d need to see much more work on the idea.  You might try many different tests and measurements and see if they cumulatively point in one direction or another – I imagine a PCA type plot showing different types of network in tight clusters in some `measurement’ space.  I’d still need convincing on a large number of trial texts.  These do now exist though, so surely there is a digital humanities project here? Or is it already happening somewhere?

The Pools of Academic Goodwill

Much of academia is not run like a commercial business, or at least not yet. Many of the jobs I do are not paid for by the recipient of my efforts: referee reports for journals, examining of some PhDs and writing references are three examples which come to mind straight away. Rather than being directly paid for these tasks by the recipient, my home institution understands that they are paying for me to spend some of my time on external matters. Of course my university also draws from that pool as do I – journals use referees for my papers, my students need examiners, I needed references to pursue my own academic career. Overall, everything probably balances out.

For some of this work I may get paid, though anyone in the commercial world would probably find the rates at best to be humorous and at worst insulting. For a PhD viva in the UK I get around £150 (around U$200), which is about 24 hours work at the rate of the UK’s minimum wage. I reckon it takes me three working days to read a thesis (if there are no problems and if I am relatively familiar with the work) so that leaves the actual exam unpaid even at minimum wage rates. I was recently an examiner for a PhD in Vienna. The trip alone took more than 24 hours and in this case it was expenses only. Some types of external academic work may have some benefits for me. I read the Viennese PhD thesis from cover to cover: it was a pleasure that I would never have had if I had not been an examiner on this particular thesis. Such detailed reading time is a precious commodity these days. Some of this work can be used to support the case for my own career progression. Here, being an external examiner for another University’s undergraduate programme is an example of a measure of esteem that might count in my favour in a review meeting. Of course the link between such work and promotion is a very tenuous link, while the work itself is quite demanding and invariably underpaid. Again we all do this work as we understand that we need examiners for our own PhD students and for our own undergraduate exams, we will draw from the pool of academic goodwill.

A more interesting case is the value of the work done by academic refereeing for journals which has been estimated at about £1.9bn per year, and £165 million for the UK alone. There is real value in this work spent commenting on academic papers yet while journals charge others  for their service and they make profits, none of the fees charged by journals make it to referees.  So journals also draw on academic goodwill.

In the book Whackademia Richard Hil says that academics are no longer trusted as professionals but are to be monitored and measured. He suggests that before this, internal pressure and support from within an academic community ensured everyone made their contribution, even if this was in different ways. Richard Hil suggests that the new neo-liberal business-like approach encourages an individualism that destroys fails to value contributions to a shared pool of academic goodwill and so actually reduces overall returns.  We have to maximise our individual measured outputs so everything else, useful or not, gets dropped.

So if the UK government, and maybe others, want to push a more business approach on universities they ought to think carefully. Perhaps they should first try to value the cost of business style consultancy over academic goodwill.

 

Power Laws

Why are power laws still in the news? Michael Stumpf and Mason Porter recently published a commentary on the topic in Science (“Critical Truth About Power Laws“).  Yet the nature of long tailed distributions has been a debate in physics since the work on critical phenomena in the 70’s.  This then continued in the context of fractals in the 80s, self-organised criticality in the 90’s and now in terms of complex networks. So the issues should be familiar to theoretical physicists and applied mathematicians working in these areas.  However the field of complexity is multidisciplinary and there is a real need to reach out to talk to those researchers from other backgrounds to explain about what we have learnt in all these debates about power laws.  Often they have their own experiences to add to this debate – financial mathematicians have long been interested in long tailed distributions.  Physicists may understand the subtext behind attention grabbing headlines but there is no reason to think others know how large a pinch of salt to add to these claims. That has certainly been my impression when hearing experts from other fields refer to some of early claims about complex networks. Perhaps the most worrying aspect is that many of the points raised in this age old debate are still not addressed in many modern publications. This is part of the frustration I hear when reading the Stumpf and Porter article. To me this is a good example of the repeated failure in the quality control derived from the current referring system (but that is a debate for another time).

One of the key points made by Stumpf and Porter, though I’ve heard this many times before, is the lack of statistical support behind many claims for a power law. Identifying a power law behaviour is not trivial. I always remember that Kim Christensen at Imperial recommended to me that four decades of data were needed following his experiences with power laws while Stumpf and Porter suggest two.  Many examples fail to pass even this lower limit.

Aaron Clauset,  Cosma Shalizi and Mark Newman provide one popular statistical recipe and code that addresses most issues ( “Power-law distributions in empirical data”  – and check out the blogs on the arXiv trackback).  The approach will produce much more useful results than I’ve seen in several published papers. It might not be perfect though. I think I’ve seen warnings against the use of cumulative distributions as problems in the early part of the data will effect many more data points that issues in later data points.

In fact I would go further. I often reject papers with no error estimates in measured quantities, no uncertainties in data points – I mark our own physics students down for these failings. Why show points and a best fit on a log-log plot except as an illustration? A plot of residuals (differences between data and fit) is far more informative scientifically. Many refereed papers appear to let these things go limiting the impact of their message.

Another part of the debate and a key message in Stumpf and Porter was the meaning of a power law. Most researchers I know realised any hope of universality in the powers seen in network data was misplaced in the early naughties. For me this was one of my first questions and as I did not see it answered in the literature it led me to think about the formation of scale-free networks in the real world.  I wrote up as a paper with Jari Sarämaki from Finland (“Realistic models for the formation of scale-free networks).  Existing models just didn’t explain this at the time, precisely what Stumpf and Porter say is still missing in many discussions about power laws even today.

Fig 2, Evans and Saramäki, arXiv:cond-mat/0411390

Power Laws in a Scale Free network formed using random walks. Even with a theoretical model I fell just short of 4 decades of data. Note the right hand plot showed non-trivial finite size effects hiding in the tail. This was all for networks with one million vertices. Fig 2 of arXiv:cond-mat/0411390

There was a technical point in the Stumpf and Porter article that did catch my eye but in the end left me disappointed.  They highlighted that for random numbers drawn from any long tailed distribution there is a generalization of the central limit theorem.  It tells us that sums of many random numbers drawn from such distributions will lie in distributions with a power law tail (you have to define all this carefully though). However this is not as powerful as it sounds. The definition of a long tailed distribution in this context is one with a power law tail. Seems circular to me.

Less seriously I thought the sketch provided in the Stumpf and Porter Science article was a bad example to set in an article about bad representations of data. One axis is a qualitative judgment while the other is an unspecified statistical measure. The ‘data points’ on the plot are mostly unspecified. The one marked Zipf’s law is presumably for the classic data on cities though which of many data sets this on this topic I’m not sure. What I think was intended was a sketch to indicate the subjective judgments of the authors on the nature of power laws which have been suggested in different fields. This would be terms of their two main themes: the statistical support for power laws, and the modelling and interpretation of results.  In the end the sketch given didn’t convey anything useful to me.

Still these are minor quibbles. If an article can keep the Imperial Complexity and Networks programme’s weekly meeting talking for an extra half an hour, a discussion led by Gunnar Pruessnar from Imperial, it has got to be doing something good.

Simply the best?

Social Network  Analysis (SNA) dates back over fifty years. The serious side of this work is that a network representation is a great simplification of complex social interactions but hopefully one which captures a key aspect  – the bilateral relationships. Mathematical measures can then be applied to the tangle of connections in order to reveal key features. One important use of SNA is to identify who is the most important person. Defining what is meant by “important” is an important part of the question with many different answers for different contexts.

However it also means we can have a little fun. In a frivolous mode we can just apply the various tools SNA provides (PageRank, betweenness and so forth) to produce instant answers to our favourite collection of individuals.

A recent study of bibliographies on Wikipedia (Biographical Social Networks on Wikipedia – A cross-cultural study of links that made history) was based on publically available data from dbpedia.  There is a serious point to the paper which contains many interesting conclusions especially about the relationships between different languages. However at a trivial level I couldn’t help but be drawn to a table of the “top 25 persons” which is given below.  The top 5 were (in descending order by in-degree) George W. Bush, Barack Obama, Bill Clinton, Ronald Reagan and then Adolf Hitler. Of course this tells us as much about who is using the English Wikipedia system as anything else.  In fact looking at the top 25 shows most are recent US presidents. Its those who are not who caught my eye.

First, in this age of celebrity,  there are very few modern entertainers, just Elvis Presley, Frank Sinatra, Bob Dylan, Michael Jackson (again in descending order).  They are,  though, all American.

At least us Brits are the second most popular nation as we get William Shakespeare, Winston Churchill and the Queen.  In fact it is probably the continuing obsession with the Second World War in the America and Britain, and not their tyrannical nature, that sees two other foreigners make it into the list, with Hitler and Stalin appearing along with Churchill.

Religion makes a surprisingly small contribution given the US bias. We only have Pope John Paul II and Jesus as top 25 sort of guys, though the last Pope is higher than his boss.  Still while Jesus is near the bottom it still puts him above any of the Beatles,  ‘disproving’ John Lennon’s famous quip.

I also thought it was interesting that anything before  1900 is clearly ancient history.  Lincoln is the only 19th century representative while Shakespeare and Jesus are the only contributions from earlier times.

Perhaps the saddest comment on our times is the lack of any women.  I’m not sure if the Queen is there because of what she has done with the job or because of the fact she is head of state.  The latter reason is a just matter of chance although even there her sex made it harder to be in the position (male heirs are preferred over female heirs) .  The only other woman, at number 25 in fact, is Hilary Clinton.

The top 25 persons in the English Wiki- pedia ranked by in-degree.

The top 25 persons in the English Wiki- pedia ranked by in-degree. Taken from Table 2 of arXiv:1204.3799.

Open access academic publishing

Open access academic publishing made it into a UK national newspaper, both as an article (“Government welcomes calls to open up science“, The Guardian, 11 April 2012), as an editorial (“An open and shut case“, 11 April) and in it’s letter column (“Better models for open access“, 15 April). Yet for a working model of open and low cost academic publishing arxiv.org has provided a successful example for over twenty years. Authors place articles for free, articles are open for all to read and index, while the minimal costs are covered directly by research funding agencies. Readers decide what is worth a look, posterity decides what is good.

My experience with arXiv (the X is meant to represent the greek letter chi so that this is pronounced like archive) is that it provides me with everything I need. Why pay for the editorial staff when I provide journals with camera ready copy? Why pay for paper copies when no one uses them? The arXiv brand is now better known in theoretical physics than any single journal.  There are no referees for my arXiv articles but I find most referee reports of limited use. Instead I provide input to other authors electronically when I think I have something constructive to say for which I am paid the same as my reports for journals, i.e. zilch.  Google Scholar, Microsoft Academic Search and other social networking sites exploit the open nature of arXiv to provide citation tracking and other search tools.

The only reason I use a journal for my work is that the bodies funding my research persist in using the publications in a journal and the citations to my journal article from other journal articles as a measure of the quality of my output. Thus it is the funders themselves who perpetuate a system in which they use scarce funds to support an old fashioned, expensive and unnecessary system for the propagation of research results.

Part of a screen shot from arXiv

Richard Dimbleby Lecture 2012 – The New Enlightenment (Sir Paul Nurse)

This year’s Richard Dimbleby Lecture was given by Sir Paul Nurse, President of the Royal Society of Great Britain.  The tenet for Sir Paul’s lecture was that science and investment in science has given greatly to the society that we live in today – not just for scientists but also for technology and the economy and far beyond.  He makes a well-reasoned argument, which politicians would be wise to listen to.  Historically, scientists and researchers have been looked up to in society and have been considered to be those whose work underpins our society.  I think that to some large degree that perception is in the descendant and has been chipped away or damaged over the years.  Certainly, the act of working in science is getting forever harder and more and more time is spent on administration, teaching, form filling and justification of funding proposals with less and less time actually spent on science itself.  “The business of science” is now arguably a larger industry than “the practice of science”.  A recent article in the Guardian by Professor Mike Duff of Imperial College London would appear to agree with this last point.  Basic science is coming under fire and only popular science remains.  Britain has always been a leader in innovation, it would be sad to lose that in favour of following the crowd.

« Older posts Newer posts »

© 2024 Netplexity

Theme by Anders NorenUp ↑