Why Google scholar is no worse than anything else

Just read an interesting couple of blogs by Stacy Konkiel from the ImpactStory team, one entitled “4 reasons why Google Scholar isn’t as great as you think it is” nicely followed up by “7 ways to make your Google Scholar Profile better” which keeps things constructive. My immediate response to the first blog is that the problems are not unique to Google scholar and that you could highlight the same issues with most bibliometric sources. Certainly the criticisms also apply to the two other commercial bibliometric data sources, Scopus and Web of Science. The four points were as follows.

1) Google Scholar Profiles include dirty data

What is “dirty data”? A site like Impactstory pulling in data from a wide range of non-traditional sources ought to be a little more careful about throwing this term about! One person’s dirty citation is another’s useful lead. It does seem Google scholar is more open to gaming but at least it is easy to spot this using Google scholar if you see some anomalous data. Scopus and Web of Science make their decisions behind closed doors about what to include and what not; how many ‘weak’ journals and obscure conference proceedings are included there, how many book citations are excluded? I’ve heard at least one story about the way bibliometric data was used as a pawn in a dispute between two companies over other commercial interests. I just have no idea how much manipulation of data goes on inside a commercial company. On the altmetrics side of the story, most departments still regard any social media counts as dirty.

2) Google Scholar Profiles may not last

Surely a problem with anything, commercial or not. Your institution may switch subscription and cut off your access even if it is still out there. Google certainly has poor reputation on this front. In so many ways, we always gamble when we invest time in a computer product – not sure my PL/1 programming knowledge is much use these days.

3) Google Scholar Profiles won’t allow itself to be improved upon

Scopus and Web of Science also carefully control what you can do with their data. In any case you need a subscription before you can start to do anything. So surely this is a criticism of all closed data systems.

4) Google Scholar Profiles only measure a narrow kind of scholarly impact

Again, I don’t see Scopus and Web of Science producing much more than bare citation counts and h-indices. The UK 2012 research assessment procedure (REF) only quoted bare citation counts from Scopus. This is a problem of education. Until more people understand how use bibliometric data nothing much will happen and I know h-indices still get thrown about during promotion discussions at my institution (again see an Impactstory blog about why people should stop using the h-index).

My Approach

I tend to think of all sources as data. Your interpretation should vary as you take each into account. Like all data and measurements, results derived from bibliometric information needs to be checked and validated using several independent sources and alternative methods.

For instance I have access to these three commercial sources, and they tend to give citations counts which differ. Web of Science is generally the most conservative, Scopus is in the middle and Google scholar leads the counts. So I can use all three to give a balanced view and to weed out any problems. They also have different strengths and weaknesses. Google is ahead of the curve and shows where the other two will go a year or two later. My work on archaeology has a large component in books which Google scholar reflects but the other two fail to capture. Scopus is very weak on my early Quantum Field Theory work, while both Web of Science and Google scholar are equally strong in this area and time period.

The tips discussed in “7 ways to make your Google Scholar Profile better” are very useful but many apply to all data sources. For instance Scopus just added two papers by another T.S.Evans working near London to my profile, even though its in a completely different research field, the address is completely different (not even in London) and there is basically zero overlap between these papers and my work. Makes you worry about the quality of the automatic detection software used in commercial bibliometric firms. I can’t fix this myself, I have to email Scopus while I can tidy up Google scholar myself whenever I want. Currently I also feel that the Google scholar recommendations are the most useful source of targeted information on papers I should look at but I am always looking for improvements.

Overall, I feel you need to keep a very balanced approached. Never trust a statistic until you’ve found at least two other ways to back it up independently.