Anonymous reviews and our (in progress) comments for Altmetrics in the wild: Using social media to explore scholarly impact, submitted to the PLoS ONE altmetrics collection. Posted with permission from the academic editor.

Reviewer 1

  1. the references to the analysis section were quite sparse.  

We have added references to the analysis section.

  1. The authors provide one signal processing technique and then move on without explanation as to why they chose this method [..] more explanation is needed.

We have added explanation and references to support why this method was chosen.

  1. how much is the covariance matrix being affected by the normalization technique?  This was difficult to assess since it wasn't clear to me what the how and why of their normalization.

We have added a sentence to confirm that the results are not heavily affected by the normalization technique within a short timeframe, and have better described the how and why of the normalization.

  1. figure 1 added little to the explanation.

The weights displayed in Figure 1 are now discussed more extensively; it is a visual that is now more likely to help the , so we believe Figure 1 is now a more effective visual touchstone for readers.

  1. Example:  "First-order exploratory factor analysis was performed with the fa function in the psych library on this overall correlation matrix, using the minimum residual (minres) solutions and a promax oblique rotation."  What is a fa function?  Why use it here?  Promax oblique rotation?  Huh?  Explain.
  2. Example:  "Cluster membership was assigned? based on Euclidian distance to these centers, and rules to assign these same cluster memberships were derived using the Weka JRip algorithm [80] through rWeka.  Evaluation? was done using the evaluate_Weka_classifier function."  Again, there is no explanation as to why this algorithm was used among the many other algorithms that could have been used.  There is no explanation on what this algorithm measures or is supposed to do.
  3. Table 3 caption:  "Loading of variables onto exploratory rotated factors."  Jargon free explanations would be more effective in places like this.
  4. Statistical descriptions need to be more specific.  Even on something as basic as explaining why log transforms were used (pg 6), the authors explain that log transforms were used to "even out the distributions."
  5. Pg 9, ln 15:  The authors are interpreting and using "long tail" incorrectly.  If I am reading Fig. 5 correctly, the long tail of the distribution consists of a small number of distinct event creators contributing a large number of events.

  1. Don't include references with library proxies.  Example: reference 10
  2. The authors use Web of Science citations from 2010 and 2011.  Would it be possible to do the same for the other indicators?
  3. Does the negative twitter-citation correlation in Fig. 12 mean that pbio papers that are twittered are less likely to be cited?

Reviewer 2

This manuscript entitled "Altmetrics in the Wild: Using Social Media to Explore Scholarly Impact", is exactly what the title means: an explorative study. Despite presenting a lot of data, this paper presents numerous problems.

Despite a promising title, the Authors do not suggest any kind of new or innovative bibliometric index,

We are glad this reviewer finds the title promising, but fear s/he has not clearly read what it promises. The title is not “A new bibliometrics index based on...” but rather “...using social media to explore scholarly impact.” The title says we will explore. And we do. We are puzzled as to why this reviewer read the title and yet expected something else. Whatever the reason, this misunderstanding seems to have hindered the reviewer’s ability to understand the manuscript at a fairly basic level.

but simply present a global and static picture of the web exposure of some papers.

It is not clear what this reviewer means by “static,” since change in both impact and social media patterns over time is a major focus of the manuscript.

 This does not propose any kind of operative model, nor any potentially useful homogeneous index.

This is correct, but it is not a failing. The reviewer seems to have conflated “useful” with “proposes a homogeneous index.” There is indeed a rich tradition in scientometrics of proposing new indices based established data sources. However, the reviewer seems unaware that there is also a strong tradition of exploratory research investigating the properties and distributions of emerging data sources. This was true in the early days of citation research, before it was established as a reliable and data source [cite]. It has also been true of research into usage data [cite], and other research into altmetrics [cite].

There is no point in proposing homogeneous indices if one does not understand the basic properties of the underlying data. This paper helps advance this understanding of underlying data. We look forward to seeing homogeneous indices based on altmetrics--perhaps this reviewer may even create one. But such indices must rely on the early exploratory and descriptive efforts of this and other studies.

As mentioned by the Authors, the use of social media is highly heterogeneous ("Because different communities have different levels of social media adoption and readership levels"). The use of social media can hardly reflect any form of scholarly impact,

We respect the reviewer’s right to have an opinion on this. We are however concerned to see this opinion offered as fact, with no supporting evidence of any kind--indeed, in opposition to both evidence presented in this paper, and a growing body of evidence from elsewhere in the literature [cite]

 but rather reflect a general exposure at a given time of a given paper, supposedly outside of the scientific community.

This seems to be based on an incomplete reading of the manuscript. We clearly argue that sources like Mendeley (which seem used almost exclusively by scholars) reflect usage by the scientific community, rather than the general public.

 I don't see how any of these results could be generalized

The generalizability of this study is indeed a concern, particularly given that PLOS, as an OA publisher, is atypical in important ways. On the other hand, it is precisely PLOS’ exceptionality that makes this study possible at all--no other publisher has provided a similar open altmetrics dataset.

Early studies like this one will be important in encouraging a robust infrastructure for altmetrics data from multiple publishers. Until then, this preliminary data helps establish methods for later research, as well as uncovering interesting trends in the data we do have access to.  The alternative--simply ignoring the data until it’s exactly what we want--seems rather worse.

Our approach is far from unusual in the scientometrics literature, where there is a strong tradition of exploring early convenience samples to help promote more expansive research later. [cite]

We address this and other limitations at greater length in the limitations section on page 13.

The term "altmetrics" used here is highly problematic. It would imply that there is only one principal metric (citation index). That is however not the case: science bibliometric relies on other index too, among them (but not limited to), are the journal Impact Factor, the half-life of citation of journal articles, the H-index, the various corrected H-index.

This is a deeply puzzling statement. The reviewer first makes the point that there are scientometric indicators other than citation. S/he than goes on to list three indices which are based on citation. This does not suggest a thorough understanding of the field.

What’s oddest is that there are in fact ample examples of scientometrics indicators that are not based on citation--in fact, we list many of these in our literature review (paragraph 2). We also describe why altmetrics, which has increasingly been adopted as a term of art in scientometrics, is an appropriate descriptor of the measures we describe later. It may be that the reviewer simply did not read the manuscript closely enough to notice any of this.

Furthermore, it would imply that the "altmetrics" measures proposed here are uniformed, which is clearly not the case.

This is, unfortunately, precisely the opposite of what we suggest. We argue that the value of altmetrics is in building a “a nuanced, multidimensional view of multiple research impacts at multiple time scales.” (p2)

Finally, "altmetrics" would imply that it is the only alternative metrics possible, which is clearly not the case either.

We are unaware of this implication. In fact, we again make the opposite claim: that it will be important to gather additional indicators of impact: “Much work to expand this research will center around...isolating and identifying different types of impacts on different audiences. Most obviously, this will involve investigating altmetrics in other contexts, with more sources.” (p14)

The introduction of this term in the text is as follow:

"So-called "alternative metrics" or "altmetrics" [4] build on information from social media use (?)"

Ref 4 is a Twitter post done by the first author of this manuscript! A new terminology for scientific measure can not be solely based on a tweet!

We appreciate that the this reviewer has quite strong opinions on this matter. However, simply asserting one’s opinion (even with exclamation points!) is less convincing than the reviewer may have imagined it to be.

We do not share this reviewer’s opinion. Neither do the MLA and APA have published citation styles for Tweets:

More to the point, the reviewer seems to have misunderstood what citation is for. If the world “altmetrics” had been first used in a peer-reviewed publication, we would have cited that publication. It so happens, however, that the word was first used in a tweet. So we have cited that tweet.

Methods present numerous short-coming and conceptual and practical issues, which lead to problems in the potential of interpretation of the results. I will only mention few of these issues.

"24,331 PLoS publications between 2003-08-18 and 2010-12-23."

This study oscillates between a "longitudinal" and a "transversal" design. However, it can hardly do both of them. The Authors have to make a decision to clarify their protocol and exact goal. Social media have been used very differently over the last decade, the use made in 2003 is not comparable, in ANY WAY, to the use made of social media in 2010.

This is a very legitimate concern. The paper applied a time-based normalization to address this concern. We have clarified our normalization explanation in the Methods section to help readers understand our approach to solving this problem.

 In such context, the "longitudinal" aspect is very hard to analyse: there are two opposite, competing, and simultaneously occurring bias which strongly impair any possibility of doing such analysis without taking into account the evolution of users behavior over the last decade: first the oldest papers get a lower chance of exposure, due to the dramatically lesser use of social media in the beginning of the first decade of this millennium; second, the most recent papers get a "novelty effect" and are more likely to get bigger exposure.

I doubt there is an objective mathematical way to overcome these two biases, which strongly affect the whole theory of this paper, but the Authors should consider it.

This reviewer has explained the problem well. There is in fact “an objective mathematical way to overcome these two biases” and we have used it. See methods section for an improved and clarified explanation of how this was done.

"Twitter" was created in March 2006! How would it be possibly possible to compare the Twitter exposure of papers published in 2003 (3 years BEFORE Twitter birth), and papers published in 2010 (4 years after Twitter birth, when the platform was popular and widely used)? It is both a major methodological and conceptual turn-off of this work. Similar consideration could be mentioned regarding the other measures proposed.

This is indeed a concern for this and future altmetrics research. We employ a normalization procedure specifically to address this concern--indeed, we believe this procedure is one of the more valuable contributions of our study, and hope it will be used in future altmetrics work. We have improved our explanation of this normalization procedure in the methods section.

For instance, the F1000 (Faculty of Thousand) issue ... At its beginning, the F1000 projects gathered only the 1000 top-scientists of the world, while nowadays, it gathers above 10,000 reviewers, including junior scientists (sometimes just postdoctoral fellows) working with the main F1000 people. Therefore, the potential of F1000 to review or comment a recent paper is way higher than it was, due to the much larger corpus of potential reviewers. In the other hand, the real impact on science of F1000 is now way lower than it used to be, due to the lesser influence the members have compared to the original project.

This is certainly an interesting conjecture. We would be eager readers of any empirical study presenting actual data supporting it.

The choice of mathematical normalization is arbitrary. The full theoretical rationale supporting the strategy presented (partially) in the material and methods section is fully missing, leading to the impossibility for any reader to assess whether or not the strategy and mathematical tools used by the authors are appropriate.

The manuscript proposes a new kind of "metrics", however it describes values which are highly susceptible to changes (blogs' mention, twitter posts, and so on ...). No possibility of control over the quality or impact of the cited article is existing. The proposed metrics do not account the quality nor the impact of a paper, but simply its "non-scientific" exposure. In the present context of attempts of optimisation of science evaluation performed by numerous Western countries governments and agencies, and of the needs of scientists for objective measurement of the impact of their work, the present manuscript goes backward. The proposed "metrics" value is highly questionable, since it will be impossible to evaluate, and rather complicated to replicate. The description of the "static picture" of the papers evaluated is very important... but seems to be rather useless.

This manuscript has 13 Figures. The Authors could have made an effort to summarize their results and present them after an appropriate filtration, in order to emphasise the most significant and relevant results. The paper does present a large amount of raw data, but these data fails to go further than fair description, and clusters presented here rely on a priori segregation.

Reviewer 3

The article concerns a very interesting subject with growing concerns in the scientific community. It deals with the alternative (through web 2.0 services) way to evaluate scientific impacts. They use the PLoS journals database to make a serious and cautious analysis, comparing more classical impacts like Web of Science and new ones like presence in Mendeley Librairies, CiteULike cites and other. However a longitudinal study would have been also interesting to illustrate maybe the diffusion of such new means for scientific diffusion, they take into account the dynamical part through a separate analysis of the corpus before 2010 and in 2010. Their conclusions, however very cautious on the subject, show that several classes of papers tend to appear using quite different impact modes.

I would therefore recommend to publish the current article, once the authors will have taken into account the following minor remarks:

- the main criticism concerns the packaging of your approach, it would be good to have more than a statistical analysis of the corpus (even well conducted and interesting) and you should put some effort to precise the motivation, aims and context of your study in introduction

 (avoiding name dropping without much justification like references to Kuhn and Merton)

and go further in the interpretation of your results in a discussion part (some elements from the future research section could also be put there).

minor remarks:

- Ref [4] is inappropriate

- I didn't have access to tables S1 and S2

- Fig.8: you could figure different trajectories from different profiles identified from the cluster analysis

- Table 5: you should add the associated error, corresponding to classification errors

- just before table 5, you should name the class C just as you did for other classes

- table 2: the last column should be Sum and not NA

- Fig3: you should put it as a cumulated histogram

- Fig7 is difficult to read as such, you could probably put the main graphs and put the others in appendix

-  Fig.9: you should add a legend