Introduction to Bibliometric Data Sources
Slides by: Nicolas Robinson-Garcia
What will we learn?
🤓
The last 20 years have witnessed an explosion of new scientific data sources
1964
2000
2004
2011
2012
2013
2017
2015
2018
2022
2009
But we only have 45 minutes to go through them…
… so we will focus on these four ⬆️⬆️⬆️
1964
2000
2004
2011
2012
2013
2017
2015
2018
2022
2009
😥
It is not only about bibliometric data sources, but about accessibility and manipulation of the data
But still there are issues of lack of transparency and reproducibility hindering a responsible use of metrics
In this session we will focus in just four of these new data sources
Indexing and coverage
Indexing and coverage
How can we download the complete list of journals?
Indexing and coverage
How can we download the complete list of journals?
Indexing and coverage
How can we download the complete list of journals?
Indexing and coverage
How can we download the complete list of journals?
😋
Journal Citation Reports
The JCR is a valuable source of journal-level metrics
Let’s explore the information provided
Journal Citation Reports
The JCR is a valuable source of journal-level metrics
Let’s explore the information provided
Click on the filtering options to refine your results
Journal Citation Reports
The JCR is a valuable source of journal-level metrics
Let’s explore the information provided
Click on the filtering options to refine your results
Add up to 23 different indicators
Journal Citation Reports
Let’s explore the information provided
Click on the filtering options to refine your results
Add up to 23 different indicators
Download the top 600 journals!
The JCR is a valuable source of journal-level metrics
Metadata
The bread and butter of bibliometrics
Metadata
Bibliographic fields are key to:
The bread and butter of bibliometrics
Data retrieval process
Indexing and coverage
Indexing and coverage
Journal-level metrics
Scopus includes three types of journal level metrics:
Metadata
Metadata - Author profiles
Data retrieval process
Data retrieval process
Indexing and coverage
Unlike other data sources, Google Scholar is a search engine and not a database.
This means that data in Google is dynamic and volatile.
Also, it means that there is no quality control of the metadata or the indexed records
Indexing and coverage
Including predatory publishers
Any PDF document, article like falling from the domain of universities, research centres, etc.
Although there are certain technical criteria they need to fulfil. For instance, Zenodo, the EU repository, is not indexed in Google Scholar
Which sources does Google Scholar index?
Indexing and coverage
Usability
Unlike other scientific data sources, Google Scholar’s interface is a simple search engine with limited ‘advanced search’ options
These options are also available through the use of commands, e.g., author: allintitle: source:
Usability
It also identifies and merges different versions of a document including preprints and OA versions
Unlike other scientific data sources, Google Scholar offers direct access to the full text of documents when available
Google Scholar Citations
One of the most known services by scientists is GS Citations. Let’s have a close look at it to find how it controls for documents and authors
Google Scholar Citations
Metadata
Authors with a GS profile have unique identifiers.
Record data is structured, signaling the use of metadata, but this is many times incomplete
Records also seem to have unique identifiers but they differ by profile and from the Google Scholar search engine
The fact that DOIs are not included makes it difficult to accurately link records from different profiles
Data retrieval process
One of the main issues of Google Scholar is the lack of a download option or API for data collection
This does not mean data cannot be retrieved, but it has a cost
Publish or Perish
Created in 2007 by Anne-Wil Harzing, this software allows to download up to 1,000 records from a Google Scholar search
Origins
Microsoft Academic Graph, the direct competitor to Google Scholar, offered a fully open knowledge graph to over 225M records.
Although not as popular as Google Scholar among users, it became a promising data source for bibliometricians as it included the perks of both Google Scholar and traditional bibliometric databases (i.e., WoS, Socpus)
Origins
In 2022, OpenAlex was released it.
It does not only feed from the MAG project, but combines information from a variety of data sources.
But in 2021, Microsoft decided to discontinue the project…
As a response to that, the Arcadia Foundation funded OpenAlex, a project led by non-profit organization Our Research, to continue with the project.
Source description
Metadata
Although OpenAlex is built upon MAG, it has greatly improved the quality of its metadata
But of course, there is room for improvement
The project is still in its very early stages
🤌
Metadata
Since spring 2024 it includes an interface which is in constant transformation.
Metadata
Expect to find poorer quality in the metadata as well as incomplete bibliographic records.
Data retrieval
Data retrieval
Let’s go through its different options and features…
Data retrieval
Let’s go through its different options and features…
Data retrieval
Let’s go through its different options and features…
Data retrieval
Let’s go through its different options and features…
Data retrieval
Let’s go through its different options and features…
Which database should I choose?
Things to consider:
Final thoughts
Thank you, questions?
☝️
Slides by: Nicolas Robinson-Garcia