1 | Title | URL | I3 member author? | Description | Terms of use | Timeframe | Documentation | Performance/error metrics | Citation | Open-source code | Versioning | API or Bulk downloads | Keywords associated with this dataset | Datasets and publications using this dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | PatCit | https://doi.org/10.5281/zenodo.3710993 | Yes | In-text and front page citations to non-patent literature + in-text patent citations, extracted and parsed using NLP techniques. Open source project | CC-BY 4.0 International | 1836-2018 | https://cverluise.github.io/PatCit/ | yes | Cyril Verluise, Gabriele Cristelli, Kyle Higham, Lucas Violon, & Gaétan de Rassenfosse. (2020). PatCit: A Comprehensive Dataset of Patent Citations (Version 0.3.1) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4391095 | https://github.com/cverluise/PatCit | Yes | Bulk | Citations, In-text, Front page, Patent, Science, Database, Wikipedia, Standard | https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3754772 | ||||||||
3 | Chilean IP and firm data | https://eml.berkeley.edu//~bhhall/Chile_ipdata.html | Yes | These data are a public release from a joint WIPO-INAPI project. | ? | 1995-2005 | https://eml.berkeley.edu//~bhhall/Chile_ipdata/chile_inno_ip.txt | Abud, M.J., Fink, C., Hall, B. and Helmers, C., 2013. The use of intellectual property in Chile (Vol. 11). WIPO. | Bulk | |||||||||||||
4 | Chinese Patent Data Project | https://sites.google.com/site/sipopdb/home/sipo---asie | In this project, patents from China's State Intellectual Property Office (SIPO) are matched to various types of companies. Matching SIPO patents to firms in the Annual Survey of Industrial Enterprises (ASIE) of China's National Bureau of Statistics. | Bulk | ||||||||||||||||||
5 | Reliance on Science in Patenting | https://zenodo.org/record/3575146#.XfQZMWRKiUk | Yes | This contains citations from the front pages of worldwide patents to articles in the Microsoft Academic Graph (MAG) from 1800-2018. | Open Data Commons Attribution License v1.0 | 1834-2019 | https://zenodo.org/record/4235193#.X6Fgb5CSm38 | Yes | Marx, Matt and Aaron Fuegi, "Reliance on Science: Worldwide Front-Page Patent Citations to Scientific Articles" | https://github.com/mattmarx/reliance_on_science | Yes | Bulk | error margins | |||||||||
6 | Japanese Patent Office | http://www.iip.or.jp/e/index.html | Patent database of the IIP | Only for use by academic research institutions and other institutions for academic research purposes, cannot be used for commercial purposes. | 1964-9/2019 | State that you used: III Patent DB | Bulk | |||||||||||||||
7 | MIT Scholarly Works 1950-2018 | https://lens-public.s3-us-west-2.amazonaws.com/sloan/scholarly/201932/mit_scholarly.zip | Yes | Scholarly works produced by MIT 1950-2018 | ||||||||||||||||||
8 | MIT Scholarly Works Cited by Patents | https://lens-public.s3-us-west-2.amazonaws.com/sloan/scholarly/201932/mit_scholarly_cited_by_patents.zip | Yes | MIT Scholarly Works Cited by Patents 1950-2018 | ||||||||||||||||||
9 | Patents Citing MIT Publications | https://www.lens.org/lens/search/patent/list?collectionId=22790&p=0&n=10 | Yes | This collection encompasses patents that cite the scholarly works of Massachusetts Institute of Technology. | ||||||||||||||||||
10 | Disambiguation and Co-authorship Networks of the U.S. Patent Inventor Database | https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/5F1RRI | Name disambiguation of US inventors, 1975-2010 | CC0 - "Public Domain Dedication" | Ronald Lai; Alexander D'Amour; Amy Yu; Ye Sun; Lee Fleming, 2011, "Disambiguation and Co-authorship Networks of the U.S. Patent Inventor Database (1975 - 2010)", https://doi.org/10.7910/DVN/5F1RRI, Harvard Dataverse, V5, UNF:5:RqsI3LsQEYLHkkg5jG/jRg== [fileUNF] | https://github.com/funginstitute/downloads | coauthor network | |||||||||||||||
11 | The careers and co-authorship networks of U.S. patent-holders, since 1975 | https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/YJUNUN | The identification enables construction of social networks based on patent co-authorship. We will eventually provide descriptive statistics of individual and collaborative variables and illustrated examples of networks for an individual, an organization, a technology, and a region. The data and code will be publically available for community use and improvement and will enable updating as frequently as new patents are issued. | CC0 - "Public Domain Dedication" | Ronald Lai; Alexander D'Amour; Lee Fleming, 2010, "The careers and co-authorship networks of U.S. patent-holders, since 1975", https://doi.org/10.7910/DVN/YJUNUN, Harvard Dataverse, V3, UNF:5:daJuoNgCZlcYY8RqU+/j2Q== [fileUNF] | coauthor network | ||||||||||||||||
12 | Penn World Tables | https://doi.org/10.15141/S50T0R | PWT version 9.1 is a database with information on relative levels of income, output, input and productivity, covering 182 countries between 1950 and 2017. | CC 4.0 | 1950-2017 | Feenstra, Robert C., Robert Inklaar and Marcel P. Timmer (2015), "The Next Generation of the Penn World Table" American Economic Review, 105(10), 3150-3182, available for download at www.ggdc.net/pwt | ||||||||||||||||
13 | Worldwide Count of Priority Patents | http://www.gder.info/download_wwc_excel.html | The goal of the project was to produce a dataset of priority patent applications filed across the globe, allocated by inventor and applicant location. | De Rassenfosse, G., Dernis, H., Guellec, D., Picci, L., & van Pottelsberghe de la Potterie, B. (2013). The worldwide count of priority patents: A new indicator of inventive activity. Research Policy, 42(3), 720–737. doi:10.1016/j.respol.2012.11.002 | http://www.gder.info/download_wwc_mysql.html | |||||||||||||||||
14 | Geocoding of worldwide patent data | https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OTTBDX | CC0 - "Public Domain Dedication" | Seliger, Florian; Kozak, Jan; de Rassenfosse, Gaétan, 2019, "Geocoding of worldwide patent data", https://doi.org/10.7910/DVN/OTTBDX, Harvard Dataverse, V5 | https://github.com/seligerf/Imputation-of-missing-location-information-for-worldwide-patent-data | geography | ||||||||||||||||
15 | On the price elasticity of demand for patents | http://www.gder.info/download_OBES_data.html | Fees since 1980 at the European (EPO), the US and the Japanese patent offices. | Rassenfosse, G. de, & Potterie, B. van P. de la. | patent demand | |||||||||||||||||
16 | Patents arising from U.S. government funding | https://zenodo.org/record/3369582 | Dataset of patents arising from government funding since the year 2000. | CC-BY 4.0 International | 2000-2019 | de Rassenfosse Gaétan, & Emilio Raiteri. (2019). 3PFL: Database of Patents and Publications with a Public-Funding Linkage (Version 1.2) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3369582 | ||||||||||||||||
17 | PATSTAT | https://www.epo.org/searching-for-patents/business/patstat.html#tab3 | PATSTAT contains bibliographical and legal event patent data from leading industrialised and developing countries. This is extracted from the EPO’s databases and is either provided as bulk data or can be consulted online. | Requires a subscription to access | PATSTAT | patstat cookbook' by Gaétan de Rassenfosse https://onlinelibrary.wiley.com/doi/full/10.1111/1467-8462.12073 | ||||||||||||||||
18 | Lens.org | https://lens.org/ | Yes | Lens serves nearly all of the patent documents in the world as open, annotatable digital public goods that are integrated with scholarly and technical literature along with regulatory and business data. The Lens will allow document collections, aggregations, and analyses to be shared, annotated, and embedded to forge open mapping of the world of knowledge-directed innovation. | Cambia grants you a non-exclusive, non-transferable, revocable, limited license to access and personally use the features of the Service. The conditions by which The Lens data may be used are intended to resonate with the principles of Creative Commons Attribution licenses with a public benefit element. | Please use the expression 'Enabled by The Lens' or 'Data Sourced from The Lens' and the Lens.org URL. | ||||||||||||||||
19 | Microsoft Academic Graph | https://academic.microsoft.com/home | The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences, and fields of study. | ODC-BY | Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839 K. Wang et al., “A Review of Microsoft Academic Services for Science of Science Studies”, Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045 | Microsoft Academic | ||||||||||||||||
20 | Crios‐Patstat Database | http://download.unibocconi.it/ICRIOS2018/icrios201806.rar | Disambiguated inventor's and applicant's names for EPO records. | EPO License | Coffano, M., & Tarasconi, G. (2014). CRIOS - Patstat Database: Sources, Contents and Access Rules. SSRN Electronic Journal. doi:10.2139/ssrn.2404344 | |||||||||||||||||
21 | NBER US Patent Citation Datafile | https://sites.google.com/site/patentdataproject/Home/downloads | The main dataset extends from Jan 1, 1963, through december 30, 1999, and includes all the utility patents granted during that period. The citations file includes all citations made by patents granted in 1975-1999. | The main dataset extends from Jan 1, 1963, through december 30, 1999, and includes all the utility patents granted during that period. The citations file includes all citations made by patents granted in 1975-1999. | 1963-1999 | Bronwyn H. Hall, Jim Bessen, Grid Thoma | ||||||||||||||||
22 | USPTO PatentsView | https://www.patentsview.org/download/ | PatentsView includes US patent data including raw data and disambugations of inventors and assignees, also inventor gender. | Creative Commons Attribution 4.0 International License. | 1963-1999 | Provided at link | N/A | Attribution should be given to PatentsView for use, distribution, or derivative works. | https://github.com/CSSIP-AIR/PatentsView-Code-Snippets/ | |||||||||||||
23 | Microsoft Academic Knowledge Graph | http://ma-graph.org/ | A large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study. The data set is based on the Microsoft Academic Graph and licensed under the Open Data Attributions license. Furthermore, we provide entity embeddings for all 210M represented scientific papers. | Open Data Commons Attribution License (ODC-By) v1.0 | @inproceedings{DBLP:conf/semweb/Farber19, author = {Michael F{\"{a}}rber}, title = "{The Microsoft Academic Knowledge Graph: {A} Linked Data Source with 8 Billion Triples of Scholarly Data}", booktitle = "{Proceedings of the 18th International Semantic Web Conference}", series = "{ISWC'19}", location = "{Auckland, New Zealand}", pages = {113--129}, year = {2019}, url = {https://doi.org/10.1007/978-3-030-30796-7\_8}, doi = {10.1007/978-3-030-30796-7\_8} } | https://github.com/michaelfaerber/makg-linking | Microsoft Academic | |||||||||||||||
24 | IPRoduct | Yes | The IPRoduct project seeks to link innovative goods to the patents upon which they are based. By directly linking products to patents, this project tracks innovation to the point where it meets consumers, the true commercial end point of investments in Science & Technology. The output of the project is a database of linked product-patent pairs that is made publicly available. | Products | ||||||||||||||||||
25 | Replication Data for: Government-funded research increasingly fuels innovation | https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DKESRC | This includes patent level metadata, 1926-1975 (OCRed from USPTO Image PDF files), 1976-2017 (parsed from USPTO HTML files), patent meta data, CPC, geography, agencies, entity size of the patent owner etc, government support categories at patent level and finally, aggregate yearly statistics. (2019-06-02) | CC0 - "Public Domain Dedication" | 1926-1975 and 1975-2017 | Lee Fleming; Hillary Green; Guan-Cheng Li; Matt Marx; Dennis Yao, 2019, "Replication Data for: Government-funded research increasingly fuels innovation", https://doi.org/10.7910/DVN/DKESRC, Harvard Dataverse, V4, UNF:6:kMIqsh3DCvKiKYgMT6/H8A== [fileUNF] | https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DKESRC | |||||||||||||||
26 | Google Patents Public Datasets | https://console.cloud.google.com/marketplace/details/google_patents_public_datasets/google-patents-public-data | Worldwide (100+ countries) bibliographic and USPTO full-text, available via BigQuery. Provided by IFI CLAIMS Patent Services, a worldwide bibliographic and US full-text dataset of patent publications. Updated quarterly. | CC BY 4.0, requires subscription to query API | 1834-present (quarterly) | https://cloud.google.com/blog/topics/public-datasets/google-patents-public-datasets-connecting-public-paid-and-private-patent-data | N/A | “Google Patents Public Data” by IFI CLAIMS Patent Services and Google, used under CC BY 4.0 | patent analysis sample code: https://github.com/google/patents-public-data, source code not accessible | Yes, quarterly | API, Bulk export | Google Patents | ||||||||||
27 | Google Patents number linking API | https://patents.google.com/api/match | Turn an unformatted application or publication number into the DOCDB format publication number | 1834-present (~weekly) | API | Google Patents | ||||||||||||||||
28 | Semantic Scholar Open Research Corpus | http://s2-public-api-prod.us-west-2.elasticbeanstalk.com/corpus/ | Semantic Scholar's records for research papers published in all fields provided as an easy-to-use JSON archive. | ODC-BY | Waleed Ammar et al. 2018. Construction of the Literature Graph in Semantic Scholar. NAACL https://www.semanticscholar.org/paper/09e3cf5704bcb16e6657f6ceed70e93373a54618 | Citation affect | ||||||||||||||||
29 | UVA Darden Global Corporate Patent Dataset (disambiguated assignees) | https://patents.darden.virginia.edu/ | 1980-2017 | https://patents.darden.virginia.edu/documents/DataConstructionDetails_v01.pdf | ||||||||||||||||||
30 | DISCERN patent/compustat crosswalk | https://zenodo.org/record/3709084#.XzbbVn-SlGp | 1976-2015 | Provided at link | ||||||||||||||||||
31 | Patent Citation Similarity | https://storage.googleapis.com/jmk_public/Kuhn-Younge-Marco_Patent_Citation_Similarity_2017-10-23.csv | from Jeff Kuhn | 1976-2017 | Paper: https://ssrn.com/abstract=2714954 | |||||||||||||||||
32 | Patent Scope and Examiner Toughness | https://storage.googleapis.com/jmk_public/Kuhn-Thompson_Patent_Scope_2017-10-23.csv | from Jeff Kuhn | Need to check paper https://ssrn.com/abstract=2977273 | Not unless it’s in the paper | Examiners | ||||||||||||||||
33 | Patent Citation Timing and Source | https://storage.googleapis.com/jmk_public/Kuhn-Younge-Marco_Patent_Citation_Source_and_Timing_2017-09-25.csv | from Jeff Kuhn | 2001-2004 | Not unless it’s in the paper here https://ssrn.com/abstract=2714954 | |||||||||||||||||
34 | Patent Families Dataset | https://storage.googleapis.com/jmk_public/Younge-Kuhn_Patent_Families_2017-09-25.csv | from Jeff Kuhn | 2005-2014 | Not unless it’s in the paper (https://ssrn.com/abstract=2709238) | |||||||||||||||||
35 | Geography of patents | https://www.nature.com/articles/sdata201674 | 1836-1975 | https://www.nature.com/articles/sdata201674#MOESM51 | Petralia, S., Balland, PA. & Rigby, D. Unveiling the geography of historical patents in the United States from 1836 to 1975. Sci Data 3, 160074 (2016). https://doi.org/10.1038/sdata.2016.74 | |||||||||||||||||
36 | Inventor disambiguation | https://dataverse.harvard.edu/dataverse/patent | 1975-2010 | Ronald Lai; Alexander D'Amour; Amy Yu; Ye Sun; Lee Fleming, 2011, "Disambiguation and Co-authorship Networks of the U.S. Patent Inventor Database (1975 - 2010)", https://doi.org/10.7910/DVN/5F1RRI, Harvard Dataverse, V5, UNF:5:RqsI3LsQEYLHkkg5jG/jRg== [fileUNF] | Disambiguation | |||||||||||||||||
37 | Patent-to-article intext citations for 244 journals | https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ZEZWBX | 197?-2015? | Bryan, Kevin, 2019, "In-Text Patent Citation Database Bryan/Ozcan/Sampat Beta version .9", https://doi.org/10.7910/DVN/ZEZWBX, Harvard Dataverse, V2, UNF:6:+28YcwvDoaxFl/9hPXQaSA== [fileUNF] | ||||||||||||||||||
38 | Patent value | https://iu.box.com/patents | Updated Mar 19, 2014 by Noah Stoffman | 1926-2010 | ||||||||||||||||||
39 | Government-funded US patents | https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DKESRC | This includes patent level metadata, 1926-1975 (OCRed from USPTO Image PDF files), 1976-2017 (parsed from USPTO HTML files), patent meta data, CPC, geography, agencies, entity size of the patent owner etc, government support categories at patent level and finally, aggregate yearly statistics. (2019-06-02) | CC0 - "Public Domain Dedication" | 1926-2017 | https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DKESRC | Yes | Lee Fleming; Hillary Green; Guan-Cheng Li; Matt Marx; Dennis Yao, 2019, "Replication Data for: Government-funded research increasingly fuels innovation", https://doi.org/10.7910/DVN/DKESRC | https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DKESRC | N | Bulk | |||||||||||
40 | Geocoding worldwide patent data | https://doi.org/10.7910/DVN/OTTBDX | 30 years | Seliger, Florian; Kozak, Jan; de Rassenfosse, Gaétan, 2019, "Geocoding of worldwide patent data", https://doi.org/10.7910/DVN/OTTBDX, Harvard Dataverse, V5 | Geography | |||||||||||||||||
41 | PatentCity | https://mailchi.mp/e0495246a573/patentcity | Yes | Data coming soon; accessible via Google BigQuery | https://github.com/Antoberge/patent_city | coming soon | ||||||||||||||||
42 | Patent text: code, data, and new measures | https://zenodo.org/record/3515985 | Different open access data files related to the text of USPTO patent documents, including 1) for each US patent a list of processed, cleaned and stemmed keywords, 2) for each patent a list of the 1,000 most similar patents (based on cosine similarity) from the entire population of US patents, 3) for each US patent the average cosine similarity with all prior patents from the previous 5 years, and the average cosine similarity with all later patents in the following 5 years, 4) each new keyword (unigram), bigram (sequence of two adjacent keywords), trigram, and pairwise keyword combination introduced for the first time in history by a US patent, the number of the patent introducing it for the first time, and the total number of patents from the entire population using these new keywords, bigrams, trigrams, and new keyword combinations. | Open Data Commons Attribution License v1.0 | 1969-2018 | https://zenodo.org/record/3515985 | Yes | Arts S, Hou J, Gomez JC. (2020). Natural language processing to identify the creation and impact of new technologies in patent text: code, data, and new measures. Forthcoming Research Policy. (https://doi.org/10.1016/j.respol.2020.104144) | https://github.com/sam-arts/respol_patents_code | Yes | Bulk | patent measures, text, natural language processing, novelty, impact, USPTO, technological progress | Arts S, Hou J, Gomez JC. (2020). Natural language processing to identify the creation and impact of new technologies in patent text: code, data, and new measures. Forthcoming Research Policy. (https://doi.org/10.1016/j.respol.2020.104144) | |||||||||
43 | ||||||||||||||||||||||
44 | ||||||||||||||||||||||
45 | ||||||||||||||||||||||
46 | ||||||||||||||||||||||
47 | ||||||||||||||||||||||
48 | ||||||||||||||||||||||
49 | ||||||||||||||||||||||
50 | ||||||||||||||||||||||
51 | ||||||||||||||||||||||
52 | ||||||||||||||||||||||
53 | ||||||||||||||||||||||
54 | ||||||||||||||||||||||
55 | ||||||||||||||||||||||
56 | ||||||||||||||||||||||
57 | ||||||||||||||||||||||
58 | ||||||||||||||||||||||
59 | ||||||||||||||||||||||
60 | ||||||||||||||||||||||
61 | ||||||||||||||||||||||
62 | ||||||||||||||||||||||
63 | ||||||||||||||||||||||
64 | ||||||||||||||||||||||
65 | ||||||||||||||||||||||
66 | ||||||||||||||||||||||
67 | ||||||||||||||||||||||
68 | ||||||||||||||||||||||
69 | ||||||||||||||||||||||
70 | ||||||||||||||||||||||
71 | ||||||||||||||||||||||
72 | ||||||||||||||||||||||
73 | ||||||||||||||||||||||
74 | ||||||||||||||||||||||
75 | ||||||||||||||||||||||
76 | ||||||||||||||||||||||
77 | ||||||||||||||||||||||
78 | ||||||||||||||||||||||
79 | ||||||||||||||||||||||
80 | ||||||||||||||||||||||
81 | ||||||||||||||||||||||
82 | ||||||||||||||||||||||
83 | ||||||||||||||||||||||
84 | ||||||||||||||||||||||
85 | ||||||||||||||||||||||
86 | ||||||||||||||||||||||
87 | ||||||||||||||||||||||
88 | ||||||||||||||||||||||
89 | ||||||||||||||||||||||
90 | ||||||||||||||||||||||
91 | ||||||||||||||||||||||
92 | ||||||||||||||||||||||
93 | ||||||||||||||||||||||
94 | ||||||||||||||||||||||
95 | ||||||||||||||||||||||
96 | ||||||||||||||||||||||
97 | ||||||||||||||||||||||
98 | Title | URL | I3 member author? | Description | Terms of use | Timeframe | Documentation | Performance/error metrics | Citation | Open-source code | Versioning | API or Bulk downloads | Keywords associated with this dataset | Datasets and publications using this dataset |