ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAGAHAIAJAKALAMANAOAPAQARASATAUAVAW
1
uuidtitleshortnamelocationdescriptioncontributorscitationcostmaintained_bytagsterms_of_usetimeframedocumentationerror_metricssizecodeversioningbigquerydoirelated_publicationsthumbnail_urlrelated_project_shortnamesrelationship_descriptionrecord_superceded_byschema_fieldssalient_fieldslast_edit
2
bd8a562a-ce58-4a61-925d-88f0d0695974PatCitpatcithttps://doi.org/10.5281/zenodo.3710993In-text and front page citations to non-patent literature and in-text patent citations, extracted and parsed. patCit builds on DOCDB, the largest database of Non Patent Literature (NPL) citations. First, we deduplicate this corpus and organize it into 10 categories. Then, we design and apply category specific information extraction models using spaCy. Eventually, when possible, we enrich the data using external domain specific high quality databases. Managed as an open-source, collaboratively maintained project. Cyril Verluise, Gabriele Cristelli, Kyle Higham, Lucas Violon, Gaétan de RassenfosseCyril Verluise, Gabriele Cristelli, Kyle Higham, Lucas Violon, & Gaétan de Rassenfosse. (2020). PatCit: A Comprehensive Dataset of Patent Citations (Version 0.3.1) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4391095NoneCyril Verluisecitation, scholarly literature, in-text, front-page, patent, science, database, WikipediaCC-BY 4.0 International1836-2018https://cverluise.github.io/PatCit/yeshttps://cverluise.github.io/notebookYeshttps://console.cloud.google.com/bigquery?project=patcit-public-data&p=patcit-public-data&page=projecthttps://doi.org/10.5281/zenodo.3710993https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3754772rons, lensDOI, npl_cat_language_flag, wg, tsg, meeting, hostname, publication_date, reference_count, patcit_id, date, acc_num, body, name, ISSN, journal_title_abbrev, hash_id, language_code, source, ref, inpadoc_family_id, PMCID, tech, docdb_family_id, PMID, language_is_reliable, URL, npl_cat, author, issue, page, volume, is_referenced_by_count, funder, ISBN, subject, cited_by, npl_cat_score, event, institution, item, version, url, type, bibref_score, pat_publn_id, npl_publn_id, md5, title, journal_title, tdoc_num, appln_id, is_cited_by_count, publication_number, abstract, reference_doi, citation04/13/2022, 12:40:04
3
e65da1db-6608-4246-98a7-c260dfc28e45Chilean IP and firm datachilean_iphttps://eml.berkeley.edu//~bhhall/Chile_ipdata.htmlIn-text and front page citations to non-patent literature and in-text patent citations, extracted and parsed. patCit builds on DOCDB, the largest database of Non Patent Literature (NPL) citations. First, we deduplicate this corpus and organize it into 10 categories. Then, we design and apply category specific information extraction models using spaCy. Eventually, when possible, we enrich the data using external domain specific high quality databases. Managed as an open-source, collaboratively maintained project. Bronwyn H. HallAbud, M.J., Fink, C., Hall, B. and Helmers, C., 2013. The use of intellectual property in Chile (Vol. 11). WIPO.NoneBronwyn HallChile, trademark squatting, pharmaceuticals, disambiguationnot specified1995-2005https://eml.berkeley.edu//~bhhall/Chile_ipdata/chile_inno_ip.txtMon, 28 Mar 2022 19:59:46 GMT
4
50fbdb5a-1288-46e9-b93d-27ac99cd4eb2The scientific knowledge base of low carbon energy technologies (updated and extended version)low_carbon_knowledgehttps://doi.org/10.4119/unibi/2950291This data publication offers updated data about low-carbon energy technology (LCET) patents and citations links to the scientific literature. Compared to a previous version, it also contains data on biofuels and fuels from waste technologies. The updated version also contains the code (R-scripts) that have been used to (1) compile the data and (2) to reproduce the statistical analysis including figures and tables presented in the final paper Hötte, Pichler, Lafond (2021): "The rise of science in low-carbon energy technologies", RSER. DOI: 10.1016/j.rser.2020.110654. Hötte K, Lafond F, Pichler AHötte, Pichler, Lafond (2021): "The rise of science in low-carbon energy technologies", RSER. DOI: 10.1016/j.rser.2020.110654Nonecitation, scholarly literature, low-carbon energy technologiesCC BY 4.0 license. See: https://creativecommons.org/licenses/by/4.0/legalcode 1836-2019https://doi.org/10.4119/unibi/2950291NoIncluded in the bulk downloadNohttps://doi.org/10.4119/unibi/2950291Mon, 11 Apr 2022 15:00:10 GMT
5
2a0949bb-2f36-45a7-b4cf-109456cec21dChinese Patent Data Projectchinese_patent_datahttps://sites.google.com/site/sipopdb/cpdp-homeIn this project, patents from China's State Intellectual Property Office (SIPO) are matched to various types of companies. Matching SIPO patents to firms in the Annual Survey of Industrial Enterprises (ASIE) of China's National Bureau of Statistics.Wenlong He, Zi-lin He, Tony W. Tong, Yuchen ZhangNoneZi-lin He, Z.L.He@uvt.nl; Tony W. Tong, tony.tong@colorado.edu; Yuchen Zhang, yzhang54@tulane.edudisambiguation, China, corporate structuresipo_matchingMon, 11 Apr 2022 15:00:15 GMT
6
53f2e34b-8088-42a3-a763-f471c26b5ac6Reliance on Science in Patentingronshttps://zenodo.org/record/3575146#.XfQZMWRKiUkWe introduce an open-access dataset of references from the front pages of patents granted worldwide to scientific papers published since 1800. Each patent-paper linkage is assigned a confidence score, which is characterized in a random sample by false negatives versus false positives. All matches are available for download at http://relianceonscience.org. We outline several avenues for strategy research enabled by these new data. This contains citations from the front pages of worldwide patents to articles in the Microsoft Academic Graph (MAG) from 1800-2020. Matt Marx, Aaron FuegiMarx, Matt and Aaron Fuegi, "Reliance on Science: Worldwide Front-Page Patent Citations to Scientific Articles"NoneMatt Marx, mmarx@cornell.educitation, scholarly literature, front-page, error metricsOpen Data Commons Attribution License v1.01834-2019https://zenodo.org/record/4235193#.X6Fgb5CSm38Yeshttps://github.com/mattmarx/reliance_on_scienceYeshttps://doi.org/10.5281/zenodo.3575146patcit, lensTue, 01 Mar 2022 12:23:09 GMT
7
07ec4549-2429-4e8e-9ee3-6deefca0b075Japanese Patent Officejapanese_patent_officehttps://www.iip.or.jp/e/patentdb/index.html

IIP Patent Database (IIP Patent DB) is a database developed for statistical analysis of patents based on the Japan Patent Office (JPO) “Standardized Data.“ Intellectual Property Institute (IIP) provides the IIP patent DB to further promote patent statistical research.
JPOState that you used: III Patent DBNoneFoundation for Intellectual Property, iip-patentdb@fdn-ip.or.jpJapan, patents, patent officeOnly for use by academic research institutions and other institutions for academic research purposes, cannot be used for commercial purposes.1964-9/2019Fri, 08 Apr 2022 13:39:15 GMT
8
bfc3892d-2170-47ed-b056-a573c845efa5MIT Scholarly Works Over Timemit_scholarlyhttps://lens-public.s3-us-west-2.amazonaws.com/sloan/scholarly/201932/mit_scholarly.zipScholarly works produced by MIT 1950-2018The LensNoneThe Lensscholarly literatureCambia grants you a non-exclusive, non-transferable, revocable, limited license to access and personally use the features of the Service. The conditions by which The Lens data may be used are intended to resonate with the principles of Creative Commons Attribution licenses with a public benefit element.1950-2021https://www.lens.org/lens/search/scholar/analysis?q=&st=true&regex=false&institution.must=Massachusetts%20Institute%20of%20Technology&p=0&n=10&s=score&d=%2B&dateFilterField=publishedYear&dashboardId=189&preview=falselensTue, 12 Apr 2022 17:16:19 GMT
9
265a814e-a4a5-4302-9cc0-0f78cf1c70fcMIT Scholarly Works Cited by Patentsmit_scholarly_citationshttps://lens-public.s3-us-west-2.amazonaws.com/sloan/scholarly/201932/mit_scholarly_cited_by_patents.zipMIT Scholarly Works Cited by Patents 1950-2018The LensNoneThe Lenscitation, scholarly literatureCambia grants you a non-exclusive, non-transferable, revocable, limited license to access and personally use the features of the Service. The conditions by which The Lens data may be used are intended to resonate with the principles of Creative Commons Attribution licenses with a public benefit element.1950-2021https://www.lens.org/lens/labs/dashboardslensTue, 01 Mar 2022 12:23:26 GMT
10
6476ac03-71ee-4480-b2aa-e25871179689Patents Citing MIT Publicationspatents_citing_mithttps://www.lens.org/lens/search/patent/list?collectionId=22790&p=0&n=10This collection encompasses patents that cite the scholarly works of Massachusetts Institute of Technology. The LensNoneThe Lenscitation, scholarly literatureCambia grants you a non-exclusive, non-transferable, revocable, limited license to access and personally use the features of the Service. The conditions by which The Lens data may be used are intended to resonate with the principles of Creative Commons Attribution licenses with a public benefit element.1950-2021https://www.lens.org/lens/labs/dashboardsThu, 02 Dec 2021 13:28:47 GMT
11
a238826e-8135-4b6d-8b59-615fc9769f03Disambiguation and Co-authorship Networks of the U.S. Patent Inventor Databaseco_authorship_disambiguationhttps://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/5F1RRIName disambiguation of US inventors, 1975-2010. Using a Bayesian supervised learning approach, we identify individual inventors from the U.S. utility patent database, from 1975 to the present. An interface to calculate and illustrate patent co-authorship networks and social network measures is also provided. The network representation does not require bounding the social network beforehand. We provide descriptive statistics of individual and collaborative variables and illustrate examples of networks for an individual, an organization, a technology, and a region. The paper provides an overview of the technical algorithms and pointers to the data, code, and documentation, with the hope of further open development by the research community. Ronald Lai, Alexander D'Amour, Amy Yu, Ye Sun, Lee FlemingRonald Lai; Alexander D'Amour; Amy Yu; Ye Sun; Lee Fleming, 2011, "Disambiguation and Co-authorship Networks of the U.S. Patent Inventor Database (1975 - 2010)", https://doi.org/10.7910/DVN/5F1RRI, Harvard Dataverse, V5, UNF:5:RqsI3LsQEYLHkkg5jG/jRg== [fileUNF] NoneContact maintainer through Dataversecoauthor network, disambiguation, United StatesCC0 - "Public Domain Dedication"https://github.com/funginstitute/downloadshttps://doi.org/10.7910/DVN/5F1RRIFri, 03 Dec 2021 22:57:37 GMT
12
3e2ed123-d6c0-46af-8683-e23d64b04efcThe careers and co-authorship networks of U.S. patent-holders, since 1975co_authorship_careershttps://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/YJUNUNThe identification enables construction of social networks based on patent co-authorship. We will eventually provide descriptive statistics of individual and collaborative variables and illustrated examples of networks for an individual, an organization, a technology, and a region. The data and code will be publically available for community use and improvement and will enable updating as frequently as new patents are issued. Ronald Lai, Alexander D'Amour, Lee FlemingRonald Lai; Alexander D'Amour; Lee Fleming, 2010, "The careers and co-authorship networks of U.S. patent-holders, since 1975", https://doi.org/10.7910/DVN/YJUNUN, Harvard Dataverse, V3, UNF:5:daJuoNgCZlcYY8RqU+/j2Q== [fileUNF] NoneContact maintainer through Dataversecoauthor network, United States, social networksCC0 - "Public Domain Dedication"https://doi.org/10.7910/DVN/YJUNUNMon, 06 Dec 2021 17:59:50 GMT
13
00c6f78f-f689-4d50-a965-812bfd528477Penn World Tablespwthttps://www.rug.nl/ggdc/productivity/pwt/?lang=enPWT version 10.0 is a database with information on relative levels of income, output, input and productivity, covering 183 countries between 1950 and 2019. Access to the data is provided in Excel, Stata and online formats.Robert C. Feenstra, Robert Inklaar, Marcel P. TimmerFeenstra, Robert C., Robert Inklaar and Marcel P. Timmer (2015), "The Next Generation of the Penn World Table" American Economic Review, 105(10), 3150-3182, available for download at www.ggdc.net/pwtNoneContact pwt@rug.nlgeograpy, GDP, productivityCC 4.01950-2017https://www.rug.nl/ggdc/docs/pwt100-user-guide-to-data-files.pdfhttps://doi.org/10.15141/S50T0RTue, 05 Apr 2022 09:43:36 GMT
14
068fb03e-642a-4896-b61c-ff6a16251e08Worldwide Count of Priority Patentspriority_patentshttp://www.gder.info/download_wwc_excel.htmlThe goal of the project was to produce a dataset of priority patent applications filed across the globe, allocated by inventor and applicant location.Gaétan de Rassenfosse, Hélène Dernis, Dominique Guellec, Lucio Picci, Bruno van Pottelsberghe de la PotterieDe Rassenfosse, G., Dernis, H., Guellec, D., Picci, L., & van Pottelsberghe de la Potterie, B. (2013). The worldwide count of priority patents: A new indicator of inventive activity. Research Policy, 42(3), 720–737. doi:10.1016/j.respol.2012.11.002 NoneGaétan de Rassenfossepriority patents, location of inventorshttp://www.gder.info/download_wwc_mysql.htmlWed, 01 Dec 2021 19:36:05 GMT
15
6fe3b5e5-93a8-4f07-9331-d9998b9000b8Geocoding of worldwide patent datageocoding_patentshttps://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OTTBDXThe dataset provides geographic coordinates for inventor and applicant locations in 18.8 million patent documents spanning over more than 30 years. The geocoded data are further allocated to the corresponding countries, regions and cities. When the address information was missing in the original patent document, we imputed it by using information from subsequent filings in the patent family. The resulting database can be used to study patenting activity at a fine-grained geographic level without creating bias towards the traditional, established patent offices.Florian Seliger, Jan Kozak, Gaétan de RassenfosseSeliger, Florian; Kozak, Jan; de Rassenfosse, Gaétan, 2019, "Geocoding of worldwide patent data", https://doi.org/10.7910/DVN/OTTBDX, Harvard Dataverse, V5 NoneContact maintainer through Dataversegeography, location of inventors, PATSTATCC0 - "Public Domain Dedication" 30 yearshttps://doi.org/10.1038/s41597-019-0264-6https://github.com/seligerf/Imputation-of-missing-location-information-for-worldwide-patent-dataYhttps://doi.org/10.7910/DVN/OTTBDXhttps://doi.org/10.1038/s41597-019-0264-6Sat, 19 Mar 2022 01:31:56 GMT
16
d76b71a1-2f43-447d-b296-a1b52db6e3d7On the price elasticity of demand for patentspatent_price_elasticityhttp://www.gder.info/download_OBES_data.htmlFees since 1980 at the European (EPO), the US and the Japanese patent offices.Gaétan de Rassenfosse, Bruno van Pottelsberghe de la PotterieRassenfosse, G. de, & Potterie, B. van P. de la. NoneGaétan de Rassenfossepatent demand, United States, Europe, JapanWed, 01 Dec 2021 19:36:06 GMT
17
c66bdabd-a80c-4a7e-b9b9-f706e4ed7395Patents arising from U.S. government fundingus_gov_patentshttps://zenodo.org/record/3369582The 3PFL database links information on patented inventions and scientific publications related to a public procurement contract or a research grant awarded by the U.S. Federal Government to detailed contract-level/grant-level information (e.g., awarding agency, recipient organization, award size). We have combined data from multiple sources, including (but not limited to) the United States Patent and Trademark Office bulk database, the Federal Procurement Database System, the Award Submission Portal (ASP), and the European Patent Office's PATSTAT database. We also provide a link to the scientific publications associated with these patents. The 3PFL database provides rich and original information that opens the door to novel empirical research in the economics of innovation and science. Gaétan de Rassenfosse, Emilio Raiteride Rassenfosse Gaétan, & Emilio Raiteri. (2019). 3PFL: Database of Patents and Publications with a Public-Funding Linkage (Version 1.2) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3369582NoneGaétan de Rassenfosseresearch funding, United StatesCC-BY 4.0 International2000-2019https://doi.org/10.5281/zenodo.3369582Wed, 01 Dec 2021 19:36:07 GMT
18
e390a212-3a92-4d8f-ac4d-ca2c960a36d3PATSTATpatstathttps://www.epo.org/searching-for-patents/business/patstat.html#tab3PATSTAT contains bibliographical and legal event patent data from leading industrialised and developing countries. This is extracted from the EPO’s databases and is either provided as bulk data or can be consulted online. EPOPATSTAT€975.00 - € 1460.00European Patent OfficeEurope, patentsRequires a subscription to accesspatstat cookbook' by Gaétan de Rassenfosse https://onlinelibrary.wiley.com/doi/full/10.1111/1467-8462.12073 Thu, 17 Mar 2022 14:04:04 GMT
19
c39f4844-5ae2-4dcb-bf2c-d6b957125704Lens.orglenshttps://lens.org/Lens serves nearly all of the patent documents in the world as open, annotatable digital public goods that are integrated with scholarly and technical literature along with regulatory and business data. The Lens will allow documents and analyses to be shared and embedded to support open mapping of knowledge-directed innovation. CambiaPlease use the expression 'Enabled by The Lens' or 'Data Sourced from The Lens' and the Lens.org URL.NoneCambia Foundation,
https://about.lens.org/contact-us/
citation, scholarly literatureCambia grants you a non-exclusive, non-transferable, revocable, limited license to access and personally use the features of the Service. The conditions by which The Lens data may be used are intended to resonate with the principles of Creative Commons Attribution licenses with a public benefit element.Tue, 12 Apr 2022 17:05:29 GMT
20
9c4124ed-5337-4b36-a1c9-7cf256a3384bMicrosoft Academic Graphmaghttps://academic.microsoft.com/homeThe Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences, and fields of study. Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang.Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839 K. Wang et al., “A Review of Microsoft Academic Services for Science of Science Studies”, Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045NoneCurrently in transitioncitation, scholarly literatureODC-BYThu, 02 Dec 2021 11:54:52 GMT
21
233d7290-f32f-46bb-8a6d-8837e59d9ffbCrios‐Patstat Databasecrios_patstathttps://www.icrios.unibocconi.eu/wps/wcm/connect/Cdr/Icrios/Home/Resources/Databases/PATENTS-ICRIOS+database/Disambiguated inventor's and applicant's names for EPO records. A major problem with PATSTAT was that data are provided in a raw format. Data have been therefore thoroughly elaborated by ICRIOS to produce a cleaned and harmonized database: PATENTS-ICRIOS. Data process­ing consisted mainly in a thorough work of clean­ing and standardization of rough in­forma­tion provided by the EPO.
Such work of name standardization has been carried out at the level of individual inventors and applicants.

In addition to this, each patent document also reports further information not included in Patstat, (FI concordance tables to convert IPC codes into more aggregated and manageable technological classes).

Data included in these reports are for EPO patent office only; last update has been released on 10/2016; starting date for EPO applications is 1978, bytheway in many reports by priority date you'll meat earlier dates.
Coffano, M., & Tarasconi, G.Coffano, M., & Tarasconi, G. (2014). CRIOS - Patstat Database: Sources, Contents and Access Rules. SSRN Electronic Journal. doi:10.2139/ssrn.2404344 Nonecrios@unibocconi.itdisambiguation, EuropeEPO Licensehttp://ssrn.com/abstract=2404344Sat, 05 Mar 2022 18:46:31 GMT
22
d9cf4e57-a90e-4d18-8a3b-08fea43a2f49NBER US Patent Data Projectnber_citationhttps://sites.google.com/site/patentdataproject/Home/downloads?authuser=0The main dataset extends from Jan 1, 1963, through december 30, 2006, and includes all the utility patents granted during that period. The citations file includes all citations made by patents granted in 1975-1999.Bronwyn H. Hall, Jim Bessen, Grid ThomaHJT refers to Hall, Bronwyn, Adam Jaffe and Manuel Trajtenberg, "The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools," NBER Working Paper 8498.NoneAdam JaffeUnited StatesThe data in these files are freely available to members of this community. We expect members to inform the community of errors in the data or documentation and to provide fixes/improvements. 1976-2006https://docs.google.com/document/d/1FyDsjZHhq7okHWMBOc_E7EquLUoAwwEZYtxw5M3UDTY/editSun, 01 May 2022 11:51:05 GMT
23
cf1780b1-e265-4e49-8d1d-83b9cfe0fd9aUSPTO PatentsViewpatentsviewhttps://patentsview.org/PatentsView includes US patent data including raw data (summaries, applications, pregrant applications), disambugations of inventors and assignees, and inventor gender estimates. Also foreign priority data, # of figures and sheets, and government interest statements.USPTOAttribution should be given to PatentsView for use, distribution, or derivative works.NoneUSPTOdisambiguation, United States, genderCreative Commons Attribution 4.0 International License.1963-1999https://patentsview.org/query/builder-faqshttps://github.com/CSSIP-AIR/PatentsView-Code-Snippets/https://console.cloud.google.com/bigquery?p=patents-public-data&d=patentsview&page=datasetcitation_id, state_fips, f102_date, classification_status, name_last, lapse_of_patent, disamb_assignee_id_20181127, assignee_id, mainclass_id, rawlocation_id, attribution_status, designation, male_flag, subclass, disamb_inventor_id_20171003, disamb_assignee_id_20190312, section, disamb_inventor_id_20180528, subcategory_id, deceased, length, latitude, uuid, disclaimer_date, city, type, category, sector_title, rule_47, patent_id, disamb_inventor_id_20200331, disamb_inventor_id_20170307, action_date, disamb_assignee_id_20190820, fname, disamb_inventor_id_20181127, filename, id, num_claims, f371_date, subsection_id, reldocno, num_figures, disamb_assignee_id_20191008, disamb_assignee_id_20200331, rawinventor_id, term_grant, disamb_assignee_id_20191231, publication_number, abstract, level_one, _371_date, main_group, disamb_assignee_id_20200929, gi_statement, longitude, subgroup, disamb_inventor_id_20190820, classification_value, subclass_id, sequence, date, organization_id, doc_type, disamb_inventor_id_20170808, contract_award_number, num_sheets, disamb_assignee_id_20200630, county, ipc_version_indicator, disamb_inventor_id_20171226, symbol_position, inventor_id, number, field_title, state, level_two, series_code, subgroup_id, field_id, location_id, lname, disamb_inventor_id_20190312, county_fips, group_id, disamb_inventor_id_20200630, section_id, term_extension, num, relkind, organization, latlong, exemplary, rawassignee_id, country_transformed, role, country, disamb_inventor_id_20201229, male, name, classification_data_source, disamb_inventor_id_20191231, applicant_type, dependent, group, application_id, status, withdrawn, name_first, rel_id, text, kind, classification_level, term_disclaimer, disamb_inventor_id_20191008, category_id, latin_name, disamb_inventor_id_20200929, variety, lawyer_id, _102_date, ipc_class, doctype, title, level_three04/13/2022, 12:40:04
24
6f3605ad-5edb-4a73-8b3b-6d6d35064d4cMicrosoft Academic Knowledge Graphmakghttp://ma-graph.org/A large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study. The data set is based on the Microsoft Academic Graph and licensed under the Open Data Attributions license. Furthermore, we provide entity embeddings for all 210M represented scientific papers.Michael Färber@inproceedings{DBLP:conf/semweb/Farber19, author = {Michael F{\"{a}}rber}, title = "{The Microsoft Academic Knowledge Graph: {A} Linked Data Source with 8 Billion Triples of Scholarly Data}", booktitle = "{Proceedings of the 18th International Semantic Web Conference}", series = "{ISWC'19}", location = "{Auckland, New Zealand}", pages = {113--129}, year = {2019}, url = {https://doi.org/10.1007/978-3-030-30796-7\_8}, doi = {10.1007/978-3-030-30796-7\_8} }Nonecitation, scholarly literatureOpen Data Commons Attribution License (ODC-By) v1.0https://github.com/michaelfaerber/makg-linkingWed, 01 Dec 2021 19:36:10 GMT
25
303ce18b-f411-4752-9fe6-d4fcc369f43cIPRoductiproducthttps://iproduct.io/appThe IPRoduct project seeks to link innovative goods to the patents upon which they are based. By directly linking products to patents, this project tracks innovation to the point where it meets consumers, the true commercial end point of investments in Science & Technology. The output of the project is a database of linked product-patent pairs that is made publicly available.

The data is sourced from virtual patent marking web pages. Everyone has seen the ‘patent pending’ notice on some products. Sometimes, manufacturers print the actual patent numbers on products -- ‘physical patent marking'.

The complete database is composed of 800 companies, 1447 web pages, 24463 products, 19815 U.S. patents and 151176 relationships.
Gaétan de RassenfosseNoneGaétan de Rassenfosse, Samuel Arnod-PrinProducts, disambiguation, trademarks, physical patent markingThese data are currently not available for sale. They are available in exchange of credits, which you earn by contributing to the project.https://iproduct.io/app/#/public/page/aboutMon, 07 Feb 2022 13:40:40 GMT
26
50c1e32c-d2f5-4328-be8e-b7f172772a26Replication Data for: Government-funded research increasingly fuels innovationgov_research_fuels_innovationhttps://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DKESRCThis includes patent level metadata, 1926-1975 (OCRed from USPTO Image PDF files), 1976-2017 (parsed from USPTO HTML files), patent meta data, CPC, geography, agencies, entity size of the patent owner etc, government support categories at patent level and finally, aggregate yearly statistics. (2019-06-02) Lee Fleming, Hillary Green, Guan-Cheng Li, Matt Marx, Dennis YaoLee Fleming; Hillary Green; Guan-Cheng Li; Matt Marx; Dennis Yao, 2019, "Replication Data for: Government-funded research increasingly fuels innovation", https://doi.org/10.7910/DVN/DKESRC, Harvard Dataverse, V4, UNF:6:kMIqsh3DCvKiKYgMT6/H8A== [fileUNF] NoneContact maintainer through Dataverseresearch funding, United StatesCC0 - "Public Domain Dedication"1926-1975 and 1975-2017https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DKESRChttps://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DKESRCYFri, 17 Dec 2021 03:03:18 GMT
27
d24e8a7e-7d27-4280-9d85-c6598a1b9b8eGoogle Patents Public Datasetsgoogle_patents_publichttps://console.cloud.google.com/marketplace/details/google_patents_public_datasets/google-patents-public-dataWorldwide (100+ countries) bibliographic and USPTO full-text, available via BigQuery. Provided by IFI CLAIMS Patent Services, a worldwide bibliographic and US full-text dataset of patent publications. Updated quarterly.Google Patents“Google Patents Public Data” by IFI CLAIMS Patent Services and Google, used under CC BY 4.0NoneGoogle Patents https://patents.google.com/Google PatentsCC BY 4.0, requires subscription to query API1834-present (quarterly)https://cloud.google.com/blog/topics/public-datasets/google-patents-public-datasets-connecting-public-paid-and-private-patent-datapatent analysis sample code: https://github.com/google/patents-public-data, source code not accessibleYes, quarterlyhttps://console.cloud.google.com/bigquery?p=patents-public-data&d=patents&page=datasetapplication_kind, family_id, fi, child, uspc, kind_code, spif_application_number, priority_claim, description_localized_html, citation, publication_date, claims_localized_html, ipc, assignee, assignee_harmonized, locarno, parent, abstract_localized, spif_publication_number, pct_number, title_localized, entity_status, art_unit, inventor_harmonized, application_number, application_number_formatted, cpc, claims_localized, country_code, grant_date, examiner, filing_date, fterm, publication_number, priority_date, description_localized, inventor04/13/2022, 12:40:04
28
ff4ffcf9-5721-4148-ac59-140b9ed4dab5Semantic Scholar Open Research Corpussem_scholar_open_researchhttps://api.semanticscholar.org/corpusSemantic Scholar's records for research papers published in all fields provided as an easy-to-use JSON archive. Waleed Ammar, Dirk Groneveld, +20 authorsWaleed Ammar et al. 2018. Construction of the Literature Graph in Semantic Scholar. NAACL https://www.semanticscholar.org/paper/09e3cf5704bcb16e6657f6ceed70e93373a54618 NoneSemantic Scholar, feedback@semanticscholar.orgcitation, scholarly literatureODC-BYThu, 27 Jan 2022 16:31:50 GMT
29
e80542a8-a9bb-4205-8364-c0e9f3a2b683UVA Darden Global Corporate Patent Dataset (disambiguated assignees)uva_global_corporate_patentshttps://patents.darden.virginia.edu/The dataset has information on about 3 million USPTO patents, which were granted between 1980 and 2017, assigned to publicly listed companies worldwide, and linked to those assignee companies using the following identifiers: Unique Patent Number, as given by the USPTO, GVKEY, as the firm identifier, from the S&P Compustat Global database. Jan Bena, Miguel A. Ferreira, Pedro Matos, Pedro PiresJan Bena, Miguel A. Ferreira, Pedro Matos, and Pedro Pires. "Are foreign investors locusts? The long-term effects of foreign institutional ownership." Journal of Financial Economics 126, no. 1 (2017): 122-146NoneGCPD@darden.virginia.eduUnited States, disambiguationCC BY-NC 4.0 Attribution-NonCommercial 4.0 International1980-2017https://patents.darden.virginia.edu/documents/DataConstructionDetails_v01.pdfTue, 26 Apr 2022 18:17:17 GMT
30
f2fcc603-7883-4e18-a82a-6275ffd82e98DISCERN: Duke Innovation & SCientific Enterprises Research Network discernhttps://doi.org/10.5281/zenodo.3594642Patents (as well as scientific articles, and NPL citations at the aggregate firm-level) matched to U.S. Compustat firms over the period 1980-2015. In extending the match to Compustat up to 2015, we address two major challenges: name changes and ownership changes. Our UO and subsidiary historical standardized firm name lists, including the dynamic reassignment, are publicly available for researches to match to their database of interest.Ashish Arora, Sharon Belenzon, Lia SheerArora Ashish, Belenzon Sharon, and Sheer Lia, 2021. "Knowledge spillovers and corporate investment in scientific research". American Economic Review, 111(3), pp.871-98.
Arora Ashish, Belenzon Sharon, and Sheer Lia, 2021. "Matching patents to Compustat firms, 1980–2015: Dynamic reassignment, name changes, and ownership structures". Research Policy, 50(5), p.104217.
NoneLia SheerCompustat, Patents, Publications, NPL, Name changes, Dynamic reassignment, GVKEY, Disambiguation1980-2015https://doi.org/10.5281/zenodo.3594642Yeshttps://doi.org/10.5281/zenodo.4320782Sun, 03 Apr 2022 10:30:08 GMT
31
f1a7dfa7-c1f0-4414-a6b9-5a0f0d0e37f1Patent Citation Similaritypatent_citation_similarityhttps://storage.googleapis.com/jmk_public/Kuhn-Younge-Marco_Patent_Citation_Similarity_2017-10-23.csvMany studies of innovation rely on patent citations to measure intellectual lineage and impact. To create this dataset, we use a vector space model of patent similarity to compute the technological similarity between each pair of citing-cited patents. The VSM model analyzes the full text of each document to position it as a vector in a vector space that includes more than 700,000 dimensions and then calculates the angular distance between the two vectors. The dataset includes similarity values for all citations made by patents issued between 1976 and 2017 to issued patents or published patent applications.Jeffrey Kuhn, Kenneth Younge, Alan Marco Kuhn, Jeffrey M. and Younge, Kenneth A. and Marco, Alan C., Patent Citations Reexamined (June 24, 2019). RAND Journal of Economics, Forthcoming, Available at SSRN: https://ssrn.com/abstract=2714954 or http://dx.doi.org/10.2139/ssrn.2714954 NoneJeff Kuhnsimilarity, citationThese datasets are provided to the public subject to the Creative Commons Attribution-NonCommercial-NoDerivatives license. No co‑authorship is required to use the data in academic research — please just cite the supporting article.1976-2017https://ssrn.com/abstract=2714954https://ssrn.com/abstract=2714954Tue, 11 Jan 2022 08:14:01 GMT
32
b547441d-efdd-4b30-8c78-852d68c9c2acPatent Scope and Examiner Toughnesspatent_scope_toughnesshttps://storage.googleapis.com/jmk_public/Kuhn-Thompson_Patent_Scope_2017-10-23.csvThis dataset includes an easy-to-use measure of patent scope that is grounded both in patent law and in the practices of patent attorneys. Our measure counts the number of words in the patents’ first claim. The longer the first claim, the less scope a patent has. This is because a longer claim has more details – and all those details must be met for another invention to be infringing. Hence, the more details there are in the patent, the greater are the opportunities for others to invent around it. We validate our measure by showing both that patent attorneys’ subjective assessments of scope agree with our estimates, and that the behavior of patenters is consistent with it. To facilitate drawing causal inferences with our measure, we show how it can be used to create an instrumental variable, patent examiner Scope Toughness, which we also validate.Jeffrey Kuhn, Neil Thompson Kuhn, Jeffrey M. and Thompson, Neil, How to Measure and Draw Causal Inferences with Patent Scope (October 9, 2017). International Journal of the Economics of Business, 26(1) 5-38 (2019), Kenan Institute of Private Enterprise Research Paper No. 19-29, Available at SSRN: https://ssrn.com/abstract=2977273 or http://dx.doi.org/10.2139/ssrn.2977273 NoneJeff KuhnExaminers, patent scope, legal, assessmentThese datasets are provided to the public subject to the Creative Commons Attribution-NonCommercial-NoDerivatives license. No co‑authorship is required to use the data in academic research — please just cite the supporting article.Need to check paper https://ssrn.com/abstract=2977273https://ssrn.com/abstract=2977273USPTO patent claims datasetThu, 10 Mar 2022 11:55:26 GMT
33
2d88904f-056b-4230-96b4-f70c178d9f88Patent Citation Timing and Sourcepatent_citation_timinghttps://storage.googleapis.com/jmk_public/Kuhn-Younge-Marco_Patent_Citation_Source_and_Timing_2017-09-25.csvInnovation studies frequently distinguish between patent citation submitted by the patent examiner and those submitted by the patent application. However, publicly available citations data is often misleading, for instance by attributing a patent citation to the patent examiner when it was in fact first submitted by the patent application. This dataset uses internal USPTO data to identify the date on which each citation was first submitted as well as the party (examiner or applicant) who first submitted it. The dataset includes observations for citations made by patents issued 2001-2014, although some level of leftward truncation is evident due to limitations in internal data availability at the USPTO.Jeffrey Kuhn, Kenneth Younge, Alan Marco Kuhn, Jeffrey M. and Younge, Kenneth A. and Marco, Alan C., Patent Citations Reexamined (June 24, 2019). RAND Journal of Economics, Forthcoming, Available at SSRN: https://ssrn.com/abstract=2714954 or http://dx.doi.org/10.2139/ssrn.2714954 NoneJeff Kuhntiming, citation, United StatesThese datasets are provided to the public subject to the Creative Commons Attribution-NonCommercial-NoDerivatives license. No co‑authorship is required to use the data in academic research — please just cite the supporting article.2001-2014https://ssrn.com/abstract=2714954https://ssrn.com/abstract=2714954Thu, 02 Dec 2021 17:29:45 GMT
34
eaee5eaa-985b-4ba5-a13a-797d3cfeef1fPatent Families Datasetpatent_familieshttps://storage.googleapis.com/jmk_public/Younge-Kuhn_Patent_Families_2017-09-25.csvPatent applicants frequently file groups of patent applications linked together by priority claims. These priority claims create families of patent applications that share features such as inventors, priority dates, and technical descriptions. By analyzing these linkages, each patent can be assigned a family identifier that it shares with other patents in the same family. This data set includes two levels of family identifiers (clone for near copies, and extended for more attenuated linkages) for each patent issued 2005-2014Kenneth Younge, Jeffrey KuhnYounge, Kenneth A. and Kuhn, Jeffrey M., Patent-to-Patent Similarity: A Vector Space Model (July 30, 2016). Available at SSRN: https://ssrn.com/abstract=2709238 or http://dx.doi.org/10.2139/ssrn.2709238 NoneJeff Kuhnpatent family, similarityThese datasets are provided to the public subject to the Creative Commons Attribution-NonCommercial-NoDerivatives license. No co‑authorship is required to use the data in academic research — please just cite the supporting article.2005-2014https://ssrn.com/abstract=2709238https://ssrn.com/abstract=2709238Thu, 02 Dec 2021 17:29:51 GMT
35
f1561d9b-8512-470f-abed-557d6e3e19adPatent-to-article intext citations for 244 journalspatent_to_article_intexthttps://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ZEZWBXThe data contains all articles in 244 journals as described in "In-Text Patent Citations: A User's Guide", and all front-page and in-text citations as found by the algorithm described in this paper. Bryan, Kevin, 2019, "In-Text Patent Citation Database Bryan/Ozcan/Sampat Beta version .9", https://doi.org/10.7910/DVN/ZEZWBX, Harvard Dataverse, V2, UNF:6:+28YcwvDoaxFl/9hPXQaSA== [fileUNF]NoneKevin Bryan, http://www.kevinbryanecon.com/in-text, scholarly literature, citation, academic science, diffusionCC0 - "Public Domain Dedication" 197?-2015?http://www.kevinbryanecon.com/UsersGuidetoIntextCitations.pdfWed, 01 Dec 2021 19:37:35 GMT
36
798f092c-3597-41bb-be5d-e5eb15c2b5d3Patent valuepatent_valuehttps://iu.box.com/patentsThe data contains all articles in 244 journals as described in "In-Text Patent Citations: A User's Guide", and all front-page and in-text citations as found by the algorithm described in this paper. Noah StoffmanNoneNoah Stoffman, nstoffma@iu.eduscientific value, economic growth, United States1926-2010Mon, 28 Mar 2022 19:59:28 GMT
37
131e13f8-342c-4dd7-a3e6-fbf5a5ba6a5cPatentCitypatentcityhttps://mailchi.mp/e0495246a573/patentcityPatentCity is a dataset on the location of patentees since the 19th century in Germany, France, Great Britain and the United States of America. Beta available for test! Drop us a mail if you are interested in becoming a beta tester.Antonin Bergeaud, Cyril VerluiseNoneAntonin Bergeaudlocation of inventors, geography, Europe, United Stateshttps://github.com/Antoberge/patent_cityThu, 02 Dec 2021 13:36:38 GMT
38
44f33a6f-5099-4481-abed-af9aadf0bd4fPatent text: code, data, and new measurespatent_text_new_measureshttps://zenodo.org/record/3515985Different open access data files related to the text of USPTO patent documents, including 1) for each US patent a list of processed, cleaned and stemmed keywords, 2) for each patent a list of the 1,000 most similar patents (based on cosine similarity) from the entire population of US patents, 3) for each US patent the average cosine similarity with all prior patents from the previous 5 years, and the average cosine similarity with all later patents in the following 5 years, 4) each new keyword (unigram), bigram (sequence of two adjacent keywords), trigram, and pairwise keyword combination introduced for the first time in history by a US patent, the number of the patent introducing it for the first time, and the total number of patents from the entire population using these new keywords, bigrams, trigrams, and new keyword combinations.Arts S, Hou J, Gomez JC. (2020). Natural language processing to identify the creation and impact of new technologies in patent text: code, data, and new measures. Forthcoming Research Policy. (https://doi.org/10.1016/j.respol.2020.104144)Sam Artspatent measures, text, natural language processing, novelty, impact, USPTO, technological progressOpen Data Commons Attribution License v1.01969-2018https://zenodo.org/record/3515985Yeshttps://github.com/sam-arts/respol_patents_codeYeshttps://doi.org/10.5281/zenodo.3515985Arts S, Hou J, Gomez JC. (2020). Natural language processing to identify the creation and impact of new technologies in patent text: code, data, and new measures. Forthcoming Research Policy. (https://doi.org/10.1016/j.respol.2020.104144)Thu, 12 May 2022 16:43:59 GMT
39
e22dcf03-9504-48c7-9cb4-468d98ec2bb2Matched inventor ages from patents, based on web scraped sourcesmatched_inventor_ageshttps://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/YRLSKU
We use information about U.S. residing inventors from patents which include name and location and search for age and date of death information from publicly available online web directories and build a scoring system to indicate the quality of information that we collect. After applying a variety of heuristics and robustness checks, we find 1,508,676 inventor ages. We also find the death dates of 206,589 inventors, though are not as confident in its accuracy.
Mary Kaltenberg, Adam Jaffe
@article{kaltenberg_matched_2021,
title = {Matched inventor ages from patents, based on web scraped sources},
url = {https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/YRLSKU},
doi = {10.7910/DVN/YRLSKU},
abstract = {We use information about U.S. residing inventors from patents which include name and location and search for age and date of death information from...},
language = {en},
urldate = {2021-08-12},
author = {Kaltenberg, Mary and Jaffe, Adam and Lachman, Margie E.},
month = may,
year = {2021},
note = {type: dataset},
}
Mary KaltenbergInventors, Ages, Gender, Death Dates, Patents, United StatesCC0 - "Public Domain Dedication" Mon, 11 Apr 2022 15:23:22 GMT
40
fddedcfc-9f4e-47c6-bc82-3e04bb3c4262Newpaper.com Indexnewspaper_comhttps://elisabethperlman.net/code.htmlIndex of newspaper.com articlesBitsy Perlman NoneBitsy Perlman Thu, 02 Dec 2021 13:36:58 GMT
41
1f556a96-61fc-4d4c-a046-ed711d9807f9Long-Term Productivity databaselong_term_productivityhttp://longtermproductivity.com/download.htmlThe Long-Term Productivity database was created as a project at the Bank of France in 2013 by Antonin Bergeaud, Gilbert Cette and Remy Lecat. Following the work of Cette, Mairesse and Kocoglu (2009), we extended the database to include 17 countries in the latest version (2016). The latest version of the database includes the following countries -- Australia, Belgium, Canada, Denmark, Germany, Finland, France, Italy, Japan, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, United Kingdom, United States. We offer data on Total Factor Productivity per hour worked, Labor productivity per hour worked, capital intensity and GDP per capita. These series cover at least the period 1890 to present annually. In addition, other data corresponding to each of the papers linked to this project are available. This includes age of capital stock, education attainment, electricity production per capita. NoneAntonin Bergeaudproductivity, Europe, United States, GDPYou are free to use the data for non-commercial use.1890-2020 08/16/2021, 13:46:40
42
410dd9de-2520-4f57-a409-0ade7ec11b65Collection of Historical Data on the Uses of Petroleum International Networkuses_of_petroleumhttp://www.longtermproductivity.com/chdupin/ The research project CH.DUPIN (Collection of Historical Data on the Uses of Petroleum International Network) aims at gathering historical data on oil consumption for many countries.

The current dataset contains yearly information on oil consumption, oil consumption per capita and oil consumption per unit of GDP for 16 OECD countries from 1890.
NoneAntonin Bergeaudpetroleum, oil consumptionYou are free to use the data for non-commercial use. We only ask you to cite the associated articles:
Oil data: Bergeaud and Lepetit (2020): Research program CH.DUPIN, a short note (link)
GDP data: Bergeaud, A., Cette, G. and Lecat, R. (2016): "Productivity Trends in Advanced Countries between 1890 and 2012," Review of Income and Wealth, vol. 62(3), pages 420–444.
1890-2012 Sat, 04 Dec 2021 14:13:06 GMT
43
bf073285-5243-4dc6-a990-c8a8c3f79898Classification Data for "Classifying Patents Based on their Semantic Content"classifying_patents_semantic_contenthttps://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ZULMOYAn open consolidated database from raw data on 4 million patents taken from the US patent office from 1976 onward. To build the pattern network, not only do we look at each patent title, but we also examine their full abstract and extract the relevant keywords accordingly. We refer to this classification as semantic approach in contrast with the more common technological approach which consists in taking the topology when considering US Patent office technological classes.
@article{bergeaud_classification_2017,
title = {Classification {Data} for "{Classifying} {Patents} {Based} on their {Semantic} {Content}"},
url = {https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ZULMOY},
abstract = {Classification Data for Bergeaud, Potiron and Raimbault, 2017, Classifying Patents Based on their Semantic Content.},
language = {en},
urldate = {2021-08-17},
author = {Bergeaud, Antonin and Yoann, Potiron and Raimbault, Juste},
month = apr,
year = {2017},
note = {type: dataset},
}
NoneContact maintainer through DataverseUnited States, patents, similarity Mon, 21 Feb 2022 16:25:58 GMT
44
f61ebc77-4082-43c5-ae60-383a756ce308List of USPTO patents from US universitiesus_university_patentshttps://sites.google.com/site/abergeaudeco/data?authuser=0Using cross-state panel and cross-U.S. commuting-zone data to look at the relationship between innovation, top income inequality and social mobility. From the paper "Innovation and Top Income Inequality" (Aghion, Akcigit, Bergeaud, Blundell, Hémous). This dataset lists all USPTO patents from 1969 to 2016 whose assignee is a univeristy and give the name and state of this university (originally taken from USPTO and improved). NoneContact maintainer through Dataverseinequality, geography, social mobility, patents 1969-2016 https://doi.org/10.1093/restud/rdy027https://academic.oup.com/restud/article/86/1/1/5026613Fri, 03 Dec 2021 20:14:51 GMT
45
fb81106d-3933-488b-acd9-aff177f82423HistPat International Datasethistpat_internationalhttps://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/QT4OJSHistPat International provides the geography of historical patents granted to foreign nationals by the United States Patent and Trademark Office (USPTO) from 1836 to 1975. This historical dataset is constructed using digitalized records of original patent documents that are publicly available. HistPat can be used in different disciplines ranging from geography, economics, history, network science, and science and technology studies. Additionally, it can easily be merged with post-1975 USPTO digital patent data to extend it until today.
@article{petralia_histpat_2019,
title = {{HistPat} {International} {Dataset}},
url = {https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/QT4OJS},
doi = {10.7910/DVN/QT4OJS},
abstract = {HistPat International provides the geography of historical patents granted to foreigns by the United States Patent and Trademark Office (USPTO) fro...},
language = {en},
urldate = {2021-08-17},
author = {Petralia, Sergio},
month = mar,
year = {2019},
note = {type: dataset},
}
NoneContact maintainer through DataverseHistorical Patents, Technological Change, Inventions, Geography, Economics Thu, 02 Dec 2021 17:15:27 GMT
46
40f30ff4-d152-4aa8-89a9-e31dddc812dcHistPat Datasethistpathttps://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BPC15WHistPat provides the geography of historical patents granted by the United States Patent and Trademark Office (USPTO) from 1790 to 1975. This historical dataset is constructed using digitalized records of original patent documents that are publicly available. HistPat can be used in different disciplines ranging from geography, economics, history, network science, and science and technology studies. Additionally, it can easily be merged with post-1975 USPTO digital patent data to extend it until today. (2016-05-23)
@article{petralia_histpat_2019,
title = {{HistPat} {Dataset}},
url = {https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BPC15W},
doi = {10.7910/DVN/BPC15W},
abstract = {HistPat provides the geography of historical patents granted by the United States Patent and Trademark Office (USPTO) from 1790 to 1975. This histo...},
language = {en},
urldate = {2021-08-24},
author = {Petralia, Sergio and Balland, Pierre-Alexandre and Rigby, David},
month = jan,
year = {2019},
note = {type: dataset},
}
NoneContact maintainer through DataverseHistorical Patents, Technological Change, Inventions, Geography, Economics 10.7910/DVN/BPC15W Mon, 04 Apr 2022 18:08:13 GMT
47
9651d1f2-3c24-46ef-9ade-e2e31f4ffe12BACIbacihttp://www.cepii.fr/CEPII/en/bdd_modele/presentation.asp?id=37BACI provides disaggregated data on bilateral trade flows for more than 5000 products and 200 countries. Pierre Cotterlaz, baci@cepii.frtrade, globalBACI is freely available to anyone, after a quick registration. http://www.cepii.fr/DATA_DOWNLOAD/baci/doc/DescriptionBACI.html https://beta.asef.org/images/stories/partners/CEPII_logo_mappemonde.jpgThu, 02 Dec 2021 18:01:38 GMT
48
1b372a68-18ae-45e3-9a28-a6feecc3e7b8Chinese Patent Data Project Dataversesipo_matchinghttps://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/CF1IXO
Matching SIPO patents to Chinese listed firms ("Main Board"). Please refer to the user documentation "Chinese Patent Database User Documentation: Matching SIPO Patents to Chinese Publicly-Listed Companies and Subsidiaries" for more details about this dataset.

@article{he_matching_2019,
title = {Matching {SIPO} patents to {Chinese} listed firms ("{Main} {Board}")},
url = {https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/CF1IXO},
doi = {10.7910/DVN/CF1IXO},
abstract = {Matching SIPO patents to Chinese listed firms ("Main Board"). Please refer to the user documentation "Chinese Patent Database User Documentation: M...},
language = {en},
urldate = {2021-08-17},
author = {He, Zi-Lin and Tong, Tony and Zhang, Yuchen and He, Wenlong},
month = dec,
year = {2019},
note = {type: dataset},
}
Contact maintainer through DataverseChina, SIPO, disambiguation, patents, firms through 2016?https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/QUH8KT Tue, 01 Mar 2022 12:17:46 GMTTue, 01 Mar 2022 12:21:44 GMT
49
5ab54caa-f53c-4537-8dac-8bf20cab594eGPT Indicatorsgpt_indicatorshttps://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/PQGHKAThis database contains yearly technology-level measures of Growth, Use Complementarity (UC) and Innovation Complementarity (IC) since 1920 for all technological classes in the United States Patent and Trademark Office (USPTO) classification system, as described in the article entitled "Mapping General Purpose Technologies with Patent Data". (2020-03-06) Sergio Petralia
@article{petralia_gpt_2020,
title = {{GPT} {Indicators}},
url = {https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/PQGHKA},
doi = {10.7910/DVN/PQGHKA},
abstract = {This database contains yearly technology-level measures of Growth, Use Complementarity (UC) and Innovation Complementarity (IC) since 1920 for all ...},
language = {en},
urldate = {2021-08-17},
author = {Petralia, Sergio},
month = mar,
year = {2020},
note = {type: dataset},
}
NoneSergio Petralia (contact maintainer through Dataverse)growth, Use Complementarity, Innovation Complementarity, technology, patents, metrics CC0 - "Public Domain Dedication" 1920-2020https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/PQGHKA/KZDEBE&version=1.0 https://ideas.repec.org/p/egu/wpaper/2027.htmlThu, 02 Dec 2021 17:15:28 GMT
50
fb46d05b-2bd9-41fc-a739-91b77a2e85d6Imputation of missing applicant country codes in worldwide patent datamissing_applicant_codeshttps://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/XNTL0WWe present a general method for imputing missing information in the Worldwide Patent Statistical Database (PATSTAT) and make the resulting datasets publicly available. The PATSTAT database is the de facto standard for academic research using patent data. Complete information on patents is essential to obtain an accurate picture of technological activities across countries and over time. However, the coverage of the database is far from complete. Our data imputation method exploits detailed institutional knowledge about the international patent system, and we codify it in a SQL algorithm. We provide two datasets related to the imputation of missing country codes and missing technology classification. We also release the algorithm that can be easily adapted to impute other pieces of information that are missing in PATSTAT.
@article{seliger_imputation_2020,
title = {Imputation of missing applicant country codes in worldwide patent data},
url = {https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/XNTL0W},
doi = {10.7910/DVN/XNTL0W},
abstract = {The file ctry\_app\_person.txt contains identifiers for patent first filings and the applicant (corresponding to appln\_id and person\_id in PATSTAT) a...},
language = {en},
urldate = {2021-08-17},
author = {Seliger, Florian},
month = oct,
year = {2020},
note = {type: dataset},
}
NoneContact maintainer through DataversePatents, Location of applicants, PATSTAT, Imputation CC0 - "Public Domain Dedication" https://www.sciencedirect.com/science/article/pii/S2352340920314955 https://github.com/seligerf/Imputation-of-missing-location-information-for-worldwide-patent-data https://doi.org/10.7910/DVN/XNTL0W https://doi.org/10.1016/j.dib.2020.106615Thu, 02 Dec 2021 17:15:29 GMT
51
46a031fd-8827-4bab-91b3-b41ca447f152Patent Examination Data Systempedshttps://ped.uspto.gov/peds/#!/PEDS contains the bibliographic, published document and patent term extension data tabs in Public PAIR from 1981 to present. There is also some data dating back to 1935.The data can be accessed by anyone using the web interface or the provided Application Programming Interface (API). PEDS is updated daily and mirrors the data available in the Patent Application Location and Monitoring system (PALM). PEDS provides access to public applications including: published patent applications and patents. PCT applications that have not been published by WIPO. Any applications that have not been released by the USPTO will not be available in PEDS.NoneUSPTOpatentsterms given here: https://www.uspto.gov/sites/default/files/documents/Patent%20Electronic%20System%20Access%20Document_0.pdf1981-2021https://ped.uspto.gov/peds/#!/#%2FuserManualThu, 02 Dec 2021 17:15:29 GMT
52
fc72efb0-8b24-4415-9b50-b0b7f33dc8b4Indian Patent Advanced Search Systemindia_patent_databasehttps://ipindiaservices.gov.in/publicsearchPlatform for accessing indian public patents dataNoneIntellectual Property IndiaIndia, patents https://ipindiaservices.gov.in/PublicSearch/PublicationSearch/HelpNoneNone Thu, 02 Dec 2021 17:15:30 GMT
53
5d387b72-6d6c-4479-8626-e9a1a9b693f7UK IPOuk_ipohttps://www.gov.uk/government/publications/ipo-patent-dataSnapshots of British patent/SPC applications received and subsequently published by the Intellectual Property Office. NoneUK Intellectual Property Office, https://www.gov.uk/government/organisations/intellectual-property-officeUnited Kingdom, patentsOpen Government License 3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/ Mon, 18 Apr 2022 17:49:18 GMT
54
a16242e8-fe81-49eb-bf1d-4df0a1927738Monthly statistics -- Patents, trade marks, and designsuk_ipo_monthlyhttps://www.gov.uk/government/collections/patents-trade-marks-and-designs-monthly-statisticsThese statistics include monthly data for designs, patents, trade marks. NoneUK Intellectual Property Office, https://www.gov.uk/government/organisations/intellectual-property-officeTrademarks, United Kingdom, designOpen Government License 3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/ Tue, 10 May 2022 21:47:11 GMT
55
29154d41-30ef-4539-b428-819ca4c66965Open Sourced Database for CEO Dismissal 1992-2018ceo_dismissalhttps://zenodo.org/record/5348198This is a database of qualitatively coded reasons for a CEO’s dismissal, for S&P 1500 Companies. The maintainers of this dataset run a mailing list with a signup [here](https://docs.google.com/forms/d/e/1FAIpQLSfiZZHwyeWYEZ5fOT1_RygH-ComG9ltad5IUUY60Fsw9z3hZg/viewform)
@misc{richard_j._gentry_open_2021,
title = {Open {Sourced} {Database} for {CEO} {Dismissal} 1992-2018},
url = {https://zenodo.org/record/4618103},
abstract = {There is a newer version of this database - please check the right-hand navigation for the latest version...},
urldate = {2021-09-02},
publisher = {Zenodo},
author = {{Richard J. Gentry} and {Joseph Harrison} and {Timothy Quigley} and {Steven Boivie}},
month = feb,
year = {2021},
doi = {10.5281/zenodo.4618103},
note = {type: dataset},
keywords = {CEO Dismissal, Management, Strategic Management},
}
NoneRichard GentryCEO, Dismissal Management, Strategic ManagementOpen Data Commons Attribution License v1.01992-2018Documentation included as a .docx on Zenodo 10.5281/zenodo.4618103 https://onlinelibrary.wiley.com/doi/abs/10.1002/smj.3278Execucomp, https://libguides.uml.edu/wrds/ExecuCompThu, 02 Dec 2021 17:41:04 GMT
56
1a7fc85d-38af-4fe6-83b8-0d629e85d418A large-scale COVID-19 Twitter chatter dataset for open scientific researchcovid_twitter_chatterhttps://zenodo.org/record/5595136Dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. The first 9 weeks of data (from January 1st, 2020 to March 11th, 2020) contain very low tweet counts as we filtered other data we were collecting for other research purposes, however, one can see the dramatic increase as the awareness for the virus spread. Dedicated data gathering started from March 11th yielding over 4 million tweets a day.

The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full dataset, and a cleaned version with no retweets. There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms, the top 1000 bigrams, and the top 1000 trigrams. Some general statistics per day are included for both datasets.

@misc{banda_large-scale_2021,
title = {A large-scale {COVID}-19 {Twitter} chatter dataset for open scientific research - an international collaboration},
url = {https://zenodo.org/record/5458943},
abstract = {Version 78 of the dataset...},
urldate = {2021-09-07},
publisher = {Zenodo},
author = {Banda, Juan M. and Tekumalla, Ramya and Wang, Guanyu and Yu, Jingyuan and Liu, Tuo and Ding, Yuning and Artemova, Katya and Tutubalina, Elena and Chowell, Gerardo},
month = sep,
year = {2021},
doi = {10.5281/zenodo.5458943},
note = {type: dataset},
keywords = {social media, twitter, nlp, covid-19, covid19},
}
NonePanacea Labs, http://www.panacealab.org/covid19/social media, twitter, nlp, covid-19, covid19, twitter, covid, open-source 2000-2018http://www.panacealab.org/covid19/https://github.com/thepanacealab/covid19_twitterYes10.5281/zenodo.5458943
https://doi.org/10.3390/epidemiologia2030024, http://doi.org/10.2196/25108, http://doi.org/10.1002/isaf.1482 Thu, 02 Dec 2021 17:15:32 GMT
57
fcf09f34-d5a8-483d-94a3-09a03c167100Biospolar Antarctic Literature and Patentsbiospolarhttps://osf.io/py6ve/Mapping the scientific and patent landscapes for biodiversity based research and innovation from Antarctica and the Southern Ocean. Created under the Biospolar Project, Research Council of Norway@article{oldham_biospolar_2019,
title = {Biospolar {Antarctic} {Literature} and {Patents}},
url = {https://osf.io/py6ve/},
doi = {10.17605/OSF.IO/PY6VE},
abstract = {Mapping the scientific and patent landscapes for biodiversity based research and innovation from Antarctica and the Southern Ocean. Created under the Biospolar Project, Research Council of Norway (RCN project number 257631/E10)
Hosted on the Open Science Framework},
language = {en},
urldate = {2021-09-10},
author = {Oldham, Paul},
month = may,
year = {2019},
}
NonePaul Oldhamantarctic, krill, biodiversityThe datasets are made available under a Creative Commons Attribution 4.0 International Licence.

When using datasets from the repository please retain the lens_id to meet with the Lens terms of use. The lens_id along with the paperid (Microsoft Academic Graph) are the main unique identifiers for table joins so you will want to keep them anyway.

Data from Microsoft Academic Graph is made available under the Open Data Commons Attribution License (ODC-By) v1.0. If using the data in a product. service or data redistribution please include this url https://aka.ms/msracad as described here. If using in a publication please cite the two articles described here.
https://osf.io/py6ve/wiki/home/ 10.17605/OSF.IO/PY6VE lens.orgThu, 02 Dec 2021 17:15:32 GMT
58
868eaad1-3c6a-4730-a70f-853996962d39US Patent Similarity Dataus_patent_similarityhttps://zenodo.org/record/3552078Pairwise semantic similarity measures for US utility patents. Includes measures for citing/cited patent pairs, 100 most-similar patents for each patent, and doc2vec vectors for each patent.@misc{whalen_us_2019,
title = {{US} {Patent} {Similarity} {Data}},
url = {https://zenodo.org/record/3552078},
abstract = {Pairwise semantic similarity measures for US utility patents. Includes measures for citing/cited patent pairs, 100 most-similar patents for each patent, and doc2vec vectors for each patent.},
urldate = {2021-09-15},
publisher = {Zenodo},
author = {Whalen, Ryan and Lungeanu, Alina and DeChurch, Leslie and Contractor, Noshir},
month = nov,
year = {2019},
doi = {10.5281/zenodo.3552078},
note = {type: dataset},
keywords = {patents, intellectual property, innovation, semantic similarity, empirical legal studies},
}
NoneRyan Whalenpatents, intellectual property, innovation, similarity, legal, patents Creative Commons Attribution 4.0 InternationalYes10.5281/zenodo.3552078
Tue, 01 Mar 2022 17:21:25 GMT
59
eb43fc38-8786-4b0f-b3b8-b9d610f456edPatstat Registerpatstat_registerhttps://www.epo.org/searching-for-patents/business/patstat.htmlThis database contains bibliographic and legal event data on published European and Euro-PCT patent applications.

Like the core PATSTAT database, it is maintained by the EPO, however PATSTAT Register only contains information about patent applications at the European Patent Office (EPO). The information in PATSTAT Register is, however, considerably deeper and more detailed.
€ 1.420,00 - € 1.460,00EPOEurope, patents, legal, citationRequires a subscription to accesshttps://www.epo.org/searching-for-patents/business/patstat.htmlhttps://github.com/gderasse/patstat_registerWed, 01 Dec 2021 15:19:19 GMT
60
3360e0a5-ee9b-47d3-91df-9348b86af0cfPATENTSCOPEpatentscopehttps://www.wipo.int/patentscope/en/The PATENTSCOPE database provides access to international Patent Cooperation Treaty (PCT) applications in full text format on the day of publication, as well as to patent documents of participating national and regional patent offices.NoneWIPOpatents, legal1978-2021https://patentscope.wipo.int/search/en/help/help.jsf10/13/2021
61
fc08c62e-5eae-4831-9eae-4a59276e29fcWIPO PATENT REGISTER PORTALpatent_registerhttps://www.wipo.int/patent_register_portal/en/index.htmlThe WIPO's Patent Register Portal gives details of the availability of online patent registers by country / jurisdiction, as well as their search functionalities and the type of information they provide.NoneWIPOgeography, index, patents10/13/2021
62
7da1dc8e-9e6c-4a53-9571-1b2f527a5dcdEPO worldwide bibliographic data (DOCDB)docdbhttps://www.epo.org/searching-for-patents/data/bulk-data-sets/docdb.html#tab-1DOCDB is the EPO's master documentation database with worldwide coverage. It contains bibliographic data, abstracts, citations and the DOCDB simple patent family, but no full text or images. € 2.700,00 (main dataset), € 9.100,00 (backfile)EPOpatents, bibliographic data, abstractsavailable through paid subscription, https://www.epo.org/service-support/ordering/raw-data-terms-and-conditions.htmlThu, 02 Dec 2021 13:22:51 GMT
63
1ba76694-1853-4721-88f9-1079418fc3d6European Business Performance Databaseeuropean_business_performancehttps://www.icrios.unibocconi.eu/wps/wcm/connect/Cdr/Icrios/Home/Resources/Databases/EUROPEAN+BUSINESS+PERFORMANCE+database/The European Business Performance database describes the performance of the largest enterprises in the twentieth century. It covers eight countries that together consistently account for above 80 per cent of western European GDP: Great Britain, Germany, France, Belgium, Italy, Spain, Sweden, and Finland. Data have been collected for five benchmark years, namely on the eve of WWI (1913), before the Great Depression (1927), at the extremes of the golden age (1954 and 1972), and in 2000.Nonecrios@unibocconi.itEurope, GDP, productivity1910-2000https://global.oup.com/academic/product/the-performance-of-european-business-in-the-twentieth-century-9780198749776?cc=it&lang=en&10/21 13:35
64
5d36b07b-b6c6-4aac-8181-c540a95dc26fPatentsView Citation datapatentsview_citationshttps://patentsview.org/download/data-download-tablesCitation to foreign patents from US patents (foreigncitation), citation to US patent applications from US patents (usapplicationcitation), citation to US patents from US patents (uspatentcitation), non-patent citations in patents (otherreference)NoneUSPTOUnited States, citationCreative Commons Attribution 4.0 International License.Wed, 01 Dec 2021 19:27:55 GMT
65
da0edeb0-caef-474c-a7f0-0910aac9b6abPatentsView Classification datapatentsview_classificationshttps://patentsview.org/download/data-download-tablesCPC classifications, NBER classifications (to 2015), USP classificiations, WIPO technology fields, Lookup tables (CPC, USPC, WIPO, NBER, US gov. organizations), botanic info for plant patents.NoneUSPTOUnited States, classifications, identifiersCreative Commons Attribution 4.0 International License.10/26/2021
66
5e147b1f-3a6c-4859-acc5-781154954941Lens Labslens_labshttps://www.lens.org/lens/labs/datafacilitiesLinks to datasets, APIs, and toolsNoneLens.org (Cambia)Global, citation, identifiers, productLinks to other resources, each with its own license. Tue, 22 Feb 2022 08:23:48 GMT
67
dcff88bd-fe6b-4fdb-8159-809bf9d7bc1cDimensionsdimensionshttps://www.dimensions.ai/products/free/Dimensions contains more than 100 million publications, ranging from articles published in scholarly journals, books and book chapters, to preprints and conference proceedings. All publications are contextualized with linked data sets, funding, publications, patents, clinical trials, and policy documents. You can also view associated categories, funders, institutions, and researcher profiles.Digital Science, https://www.digital-science.com/Free for personal, non-commercial use.Digital Science, https://www.digital-science.com/scholarly literature, patents, funding, clinical trials, academic profilesUse of both the Dimensions COVID-19 dataset and full Dimensions dataset are subject to the Dimensions Terms of use: https://www.dimensions.ai/policies-terms-legal https://docs.dimensions.ai/bigquery/index.htmlhttps://console.cloud.google.com/bigquery?p=covid-19-dimensions-ai&page=table&d=data&t=publicationsfamily_id, publisher, research_orgs, start_date, granted_date, expiration_date, publication_year, assignee_countries, book_series_title, authors, journal_lists, funding_eur, category_uoa, linkout, associated_publication_doi, current_assignee, license, links, category_hrcs_rac, repository_id, issue, funding_aud, funding_details, gender, cpc, funding_currency, labels, type, conference, doi, established, family_members_ids, funding_amount, funder_countries, repository_url, priority_date, source_id, original_assignee_orgs, acknowledgements, publication_ids, start_year, funding_nzd, id, embargo_date, parent_id, granted_year, original_assignee, investigators, wikipedia_url, external_ids, research_org_state_codes, associated_publication_arxiv_id, grant_number, resulting_publication_ids, associated_publication_pmid, eisbn, category_rcdc, year, volume, date_inserted, category_icrp_cso, end_year, types, research_org_state_names, funder_org_state_codes, email_address, brief_title, filing_date, open_access_categories, abstract, research_org_city_names, priority_year, current_assignee_countries, funding_chf, reference_ids, journal, category_bra, end_date, altmetrics, date_modified, subtitles, mesh_headings, category_sdg, publication_date, organisation_details, jurisdiction, date, acronym, date_imported_gbq, pmid, legal_status, funder_org, date_online, pages, category_icrp_ct, registry, funding_usd, open_access_categories_v2, isbn, foa_number, researcher_ids, category_for, citation_string, resulting_publication_doi, inventor_names, aliases, expiration_year, research_org_countries, metrics, associated_grant_ids, supporting_grant_ids, interventions, funder_org_cities, funder_orgs, clinical_trial_ids, repository_name, citations, original_title, original_abstract, funder_org_acronyms, citations_count, associated_publication_id, acronyms, category_hrcs_hc, research_org_cities, legal_events, assignee_orgs, date_print, research_org_country_names, address, funding_jpy, funding_cad, conditions, funding_gbp, name, concepts, current_assignee_orgs, editors, language, description, status, original_assignee_countries, categories, active_years, funder_org_countries, relationships, category_hra, pmcid, application_number, patent_ids, phase, created_date, ipcr, family_count, arxiv_id, funding_cny, book_title, filing_year, title, filing_status, proceedings_title, date_normal, cited_by_ids, mesh_terms, kind04/13/2022, 12:40:04
68
8bb14de6-ace9-4acb-a1ca-66b6d088a574Google Patents Research Datagoogle_patents_researchhttps://console.cloud.google.com/marketplace/product/google_patents_public_datasets/google-patents-research-dataGoogle Patents Research Data contains the output of much of the data analysis work used in Google Patents (patents.google.com), including machine translations of titles and abstracts from Google Translate, embedding vectors, extracted top terms, similar documents, and forward references.Google Patents, IFI CLAIMS Patent ServicesGoogle Patents Research Data by Google, based on data provided by IFI CLAIMS Patent ServicesNoneGoogle Patents https://patents.google.com/terms, citation, forward references, similarityCreative Commons Attribution 4.0 International LicenseYeshttps://console.cloud.google.com/bigquery?p=bigquery-public-data&d=labeled_patents&page=datasetinvention_type, title_line_1, x_relative_min, class_us, issuer, representative_line_1_eu, inventor_line_1, number, language, application_number, gcs_path, filing_date, y_relative_min, publication_date, priority_date_eu, applicant_line_1, x_relative_max, y_relative_max, class_international04/13/2022, 12:40:04
69
0a69b187-6d79-4ee8-999c-3295571e76dbNBER Economic Indicators and Releasesnber_indicatorshttps://back.nber.org/releases/Regularly-updated and archived index of economic indicators, including interest rates, stock reserves, home sales, labour statistics and productivity. This page is updated Monday-Friday.NBERNoneNBERmetrics, economy, trade, productivity, growth, indicatorsyesSun, 15 May 2022 12:54:11 GMT
70
297f265e-eb23-48aa-b4df-54333ba779abDisclosed Standard Essential Patents Databasedsep_datahttp://ssopatents.org/The OEIDD database provides a full overview of all disclosed IPR at setting organizations world-wide. Based on the archives of thirteen major SSOs as of March 2011, the disclosure data is cleaned, harmonized, and all disclosed USPTO or EPO patents or patent applications are matched against patent identities in the PATSTAT database. Overall, the database contains 46,906 disclosed patents, patent applications or blankets, from 969 different firms, with 14057 USPTO or EPO patents or patent applications identified in PATSTAT, belonging to 4814 different INPADOC patent families and 5337 different DOCDB patent families.
Rudi Bekkers, Christian Catalini, Arianna Martinelli, Timothy Simcoe, Cesare RighiBekkers, R., Catalini, C., Martinelli, A., & Simcoe, T. (2012). Intellectual Property Disclosure in Standards Development. Proceedings from NBER conference on Standards, Patents & Innovation, Tucson (AZ), January 20 and 21, 2012.

Nonedisclosure, standards, patentsAnyone is free to use this data, provided that any paper or report published that uses this data includes the following literature citation:


"Bekkers, R., Catalini, C., Martinelli, A., & Simcoe, T. (2012). Intellectual Property Disclosure in Standards Development. Proceedings from NBER conference on Standards, Patents & Innovation, Tucson (AZ), January 20 and 21, 2012."
Included with filescodebook included in excel filesyeshttps://console.cloud.google.com/bigquery?p=patents-public-data&d=dsep&page=datasetfamily_id, date, committee_project, third_party, blanket_scope, blanket_type, sso, serial_cleaned, tc_name, patent_owner_unharmonized, sc_name, record_id, copyright, wg_name, patent_owner_harmonized, pub_cleaned, reciprocity, licensing_commitment, disclosure_event, standard04/13/2022, 12:40:04
71
4342caa7-23af-420c-b2f6-6088f133df6aUSPTO OCE Patent Examination Research Data (PatEx)patexhttps://www.uspto.gov/ip-policy/economic-research/research-datasets/patent-examination-research-dataset-public-pairThe latest version of PatEx (referred to below as the 2020 release) contains detailed information on nearly 11.9 million publicly-viewable provisional and non-provisional patent applications to the USPTO and over 4.6 million Patent Cooperation Treaty (PCT) applications. It is based on data that OCE downloaded from the Patent Examination Data System (PEDS) in April, 2021. The PEDS data are sourced from Public PAIR. The first time that OCE used PEDS as the basis of PatEx was for the 2019 release. We took the PEDS data and organized it into the familiar PatEx data files, which are based on the organization of the Public PAIR portal. The data files include information on each application’s characteristics, prosecution history, continuation history, claims of foreign priority, patent term adjustment history, publication history, and correspondence address information.Graham, S. Marco, A., Miller, A.Graham, S. Marco, A., and Miller, A. (2015). “The USPTO Patent Examination Research Dataset: A Window on the Process of Patent Examination.”NoneEconomicsData@uspto.govpatents, legal, historyUSPTO’s online databases are not designed or intended to be a source for bulk downloads of USPTO data when accessed through the website’s interfaces. Individuals, companies, IP addresses, or blocks of IP addresses who, in effect, deny or decrease service by generating unusually high numbers of database accesses (searches, pages, or hits), whether generated manually or in an automated fashion, may be denied access to USPTO servers without notice.

Bulk data products may be separately obtained from the USPTO, either for free or at the cost of dissemination. For details, see information on Electronic Bulk Data Products: https://www.uspto.gov/learning-and-resources/electronic-bulk-data-products
For the 2019 and later releases, new technical documentation is available https://www.uspto.gov/sites/default/files/documents/PatEx-2019-Technical-Doc.pdf

A document describing the 2014-2017 data sets is available and can be cited as: Graham, Stuart J.H. and Marco, Alan C. and Miller, Richard, The USPTO Patent Examination Research Dataset: A Window on the Process of Patent Examination (November 30, 2015). Available at SSRN: https://ssrn.com/abstract=2702637.
https://console.cloud.google.com/bigquery?p=patents-public-data&d=uspto_oce_pair&page=datasethttps://ssrn.com/abstract=29956744, https://ssrn.com/abstract=2702637inventor_name_middle, inventor_address_type, file_location, file_location_date, continuation_type, examiner_id, invention_subject_matter, child_application_number, event_code, examiner_name_first, small_entity_indicator, status_description, inventor_region_code, correspondence_country_name, parent_country, uspc_class, earliest_pgpub_number, examiner_name_last, inventor_country_code, inventor_rank, wipo_pub_date, recorded_date, status_code, appl_status_code, correspondence_country_code, correspondence_street_line_2, examiner_name_middle, child_filing_date, sequence_number, patent_issue_date, correspondence_name_line_2, uspc_subclass, foreign_parent_date, appl_status_date, disposal_type, correspondence_region_name, inventor_name_last, inventor_country_name, application_type, parent_application_number, application_number, atty_docket_number, parent_filing_date, correspondence_region_code, earliest_pgpub_date, patent_number, invention_title, event_description, customer_number, application_number_pair, correspondence_street_line_1, parent_country_code, examiner_art_unit, inventor_name_first, correspondence_city, foreign_parent_id, correspondence_postal_code, correspondence_name_line_1, filing_date, confirm_number, aia_first_to_file, abandon_date, wipo_pub_number04/13/2022, 12:40:04
72
8c2b2faf-df08-45f9-9ad1-ddf3ca722b12SureChEMBLsurechemblhttps://www.surechembl.org/search/SureChEMBL is a publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-mining pipeline on a daily basis. SureChEMBL provides access to a previously unavailable, open and timely set of annotated compound-patent associations, complemented with sophisticated combined structure and keyword-based search capabilities against the compound repository and patent document corpus. Currently, the database contains 17 million compounds extracted from 14 million patent documents. G. Papadatos, M. Davies, N. Dedman, J. Chambers, A. Gaulton, J. Siddle, R. Koks, S. A. Irvine, J. Pettersson, N. Goncharoff, A. Hersey, J. P. Overington“SureChEMBL” by the European Bioinformatics Institute (EMBL-EBI), used under CC BY-SA 3.0. G. Papadatos, M. Davies, N. Dedman, J. Chambers, A. Gaulton, J. Siddle, R. Koks, S. A. Irvine, J. Pettersson, N. Goncharoff, A. Hersey, J. P. Overington (2016). SureChEMBL: a large-scale, chemically annotated patent document database. Nucleic Acids Research Database Issue, 44, D1220-D1228, DOI:10.1093/nar/gkv1253, PMID:26582922.NoneEMBL-EBI, an outstation of European Molecular Biology Laboratorybiotechnology, health, chemical, bioinformatics, medicalhttps://www.surechembl.org/terms/http://chembl.blogspot.com/https://console.cloud.google.com/bigquery?p=patents-public-data&d=ebi_surechembl&page=datasethttps://doi.org/10.1093/nar/gkv1253publication_number, patent_id, inchi_key, corpus_frequency, smiles, schembl_id, publication_date, field, field_frequency04/13/2022, 12:40:04
73
5f17a3b2-ecd2-4c45-8d1a-cebd28f41a64MatrixWare Research Collectionmarechttp://www.ifs.tuwien.ac.at/imp/marec.shtmlMAREC Data is a static collection of over 19 million patent applications and granted patents in a unified file format normalized from EP, WO, US, and JP sources, spanning a range from 1976 to June 2008. In MAREC, the documents from different countries and sources are normalized to a common XML format with a uniform patent numbering scheme and citation format. The standardized fields include dates, countries, languages, references, person names, and companies as well as rich subject classifications. It is a comparable corpus, where many documents are available in similar versions in other languages. Nonemarec@fandan.netglobal, patentsCreative Commons Attribution NonCommercial ShareAlike 3.0 Unported License1976-2008https://console.cloud.google.com/bigquery?p=patents-public-data&d=marec&page=datasetpublication_number, publication_number_original, xml, truncated04/13/2022, 12:40:04
74
7d8cda0b-9ee1-47b9-9dca-8adb93206024USPTO OCE Patent Claims Research Datauspto_patent_claimshttps://www.uspto.gov/ip-policy/economic-research/research-datasets/patent-claims-research-datasetThe Patent Claims Research Dataset contain detailed information on claims from U.S. patents granted between 1976 and 2014 and U.S. patent applications published between 2001 and 2014. The dataset is derived from the Patent Application Publication Full-Text and Patent Grant Full Text files, available at https://bulkdata.uspto.gov/, to which the Office of Chief Economist (OCE) applied a Python algorithm to identify individual claims as well as the dependency relationship between claims. From the parsed claims text, OCE created six data files containing individually-parsed claims, claim-level statistics, and document-level statistics, including newly-developed measures of patent scope.Marco, Alan C. and Sarnoff, Joshua D. and deGrazia, Charles, Patent Claims and Patent Scope (October 2016). USPTO Economic Working Paper 2016-04. Available at: SSRN: https://ssrn.com/abstract=2844964NoneEconomicsData@uspto.govfinancial services, scope, economicsUSPTO’s online databases are not designed or intended to be a source for bulk downloads of USPTO data when accessed through the website’s interfaces. Individuals, companies, IP addresses, or blocks of IP addresses who, in effect, deny or decrease service by generating unusually high numbers of database accesses (searches, pages, or hits), whether generated manually or in an automated fashion, may be denied access to USPTO servers without notice.

Bulk data products may be separately obtained from the USPTO, either for free or at the cost of dissemination. For details, see information on Electronic Bulk Data Products: https://www.uspto.gov/learning-and-resources/electronic-bulk-data-products
1976-2014Available at source, including documentation of variableshttps://console.cloud.google.com/bigquery?p=patents-public-data&d=uspto_oce_claims&page=datasethttp://dx.doi.org/10.2139/ssrn.2844964 https://ssrn.com/abstract=2844964pat_dep_wrd_ct, appl_id, cns_ct, claim_no, ind_flg, claim_txt, pub_no, pub_dep_wrd_avg, pub_dep_wrd_ct, dependencies, pub_wrd_ct, pat_dep_wrd_min, word_ct, pub_wrd_min, or_ct, pat_wrd_min, pat_wrd_ct, pat_no, char_ct, pat_clm_ct, pat_dep_clm_ct, pat_dep_wrd_avg, sf_ct, pub_wrd_avg, pub_dep_clm_ct, pub_clm_ct, publication_number, pat_wrd_avg, pub_dep_wrd_min04/13/2022, 12:40:04
75
7c697eb3-2d99-4b44-87cb-d3c7bb0568e1USPTO OCE Patent Assignment Datauspto_patent_assignmenthttps://www.uspto.gov/ip-policy/economic-research/research-datasets/patent-assignment-datasetThe USPTO allows parties to record assignments of patents and patent applications to, as much as possible, maintain a complete history of claimed interests in a patent. The USPTO also permits recording of other documents that affect title (such as certificates of name change and mergers of businesses) or are relevant to patent ownership (such as licensing agreements, security interests, mortgages, and liens). The 2020 update to the Patent Assignment Dataset contains detailed information on 8.97 million patent assignments and other transactions recorded at the USPTO since 1970 and involving roughly 15.1 million patents and patent applications. It is derived from the recording of patent transfers by parties with the USPTO.Marco, Alan C., Graham, Stuart J.H., Myers, Amanda F., D'Agostino, Paul A, Apple, Kirsten"USPTO OCE Patent Assignment Data" by the USPTO, for public use. Marco, Alan C., Graham, Stuart J.H., Myers, Amanda F., D'Agostino, Paul A and Apple, Kirsten, "The USPTO Patent Assignment Dataset: Descriptions and Analysis" (July 27, 2015).NoneEconomicsData@uspto.govpatents, claimsUSPTO’s online databases are not designed or intended to be a source for bulk downloads of USPTO data when accessed through the website’s interfaces. Individuals, companies, IP addresses, or blocks of IP addresses who, in effect, deny or decrease service by generating unusually high numbers of database accesses (searches, pages, or hits), whether generated manually or in an automated fashion, may be denied access to USPTO servers without notice.

Bulk data products may be separately obtained from the USPTO, either for free or at the cost of dissemination. For details, see information on Electronic Bulk Data Products: https://www.uspto.gov/learning-and-resources/electronic-bulk-data-products
1970-2020https://console.cloud.google.com/bigquery?p=patents-public-data&d=uspto_oce_assignment&page=datasethttp://ssrn.com/abstract=2636461reel_no, pgpub_country, admin_pat_no_for_appno, caddress_2, grant_doc_num, ee_address_1, ee_city, ee_name, appno_date, caddress_4, exec_dt, caddress_3, error, file_id, record_dt, last_update_dt, ee_postcode, grant_country, employer_assign, caddress_1, purge_in, page_count, lang, admin_appl_id_for_grant, ack_dt, grant_date, frame_no, convey_ty, rf_id, ee_address_2, cname, ee_state, appno_doc_num, pgpub_date, title, or_name, pgpub_doc_num, appno_country, publication_number, ee_country, convey_text04/13/2022, 12:40:04
76
76d0ee06-c78e-4a5a-ba1a-f0b41378b3cdUSPTO Patent Trial and Appeal Board (PTAB) API Data ptabhttps://developer.uspto.gov/ptab-web/#/search/decisionsUSPTO Patent Trial and Appeal Board (PTAB) API Data contains data from the PTAB E2E (end-to-end) system making public America Invents Action (AIA) Trials information and documents available.

This dataset is hosted as a RESTful API with an easy to use search interface. You can easily browse USPTO PTAB public documents, search for specific content, and request a bulk download of PTAB content. The PTAB API synchronizes close to real time with the PTAB E2E (end-to-end) system.
“USPTO PTAB API” by the USPTO, for public use.NoneUSPTOlegal, trials, appeals1997-2020https://developer.uspto.gov/ptab-api/swagger-ui.htmlhttps://console.cloud.google.com/bigquery?p=patents-public-data&d=uspto_ptab&page=datasetInstitutionDecisionDate, AccordedFilingDate, PatentOwnerName, LastModifiedDatetime, PatentNumber, ApplicationNumber, TrialNumber, application_number, Documents, PetitionerPartyName, FilingDate, ProsecutionStatus, publication_number, InventorName04/13/2022, 12:40:04
77
984374a7-16e9-4b35-9445-458daceb01bfCooperative Patent Classification Datacooperative_patent_classificationhttps://www.cooperativepatentclassification.org/indexCooperative Patent Classification Data contains the scheme and definitions of the Cooperative Patent Classification system for classifying patent documents. The CPC is the result of a partnership between the EPO and the USPTO in their joint effort to develop a common, internationally compatible classification system for technical documents, in particular patent publications, which will be used by both offices in the patent granting processEPO, USPTO“Cooperative Patent Classification” by the EPO and USPTO, for public use. NoneUSPTO, EPOpatents, sciencehttps://www.cooperativepatentclassification.org/cpcSchemeAndDefinitionshttps://console.cloud.google.com/bigquery?p=patents-public-data&d=cpc&page=datasetbreakdownCode, date_revised, titleFull, residual_references, titlePart, symbol, limiting_references, limitingReferences, ipc_concordant, application_references, title_full, additional_only, child_groups, synonyms, notAllocatable, ipcConcordant, status, title_part, childGroups, informative_references, breakdown_code, not_allocatable, definition, level, children, applicationReferences, informativeReferences, residualReferences, glossary, sizeCache, parents, dateRevised04/13/2022, 12:40:04
78
e232a192-965c-4ec9-904c-155b6dfe56c5ChEMBLchemblhttps://console.cloud.google.com/marketplace/product/google_patents_public_datasets/chemblChEMBL Data is a manually curated database of small molecules used in drug discovery, including information about existing patented drugs.European Bioinformatics Institute"The ChEMBL database in 2017." Anna Gaulton, Anne Hersey, Michał Nowotka, A Patrícia Bento, Jon Chambers, David Mendez, Prudence Mutowo, Francis Atkinson, Louisa J Bellis, Elena Cibrián-Uhalte, Mark Davies, Nathan Dedman, Anneli Karlsson, María Paula Magariños, John P Overington, George Papadatos, Ines Smit, Andrew R Leach Nucleic acids Research (2017) 45 (Database Issue), D945-D954NoneEMBL-EBI, an outstation of European Molecular Biology Laboratorybiotechnology, health, chemical, bioinformatics, medicalCC BY-SA 3.0schema: https://www.ebi.ac.uk/chembl/db_schema

https://console.cloud.google.com/bigquery?p=patents-public-data&d=ebi_chembl&page=dataset
ChEMBL: towards direct deposition of bioassay data.

Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, Magariños MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Marañón M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A, Leach AR.

— Nucleic Acids Res. 2019; 47(D1):D930-D940. doi: 10.1093/nar/gky1075
rgid, activity_count, le, ddd_comment, target_type, downgraded, actsm_id, ddd_value, drug_product_flag, standard_flag, black_box_warning, max_phase_for_ind, mol_hrac_id, cx_most_apka, src_id, src_short_name, oral, component_synonym, std_act_id, patent_no, molfile, bao_endpoint, warning_country, ref_id, full_mwt, label, l5, target_desc, protclasssyn_id, res_stem_id, data_validity_comment, homologue, warning_id, level2, mecref_id, ddd_admr, record_id, level2_description, qed_weighted, publication_number, abstract, withdrawn_flag, stem_class, compound_key, sequence, db_version, warning_class, as_id, selectivity_comment, accession, withdrawn_year, indication_class, targrel_id, domain_description, molecular_mechanism, prediction_method, src_description, assay_param_id, submission_date, withdrawn_class, usan_stem_id, l1, standard_units, end_position, binding_site_comment, assay_test_type, orig_description, atc_code, activity_comment, comments, component_type, first_approval, description, assay_type, substrate_record_id, cidx, l2, published_value, warning_type, hba, cx_logd, acd_most_bpka, caloha_id, component_id, standard_type, tid_fixed, updated_on, mechanism_comment, domain_type, therapeutic_flag, cx_logp, cell_source_tissue, aidx, aromatic_rings, level3_description, synonyms, research_stem, withdrawn_reason, clo_id, ridx, direct_interaction, acd_logd, type, chebi_par_id, pubmed_id, indref_id, ad_type, cx_most_bpka, isoform, irac_class_id, parent_id, approval_date, ro3_pass, tid, year, last_active, site_id, assay_category, standard_text_value, assay_tax_id, nda_type, start_position, assay_id, cell_description, major_class, subgroup, rtb, prod_pat_id, assay_cell_type, warning_description, domain_name, sitecomp_id, who_extra, product_id, set_name, mol_atc_id, metref_id, assay_strain, l7, efo_term, withdrawn_country, mc_tax_id, bao_format, warning_year, aspect, protein_class_synonym, targcomp_id, result_flag, cell_source_organism, cell_ontology_id, go_id, upper_value, prodrug, text_value, mc_target_type, smarts, cellosaurus_id, value, ap_id, assay_source, lle, alert_name, species_group_flag, qudt_units, curated_by, definition, chembl_id, path, hrac_class_id, active_molregno, published_units, who_name, title, variant_id, relationship_desc, mechanism_of_action, first_page, parameter_type, short_name, entity_type, assay_desc, authors, curation_comment, ref_url, alogp, chirality, natural_product, published_type, source_domain_id, ddd_units, site_residues, issue, cell_name, cell_source_tax_id, mc_target_accession, mec_id, usan_substem, mutation, level1, doi, doc_id, co_stem_id, comp_go_id, compd_id, uberon_id, updated_by, last_page, disease_efficacy, warnref_id, sei, potential_duplicate, standard_inchi, normal_range_min, mesh_id, acd_most_apka, syn_type, level1_description, assay_tissue, predbind_id, mc_organism, confidence, toid, parent_type, max_phase, parameter_value, volume, irac_code, assay_class_id, activity_id, met_id, molecular_species, delist_flag, mw_freebase, domain_id, applicant_full_name, normal_range_max, level4_description, molregno, creation_date, class_level, oc_id, dosage_form, assay_subcellular_fraction, assay_organism, mesh_heading, level4, protein_class_id, doc_type, hba_lipinski, annotation, drug_record_id, trade_name, ass_cls_map_id, site_name, metabolite_record_id, priority, log_id, mol_frac_id, met_conversion, usan_stem_definition, parent_molregno, enzyme_tid, hbd_lipinski, topical, patent_expire_date, published_relation, usan_year, met_comment, biocomp_id, availability_type, parenteral, country, standard_inchi_key, formulation_id, compsyn_id, drugind_id, name, level5, alert_id, status, full_molformula, src_compound_id, l8, strength, l3, heavy_atoms, level3, enzyme_name, helm_notation, stem, tax_id, hrac_code, patent_use_code, ref_type, num_alerts, standard_upper_value, comp_class_id, route, polymer_flag, target_mapping, frac_code, bto_id, bei, version, pref_name, patent_id, stat, class_type, psa, pathway_id, canonical_smiles, db_source, mw_monoisotopic, confidence_score, dosed_ingredient, sequence_md5sum, src_assay_id, first_in_class, frac_class_id, job_id, structure_type, journal, tbl, cl_lincs_id, parent_go_id, company, mol_irac_id, relationship, acd_logp, hbd, drug_substance_flag, relation, source, pathway_key, pchembl_value, tissue_id, l4, idx, efo_id, uo_units, previous_company, compound_name, alert_set_id, smid, usan_stem, protein_class_desc, inorganic_flag, cell_id, cpd_str_alert_id, standard_value, entity_id, num_lipinski_ro5_violations, action_type, related_tid, organism, molecule_type, num_ro5_violations, units, standard_relation, active_ingredient, bao_id, innovator_company, mc_target_name, ingredient, relationship_type, molsyn_id, l6, ddd_id04/13/2022, 12:40:04
79
640ed301-691a-45c6-aa9d-5f8364424044UniChemunichemhttps://www.ebi.ac.uk/unichem/beta/ UniChem is large-scale non-redundant database of pointers between chemical structures and EMBL-EBI chemistry resources. Its purpose is to optimise the efficiency with which structure-based hyperlinks may be built and maintained between chemistry-based resources, and is particularly suitable for creating such links 'on the fly' (by use of REST web services). Primarily, this service has been designed to maintain cross references between EBI chemistry resources. These include primary chemistry resources (ChEMBL, ChEBI and SureChEMBL), and other resources where the main focus is not small molecules, but which may nevertheless contain some small molecule information (eg: Gene Expression Atlas, PDBe). European Bioinformatics Institutebiotechnology, health, chemical, bioinformatics, medicalhttps://chembl.gitbook.io/unichem/unichem-2.0/unichem-2.0-betaFri, 03 Dec 2021 11:44:45 GMT
80
b9602dde-b508-4e6a-9620-0e20e95104ffChemBL-NTDchembl_ntdhttps://chembl.gitbook.io/chembl-ntd/CHEMBL-NTD is a repository for Open Access primary screening and medicinal chemistry data directed at neglected diseases - endemic tropical diseases of the developing regions of the Africa, Asia, and the Americas. The primary purpose of ChEMBL-NTD is to provide a freely accessible and permanent archive and distribution centre for deposited data. ChEMBL-NTD is a subset of the data in the free medicinal chemistry and drug discovery database ChEMBL.EMBL-EBI at Hinxton in the United Kingdomuse the citation associated with the deposited datasetbiotechnology, health, chemical, bioinformatics, medical, neglected diseasesWe encourage all users to download, copy and redistribute these data as needed. However, in the spirit of open collaboration and to enable rapid development of new therapeutics for neglected disease, we encourage following these basic principles:

Users who annotate, add to, or modify these data in a way that adds significant value are encouraged to release their work to the public domain, ideally by re-contributing their findings to ChEMBL-NTD.

When these data are used or cited in a paper or other scholarly work please reference the citation provided in each deposition set.

Access to the ChEMBL-NTD data is under the EMBL-EBI's standard terms: http://www.ebi.ac.uk/Information/termsofuse.html
Fri, 03 Dec 2021 11:46:01 GMT
81
b8b008d6-43ba-49e1-92cb-59a9dcffaf87World Bank Development Indicatorsworld_bank_development_indicatorshttps://datacatalog.worldbank.org/search/dataset/0037712World Development Indicators Data is the primary World Bank collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates.World Bank“World Development Indicators” by the World BankNonedata@worldbank.orgdevelopment, growth, globalCreative Commons Attribution 4.0https://datahelpdesk.worldbank.org/knowledgebase/topics/125589https://console.cloud.google.com/bigquery?p=patents-public-data&d=worldbank_wdi&page=datasetcountry_name, indicator_code, year, indicator_value, country_code, indicator_name04/13/2022, 12:40:04
82
289055b8-4e07-4d52-9f5a-7d35fa0d942bCPA Global Technical Standards ETSI Data technical_standards_etsihttps://console.cloud.google.com/marketplace/product/google_patents_public_datasets/cpa-global-technical-standards-etsiEuropean Telecommunications Standards Institute (ETSI) IPR dataset for technical standards. These are the US assets disclosed by companies as related to technical standards in ETSI. The two major ones included are 3GPP and LTE.CPA Global (now owned by Clarivate)“CPA Global Technical Standards ETSI Data” by CPA Global (through ETSI IPR) is licensed under a Creative Commons Attribution 4.0 International License.NoneGoogle Patents Public Datastandards, technologyCreative Commons Attribution 4.0https://github.com/google/patents-public-data/blob/master/tables/dataset_CPA%20Global.mdhttps://console.cloud.google.com/bigquery?p=innography-174118&d=technical_standards&page=dataset&project=sheets-management-319211&ws=!1m4!1m3!3m2!1sinnography-174118!2stechnical_standardsPublicationNumber, StandardBody, TechnicalStandard04/13/2022, 12:40:04
83
2721f5ec-e599-4890-9265-9706719fc71e337Info - Unfair Import Investigations Information Systemunfair_import_investigationshttps://pubapps2.usitc.gov/337external/US International Trade Commission 337Info Unfair Import Investigations Information System contains data on investigations done under Section 337. Section 337 declares the infringement of certain statutory intellectual property rights and other forms of unfair competition in import trade to be unlawful practices. Most Section 337 investigations involve allegations of patent or registered trademark infringement.US International Trade ComissionUS International Trade Commission 337Info Unfair Import Investigations Information SystemNoneUS International Trade Comissionimport, legal, trade2008-2021 (prior to 2008 downloadable as a JSON file)FAQ and tutorial available on the sitehttps://console.cloud.google.com/bigquery?p=patents-public-data&d=usitc_investigations&page=dataset&project=sheets-management-319211teoProceedingInvolved, lastUpdated, finalIdOnViolationDue, gcAttorney, aljAssigned, patentNumber, teoIdIssueDate, scheduledStartDateEvidHear, issueDateOtherNonFinal, docketNo, patentNumbers, internalRemand, finalDetNoViolation, id, copyrightNumbers, finalDetViolation, dateComplaintFiled, teoIdDueDate, currentStatus, currentActiveALJ, invUnfairAct, endDateMarkmanHearing, scheduledEndDateEvidHear, trademarkNumbers, respondent, actualStartDateEvidHear, htsNumbers, investigationType, investigationTermDate, dateCreated, actualEndDateEvidHear, investigationNo, ouiiAttorney, cafcAppeals, finalIdOnViolationIssue, startDateMarkmanHearing, markmanHearing, ouiiParticipation, title, dateOfPublicationFrNotice, publication_number, teoReliefGranted, complainant, targetDate04/13/2022, 12:40:04
84
10fc1bad-8a80-4c3c-8803-8d33246fc659IFI Claims Patent Data Enrichmentsifi_claims_enrichmentshttps://www.ificlaims.com/product/product-data-enrichments.htmIFI CLAIMS Patent Data Enrichments includes standardized assignee/applicant names and integrated legal status information.IFI CLAIMSCosts to access via IFI, Google Patents Public Datasets hosts a core public version on BigQueryIFI CLAIMSanalytics, patentsvariablehttps://www.ificlaims.com/news/view/blog-posts/public-patent-data-now.htmhttps://console.cloud.google.com/marketplace/product/google_patents_public_datasets/ifi-claims-patent-data-enrichments04/13/2022, 15:00:02
85
3f98a0ed-4f5d-43d9-9bdb-4cef4e1ae46fUSPTO OCE Cancer Moonshot Patent Datauspto_cancerhttps://www.uspto.gov/ip-policy/economic-research/research-datasets/cancer-moonshot-patent-dataThe USPTO Cancer Moonshot Patent Data contains detailed information on published patent applications and granted patents relevant to cancer research and development (R&D). We generate the dataset using USPTO examiner tools to execute a series of queries designed to identify cancer-specific patents and patent applications. We apply several approaches to ensure coverage of the various fields and subject matter that cancer-related innovations encompass. These include drugs, diagnostics, surgical devices, data analytics, and genomic-based inventions. The final dataset consist of roughly 270,000 patent documents spanning the 1976 to 2016 period. Jesse Frumkin, Amanda F. MyersFrumkin, Jesse and Myers, Amanda F., Cancer Moonshot Patent Data (August, 2016). NoneeconomicsData@uspto.govhealth, cancer, drug discovery, biotechnologyThe OCE developed these data files for public use and encourage users to identify fixes and improvements.1976-2016https://bulkdata.uspto.gov/data/patent/cancer/moonshot/2016/cancer_patent_data_doc_v15.docxhttps://console.cloud.google.com/bigquery?p=patents-public-data&d=uspto_cancer&page=dataset&project=sheets-management-31921104/13/2022, 15:00:02
86
e3d20ecd-fa26-4572-9c1f-2b26aa47e15dUCB Fung Institute Patent Data ucb_funghttps://console.cloud.google.com/marketplace/product/google_patents_public_datasets/ucb-fung-patentDrawing upon recent advances in machine learning and natural language processing, we introduce new tools that automatically ingest, parse, disambiguate and build an updated database using United States patent data. The tools identify unique inventor, assignee, and location entities mentioned on each granted US patent from 1976 to 2016. We describe data flow, algorithms, user interfaces, descriptive statistics, a novelty measure based on the first appearance of a word in the patent corpus, and an automated co-inventor network mapping tool. Balsmeier, B., Assaf, M., Chesebro, T., Fierro, G., Johnson, K., Johnson, S., Li, G., W.S. Lueck, O’Reagan, D., Yeh, W., Zang, G., Fleming, L.Balsmeier, B., Assaf, M., Chesebro, T., Fierro, G., Johnson, K., Johnson, S., Li, G., W.S. Lueck, O’Reagan, D., Yeh, W., Zang, G., Fleming, L. “Machine learning and natural language processing applied to the patent corpus.” Forthcoming at Journal of Economics and Management Strategy.Nonepatents, machine learning, disambiguation, metrics, noveltyCreative Commons Attribution 4.0 International license1976-2016https://funginstitute.berkeley.edu/wp-content/uploads/2016/11/Machine_learning_and_natural_language_processing_on_the_patent_corpus.pdfhttps://console.cloud.google.com/bigquery?p=erudite-marker-539&d=JEMS16&page=dataset https://doi.org/10.1111/jems.12259Country, PatentNo, string_field_1, Geography, AssistExaminer, Type, GovernmentInterests, CPC_Layer_2, LastName, Self_Citation_Flag, sequence, id, int64_field_0, Title, ApplDate, Company, City, InventorID, Abstract, assignee_disambiguated, FirstMiddleName, CPC_Full, PrimaryExaminer, FamilyID, IssueDate, CurrentUse, string_field_2, CPC_Layer_1, ApplNo, PatentNoOrNPL_cited, LawFirm, pdpass, FullName, InventorFullname, PatentNo_citing, FutureUse, Word, Sequence, State, CountryCodeOrNPL_cited04/13/2022, 12:40:04
87
8b8da8ff-2b09-4e1f-9523-c0c549c5cfa1Patent PDF Samples with Extracted Structured Data patent_pdf_sampleshttps://console.cloud.google.com/marketplace/product/global-patents/labeled-patentsThe dataset consists of PDFs in Google Cloud Storage from the first page of select US and EU patents, and BigQuery tables with extracted entities, labels, and other properties, including a link to each file in GCS. The structured data contains labels for eleven patent entities (patent inventor, publication date, classification number, patent title, etc.), global properties (US/EU issued, language, invention type), and the location of any figures or schematics on the patent's first page.

The structured data is the result of a data entry operation collecting information from PDF documents, making the dataset a useful testing ground for benchmarking and developing AI/ML systems intended to perform broad document understanding tasks like extraction of structured data from unstructured documents. This dataset can be used to develop and benchmark natural language tasks such as named entity recognition and text classification, AI/ML vision tasks such as image classification and object detection, as well as more general AI/ML tasks such as automated data entry and document understanding. Google is sharing this dataset to support the AI/ML community because there is a shortage of document extraction/understanding datasets shared under an open license.
Google PatentsNoneGoogle Cloud Public Datasets Programmachine learning, OCR, document recognition, benchmarkingCC BY 4.0At sitehttps://console.cloud.google.com/bigquery?p=bigquery-public-data&d=labeled_patents&page=datasetinvention_type, title_line_1, x_relative_min, class_us, issuer, representative_line_1_eu, inventor_line_1, number, language, application_number, gcs_path, filing_date, y_relative_min, publication_date, priority_date_eu, applicant_line_1, x_relative_max, y_relative_max, class_international04/13/2022, 12:40:04
88
fbd6c408-e2b1-4581-8cdb-e1bca46146f7GRID: Global Database of Research Institutes gridhttps://www.grid.ac/GRID is a free and openly available global database of over 100,000 research-related organisations, including healthcare organizations, companies, governments, non-profits, each provided with a unique and persistent identifier. In addition to IDs and names, the data is augmented with with locations, addresses, hierarchical structures and much more.

Open IDs such as GeoNames IDs, NUTS3 regions, WikiData IDs, CrossRef Open Funder Registry IDs, ISNI and link to country specific IDs like UCAS codes, UKPRN numbers, HESA codes are used.
Digital Science & Research Solutions LtdNonecontact@grid.ac, Digital Sciencedisambiguation, geography, institutionsCC0 Creative Commons licensehttps://www.grid.ac/pages/policies yeshttps://console.cloud.google.com/bigquery?p=grid-ac&page=table&d=data&t=research_orgs&project=sheets-management-31921104/13/2022, 15:00:02
89
95ed0b8b-1d47-4386-9ff1-6b09028323efTransportation Economics in the 21st Centurytransportation_economicshttps://www.nber.org/research/data/transportation-economics-21st-century-data-resourcesImproving access to data sets related to transportation economics and facilitating research with these datasets are cental objectives of this project. Post-doctoral researcher Caitlin Gorback, with advice from from a steering committee including Nathaniel Baum-Snow of the University of Toronto, Leah Brooks of George Washington University, Edward Glaeser, Harvard University and NBER, Stephen Redding, Princeton University and NBER, and Matthew Turner of Brown University and NBER, has collected information on a number of data sets that are available from the Department of Transportation (DOT) or that have been created by researchers who have made them available for folllow-on study. These data have been organized into several major categories below. The DOT data span a wide range of transportation modes and include information about the transportation infrastructure, the delivery of transportation services, and the demand for these services. This project is supported by the U.S. Department of Transportation through an inter-agency agreement with the National Science Foundation, which has extended a grant to the NBER.NoneCaitlin Gorback, gorback@nber.orggeography, transportation, trade, logistics, infrastructureFri, 03 Dec 2021 13:12:25 GMT
90
9b37a63b-4bfd-43e9-815e-3fd84cd29301NBER Macrohistory Databasenber_macrohistoryhttps://www.nber.org/research/data/nber-macrohistory-databaseDuring the first several decades of its existence, the National Bureau of Economic Research (NBER) assembled an extensive data set that covers all aspects of the pre-WWI and interwar economies, including production, construction, employment, money, prices, asset market transactions, foreign trade, and government activity. Many series are highly disaggregated, and many exist at the monthly or quarterly frequency. The data set has some coverage of the United Kingdom, France and Germany, although it predominantly covers the United States.
Daniel Feenberg, Jeff Miron, NBERNoneDaniel Feenberg (feenberg at nber dot org)
Jeff Miron (jmiron@bu.edu)
data@nber.org
Improving the Accessibility of the NBER's Historical Data, by Daniel Feenberg and Jeff Miron (NBER Working Paper 5186). Published in the Journal of Business and Economic Statistics, Volume 15 Number 3 (July 1997) pages 293-299.Fri, 03 Dec 2021 13:28:26 GMT
91
6520861b-6600-4dcc-9ef2-2f0984283d7cAmerican Business Cycleamerican_business_cyclehttps://www.nber.org/research/data/tables-american-business-cyclePresented here are the tables of quarterly data from Appendix B of "The American Business Cycle: Continuity and Change" Edited by Robert J. Gordon. National Bureau of Economic Research Studies in Business Cycles Volume 25, Univerisity of Chicago Press 1986. For information about sources and methods please see that volume.

A feature of that volume is an extensive data appendix, compliled as a project independent of the conference in collaboration with Nathan S. Balke. The unique value of this data set is the fact that it is the only existing source for the pre-1947 quarterly data, as NIPA quarterly data series do not otherwise exist before 1947. These files include the components of GDP back from 1941 to 1919 and the quarterly real GDP back to 1875.
Robert J. Gordon, Nathan S. BalkeNoneDaniel Feenberg (feenberg at nber dot org)"The American Business Cycle: Continuity and Change" Edited by Robert J. Gordon. National Bureau of Economic Research Studies in Business Cycles Volume 25, Univerisity of Chicago Press 1986, https://www.nber.org/books-and-chapters/american-business-cycle-continuity-and-changeFri, 03 Dec 2021 13:30:02 GMT
92
5e6dc621-57a3-4374-b558-8b7c8ca3e252Census Block Distance Databasecensus_block_distancehttps://www.nber.org/research/data/block-distance-databaseCensus Block Distances are great-circle distances calculated using the Haversine formula based on internal points in the geographic area.

Census Blocks are from Census 2000 SF1 and Census 2010 SF1 files. Census Blocks "are statistical areas bounded by visible features, such as streets, roads, streams, and railroad tracks, and by nonvisible boundaries, such as selected property lines and city, township, school district, and county limits and short line-of-sight extensions of streets and roads."
NBERdata@nber.orgpopulation, geographyFri, 03 Dec 2021 15:25:59 GMT
93
602ecd9b-4b5d-45f6-9ee2-16c6d83aeb9fHistorical Cross-Country Technology Adoption (HCCTA) Datasethistorical_cross_countyhttps://www.nber.org/research/data/historical-cross-country-technology-adoption-hccta-datasetThis Historical Cross Country Technology Adoption Dataset is a dataset that was collected to allow for the analysis of the adoption patterns of some of the major technologies introduced in the past 250 years across the World's leading industrialized economies.NBERComin, D. and Hohijn B., "Cross-Country Technological Adoption: Making the Theories Face the Facts". Journal of Monetary Economics, January 2004, pp. 39-83.NoneDiego A. Comin, diego.comin@nyu.edu, Bart Hobijin, bart.hobijn@ny.frb.orggeography, technology, adoption, metricshttps://www.nber.org/hccta/hcctadhelp.pdfComin, D. and Hohijn B., "Cross-Country Technological Adoption: Making the Theories Face the Facts". Journal of Monetary Economics, January 2004, pp. 39-83.Fri, 03 Dec 2021 15:27:30 GMT
94
0ab62e80-2e3a-4289-8abf-0995489f5f0cComputer Retrieval of Information on Scientific Projectscrisphttps://www.nber.org/research/data/computer-retrieval-information-scientific-projectsThe NIH CRISP (Computer Retrieval of Information on Scientific Projects) is a searchable database of federally funded biomedical research projects conducted at universities, hospitals, and other research institutions. This dataset has not been updated since 2007, but is relevant to historic researchNBERdata@nber.org1972-1995Fri, 03 Dec 2021 15:37:49 GMT
95
f5c60657-0ea0-4954-8794-ea7ebadca57cCMS's SSA to FIPS CBSA and MSA County Crosswalkcms_ssa_fips_county_crosswalkhttps://data.nber.org/data/cbsa-msa-fips-ssa-county-crosswalk.htmlCMS periodically produces SSA to FIPS CBSA to county crosswalk files. They released a CBSA to MSA to FIPS county crosswalk as well. Some CMS data files have SSA state and county codes or county name rather than FIPS state and county codes. Jean Roth processed the data files below for greater ease of use. NBER, CMSgeography, crosswalk, united states2005-2017yesFri, 03 Dec 2021 23:11:07 GMT
96
ade8e030-cc95-4ea8-a52b-4063688bd02eOpenAlexopenalexhttps://docs.openalex.org/download-snapshotOpenAlex is a free and open catalog of the world's scholarly papers, researchers, journals, and institutions — along with all the ways they're connected to one another. It is maintained by the non-profit OurResearch.MAG, Crossref, OurResearch, Heather Piwowar, Jason PriemNoneinfo@ourresearch.orgcitation, scholarly literatureCC0through 2021https://docs.openalex.org/200GbyesThu, 03 Feb 2022 19:28:50 GMT
97
98
99
100