ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAGAHAIAJAKALAMANAOAPAQARASAT
1
uuidtitleshortnamelocationdescriptiondocumentationcontributorscitationterms_of_usemaintained_bytagsassociated_datasetsassociated_paperslast_edit
2
23999351-4c68-4e28-aec2-9b16e18e4d9cAutomated Patent Landscapingpatent_landscapinghttps://github.com/google/patents-public-data/tree/master/models/landscapingPatent landscaping is the process of finding patents related to a particular topic. It is important for companies, investors, governments, and academics seeking to gauge innovation and assess risk. However, there is no broadly recognized best approach to landscaping. Frequently, patent landscaping is a bespoke human-driven process that relies heavily on complex queries over bibliographic patent databases. This tool can be used to perform Automated Patent Landscaping, an approach that jointly leverages human domain expertise, heuristics based on patent metadata, and machine learning to generate high-quality patent landscapes with minimal effort.https://github.com/google/patents-public-data/tree/master/models/landscapingGoogle Patents, Aaron Abood, Dave Feltenbergerhttp://www.apache.org/licenses/LICENSE-2.0machine learning, patent landscaping, citationhttps://link.springer.com/content/pdf/10.1007%2Fs10506-018-9222-4.pdfWed, 04 May 2022 11:04:06 GMT
3
2af204fa-8074-44a1-8137-f0b605d97c68Claim Text Extractionclaim_text_extractionhttps://github.com/google/patents-public-data/blob/master/examples/claim-text/claim_text_extraction.ipynbImagine you're analyzing a subset of patents and want to do some text analysis of the first independent claim. To do this, you'd need to be able to join your list of patent publication numbers with a dataset containing the patent text. Additionally, you'd need a method to extract the first claim from the rest of the claims. This notebook is a demonstration of one method to perform this analysis using python, BigQuery, and Google's new public dataset on patents.Google Patents, Otto Stegmaier, MtDersvanhttp://www.apache.org/licenses/LICENSE-2.0machine learningFri, 03 Dec 2021 18:45:21 GMT
4
87ef4394-8339-453f-b1e8-5715f68dd0fdClaim Breadth Modelclaim_breadth_modelhttps://github.com/google/patents-public-data/blob/master/models/claim_breadth/README.mdWe demonstrate a machine learning (ML) based approach to estimating claim breadth, which has the ability to capture more nuance than a simple word count model. While our approach may be an improvement over simpler methods, it is still imperfect and does not account for any semantic meaning within the text of the claim. This is not intended to be a recommendation on how to measure claim breadth, but instead we aim to spark academic and corporate interest in using the large amounts of public patent data in BigQuery to further the state of the art in patent research.https://cloud.google.com/blog/products/ai-machine-learning/measuring-patent-claim-breadth-using-google-patents-public-datasetsGoogle Patents, Otto Stegmaier, Vihang Mehta, Darío Hereñúhttp://www.apache.org/licenses/LICENSE-2.0machine learning, claim breadth, classificationFri, 01 Dec 2023 12:21:25 GMT
5
db1c19b5-a1b3-4a49-9fce-583c0b522d9fCitation Chasercitation_chaserhttps://github.com/nealhaddaway/citationchaserIn systematic reviews, we often want to obtain lists of references from across studies: forward citation chasing looks for all records citing one or more articles of known relevance; backward ciation chasing looks for all records referenced in one or more articles. This package contains functions to automate the process of forward and backward citation chasing by making use of the Lens.org API. An input article list can be used to return a list of all referenced records, and/or all citing records in the Lens.org database (consisting of PubMed, PubMed Central, CrossRef, Microsoft Academic Graph and CORE; 'https://www.lens.org').Neal HaddawayHaddaway, N.R., Grainger, M.J., Gray, C.T. 2021. citationchaser: An R package and Shiny app for forward and backward citations chasing in academic searching.MIT Licensecitation, reviews, lensWed, 01 Dec 2021 19:24:38 GMT
6
d5e6e419-faf0-4672-bb87-0da1cb8dfa35Frictionless Frameworkfrictionless_frameworkhttps://framework.frictionlessdata.io/Frictionless is a framework to describe, extract, validate, and transform tabular data, available as a Python library. It supports working with data in a standardised and reproducible way by improving data quality and consistency.https://framework.frictionlessdata.io/docs/guides/introduction/frictionless dataMIT Licensereproducibility
7
20d46742-3c4c-4563-90f3-ec3e5ebeb0b8OpenRefineopenrefinehttps://openrefine.org/OpenRefine is a desktop application that uses your web browser as a graphical interface. It is described as “a power tool for working with messy data”. OpenRefine is most useful where you have data in a simple tabular format such as a spreadsheet, a comma separated values file (csv) or a tab delimited file (tsv) but with internal inconsistencies either in data formats, or where data appears, or in terminology used. OpenRefine can be used to standardize and clean data across your file, as well as performing more complex operations including entity reconciliation against external APIs.https://openrefine.org/documentation.htmlOpenRefinedata cleaning
8
a4d92bb0-baa8-4b85-a439-52b18e6c3c6bMediawiki Citation APIcitoidhttps://en.wikipedia.org/api/rest_v1/#/Citation/getCitationCitoid is an auto-filled citation generator which automatically creates a citation template from online sources based on a URL or some academic reference identifiers like DOIs, PMIDs, PMCIDs and ISBNs. Mediawiki hosts a citoid API, which it's possible to call: alternately, the code is open-source and can be run locally or on a server -- it uses Zotero's translation servers.https://www.mediawiki.org/wiki/CitoidMediawiki, ZoteroAPI: Limit client to no more than 200 requests/sec

code:
http://www.apache.org/licenses/LICENSE-2.0
citationWed, 01 Dec 2021 19:24:42 GMT
9
c24e498a-5b03-4daa-8d36-874e00a41f08PatentsView APIpatentsview_apihttps://patentsview.org/apis/api-query-languageThe PatentsView platform is built on a newly developed database that longitudinally links inventors, organizations, locations, and patenting activity since 1976. The data visualization tool, query tool, and flexible API enable a broad spectrum of users to examine the dynamics of inventor patenting activity over time and space. These tools also permit users to explore technology categories, assignees, citation patterns, and co-inventor networks.Currently no key is necessary to access the PatentsView API. However, we reserve the right to halt excessive usage of the API. Users are free to use, share, or adapt the material for any purpose, subject to the standards of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). Attribution should be given to PatentsView (www.patentsview.org) for use, distribution, or derivative works.disambiguation, entity reconciliation
10
c6b61a07-2fd6-426d-99e6-2b825b98d102Grobidgrobidhttps://github.com/kermitt2/grobidGROBID (or Grobid, but not GroBid nor GroBiD) means GeneRation Of BIbliographic Data.

GROBID is a machine learning library for extracting, parsing and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications.

GROBID should run properly "out of the box" on Linux (32 and 64 bits) and macOS.
https://grobid.readthedocs.io/en/latest/Introduction/The main author is Patrice Lopez (patrice.lopez@science-miner.com).

Core committers and maintenance: Patrice Lopez (science-miner) and Luca Foppiano (NIMS).
@misc{GROBID,
title = {GROBID},
howpublished = {\url{https://github.com/kermitt2/grobid}},
publisher = {GitHub},
year = {2008--2021},
archivePrefix = {swh},
eprint = {1:dir:dab86b296e3c3216e2241968f0d63b68e8209d3c}
}
GROBID is distributed under Apache 2.0 license.Patrice Lopez, info@science-miner.comFri, 01 Dec 2023 12:42:11 GMT
11
fd1340fe-fcae-4c59-9c5e-5e1bbbc19592CiteSpacecitespacehttps://citespace.podia.comCiteSpace generates interactive visualizations of structural and temporal patterns and trends of a scientific field. It facilitates a systematic review of a knowledge domain through an in-depth visual analytic process. It can process citation data from popular sources such as the Web of Science, Scopus, Dimensions, and the Lens. CiteSpace also supports basic visual analytic functions for datasets without citation-related information, for example, PubMed, CNKI, ProQuest Dissertations and Theses. CiteSpace reveals how a field of research has evolved, what intellectual turning points are evident along a critical path, and what topics have attracted attention. CiteSpace can be applied repeatedly so as to track the development of a field closely and extensively.http://cluster.ischool.drexel.edu/~cchen/citespace/tutorial/Chaomei Chen Tue, 30 Nov 2021 17:24:01 GMT
12
1809b659-d1e1-43db-8dbe-664a6e9a5bc0Google Patents match APIgoogle_patents_matchhttps://patents.google.com/api/matchResolves messy patent publication and application numbers to DOCDB publication number format.https://patents.google.com/api/matchGoogle Patentsentity reconciliationFri, 01 Dec 2023 12:19:54 GMT
13
b8c70382-7b6f-43b2-a6c0-c788e970e99eGoogle BERT for Patentsbert_for_patentshttps://github.com/google/patents-public-data/blob/master/models/BERT%20for%20Patents.mdA BERT (bidirectional encoder representation from transformers) model pretrained on over 100 million patent publications from the U.S. and other countries using open-source tooling. The trained model can be used for a number of use cases, including how to more effectively perform prior art searching to determine the novelty of a patent application, automatically generate classification codes to assist with patent categorization, and autocomplete.https://github.com/google/patents-public-data/blob/master/examples/BERT_For_Patents.ipynbGoogle Patents, Rob Srebrovic, Jay Yonaminehttps://cloud.google.com/blog/products/ai-machine-learning/how-ai-improves-patent-analysishttp://www.apache.org/licenses/LICENSE-2.0classification, novelty, machine learningFri, 01 Dec 2023 12:20:34 GMT
14
3aa314f5-20eb-4e21-96e8-d1f28e8dd51cCooperative Patent Classification Schemecooperative_patent_classificationhttps://www.cooperativepatentclassification.org/aboutCPC is the outcome of an ambitious harmonization effort to bring the best practices from the EPO and USPTO together. In fact, most U.S. patent documents are already classified in ECLA. The conversion from ECLA to CPC at the EPO will ensure IPC compliance and eliminate the need for the EPO to classify U.S. patent documents. At the USPTO, the conversion will provide an up-to date classification system that is internationally compatible.https://www.cooperativepatentclassification.org/cpcSchemeAndDefinitionscollaboration between EPO and USPTOclassificationFri, 01 Dec 2023 12:20:30 GMT
15
6ba552a7-ec31-4710-9d8b-d8177b293a90Tools for Harmonizing County Boundariesharmonising_county_boundarieshttps://elisabethperlman.net/code.htmlThis tool creates the csv tables that allow county boundaries to be synchronized to a base year, exported to the directory you run this from. While this code takes shape files of any type and preforms an intersect, it was written to follow the method used in Hornbeck (2010) (see https://www.dropbox.com/s/1cygkeoo4p89vrw/BWreplication_BorderFixes.rar for those replication files), that is to say, I wrote it to take shapefiles of US counties from NHGIS from a selections of years and then to reapportioning them by area to the boundaries as they were in a base year. The stata code that uses these csvs was writen to be used with Haines' census data (ICPSR 02896).Bitsy Perlmangeography Thu, 02 Dec 2021 13:37:42 GMT
16
98efee7f-c66a-4e8d-bb15-d23709698dbdManual of Patent Examining Proceduremanual_patent_examininghttps://www.uspto.gov/web/offices/pac/mpep/This Manual is published to provide U.S. Patent and Trademark Office (USPTO) patent examiners, applicants, attorneys, agents, and representatives of applicants with a reference work on the practices and procedures relative to the prosecution of patent applications and other proceedings before the USPTO. For example, the Manual contains instructions to examiners, as well as other material in the nature of information and interpretation, and outlines the current procedures which the examiners are required or authorized to follow in appropriate cases in the normal examination of a patent application. The Manual does not have the force of law or the force of the rules in Title 37 of the Code of Federal Regulations.classificationFri, 01 Dec 2023 12:20:43 GMT
17
9d6d4e5a-5c8d-486a-b9bd-dc1f0485041fWellcome Trust data toolswellcome_trust_grantshttps://github.com/wellcometrustMachine Learning tools, other scripts they use to analyze + visualize grant proposals and outcomes from their public datamachine learningFri, 01 Dec 2023 12:20:56 GMT
18
5615d902-2dfe-4f8b-8205-df0b0b33ce08Octimineoctiminehttps://www.dennemeyer.com/octimine/Machine-learning based patent search and semantic analysis tool.semantic analysisWed, 28 Jun 2023 18:35:05 GMT
19
0a68e139-4068-4b2f-a262-ce7124c6cf73Prodigyprodigyhttps://prodi.gy/Prodigy is a scriptable annotation tool used for creating new machine learning datasets.annotationWed, 28 Jun 2023 18:34:55 GMT
20
9519fa86-2fa6-4600-9c10-06ceef41f423Biblio-gluttonbiblio-gluttonhttps://github.com/kermitt2/biblio-gluttonFramework dedicated to bibliographic information. It includes:

-- a bibliographical reference matching service: from an input such as a raw bibliographical reference and/or a combination of key metadata, the service will return the disambiguated bibliographical object with in particular its DOI and a set of metadata aggregated from Crossref and other sources,
-- a fast metadata look-up service: from a "strong" identifier such as DOI, PMID, etc. the service will return a set of metadata aggregated from Crossref and other sources,
-- various mapping between DOI, PMID, PMC, ISTEX ID and ark, integrated in the bibliographical service,
-- Open Access resolver: Integration of Open Access links via the Unpaywall dataset from Impactstory,
-- Gap and daily update for Crossref resources (via the Crossref REST API), so that your glutton data service stays always in sync with Crossref,
-- MeSH classes mapping for PubMed articles.
https://github.com/kermitt2/biblio-gluttonPatrice LopezPatrice Lopez, info@science-miner.comcitation, metadata, identifiers, mapping, entity reconciliationFri, 01 Dec 2023 12:42:05 GMT
21
4a3152a3-ec06-4aa9-b92d-eb12779dfdaeLogic Milllogic-millhttps://logic-mill.net/Logic Mill is a scalable and openly accessible soft- ware system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. https://github.com/max-planck-innovation-competition/logic-millSebastian Erhardt, Mainak Ghosh, Erik Buunk, Michael E. Rose, Dietmar Harhoff@misc{erhardt2022logic,
title={Logic Mill - A Knowledge Navigation System},
author={Sebastian Erhardt and Mainak Ghosh and Erik Buunk and Michael E. Rose and Dietmar Harhoff},
year={2022},
eprint={2301.00200},
archivePrefix={arXiv},
primaryClass={cs.CL}
}


If you use the Logic Mill system, please cite our paper: https://doi.org/10.48550/arXiv.2301.00200
Max Planck Institute for Innovation and Competition, email: team@logic-mill.netsemantic analysishttps://arxiv.org/pdf/2301.00200.pdfFri, 01 Dec 2023 12:42:49 GMT
22
3434d838-d220-420a-a23e-ced7492528d3AIMixDetect: detect mixed authorship of a language model (LM) and humansai_mix_detecthttps://github.com/idankash/AIMixDetectThis replication package is designed to guide you through the process of replicating the results presented in our paper. The data used in this research was generated using GPT-3.5-turbo (ChatGPT) and is organized into five distinct datasets, which are located in the dataset folder. To facilitate the reading and parsing of these datasets, a script named parse_article.py is provided.Alon Kipnis, Idan Kashtan@article{Kashtan2024Information,
author = {Kashtan, Idan and Kipnis, Alon},
journal = {Harvard Data Science Review},
number = {Special Issue 5},
year = {2024},
month = {aug 15},
note = {https://hdsr.mitpress.mit.edu/pub/f90vid3h},
publisher = {The MIT Press},
title = {
{An} {Information}-{Theoretic}
{Approach} for {Detecting} {Edits} in {AI}-{Generated} {Text} },
volume = { },
}
idankashtan11@gmail.comFri, 30 Aug 2024 09:24:59 GMT
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100