The I3 is creating an index of datasets, code and methods used by innovation researchers to validate new research data and code. An example of such a project is Wetherbee et. al's
U.S. Patent Phrase to Phrase Matching, a standalone validation dataset for benchmarking patent-phrase matching models. Other validation datasets are purpose-built for specific projects and released alongside the datasets they are used to verify; others are referenced in papers and documentation, but are not made public.
This project will index the validation datasets that are available (so researchers can make use of them), point to ones that have been made but are not currently public, outline various validation methods and tools, and also to index validation datasets that do not currently exist, but would be of use to the community.