LD4L Converter Requirements for Stanford/Cornell/Harvard

	A	B	C	D	E	F	G	H	I	J	K
1	Stanford priority	Cornell priority	Harvard priority	Timing / Scope	Requirement	Category	Requirement comes from	Questions and comments	Stanford comments	Cornell comments	Harvard comments

2	--	highest	high	2017Q1	Take MARCXML file for one bib record as input, output LD4L Ontology RDF	A. core	Cornell UC1	For LD4P cataloging/extension work. No expectation of deduping across a collection of records, or of reconciliation. Architecture to assume later extension to different input and outputs. Maps to Harvard UC1.	Stanford requires conversion to Bibframe 2.0, not LD4L ontology.
3	high	medium	medium	2017Q2?	Deduplicate URIs generated in conversion of a batch of records	A. core	Stanford, Cornell UC5, Harvard UC5	Understanding of context? Batch or something else? This is a high priority (Cornell) but recognize that it happens after the "high" priorities listed in column B
4	high	high	high	2017Q1	Convert from MARC (as MARCXML)	B. source data (input)	Stanford, Cornell UC1
5	low	low		post LD4L Labs	convert from MODS	B. source data (input)	Stanford	convert SDR legacy data? SDR metadata originally created not in MODS--does Stanford need this?
6	medium	medium-to-low	medium?		convert source data provided via post to web service (so that the converter can be a function of an editor)	B. source data (input)	converter discussion at Stanford LD4P core mtg	SF- depends expected workflows (whether items are edited on their way into the knowlege store, or after). Would like to hear Stanford's thoughts on this. JAK: for Cornell, seems essential moving forward for production environment but not needed for current LD4P effort
7	low	low	medium		convert directly from database (converter reads database)	B. source data (input)	converter discussion at Stanford LD4P core mtg; Harvard UC2	is this a functional requirement? is there a reason for this beyond "better scaling and efficiency" (a good reason, but not a functional requirement). SF- may not be a functional requirement for every workflow, but there will be workflows where we can't devote resources to exporting files from a db, to be converted periodically. Hoping to explore this with the Harvard Film Archive db.
8	high	high	high	2017Q2?	convert a batch of source data (up to millions of records)	B. source data (input)	Stanford Conversion Workflow Use Cases – MARC, Cornell UC4, Harvard UC4 and UC5	Cornell UC4 imagines doing this as input and output files (at first at least)	dependency for original cataloging	how is this different from the first functional requirement around converting a batch of MARCXML? Is the idea that this is non-MARC data? OR that idea of scalability to the millions? This is not required for the LD4P Cornell project but is the core of the functionality of the converter so high
9	high	high (ld4l BF ext); low (BF)		2017Q1	convert a single record (especially to use as the basis for original cataloging)	B. source data (input)	Stanford Conversion Workflow Use Cases – MARC	how does this relate to 7D?	Same as line 2 but for BF2
10	--	high		2017Q1	convert even brief records such as initial record from acquisitions	B. source data (input)	tracer_bullet_workflow_user_stories	from converter point of view, how is this different from converting a "full record", what does the converter have to do differently?	Suggest to remove this requirement	how is this a different requirement than convert a full record? If anything, this is facilitated in the former.
11	--	low	high	2017Q2	Take one FGDC file as input, output LD4L Ontology RDF	B. source data (input)	Cornell UC2	For LD4P cataloging/extension work. Expect Cornell-Harvard collaboration. Harvard UC3		While FGDC is listed as an example, this is a Harvard requirement rather than a Cornell requirement in actual practice	This is on Harvard's roadmap to work on during 2017Q2. This is dependent on core converter handling all common marc/fgdc fields by 4/1.
12	--	low	high	2017Q3	Take one HFA file as input, output LD4L Ontology RDF	B. source data (input)	Cornell UC2	For LD4P cataloging/extension work. Expect Cornell-Harvard collaboration. Harvard UC2		I'm unsure why HFA data is a Cornell priority -- we do not have data for the Harvard Film Archive	This is on Harvard's roadmap to work on during 2017Q2.
13	high	low	high		when a single MARC record represents both the print and digital version of a resource, converter must correctly interpret whether there are 2 works and 1 instance of each or 2 instances of the same work	B. source data (input)	Stanford	is this part of converter mapping? SF- From Harvard's POV we will have to look at our single record practice and possibly create a mapping for our local needs.	See local tag use cases tab for 856	Cornell generally separates physical and digital bibs aside from known areas (e.g.: dissertations)
14	--	high	high	2017Q1	Convert to LD4L-Ontology	C. target data (output)	Stanford LD4P Converter Requirements3, Cornell UC1
15	high	medium-to-low	medium	2017Q2?	Convert to BF 2.0	C. target data (output)	Stanford, https://wiki.duraspace.org/display/ld4lPLAN/Cornell+Converter+Use+Cases#CornellConverterUseCases-UC3	At 2016-11 Cornell meeting was agreed to do LD4L Ontology first, expect to be able to readily map from internal representation to BF2 as a change in the output code
16		medium			Output triples to include inverses for where both subject and object are in target domain	C. target data (output)	Discussion in 2017-02-07 converter meeting	Could also be done as a post-process step
17	low	high-to-medium	high		convert to a core ontology with an extension	C. target data (output)	Stanford		Not a tracer-bullet 1 (coversion) requirement
18	low	low			converter output, you should be able to choose your serialization format, or, hook converter to a serializer in which case, what format do you want from converter? do you care?	C. target data (output)	Stanford	SW - Conversion between RDF serialization formats is relatively trivial and supported by many off-the-shelf tools. Might be best as a post-process step although since Java libraries also support different formats it would likely be easy to build in some selection	There are other tools to be able to reserialize RDF downstream if needed
19	low	low			The converter needs to be configurable to put or post the RDF somewhere (rather than to standard out)	C. target data (output)	Stanford		We can deal with stdout
20	medium	medium	high		be able to call converter in various ways (API?) at various pipeline steps, including from an editor-like tool in order to review converter output; call converter from another process	D. pipeline	converter discussion at Stanford LD4P core mtg	Do you want the converter to be able to launch another process? SW - Could be implemented as a wrapper around this or another converter	Related to line 7
21	low	low			put the output into a triple store	D. pipeline	Stanford Conversion Workflow Use Cases – MARC		Related to 20	Is this in scope for the converter per-se, or is this something one does with converter output?
22	?	low			Reconcile with external identities	E. reconciliation	Stanford, Cornell UC5	Automated only, manual assist out-of-scope. Stanford question whether this encompases all entity lookup	Not sure what this mean exactly. Is this a post-process or is this a converter process via "uri minting"	recon was a separate step according to our understanding
23	low	low	medium		convert newly received data relevant to previously converted records (MARC record is updated post-conversion)	G. updates	Stanford, Harvard UC5	ideally the converter would convert against something and tell you if it found conflicts and let you choose what to do; converter would ignore dupes, not put them in triple store. if this is a situation that is going to be temporary, need to build converter for it?	Better handled downstream
24	low	low	medium		Take daily dumps of changed MARCXML records, convert these to LD4L ontology and update a triplestore of the complete catalog data	G. updates	Cornell UC6	Also has performance implications, Harvard UC6
25	low	medium			Say what conversion process (profile) was applied	H. provenance tracking	Stanford
26	low	low			Say where the data came from	H. provenance tracking	Stanford			Before we acquired data? Or from which catalog the source data for the converter derived?
27	high	high	high	2017Q2?	If a record can't be converted, converter skips the record and continues (and reports the skipped records and reasons for skipping in a convenient format)	I. error handling	Stanford LD4P Converter Requirements3, Cornel UC4	Based on experience with converter used in LD4L1
28	high	high	high	2017Q2?	Report errors with enough detail that someone can fix the problem records (which records were skipped and which fields caused the problem)	J. reporting	Stanford, Cornell UC4
29	low	medium			on completion of conversion get performance statistics including: # of records converted; # of records skipped; time elapsed; # of local URIs created; # of dedpues; # of entities (what?); # of triples created; # of errors, what kind of errors; what kind of reconciliation was done, to what sources, how many things were reconciled, how many were queued for human intervention; what didn't convert (at record level or field level?)	J. reporting	Stanford
30	low	low			at conversion time, specify where the converted data goes (including putting it in a temporary location for review)	K. configuration	Stanford Conversion Workflow Use Cases – MARC
31	medium	medium			at conversion time, choose a target ontology and optionally, target ontology extension	K. configuration	Stanford Conversion Workflow Use Cases – MARC	? Question about ld4l base conversion ontology and its ability to subsequently map to BF2 and/or other ontology extentions - jg
32	high	high	high	2017Q2?	Specify how converter will handle local MARC practice	K. configuration	Stanford Conversion Workflow Use Cases – MARC	are these tags that no one else uses, or tags that are used in a particular way locally, or a local preference for how to convert tags that are used in a standard way? SF- likely both; seems to be about the flexibility/extensiblity of the converter
33	high	high-to-medium			create profile, which includes...how to handle private data (and which data is private), how to handle local fields, what target ontology/extension to use, where to put output, ...	K. configuration	Stanford LD4P Converter Requirements3
34	high	medium	medium		save profile	K. configuration	Stanford LD4P Converter Requirements3
35	low	medium	medium		share profile	K. configuration	Stanford LD4P Converter Requirements3
36	low	medium	medium		another part of a profile is to specify some constant values to be added to all records being converted (some triple statements to create for everything?) examples ...	K. configuration	Stanford	SF- one use case we have for this is to assert an LDN Receiver for specific types of things.
37	high	high	high	2017Q1	URIs are minted in institution's local namespace	K. configuration	Stanford	Expectation that converter running at any institution will be configured with the local namespace to use for URI minting
38	high	medium	high		add new ontologies or ontology extensions later as conversion targets	L. extensibility	Stanford			yes it is a priority (high) but not urgent to facilitate first stage of conversion dev aside from knowing the converter needs to be extensible
39	low	--			performance specs: convert 10 million records in less than one week	M. non-functional	converter discussion at Stanford LD4P core mtg
40	low	--			run multiple conversions in parallel (i.e. converter is not tied up with one job)	M. non-functional	Stanford
41	high	high		ongoing	Documentation for users and for developers	M. non-functional	Stanford LD4P Converter Requirements3	SW - Documentation should not be a separate task, should document as features are added. Might need a specific effort around end of grant to to tidy though.
42	low	low-to-no		post LD4L Labs	nice to have: converter chooses ontology and extension for you (on record by record basis?) based on source data (and a set of configurable rules)	N. bluesky	converter discussion at Stanford LD4P core mtg
43	low	low			what happens to previously converted records if/when the ontology you are using gets updated?	N. bluesky	tracer_bullet_workflow_user_stories	Ontology change management (cf. Javed's work), probably out-of-scope for this grant		Seems like less of a converter issue and more of a long-term operations workflow, procedure and infrastructure question
44	low	low			GUI to set configurations (anticipating the need for non-programmer interactions)	N. bluesky	Harvard
45	low	--			Converter recognition of uploaded target ontology/extensions	N. bluesky	Harvard	SF-related to "add new ontologies or ontology extensions later as conversion targets", but wasn't sure if that meant adding code that specified the ontology classes or properties or having the ontology "loaded" to the converter and referenced in the profile.		Confused why asking about ontology (in converter context) rather than mapping between source data and output ontology... does ontology need to be loaded?
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100