ABCDEFGHIJKLMNOPQRSTUVWXY
1
Stanford priorityCornell priorityHarvard priorityTiming / ScopeRequirementCategoryRequirement comes fromQuestions and commentsStanford commentsCornell commentsHarvard comments
2
--highesthigh2017Q1Take MARCXML file for one bib record as input, output LD4L Ontology RDFA. coreCornell UC1For LD4P cataloging/extension work. No expectation of deduping across a collection of records, or of reconciliation. Architecture to assume later extension to different input and outputs. Maps to Harvard UC1. Stanford requires conversion to Bibframe 2.0, not LD4L ontology.
3
highmediummedium2017Q2?Deduplicate URIs generated in conversion of a batch of recordsA. coreStanford, Cornell UC5, Harvard UC5Understanding of context? Batch or something else? This is a high priority (Cornell) but recognize that it happens after the "high" priorities listed in column B
4
highhighhigh2017Q1Convert from MARC (as MARCXML)B. source data (input)Stanford, Cornell UC1
5
lowlowpost LD4L Labsconvert from MODSB. source data (input)Stanfordconvert SDR legacy data? SDR metadata originally created not in MODS--does Stanford need this?
6
mediummedium-to-lowmedium?convert source data provided via post to web service (so that the converter can be a function of an editor)B. source data (input)converter discussion at Stanford LD4P core mtgSF- depends expected workflows (whether items are edited on their way into the knowlege store, or after). Would like to hear Stanford's thoughts on this. JAK: for Cornell, seems essential moving forward for production environment but not needed for current LD4P effort
7
lowlowmediumconvert directly from database (converter reads database)B. source data (input)converter discussion at Stanford LD4P core mtg; Harvard UC2is this a functional requirement? is there a reason for this beyond "better scaling and efficiency" (a good reason, but not a functional requirement). SF- may not be a functional requirement for every workflow, but there will be workflows where we can't devote resources to exporting files from a db, to be converted periodically. Hoping to explore this with the Harvard Film Archive db.
8
highhighhigh2017Q2?convert a batch of source data (up to millions of records)B. source data (input)Stanford Conversion Workflow Use Cases – MARC, Cornell UC4, Harvard UC4 and UC5Cornell UC4 imagines doing this as input and output files (at first at least)dependency for original cataloginghow is this different from the first functional requirement around converting a batch of MARCXML? Is the idea that this is non-MARC data? OR that idea of scalability to the millions? This is not required for the LD4P Cornell project but is the core of the functionality of the converter so high
9
highhigh (ld4l BF ext); low (BF)
2017Q1convert a single record (especially to use as the basis for original cataloging)B. source data (input)Stanford Conversion Workflow Use Cases – MARChow does this relate to 7D?Same as line 2 but for BF2
10
--high2017Q1convert even brief records such as initial record from acquisitionsB. source data (input)tracer_bullet_workflow_user_storiesfrom converter point of view, how is this different from converting a "full record", what does the converter have to do differently?Suggest to remove this requirementhow is this a different requirement than convert a full record? If anything, this is facilitated in the former.
11
--lowhigh2017Q2Take one FGDC file as input, output LD4L Ontology RDFB. source data (input)Cornell UC2For LD4P cataloging/extension work. Expect Cornell-Harvard collaboration. Harvard UC3While FGDC is listed as an example, this is a Harvard requirement rather than a Cornell requirement in actual practiceThis is on Harvard's roadmap to work on during 2017Q2. This is dependent on core converter handling all common marc/fgdc fields by 4/1.
12
--lowhigh2017Q3Take one HFA file as input, output LD4L Ontology RDFB. source data (input)Cornell UC2For LD4P cataloging/extension work. Expect Cornell-Harvard collaboration. Harvard UC2I'm unsure why HFA data is a Cornell priority -- we do not have data for the Harvard Film ArchiveThis is on Harvard's roadmap to work on during 2017Q2.
13
highlowhighwhen a single MARC record represents both the print and digital version of a resource, converter must correctly interpret whether there are 2 works and 1 instance of each or 2 instances of the same workB. source data (input)Stanfordis this part of converter mapping? SF- From Harvard's POV we will have to look at our single record practice and possibly create a mapping for our local needs.See local tag use cases tab for 856Cornell generally separates physical and digital bibs aside from known areas (e.g.: dissertations)
14
--highhigh2017Q1Convert to LD4L-OntologyC. target data (output)Stanford LD4P Converter Requirements3, Cornell UC1
15
highmedium-to-lowmedium2017Q2?Convert to BF 2.0C. target data (output)Stanford, https://wiki.duraspace.org/display/ld4lPLAN/Cornell+Converter+Use+Cases#CornellConverterUseCases-UC3At 2016-11 Cornell meeting was agreed to do LD4L Ontology first, expect to be able to readily map from internal representation to BF2 as a change in the output code
16
mediumOutput triples to include inverses for where both subject and object are in target domainC. target data (output)Discussion in 2017-02-07 converter meetingCould also be done as a post-process step
17
lowhigh-to-mediumhighconvert to a core ontology with an extensionC. target data (output)StanfordNot a tracer-bullet 1 (coversion) requirement
18
lowlowconverter output, you should be able to choose your serialization format, or, hook converter to a serializer in which case, what format do you want from converter? do you care?C. target data (output)StanfordSW - Conversion between RDF serialization formats is relatively trivial and supported by many off-the-shelf tools. Might be best as a post-process step although since Java libraries also support different formats it would likely be easy to build in some selectionThere are other tools to be able to reserialize RDF downstream if needed
19
lowlowThe converter needs to be configurable to put or post the RDF somewhere (rather than to standard out)C. target data (output)StanfordWe can deal with stdout
20
mediummediumhighbe able to call converter in various ways (API?) at various pipeline steps, including from an editor-like tool in order to review converter output; call converter from another processD. pipelineconverter discussion at Stanford LD4P core mtgDo you want the converter to be able to launch another process? SW - Could be implemented as a wrapper around this or another converterRelated to line 7
21
lowlowput the output into a triple storeD. pipelineStanford Conversion Workflow Use Cases – MARCRelated to 20Is this in scope for the converter per-se, or is this something one does with converter output?
22
?low
Reconcile with external identitiesE. reconciliationStanford, Cornell UC5Automated only, manual assist out-of-scope. Stanford question whether this encompases all entity lookup Not sure what this mean exactly. Is this a post-process or is this a converter process via "uri minting"recon was a separate step according to our understanding
23
lowlow mediumconvert newly received data relevant to previously converted records (MARC record is updated post-conversion)G. updatesStanford, Harvard UC5ideally the converter would convert against something and tell you if it found conflicts and let you choose what to do; converter would ignore dupes, not put them in triple store. if this is a situation that is going to be temporary, need to build converter for it?Better handled downstream
24
lowlow mediumTake daily dumps of changed MARCXML records, convert these to LD4L ontology and update a triplestore of the complete catalog dataG. updatesCornell UC6Also has performance implications, Harvard UC6
25
lowmediumSay what conversion process (profile) was appliedH. provenance trackingStanford
26
lowlow Say where the data came fromH. provenance trackingStanfordBefore we acquired data? Or from which catalog the source data for the converter derived?
27
highhighhigh2017Q2?If a record can't be converted, converter skips the record and continues (and reports the skipped records and reasons for skipping in a convenient format)I. error handlingStanford LD4P Converter Requirements3, Cornel UC4Based on experience with converter used in LD4L1
28
highhighhigh2017Q2?Report errors with enough detail that someone can fix the problem records (which records were skipped and which fields caused the problem)J. reportingStanford, Cornell UC4
29
lowmediumon completion of conversion get performance statistics including: # of records converted; # of records skipped; time elapsed; # of local URIs created; # of dedpues; # of entities (what?); # of triples created; # of errors, what kind of errors; what kind of reconciliation was done, to what sources, how many things were reconciled, how many were queued for human intervention; what didn't convert (at record level or field level?)J. reportingStanford
30
lowlow at conversion time, specify where the converted data goes (including putting it in a temporary location for review)K. configurationStanford Conversion Workflow Use Cases – MARC
31
mediummediumat conversion time, choose a target ontology and optionally, target ontology extensionK. configurationStanford Conversion Workflow Use Cases – MARC? Question about ld4l base conversion ontology and its ability to subsequently map to BF2 and/or other ontology extentions - jg
32
highhighhigh2017Q2?Specify how converter will handle local MARC practiceK. configurationStanford Conversion Workflow Use Cases – MARCare these tags that no one else uses, or tags that are used in a particular way locally, or a local preference for how to convert tags that are used in a standard way? SF- likely both; seems to be about the flexibility/extensiblity of the converter
33
highhigh-to-mediumcreate profile, which includes...how to handle private data (and which data is private), how to handle local fields, what target ontology/extension to use, where to put output, ...K. configurationStanford LD4P Converter Requirements3
34
highmediummediumsave profileK. configurationStanford LD4P Converter Requirements3
35
lowmediummediumshare profileK. configurationStanford LD4P Converter Requirements3
36
lowmediummediumanother part of a profile is to specify some constant values to be added to all records being converted (some triple statements to create for everything?) examples ...K. configurationStanfordSF- one use case we have for this is to assert an LDN Receiver for specific types of things.
37
highhighhigh2017Q1URIs are minted in institution's local namespaceK. configurationStanfordExpectation that converter running at any institution will be configured with the local namespace to use for URI minting
38
highmediumhighadd new ontologies or ontology extensions later as conversion targetsL. extensibilityStanfordyes it is a priority (high) but not urgent to facilitate first stage of conversion dev aside from knowing the converter needs to be extensible
39
low--performance specs: convert 10 million records in less than one weekM. non-functionalconverter discussion at Stanford LD4P core mtg
40
low--run multiple conversions in parallel (i.e. converter is not tied up with one job)M. non-functionalStanford
41
highhighongoingDocumentation for users and for developersM. non-functionalStanford LD4P Converter Requirements3SW - Documentation should not be a separate task, should document as features are added. Might need a specific effort around end of grant to to tidy though.
42
lowlow-to-nopost LD4L Labsnice to have: converter chooses ontology and extension for you (on record by record basis?) based on source data (and a set of configurable rules)N. blueskyconverter discussion at Stanford LD4P core mtg
43
lowlowwhat happens to previously converted records if/when the ontology you are using gets updated? N. blueskytracer_bullet_workflow_user_storiesOntology change management (cf. Javed's work), probably out-of-scope for this grantSeems like less of a converter issue and more of a long-term operations workflow, procedure and infrastructure question
44
lowlowGUI to set configurations (anticipating the need for non-programmer interactions)N. blueskyHarvard
45
low--Converter recognition of uploaded target ontology/extensionsN. blueskyHarvardSF-related to "add new ontologies or ontology extensions later as conversion targets", but wasn't sure if that meant adding code that specified the ontology classes or properties or having the ontology "loaded" to the converter and referenced in the profile.Confused why asking about ontology (in converter context) rather than mapping between source data and output ontology... does ontology need to be loaded?
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100