2024 Alignement ZBMED and INESData

	A	B	C	D	E	F	G	H	I	J	K	L
1		M4ML			INESData
2	Source	Property	Range	Decription	Property	Range	Description	Comments LJ	Comments Jenifer T	Comments Nelson Q	Comments Rohitha	Comments Daniel G
3	FAIR4ML	deployedAt	Thing	Platform, website, webservice or similar where this ML model has been deployed. There could be deployments that this ML model is not aware of (e.g., done by third-parties).								while this may be interesting, it is very difficult to main this information (at least in a model card).
4	FAIR4ML	ethicalLegalSocial	Text	Considerations wrt ethical, legal and social aspects.					(+1)			(+1)
5	FAIR4ML	evaluationMetricValue	PropertyValue	Evaluation metric values obtained when creating this ML model. There should be a correspondence with the evaluation metrics declared by the ML software used to create this ML model.	evaluationMetrics	Text	Description of the metrics used for evaluating the ML model	Proposal: include in 0.0.1. To cover m4ml and InesData (metrics plus results) Name: evaluationResults Range: Text or PropertyValue Example text: ["Precision: 0.8", "Mean: 0.9"] Example PropertyValue: [ { minValue: 0.0, maxValue: 1.0, value: 0.8, measurementTechnique: "Precision" } ... ]	I also think that the metric and the value should both be stored.	Equivalent to ind:evaluationResults. But I think that having ind:evaluationMetrics and ind:evaluationResults can result in repeated information. Connecting this property to an evaluation dataset like the ind:evaluatedOn would be nice.	I also think that the metric and the value should be stored together using evaluationMetric.	This may need further discussion. On the one hand, having a text value to add a tecxtual description is easy to do. On the other hand, it may be worth it discussing a propert value pair <metric, result> to be able to rank results
6	FAIR4ML	externalValidation	m4ml:MLModelValidationAction	A validation action using this ML model with an external validation dataset (e.g., when new datasets are produced from experiments and were not used as part of the learning process of this ML model). There could be external validations that this ML model is not aware of (e.g., done by third-parties).						I think here we could use the m4ml:testedOn. Don't think makinig a explicit separation is necessary.		Pointing to an actino may bring in additional complexity that may not be easy to capture in my opinion. I am ok pointing to the validation dataset directly
7	FAIR4ML	fineTunedBy	SoftwareApplication or SoftwareSourceCode	ML Software fine-tuning this ML model.				Proposal: Leave it out for now The model is created by a software that does all the training. That training can include some fine-tuning. Still, we are talking about the same software. What would be more interesting is knowing what exisitng MLModel was used for fine-tuning, so the name would be fineTuneOf and the range MLModel... To be honest, I am not so sure on what and how to model here. Fine-tuning, re-terining are important and should be captured but not sure how...			(+1)
8	FAIR4ML	generatedBy	m4ml:MLOptimizationAction	Optimization action on an ML software used to create this ML model.				Proposal: An MLModel is craeted/generated by a specific run of a software. To simplify it we can go for MLModel - generatedBy - SoftwareSourceCode or SoftwareApplication Name: generatedBy Range: SoftwareSourceCode or SoftwareApplication Description: Softawre used to cenerate this model	I don't understand this one	Inesdata schema has the ind:ModelTraining Class to represent that event.	What does optimization action mean here? What does a particular execution of an MLSoftware imply? If it implies the training of the model, then a trainedBy property could be added to reference the dataset. Or could be linked to the optimizedfor property.	I think this may be too granular level, at least in the first iteration of the model
9	FAIR4ML	hyperparameterValue	PropertyValue	Hyperparameter values used to create this ML model. There should be a correspondence with the hyperparameters declared by the ML software used to create this ML model.				Proposal: Leave it out for now For 7B weights we could use a URL to the file. But, I do not really see how to get this one from the model cards. It is considered in DOME but not sure how to get them easily so we can leave it for the next round.	This only works for regular ML models but not neural metworks, right? Or if so, storing 7B weights but be a bit expensive
10	FAIR4ML	intendedUse	Text or DefinedTerm or URL	Purpose and intended use stated to enable users to make a decision as to the suitability of this creative work (e.g., lab protocol, machine learning model, software) to their experimental problem or own use case.
11	FAIR4ML	mlAlgorithm	Text or DefinedTerm	ML algorithm used to solve the task. For instance logistic regression or random forests.	modelCategory	Text	Category of the model (e.g., SVM, Transformer, Supervised, etc.)	SVM is supervised. I suggest split in algorithm (SVM, CNN) and category (supervised, reinforcement, unsupervised)			(+1)
12	FAIR4ML	mlTask	Text or DefinedTerm	ML task addressed by this Ml software or model. For instance binary classification.	task	Text	Task for which the model was trained or fine tuned. E.g., image classification, sentiment analysis, etc.				(+1)
13	FAIR4ML	optimizedFor	Dataset	AI-ready dataset (after pre-processing) used by the ML software for the training and optimization of this ML model.	ind:trainedOn	ind:MLModel	Link to the dataset(s) used for training the model.			Equivalent to ind:trainedOn
14	FAIR4ML	retrainedBy	SoftwareApplication or SoftwareSourceCode	ML software used to re-train this ML model.				Will an MLModel know if it is retrained by a software? I guess yes when it was created by aretraining process. But it is the "big" model that will be used by others to etrain their own models, then not necessarily. Do we need an inverse property for the second case?		I think there should be a retrained entity, that specifies if the model was retrained completely or partially (fine-tuned), that gives information about the base model used (Or multiple models with techniques like Mixture of Experts), which algorithm/technique was used for the retraining (Reinforcement-learning, MoE, Low-rank adaptation, IA3, etc), what dataset was used for the retraining, what task was optimized in the retraining, etc.
15	FAIR4ML				developmentLibrary	Text or URL		In the diagram is inesdata but in the JSON-LD is codemeta. It is not described in any of them
16	FAIR4ML				evaluationResults	Text	Description of the evaluation results obtained from the model (comparison, metric tables, etc.)	See proposal for evaluationMetrics Not sure what it means. Are these the actual values corresponding to the evaluation metrics? Or e.g., comparison tables across results from different models? If the former, it will be messy to do the one to one. If the latter, would you not need link to the other models you are comparing this one to? It sound then more complex than a property with only text, maybe a Dataset (table) would work in that case but still it would miss the info on the compared models	It could be combined im the evaluationMetric if we apply the changes mentiomed there		Could be combined with evaluationMetric.	We need to figure out how to combine with evaluation metric
17	FAIR4ML				GPURequirements	Text	Description of the GPU requirements needed to run the model	Why is this different from schema:ProcessorRequirements? A GPU is a processor, is not it?	This makes sense to me because some models (eg scikit-learn) run on CPU		Better to keep schema:ProcessorRequirements separate.	The rationale for having it separate is that GPU is a special type of processor.
18	FAIR4ML				hasCO2eEmissions	Text	Amount of CO2 equivalent emissions produced by the model. The unit should be included in the field (e.g., 10 tonnes)	(+1) There is also schema:emissionsCO2. To keep it compatible I suggest adding Number to range. Or, use the same as you do with schema:distribution where you have changed domain and range	(+1)			we can merge it with schema.org +1 to Jael's suggestion
19	FAIR4ML				modelRisks	Text	Description of the risks and biases of the model, in a human-readable manner	(+1)			I do not understand this. Does it refer to malware and any viruses that could be caused by the model?	May need to clarify the definition
20	FAIR4ML				parameterSize	Text	Brief description on the parameter size used to train the model (e.g., 7B). The unit (e.g., billions) must be included in the description	Is this about the number of data points used to train the model? I think the name is confusing. For me parameters would be those used in the algorithm, e.g., window size in embeddings	Does this refer to the weiths (or also called parameters in terms of NNs)? The word 'size' is a bit missleading to me, 'Number' maybe?		I do not understand what parameterSize refers to. Are they the number of parameters used to train the model? Or all the permutations of hyperparameters possible?
21	FAIR4ML				schema:distribution	ind:MLModelDownload		(+1) for property and ind:MLModelType
22	FAIR4ML				usageInstructions	Text	Description of the instructions needed to run the model (e.g., to do inference on a task). Code snippets may be used for illustration	(+1) schema:usageInfo is similar, I would make the range compatible or name it the same so easy to integrate later	Agreed
23	codemeta	buildInstructions	URL	Link to installation instructions/documentation.						Is this property not inside ind:usageInstructions?
24	codemeta	contIntegration	URL	Link to continuous integration service.						I don't think that knowing the continous integration approch relates too much to the Model itself.	(+1 to Nelson's comment)
25	codemeta	developmentStatus	Text	Description of development status, e.g. Active, inactive, suspended. See <a href='http://www.repostatus.org/' target='_blank'>repostatus.org</a>	SAME					Is difficult to keep track of this property.	Very much needed.
26	codemeta	embargoDate	Date	Software may be embargoed from public access until a specified date (e.g. pending publication, 1 year from publication).						I think this property is too specific.
27	codemeta	issueTracker	URL	Link to software bug reporting or issue tracking system.							Needed to track bugs in different versions of the same model (if different versions exist). But this could also be conveyed by releaseNotes.
28	codemeta	readme	URL	Link to software Readme file.				Proposal: keep it as it is. For clarity MLModel (domain) - readme - URL (range) Description: Link to the readme file of this creative work (e.g., software or MLModel) Idealy, all the elements from the readme would go to different properties but that also happens with software, right? So, we not to keep it?	(+1)	Exactly to what software are we refering? The software used to train the model? To run the model? Of the model itself?	(+1 to Nelson's comment)
29	codemeta	referencePublication	ScholarlyArticle	An academic publication related to the software.	SAME			But we will need to change the domain to CreativeWork or to "SoftwareSourceCode or SoftwareApplication or MLModel" to cover what codemeta has and the new case we are introducing		(+1)	(+1)
30	schema:CreativeWork	archivedAt	URL or WebPage	Indicates a page or other link involved in archival of a [[CreativeWork]]. In the case of [[MediaReview]], the items in a [[MediaReviewItem]] may often become inaccessible, but be archived by archival, journalistic, activist, or law enforcement organizations. In such cases, the referenced page may not directly publish the content.				Proposal: keep it as it is, it is very important for ZB MED case. The idea is having here to which registry (MLentory for ZB MED case) the model has been added				I am not sure what this means. Model archival is not a thing yet.
31	schema:CreativeWork	author	Organization or Person	The author of this content or rating. Please note that author is special in that HTML 5 provides a special mechanism for indicating authorship via the rel tag. That is equivalent to this and may be used interchangeably.					(+1)
32	schema:CreativeWork	citation	CreativeWork or Text	A citation or reference to another creative work, such as another publication, web page, scholarly article, etc.	SAME
33	schema:CreativeWork	conditionsOfAccess	Text	Conditions that affect the availability of, or method(s) of access to, an item. Typically used for real world items such as an [[ArchiveComponent]] held by an [[ArchiveOrganization]]. This property is not suitable for use as a general Web access control mechanism. It is expressed only in natural language.\\n\\nFor example \"Available by appointment from the Reading Room\" or \"Accessible only from logged-in accounts \".
34	schema:CreativeWork	contributor	Organization or Person	A secondary contributor to the CreativeWork or Event.					What is the difference between this and the author?
35	schema:CreativeWork	copyrightHolder	Organization or Person	The party holding the legal copyright to the CreativeWork.
36	schema:CreativeWork				dateCreated	Date or DateTime	The date on which the CreativeWork was created or the item was added to a DataFeed.	(+1)
37	schema:CreativeWork	dateModified	Date or DateTime	The date on which the CreativeWork was most recently modified or when the item's entry was modified within a DataFeed.	SAME
38	schema:CreativeWork	datePublished	Date or DateTime	Date of first broadcast/publication.
39	schema:CreativeWork	discussionUrl	URL	A link to the page containing the comments of the CreativeWork.
40	schema:CreativeWork	funding	Grant	A Grant that directly or indirectly provide funding or sponsorship for this item. See also ownershipFundingInfo. Inverse property: fundedItem
41	schema:CreativeWork	isAccessibleForFree	Boolean	A flag to signal that the item, event, or place is accessible for free.					(+1)
42	schema:CreativeWork	keywords	DefinedTerm or Text or URL	Keywords or tags used to describe this content. Multiple entries in a keywords list are typically delimited by commas.	SAME
43	schema:CreativeWork	license	CreativeWork or URL	A license document that applies to this content, typically indicated by URL.	SAME
44	schema:CreativeWork	maintainer	Organization or Person	A maintainer of a [[Dataset]], software package ([[SoftwareApplication]]), or other [[Project]]. A maintainer is a [[Person]] or [[Organization]] that manages contributions to, and/or publication of, some (typically complex) artifact. It is common for distributions of software and data to be based on \"upstream\" sources. When [[maintainer]] is applied to a specific version of something e.g. a particular version or packaging of a [[Dataset]], it is always possible that the upstream source has a different maintainer. The [[isBasedOn]] property can be used to indicate such relationships between datasets to make the different maintenance roles clear. Similarly in the case of software, a package may have dedicated maintainers working on integration into software distributions such as Ubuntu, as well as upstream maintainers of the underlying work.\n
45	schema:CreativeWork				headline	Text	Headline of the article.	(+1) What would be the purpose wrt ML models? Is the name not enough? Would this be an alternate name? Or a subtitle/moto phrase?		Should not be enough to have references to the articles related to the model in referencePublication? What if there are multiple articles mentioning the model?	Should it not be enough with citation?	Headline has nothing to do with citation. It's a short description of the model
46	schema:CreativeWork				inLanguage	Language or Text	The language of the content or performance or used in an action. Please use one of the language codes from the IETF BCP 47 standard. See also availableLanguage.	(+1) Just to make sure, this is the natural language and not the programming on, right?
47	schema:SoftwareApplication	installUrl	URL	URL at which the app may be installed, if different from the URL of the item.								Isn't the installUrl the same as the readme for many models? I wonder is this is needed
48	schema:SoftwareApplication	memoryRequirements	Text or URL	Minimum memory requirements.	SAME						(+1)
49	schema:SoftwareApplication	operatingSystem	Text	Operating systems supported (Windows 7, OSX 10.6, Android 1.6).	SAME						(+1)
50	schema:SoftwareApplication	processorRequirements	Text	Processor architecture required to run the application (e.g. IA64).	SAME					I am not sure if the definition of "processor architecture required to run the application" is good enoug. When I read the name "processorRequirement" I think about what processor hardware is required to run the model, a GPU, CPU or TPU and a particular model associated with it. Could be nice to have different attributes for the hardware needed to run and to train the model.
51	schema:SoftwareApplication	releaseNotes	Text or URL	Description of what changed in this version.
52	schema:SoftwareApplication	softwareHelp	CreativeWork	Software application help.
53	schema:SoftwareApplication	softwareRequirements	Text or URL	Component dependency requirements for application. This includes runtime environments and shared libraries that are not included in the application distribution package, but required to run the application (Examples: DirectX, Java or .NET runtime).	SAME
54	schema:SoftwareApplication	softwareVersion	Text	Version of the software instance.	schema:version			Proposal: use schema:version Why: it is better to extend from CreativeWork so that one is the one available there. It could be that versioning models is not much of a thing now but it should be. For instance, if only the dataset changes and nothing else, yes, it is a new model but it would be more correct to say that it is a new version... Not sure, I like vesioning but we can keep it or leave it for later		The idea is to keep track of the version of the model? Usually when a new version of the model appears is launched as a new model.
55	schema:SoftwareApplication	storageRequirements	Text or URL	Storage requirements (free space required).	SAME
56	schema:SoftwareSourceCode				codeRepository	URL	Link to the repository where the un-compiled, human readable code and related code is located (SVN, GitHub, CodePlex).	Would this be the code repository fot the software that created the model? If not, what is the role wrt ML models?		It is not clear what related code means in regards to the model.
57	schema:Thing	description	Text or TextObject	A description of the item.	SAME
58	schema:Thing	identifier	PropertyValue or Text or URL	The identifier property represents any kind of identifier for any kind of [[Thing]], such as ISBNs, GTIN codes, UUIDs etc. Schema.org provides dedicated properties for representing many of these, either as textual strings or as URL (URI) links. See [background notes](/docs/datamodel.html#identifierBg) for more details.\n	SAME
59	schema:Thing	name	Text	The name of the item.	SAME
60	schema:Thing	sameAs	URL	URL of a reference Web page that unambiguously indicates the item's identity. E.g. the URL of the item's Wikipedia page, Wikidata entry, or official website.							Why do we need same as here?
61	schema:Thing	url	URL	URL of the item.						I don't understand in regards to what are we storing this url.
62
63
64					Legend:
65						Agreement (part of FAIR4ML core)
66						Initial agreement, but more discussion is needed to be in core FAIR4ML vocab
67						Disagreement (not part of FAIR4ML core at this time, but may be introduced in a future version)
68
69						In column H	Revise if it is possible to keep it, justification and proposal on how to keep it included in column H
70						In column H	Undecided so we can leave it for next time
71						In column H	Undecided so we maybe leave it for next time
72						In column H	Indeed, better to leave it for next time, more discussion needed
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100