A | B | C | D | E | F | G | H | I | J | K | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Component | Legal Framework | Legal Document | Arctic Document Analysis | Freedoms defined: OSAID v. 0.0.8 | Notes | ||||||
2 | Component definitions: Model Openness Framework | For each component (source: OSAID v. 0.0.8) | Paste link to each component's legal document below | Use for any purpose and without having to ask for permission | Study how the system works and inspect its components | Modification for any purpose, including to change its output | Sharing for others to use, with or without modifications, for any purpose | Challenges in analyzing this component? Need for clarification? Let us know! (Use the comment function to annotate specific cells.) | ||||
3 | Version reviewed: snowflake-arctic-instruct and snowflake-arctic-base (both seems to have the same characteristics) https://huggingface.co/Snowflake/snowflake-arctic-instruct https://huggingface.co/Snowflake/snowflake-arctic-base | |||||||||||
4 | Required | |||||||||||
5 | Data Information | |||||||||||
6 | Training methodologies and techniques | Available under OSD-compliant license | https://medium.com/snowflake/snowflake-arctic-cookbook-series-arctics-approach-to-data-b81a8a0958bd | Allowed | Allowed | | | All of the information for "Data information" rows is provided in a set of Medium posts, with no specific license attached. Therefore, I understand the information can be used and studied, but not modified or shared. However, this could be hindered (at least wrt use) by patents, which are not mentioned. The collection of posts is available at https://www.snowflake.com/en/data-cloud/arctic/cookbook/ BTW, not all posts have already being published, only the first ones (but they seem to include most, if not all, information about the data. In any case, the final dataset used for training is also not available )or I could not find it, to be more precise) | ||||
7 | Training data scope and characteristics | Available under OSD-compliant license | https://medium.com/snowflake/snowflake-arctic-cookbook-series-arctics-approach-to-data-b81a8a0958bd | Allowed | Allowed | | | Idem | ||||
8 | Training data provenance (including how data was obtained and selected) | Available under OSD-compliant license | https://medium.com/snowflake/snowflake-arctic-cookbook-series-arctics-approach-to-data-b81a8a0958bd | Allowed | Allowed | | | Idem | ||||
9 | Training data labeling procedures, if used | Available under OSD-compliant license | https://medium.com/snowflake/snowflake-arctic-cookbook-series-arctics-approach-to-data-b81a8a0958bd | Allowed | Allowed | | | Idem | ||||
10 | Training data cleaning methodology | Available under OSD-compliant license | https://medium.com/snowflake/snowflake-arctic-cookbook-series-arctics-approach-to-data-b81a8a0958bd | Allowed | Allowed | | | Idem | ||||
11 | Code | |||||||||||
12 | Data pre-processing | Available under OSI-approved license | Something Snowflake is willing to share, but they haven't yet published this anywhere yet because no one has asked so far. | | | | | I could not find it. Maybe it is disclosed in some of the pending Medium posts? | ||||
13 | Training, validation and testing | Available under OSI-approved license | https://github.com/Snowflake-Labs/snowflake-arctic/tree/main/training | | | | | I could not find it. Maybe it is disclosed in some of the pending Medium posts? | ||||
14 | Inference | Available under OSI-approved license | https://github.com/Snowflake-Labs/snowflake-arctic/blob/main/LICENSE | Allowed | Allowed | Allowed | Allowed | The repository with the inference code (really,very short examples of how to use) is under Apache-2.0. But that is barely some tens of Python LoCs. The real action is in libraries: transformers & torch for one of the examples, vllm for the other. | ||||
15 | Supporting libraries and tools | Available under OSI-approved license | Allowed | Allowed | Allowed | Allowed | Many components involved. Two different examples, based on two different modules (vllm and transformers). All of them seem to be OSI-compliant licenses, anyway (see comment in legal document cell) | |||||
16 | Model | |||||||||||
17 | Model architecture | Available under OSI-approved license | https://huggingface.co/Snowflake/snowflake-arctic-instruct | Allowed | Allowed | Allowed | Allowed | Described in the blog posts, and in code | ||||
18 | Model parameters | Available under OSD-conformant terms | https://huggingface.co/Snowflake/snowflake-arctic-instruct | Allowed | Allowed | Allowed | Allowed | Apache 2.0 license, as stated in the model card | ||||
19 | Optional | |||||||||||
20 | Data Information All data sets, including: | |||||||||||
21 | Training data sets | Available under OSD-compliant license | | | | | Datasets used are referenced in the Medium post about the matter. There are detailed descriptions of how they are processed to produce the training dataset, but I couldn't find the training dataset itself. | |||||
22 | Testing data sets | Available under OSD-compliant license | | | | | ||||||
23 | Validation data sets | Available under OSD-compliant license | | | | | ||||||
24 | Benchmarking data sets | Available under OSD-compliant license | | | | | ||||||
25 | Data card | Available under OSD-compliant license | | | | | ||||||
26 | Evaluation data | Available under OSD-compliant license | Allowed | Allowed | Allowed | Allowed | ||||||
27 | Evaluation results | Available under OSD-compliant license | | | | | ||||||
28 | Other data documentation | Available under OSD-compliant license | | | | | ||||||
29 | Code | |||||||||||
30 | Code used to perform inference for benchmark tests | Available under OSI-approved license | https://github.com/Snowflake-Labs/snowflake-arctic/blob/main/LICENSE | | | | | At least some scripts for benchmarking seems to be available under Apache-2.0 https://github.com/Snowflake-Labs/snowflake-arctic/tree/main/inference/vllm/benchmarks | ||||
31 | Evaluation code | Available under OSI-approved license | | | | | ||||||
32 | Model All model elements, including: | |||||||||||
33 | Model card | Available under OSD-compliant license | https://huggingface.co/Snowflake/snowflake-arctic-instruct | Allowed | Allowed | Allowed | Allowed | I'm linking the model card as "legal document", since it states that license is Apache-2.0. However, it is not completely clear if license applies to the model card itself: I could it say it does, but I see no specific reference, since al what is available is the usual "License:" field in the model card. In any case, the freedoms analysis is based on what Apache-2.0 allows. | ||||
34 | Sample model outputs | Available under OSD-compliant license | | | | | ||||||
35 | Model metadata | Available under OSD-compliant license | | | | | ||||||
36 | Other | |||||||||||
37 | Research papers | Available under OSD-compliant license | | | | | ||||||
38 | Technical report | Available under OSD-compliant license | Allowed | Allowed | | | I guess the collection of cookbooks could be considered as a detailed technical report for Arctic (see reference above) |