We will be continually pretraining the open-source LLMs to extend their capabilities to new domains and use cases with pressing needs.
We would love to hear from you about your needs - which domains and use cases would you like us to extend these for? Is there any domain where you find tokenization to be currently more inefficient than it needs to be?
We will try our best to accomodate your request, but it is subjected to compute and data constraints. For exceptional cases, we even may consider generating synthetic data to boost performance on such domains and use cases.
All NOLIN models will be open source under Apache 2.0 license.