Open is thrown about a lot in the AI community. This week, Nomic and Allen AI remind us what it takes to build truly open-source AI models. They shared the training data and methods along with the models. This is a big deal because scrutiny of the training process is valuable for many reasons.
Dolma an Open Corpus of Three Trillion Tokens
for Language Model Pretraining Research - https://arxiv.org/pdf/2402.00159.pdf
Nomic Embed: https://blog.nomic.ai/posts/nomic-embed-text-v1
Open Language Models (OLMos) and the LLM landscape: https://www.interconnects.ai/p/olmo
━━━━━━━━━━━━━━━━━━━━━━━━━
★ Rajistics Social Media »
● Home Page: http://www.rajivshah.com
● LinkedIn: https://www.linkedin.com/in/rajistics/
━━━━━━━━━━━━━━━━━━━━━━━━━
Dolma an Open Corpus of Three Trillion Tokens
for Language Model Pretraining Research - https://arxiv.org/pdf/2402.00159.pdf
Nomic Embed: https://blog.nomic.ai/posts/nomic-embed-text-v1
Open Language Models (OLMos) and the LLM landscape: https://www.interconnects.ai/p/olmo
━━━━━━━━━━━━━━━━━━━━━━━━━
★ Rajistics Social Media »
● Home Page: http://www.rajivshah.com
● LinkedIn: https://www.linkedin.com/in/rajistics/
━━━━━━━━━━━━━━━━━━━━━━━━━
- Category
- Artificial Intelligence & Business
Comments