Open Artificial Knowledge

A 600M+ token dataset generated by state-of-the-art LLMs: Mixtral, Llama3, Llama3.1, Gemma, and Gemma2, designed for diverse NLP tasks and ethical AI development.

Download Dataset GitHub Repo Paper