cover

CulturaX: A High-Quality, Multilingual Dataset for LLMs - Conclusion and References

28 Aug 2024

Introducing CulturaX: a 6.3 trillion-token multilingual dataset in 167 languages, meticulously cleaned and deduplicated for training high-performing LLMs.

cover

CulturaX: A High-Quality, Multilingual Dataset for LLMs - Related Work

28 Aug 2024

Introducing CulturaX: a 6.3 trillion-token multilingual dataset in 167 languages, meticulously cleaned and deduplicated for training high-performing LLMs.

cover

CulturaX: A High-Quality, Multilingual Dataset for LLMs - Data Analysis and Experiments

28 Aug 2024

Introducing CulturaX: a 6.3 trillion-token multilingual dataset in 167 languages, meticulously cleaned and deduplicated for training high-performing LLMs.

cover

CulturaX: A High-Quality, Multilingual Dataset for LLMs - Multilingual Dataset Creation

28 Aug 2024

Introducing CulturaX: a 6.3 trillion-token multilingual dataset in 167 languages, meticulously cleaned and deduplicated for training high-performing LLMs.

cover

CulturaX: A High-Quality, Multilingual Dataset for LLMs - Abstract and Introduction

28 Aug 2024

Introducing CulturaX: a 6.3 trillion-token multilingual dataset in 167 languages, meticulously cleaned and deduplicated for training high-performing LLMs.

cover

NExT-GPT: Any-to-Any Multimodal LLM: Overall Architecture

31 Jul 2024

In this study, researchers present an end-to-end general-purpose any-to-any MM-LLM system called NExT-GPT.

cover

NExT-GPT: Any-to-Any Multimodal LLM: Lightweight Multimodal Alignment Learning

31 Jul 2024

In this study, researchers present an end-to-end general-purpose any-to-any MM-LLM system called NExT-GPT.

cover

NExT-GPT: Any-to-Any Multimodal LLM: Instruction Dataset

31 Jul 2024

In this study, researchers present an end-to-end general-purpose any-to-any MM-LLM system called NExT-GPT.

cover

NExT-GPT: Any-to-Any Multimodal LLM: Related Work

31 Jul 2024

In this study, researchers present an end-to-end general-purpose any-to-any MM-LLM system called NExT-GPT.