Xiaohongshu Launches First Open-Source Large Model dots.llm1: 11.2 Trillion Non-Synthetic Data Enhances Chinese Language Performance
Introduction to Dots.llm1: A Breakthrough in AI Modeling
Recently, Xiaohongshu (Little Red Book) unveiled its first open-source large-scale model, ### dots.llm1. This innovative model boasts an impressive ### 142 billion parameters and is classified as a ### Mixture of Experts (MoE) model. A key feature of dots.llm1 is its ability to activate only ### 14 billion parameters during inference, which not only enhances performance but also significantly reduces both training and inference costs.
Unprecedented Data Utilization
Dots.llm1 is trained on an astounding ### 11.2 trillion high-quality, non-synthetic data tokens. This extensive dataset is a rarity among current open-source large models, showcasing Xiaohongshu's robust resources in language processing. The model has demonstrated exceptional performance in Chinese language tests, achieving an average score of ### 91.3, surpassing competitors such as DeepSeek's V2 and V3, as well as Alibaba's Qwen2.5 series.
Technical Architecture of Dots.llm1
The architecture of dots.llm1 employs a ### unidirectional decoder Transformer structure, replacing traditional feedforward networks with the MoE framework. Unlike conventional models, MoE separates multiple expert networks, allowing each expert to focus on different features of the input data. This design enables the model to activate only a small subset of networks during computation, leading to substantial savings in computational resources.
Expert Configuration
- 128 routing experts and ### 2 shared experts are included in the model.
- Each expert consists of a two-layer feedforward network utilizing the ### SwiGLU activation function to capture complex relationships within the data.
- During input processing, the model dynamically selects ### 6 of the most relevant experts and ### 2 shared experts for computation.
Enhancements in Training Stability
To further stabilize model performance and output, dots.llm1 incorporates an improved ### RMSNorm normalization operation during training. Additionally, a load-balancing strategy within the MoE module ensures equitable utilization of all expert networks, mitigating the risk of over-reliance on specific experts.
Optimizing Training Efficiency
To enhance training efficiency, dots.llm1 employs the ### AdamW optimizer, which effectively prevents overfitting and controls gradient explosion. This optimization technique is crucial for maintaining the integrity of the model during extensive training sessions.
Rigorous Data Processing Pipeline
Data processing is a pivotal aspect of training large models. Dots.llm1 has undergone a stringent ### three-tier data processing pipeline, ensuring the high quality of training data. Through a series of filtering and processing steps, the model has been refined to utilize ### 11.2 trillion high-quality tokens.
Open Source Contributions
In a move to foster academic research and collaboration, Xiaohongshu has also made available intermediate training checkpoints for every ### 1 trillion tokens. This initiative aims to promote further exploration and development within the AI research community.
For those interested in exploring the dots.llm1 model, it is available at the following link: Dots.llm1 on Hugging Face.
Key Highlights
- Dots.llm1 is Xiaohongshu's first open-source large model, featuring a ### Mixture of Experts structure with ### 142 billion parameters.
- The model utilizes ### 11.2 trillion non-synthetic data tokens, achieving superior performance in Chinese language assessments.
- A rigorous data processing pipeline ensures the effectiveness and reliability of the training data.
Explore the future of AI with Xiaohongshu's groundbreaking dots.llm1 model, a significant advancement in the realm of language processing and machine learning.
Learn more and explore AI tools built for users on our AI Tool Directory, where you can explore features like smart search and AI assistants to find the perfect tool for you.