What is BigScience BLOOM?
Overview of BigScience BLOOM
BigScience BLOOM is an autoregressive Large Language Model (LLM), trained to continue text from a prompt on vast amounts of text data using industrial-scale computational resources. As such, it is able to output coherent text in 46 languages and 13 programming languages that is hardly distinguishable from text written by humans. BLOOM can also be instructed to perform text tasks it hasn't been explicitly trained for, by casting them as text generation tasks.
Technical Specifications
- Model Architecture and Objective: Decoder-only architecture with layer normalization applied to word embeddings layer, ALiBI positional encodings, and GeLU activation functions.
- Compute Infrastructure: Trained on the Jean Zay Public Supercomputer, provided by the French government, with 384 A100 80GB GPUs and additional 32 A100 80GB GPUs in reserve.
Training
- Training Data: 1.6TB of pre-processed text, converted into 350B unique tokens, including 46 natural languages and 13 programming languages.
- Training Speed and Size: Training throughput of about 150 TFLOP per GPU per second, with a checkpoint size of 329GB for bf16 weights and 2.3TB for full checkpoint with optimizer states.
Environmental Impact
- Estimated carbon emissions: Forthcoming.
- Estimated electricity usage: Forthcoming.
Uses
- Intended Use: Enable public research on large language models (LLMs) for language generation or as a pretrained base model that can be further fine-tuned for specific tasks.
- Direct Use: Text generation, exploring characteristics of language generated by a language model.
- Downstream Use: Tasks that leverage language models, such as Information Extraction, Question Answering, Summarization.
Risks and Limitations
- Model may overrepresent some viewpoints and underrepresent others, contain stereotypes, contain personal information, generate hateful or discriminatory language, and make errors.
- Model outputs may not be appropriate for all settings, including sexual content.
Evaluation
- Metrics: Perplexity, Cross Entropy Loss, and multiple different metrics for specific tasks.
- Factors: Language, domain, demographic characteristics.
- Results: Zero-shot evaluations and train-time evaluation results.
Recommendations
- Indirect users should be made aware when the content they're working with is created by the LLM.
- Users should be aware of Risks and Limitations and include an appropriate age disclaimer or blocking interface as necessary.
- Models trained or finetuned downstream of BLOOM LM should include an updated Model Card.