Fine-Tuning Bloom AI Model
- 3 min read

Fine-Tuning Bloom AI Model

On this page
Introduction

In this blog post, I will explore how to fine-tune a large language model using LoRA (Low-Rank Adaptation). I will use the bloom-3b model from Hugging Face and perform fine-tuning on the SQuAD v2 dataset.

Setup and Installation

First, I need to install the necessary libraries. This includes bitsandbytes, datasets, accelerate, loralib, and peft.

!pip install -q bitsandbytes datasets accelerate loralib
!pip install -q git+https://github.com/huggingface/peft.git
!pip install -q git+https://github.com/huggingface/transformers.git
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

Model Preparation

Load the bloom-3b model and tokenizer, and prepare the model for fine-tuning.

import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "bigscience/bloom-3b",
    torch_dtype=torch.float16,
    device_map='auto'
)

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-3b")
for param in model.parameters():
    param.requires_grad = False
    if param.ndim == 1:
        param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()
model.enable_input_require_grads()

class CastOutputFloat(nn.Sequential):
    def forward(self, x): 
        return super().forward(x).to(torch.float32)

model.lm_head = CastOutputFloat(model.lm_head)

LoRA Configuration

Configure the model to use LoRA for fine-tuning. This involves setting up a LoRA configuration and applying it to the model.

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, config)

def print_trainable_parameters(model):
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(f"trainable parameters: {trainable_params} || all parameters: {all_param} || percentage: {trainable_params/all_param*100:.2f}%")

print_trainable_parameters(model)

Dataset Preparation

Load the SQuAD v2 dataset and preprocess it for training.

from datasets import load_dataset

qa_dataset = load_dataset("squad_v2")

def create_prompt(context, question, answer):
    if len(answer["text"]) < 1:
        answer_text = "Cannot answer"
    else:
        answer_text = answer["text"][0]
    prompt_template = f"### CONTEXT\n{context}\n\n### QUESTION\n{question}\n\n### ANSWER\n{answer_text}</s>"
    return prompt_template

mapped_qa_dataset = qa_dataset.map(
    lambda samples: tokenizer(
        create_prompt(samples['context'], samples['question'], samples['answers'])))

Training the Model

Set up the training arguments and train the model using the Trainer class from Hugging Face Transformers.

import transformers

trainer = transformers.Trainer(
    model=model,
    train_dataset=mapped_qa_dataset["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=100,
        max_steps=100,
        learning_rate=1e-3,
        fp16=True,
        logging_steps=1,
        output_dir='outputs',
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

model.config.use_cache = False
trainer.train()

Model Deployment

Login to Hugging Face and push the fine-tuned model to the Hugging Face Hub.

HUGGING_FACE_USER_NAME = "Mohammedxo51"

from huggingface_hub import notebook_login
notebook_login()

model_name = "squad-bloom-3b"

model.push_to_hub(f"{HUGGING_FACE_USER_NAME}/{model_name}", use_auth_token=True)

Inference

Load the fine-tuned model and tokenizer, and perform inference to answer questions based on provided context.

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = f"{HUGGING_FACE_USER_NAME}/{model_name}"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=False, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

qa_model = PeftModel.from_pretrained(model, peft_model_id)

from IPython.display import display, Markdown

def make_inference(context, question):
    prompt = f"### CONTEXT\n{context}\n\n### QUESTION\n{question}\n\n### ANSWER\n"
    inputs = tokenizer(prompt, return_tensors="pt")
    inputs = {k: v.to(qa_model.device) for k, v in inputs.items()}

    with torch.cuda.amp.autocast():
        output_tokens = qa_model.generate(**inputs, max_new_tokens=200, pad_token_id=tokenizer.eos_token_id)

    answer = tokenizer.decode(output_tokens[0], skip_special_tokens=True)
    display(Markdown(answer))

context = "Some context"
question = "A question about the context?"
make_inference(context, question)

Conclusion

In this blog post, I covered the steps to fine-tune a large language model using LoRA. I demonstrated how to set up the environment, prepare the model, configure LoRA, preprocess the dataset, train the model, deploy it to the Hugging Face Hub, and perform inference. This approach allows for efficient fine-tuning with a significantly reduced number of trainable parameters.