Fine-Tuning Bloom AI Model

In this blog post, I will explore how to fine-tune a large language model using LoRA (Low-Rank Adaptation). I will use the bloom-3b model from Hugging Face and perform fine-tuning on the SQuAD v2 dataset.

Setup and Installation

First, I need to install the necessary libraries. This includes bitsandbytes, datasets, accelerate, loralib, and peft.

!pip install -q bitsandbytes datasets accelerate loralib
!pip install -q git+https://github.com/huggingface/peft.git
!pip install -q git+https://github.com/huggingface/transformers.git
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

Model Preparation

Load the bloom-3b model and tokenizer, and prepare the model for fine-tuning.

import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "bigscience/bloom-3b",
    torch_dtype=torch.float16,
    device_map='auto'
)

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-3b")
for param in model.parameters():
    param.requires_grad = False
    if param.ndim == 1:
        param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()
model.enable_input_require_grads()

class CastOutputFloat(nn.Sequential):
    def forward(self, x): 
        return super().forward(x).to(torch.float32)

model.lm_head = CastOutputFloat(model.lm_head)

LoRA Configuration

Configure the model to use LoRA for fine-tuning. This involves setting up a LoRA configuration and applying it to the model.

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, config)

def print_trainable_parameters(model):
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(f"trainable parameters: {trainable_params} || all parameters: {all_param} || percentage: {trainable_params/all_param*100:.2f}%")

print_trainable_parameters(model)

Dataset Preparation

Load the SQuAD v2 dataset and preprocess it for training.

from datasets import load_dataset

qa_dataset = load_dataset("squad_v2")

def create_prompt(context, question, answer):
    if len(answer["text"]) < 1:
        answer_text = "Cannot answer"
    else:
        answer_text = answer["text"][0]
    prompt_template = f"### CONTEXT\n{context}\n\n### QUESTION\n{question}\n\n### ANSWER\n{answer_text}</s>"
    return prompt_template

mapped_qa_dataset = qa_dataset.map(
    lambda samples: tokenizer(
        create_prompt(samples['context'], samples['question'], samples['answers'])))

Training the Model

Set up the training arguments and train the model using the Trainer class from Hugging Face Transformers.

import transformers

trainer = transformers.Trainer(
    model=model,
    train_dataset=mapped_qa_dataset["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=100,
        max_steps=100,
        learning_rate=1e-3,
        fp16=True,
        logging_steps=1,
        output_dir='outputs',
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

model.config.use_cache = False
trainer.train()

Model Deployment

HUGGING_FACE_USER_NAME = "Mohammedxo51"

from huggingface_hub import notebook_login
notebook_login()

model_name = "squad-bloom-3b"

model.push_to_hub(f"{HUGGING_FACE_USER_NAME}/{model_name}", use_auth_token=True)

Inference

Load the fine-tuned model and tokenizer, and perform inference to answer questions based on provided context.

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = f"{HUGGING_FACE_USER_NAME}/{model_name}"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=False, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

qa_model = PeftModel.from_pretrained(model, peft_model_id)

from IPython.display import display, Markdown

def make_inference(context, question):
    prompt = f"### CONTEXT\n{context}\n\n### QUESTION\n{question}\n\n### ANSWER\n"
    inputs = tokenizer(prompt, return_tensors="pt")
    inputs = {k: v.to(qa_model.device) for k, v in inputs.items()}

    with torch.cuda.amp.autocast():
        output_tokens = qa_model.generate(**inputs, max_new_tokens=200, pad_token_id=tokenizer.eos_token_id)

    answer = tokenizer.decode(output_tokens[0], skip_special_tokens=True)
    display(Markdown(answer))

context = "Some context"
question = "A question about the context?"
make_inference(context, question)

Conclusion

In this blog post, I covered the steps to fine-tune a large language model using LoRA. I demonstrated how to set up the environment, prepare the model, configure LoRA, preprocess the dataset, train the model, deploy it to the Hugging Face Hub, and perform inference. This approach allows for efficient fine-tuning with a significantly reduced number of trainable parameters.

Fine-Tuning Bloom AI Model

Setup and Installation

Model Preparation

LoRA Configuration

Dataset Preparation

Training the Model

Model Deployment

Inference

Conclusion

Post In:

0 results found in this keyword

Fine-Tuning Bloom AI Model

Setup and Installation

Model Preparation

LoRA Configuration

Dataset Preparation

Training the Model

Model Deployment

Inference

Conclusion

Share:

Post In:

Multi-Layer Perceptron (MLP)

Fine-Tuning Llama2

You might also like

KV Cache in LLMs

RAG Using Llama3

AI-Powered IDS

Fine-Tuning Gemma Google

0 results found in this keyword