Fine-Tuning Gemma Google

In this blog post, I will explore the process of finetuning a language model using Low-Rank Adaptation (LoRA). I will cover everything from setting up the environment to training and evaluating the model on a dataset of quotes.

Setting Up the Environment

First, I need to install the necessary libraries.

!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.0

Loading the Model and Tokenizer

Next, I will load the model and tokenizer. I am using the AutoTokenizer and AutoModelForCausalLM from the Hugging Face transformers library.

import os
import transformers
import torch
from datasets import load_dataset
from google.colab import userdata
from trl import SFTTrainer
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer
from transformers import BitsAndBytesConfig, GemmaTokenizer

model_id = "google/gemma-2b"
tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN'])
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0}, token=os.environ['HF_TOKEN'])

Quantization Configuration

I will configure the model to use 4-bit quantization, which allows us to run larger models on smaller hardware.

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

Generating Text with the Model

Before we start training, let's generate some text to see how the model performs out of the box.

text = "Quote: add quote,"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_length=50, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Configuring LoRA

LoRA (Low-Rank Adaptation) helps to efficiently fine-tune models by adding trainable adaptation matrices.

os.environ["WANDB_DISABLED"] = "false"
lora_config = LoraConfig(
    r=8,
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
    task_type="CAUSAL_LM",
)

Preparing the Dataset

I will use the datasets library to load and preprocess the dataset of quotes.

data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)

def formatting_func(example):
    text = f"Quote: {example['quote'][0]}\nAuthor: {example['author'][0]}"
    return [text]

Training the Model

I use the SFTTrainer from the trl library to train the model with the LoRA configuration.

trainer = SFTTrainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=200,
        max_steps=100,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    formatting_func=formatting_func,
)

trainer.train()

Evaluating the Model

After training, I can generate text again to see how the model's performance has improved.

text = "Quote: add quote,"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_length=50, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Conclusion

I walked through the process of finetuning a language model using LoRA. This method allows for efficient training by focusing on specific layers of the model. I demonstrated how to set up the environment, configure the model, prepare the dataset, and train the model. With this approach, you can adapt large language models to specific tasks with limited computational resources.

Fine-Tuning Gemma Google

Setting Up the Environment

Loading the Model and Tokenizer

Quantization Configuration

Generating Text with the Model

Configuring LoRA

Preparing the Dataset

Training the Model

Evaluating the Model

Conclusion

Post In:

0 results found in this keyword

Fine-Tuning Gemma Google

Setting Up the Environment

Loading the Model and Tokenizer

Quantization Configuration

Generating Text with the Model

Configuring LoRA

Preparing the Dataset

Training the Model

Evaluating the Model

Conclusion

Share:

Post In:

Fine-Tuning Llama2

AI-Powered IDS

You might also like

KV Cache in LLMs

RAG Using Llama3

AI-Powered IDS

Fine-Tuning Llama2

0 results found in this keyword