Fine-Tuning Gemma Google
- 2 min read

Fine-Tuning Gemma Google

On this page
Introduction

In this blog post, I will explore the process of finetuning a language model using Low-Rank Adaptation (LoRA). I will cover everything from setting up the environment to training and evaluating the model on a dataset of quotes.

Setting Up the Environment

First, I need to install the necessary libraries.

!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.0

Loading the Model and Tokenizer

Next, I will load the model and tokenizer. I am using the AutoTokenizer and AutoModelForCausalLM from the Hugging Face transformers library.

import os
import transformers
import torch
from datasets import load_dataset
from google.colab import userdata
from trl import SFTTrainer
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer
from transformers import BitsAndBytesConfig, GemmaTokenizer

model_id = "google/gemma-2b"
tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN'])
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0}, token=os.environ['HF_TOKEN'])

Quantization Configuration

I will configure the model to use 4-bit quantization, which allows us to run larger models on smaller hardware.

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

Generating Text with the Model

Before we start training, let's generate some text to see how the model performs out of the box.

text = "Quote: add quote,"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_length=50, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Configuring LoRA

LoRA (Low-Rank Adaptation) helps to efficiently fine-tune models by adding trainable adaptation matrices.

os.environ["WANDB_DISABLED"] = "false"
lora_config = LoraConfig(
    r=8,
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
    task_type="CAUSAL_LM",
)

Preparing the Dataset

I will use the datasets library to load and preprocess the dataset of quotes.

data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)

def formatting_func(example):
    text = f"Quote: {example['quote'][0]}\nAuthor: {example['author'][0]}"
    return [text]

Training the Model

I use the SFTTrainer from the trl library to train the model with the LoRA configuration.

trainer = SFTTrainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=200,
        max_steps=100,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    formatting_func=formatting_func,
)

trainer.train()

Evaluating the Model

After training, I can generate text again to see how the model's performance has improved.

text = "Quote: add quote,"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_length=50, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Conclusion

I walked through the process of finetuning a language model using LoRA. This method allows for efficient training by focusing on specific layers of the model. I demonstrated how to set up the environment, configure the model, prepare the dataset, and train the model. With this approach, you can adapt large language models to specific tasks with limited computational resources.