In this blog post, I will explore the process of finetuning a language model using Low-Rank Adaptation (LoRA). I will cover everything from setting up the environment to training and evaluating the model on a dataset of quotes.
Setting Up the Environment
First, I need to install the necessary libraries.
!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.0
Loading the Model and Tokenizer
Next, I will load the model and tokenizer. I am using the AutoTokenizer
and AutoModelForCausalLM
from the Hugging Face transformers
library.
import os
import transformers
import torch
from datasets import load_dataset
from google.colab import userdata
from trl import SFTTrainer
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer
from transformers import BitsAndBytesConfig, GemmaTokenizer
model_id = "google/gemma-2b"
tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN'])
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0}, token=os.environ['HF_TOKEN'])
Quantization Configuration
I will configure the model to use 4-bit quantization, which allows us to run larger models on smaller hardware.
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
Generating Text with the Model
Before we start training, let's generate some text to see how the model performs out of the box.
text = "Quote: add quote,"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_length=50, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Configuring LoRA
LoRA (Low-Rank Adaptation) helps to efficiently fine-tune models by adding trainable adaptation matrices.
os.environ["WANDB_DISABLED"] = "false"
lora_config = LoraConfig(
r=8,
target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
task_type="CAUSAL_LM",
)
Preparing the Dataset
I will use the datasets
library to load and preprocess the dataset of quotes.
data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)
def formatting_func(example):
text = f"Quote: {example['quote'][0]}\nAuthor: {example['author'][0]}"
return [text]
Training the Model
I use the SFTTrainer
from the trl
library to train the model with the LoRA configuration.
trainer = SFTTrainer(
model=model,
train_dataset=data["train"],
args=transformers.TrainingArguments(
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
warmup_steps=200,
max_steps=100,
learning_rate=2e-4,
fp16=True,
logging_steps=1,
output_dir="outputs",
optim="paged_adamw_8bit"
),
peft_config=lora_config,
formatting_func=formatting_func,
)
trainer.train()
Evaluating the Model
After training, I can generate text again to see how the model's performance has improved.
text = "Quote: add quote,"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_length=50, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Conclusion
I walked through the process of finetuning a language model using LoRA. This method allows for efficient training by focusing on specific layers of the model. I demonstrated how to set up the environment, configure the model, prepare the dataset, and train the model. With this approach, you can adapt large language models to specific tasks with limited computational resources.