In this tutorial, I will walk through the steps to fine-tune LLaMA2 using the Hugging Face Transformers library, along with LoRA (Low-Rank Adaptation) to make the process more efficient.
Setting Up the Environment
Start by setting up the environment and importing the necessary libraries.
import os
import torch
from datasets import load_dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
HfArgumentParser,
TrainingArguments,
pipeline,
logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer
Loading the Dataset
Load the dataset that will be used for fine-tuning.
dataset_name = "mlabonne/guanaco-llama2-1k"
dataset = load_dataset(dataset_name, split="train")
Configuring the Model for Fine-Tuning
BitsAndBytes Configuration
Configure the BitsAndBytes settings to enable 4-bit quantization for efficient training.
model_name = "NousResearch/Llama-2-7b-hf"
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
load_in_4bit=use_4bit,
bnb_4bit_quant_type=bnb_4bit_quant_type,
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=use_nested_quant,
)
Loading the Base Model and Tokenizer
Load the base LLaMA model and tokenizer.
device_map = {"": 0}
# Load base model
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1
# Load LLaMA tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
Applying LoRA Configuration
Configure LoRA for parameter-efficient fine-tuning.
lora_r = 64
lora_alpha = 16
lora_dropout = 0.1
peft_config = LoraConfig(
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
r=lora_r,
bias="none",
task_type="CAUSAL_LM",
)
Defining Training Arguments
Set up the training arguments for the fine-tuning process.
output_dir = "./results"
num_train_epochs = 1
fp16 = False
bf16 = True
per_device_train_batch_size = 4
gradient_accumulation_steps = 1
save_steps = 0
logging_steps = 25
learning_rate = 2e-4
weight_decay = 0.001
optim = "paged_adamw_32bit"
lr_scheduler_type = "cosine"
max_steps = -1
warmup_ratio = 0.03
group_by_length = True
training_arguments = TrainingArguments(
output_dir=output_dir,
num_train_epochs=num_train_epochs,
per_device_train_batch_size=per_device_train_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
optim=optim,
save_steps=save_steps,
logging_steps=logging_steps,
learning_rate=learning_rate,
weight_decay=weight_decay,
fp16=fp16,
bf16=bf16,
max_grad_norm=0.3,
max_steps=max_steps,
warmup_ratio=warmup_ratio,
group_by_length=group_by_length,
lr_scheduler_type=lr_scheduler_type,
report_to="tensorboard"
)
Fine-Tuning the Model
Set up the trainer and start the fine-tuning process.
max_seq_length = None
packing = False
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=training_arguments,
packing=packing,
)
trainer.train()
Saving the Fine-Tuned Model
Save the fine-tuned model for future use.
new_model = "Llama-2-7b-chat-finetune"
trainer.model.save_pretrained(new_model)
Testing the Fine-Tuned Model
Generate text using the fine-tuned model to verify its performance.
logging.set_verbosity(logging.CRITICAL)
prompt = "How to fly a plane?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])
Conclusion
I have successfully fine-tuned the LLaMA2 model using LoRA and the Hugging Face Transformers library. This process enables efficient model adaptation even with limited computational resources.