Fine-Tuning LLaMA 3.1 with Unsloth

Introduction to LLM Inference

In this post, I will walk through how to fine-tune Meta's LLaMA 3.1 8B model using Unsloth, a library optimized for efficient LLM training. I will cover everything from installing dependencies to training and saving the fine-tuned model.

1. Setting Up the Environment

Before we begin fine-tuning, we need to install the required packages:

%%capture
!pip install unsloth
!pip install datasets
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

unsloth is the primary library used for loading and fine-tuning the LLaMA model efficiently.
datasets helps us handle and preprocess text datasets.
We uninstall and reinstall unsloth from its latest GitHub version to ensure we have the newest features and bug fixes.

2. Model Configuration and Loading

from unsloth import FastLanguageModel
import torch

# Configuration
max_seq_length = 8192  # Setting the context length to 8192
dtype = torch.bfloat16  
load_in_4bit = False 

model_name = "unsloth/Meta-Llama-3.1-8B"

# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

We define a sequence length of 8192, which means the model can process long-context data.
dtype = torch.bfloat16 sets bfloat16 as the precision type, reducing memory usage.
The model is loaded using FastLanguageModel.from_pretrained(), which fetches Meta LLaMA 3.1 8B.

3. Applying LoRA for Efficient Fine-Tuning

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, 
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0, 
    bias = "none",    
    use_gradient_checkpointing = "unsloth", 
    random_state = 3407,
    use_rslora = False,  
    loftq_config = None, 
)

We apply LoRA (Low-Rank Adaptation) to reduce the number of trainable parameters:

r = 16: The rank of LoRA updates (trade-off between memory and adaptability).
lora_alpha = 16: A scaling factor for LoRA layers.
use_gradient_checkpointing = "unsloth": Reduces memory usage during training.
This allows efficient fine-tuning without modifying the entire model.

4. Loading and Preprocessing the Dataset

from datasets import Dataset

# Load dataset from JSON file
dataset = Dataset.from_json("dataset.json")

5. Formatting the Dataset

custom_prompt = """Below is a prompt and its corresponding response. Write a completion that adheres to the response.

### Prompt:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token  

def formatting_prompts_func(examples):
    prompts = examples["prompt"]
    responses = examples["response"]
    texts = []
    for prompt, response in zip(prompts, responses):
        text = custom_prompt.format(prompt, response) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}

dataset = dataset.map(formatting_prompts_func, batched=True)

We format the dataset into a prompt-response structure, adding an EOS_TOKEN at the end to indicate completion.
This function ensures that our training data follows a structured format for proper fine-tuning.

6. Tokenizing the Dataset

def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=8192)

tokenized_dataset = dataset.map(tokenize_function, batched=True)

# Save tokenized dataset for fine-tuning
tokenized_dataset.save_to_disk("tokenized_dataset")

print("Dataset preprocessing complete. Ready for fine-tuning!")

The function tokenizes our formatted dataset, ensuring each sample fits within the 8192-token limit.
We truncate longer inputs and pad shorter ones to maintain uniformity.
The dataset is then saved for training

7. Fine-Tuning the Model

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=tokenized_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,  
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        num_train_epochs=2,  
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to="none",  
    ),
)
trainer_stats = trainer.train()

We use SFTTrainer (Supervised Fine-Tuning) from trl to manage training.
gradient_accumulation_steps=4 helps optimize memory usage.
learning_rate=2e-4 sets the learning rate for gradual updates.
The optimizer adamw_8bit is used for efficiency.
The model is trained for 2 epochs with a batch size of 2 per GPU.

8. Saving the Fine-Tuned Model

model.save_pretrained_merged("model", tokenizer, save_method="merged_16bit")

Saves the fine-tuned model in a 16-bit format to optimize storage.
The model is now ready for inference and further deployment.

Conclusion

Fine-tuning LLaMA 3.1 with Unsloth offers a powerful and memory-efficient way to adapt LLMs for custom use cases. By using LoRA, structured dataset preparation, and an optimized training approach, we can achieve high-quality results with limited resources.

Fine-Tuning LLaMA 3.1 with Unsloth

Introduction to LLM Inference

1. Setting Up the Environment

2. Model Configuration and Loading

3. Applying LoRA for Efficient Fine-Tuning

4. Loading and Preprocessing the Dataset

5. Formatting the Dataset

6. Tokenizing the Dataset

7. Fine-Tuning the Model

8. Saving the Fine-Tuned Model

Conclusion

Post In:

0 results found in this keyword

Fine-Tuning LLaMA 3.1 with Unsloth

Introduction to LLM Inference

1. Setting Up the Environment

2. Model Configuration and Loading

3. Applying LoRA for Efficient Fine-Tuning

4. Loading and Preprocessing the Dataset

5. Formatting the Dataset

6. Tokenizing the Dataset

7. Fine-Tuning the Model

8. Saving the Fine-Tuned Model

Conclusion

Share:

Post In:

Graduation Planning

xxx

0 results found in this keyword