Starting with Large Language Models (LLMs) using Hugging Face and PyTorch
Getting started with AI can seem overwhelming, but Hugging Face makes it way easier. With their library, you can quickly load powerful language models, like GPT, and fine-tune them for your own projects—even if you’re new to the field. In this post, I’ll walk you through the basics of using Hugging Face so you can dive into AI with confidence.
An LLM (Large Language Model) is an advanced AI model that understands and generates human-like text. Models like GPT and BERT are trained on extensive text data, enabling them to perform various language tasks, such as answering questions, translating languages, and writing code. Hugging Face makes it easy to work with LLMs by offering pre-trained models that you can use right away or fine-tune for your specific projects.
Set up a Hugging Face account:
Hugging Face is a great starting point because it simplifies working with large language models. It offers thousands of ready-to-use models, an easy-to-install library, and tools that streamline the process of loading, fine-tuning, and deploying models—all without needing deep AI knowledge.
Go to the website at https://huggingface.co/ and create an account. You will get access to their Model Hub, Datasets, and Spaces.
Install the Hugging Face Transformers Library and PyTorch:
The Hugging Face Transformers library makes it simple to load, use, and fine-tune powerful language models like GPT and BERT. It connects you with thousands of pre-trained models directly from the Hugging Face Model Hub, ready for quick integration into your projects. With built-in support for PyTorch and TensorFlow, it’s an essential toolkit for efficient, streamlined AI development.
PyTorch and TensorFlow are the two most popular deep-learning frameworks. They help developers build, train, and deploy machine learning and AI models. PyTorch is great because it is very ‘Pythonic’ where it feels like working with regular Python code which makes debugging and getting started easy. TensorFlow is a lot more scalable (better for larger applications and complex models) but has a steeper learning curve compared to PyTorch.
Now we want to install transformers with: sudo pip install transformers
and install PyTorch with sudo pip install torch
Selecting a Model:
Now head to the Model Hub: https://huggingface.co/models and choose a pre-trained model such as GPT-2 for text generation or BERT for understanding-based tasks (I will be choosing GPT-2). Copy the model name or link for use in your code: openai-community/gpt2
https://huggingface.co/openai-community/gpt2
Load and Use The Model:
Now we can create a basic code snippet to get started with a text generation model (GPT-2). This script begins with importing the necessary classes: GPT2LMHeadModel
and GPT2Tokenizer
, to handle text generation and tokenization. The script specifies the pre-trained model name (gpt2
) and loads both the tokenizer and the mode lassociated with that name.
A function called generate_text
which encodes an input prompt into token IDs, generates a continuation of that text with a specified maximum length, and decodes the output back into readable text. The script includes a main guard that sets a sample prompt (“Once upon a time”), calls the text generation function, and prints the result. This simple setup allows users to experiment with text generation by modifying the prompt and model parameters as needed.
To be more effective about this too, we can add at the top of the program #!/usr/bin/env python3
to specify the interpreter at the top so when we do chmod +x text.py
to make this script executable we’re able to then run by default with Python3 which makes our work easier on other people testing, and ourselves. Here we see the random text generation though.
So let’s update our code to take in user inputs by importing the argparse
library and create a means to catch some errors and return to a ‘help’ menu of sorts. This way we can start asking the AI random stuff and see what it says back to us. Below is the updated code:
Now we can receive more random inputs and in a more interactive way, with “error handling” of sorts.
Building a Simple Text Generator with GPT-2 and PyTorch
The interesting thing about Hugging Face models like GPT-2 is it already uses PyTorch under the hood, so what we’ve done is already using PyTorch. We can make it more obvious by making our script interact more directly with PyTorch features such as modifying the model structure or adjusting the processing pipeline.
With this script, we’ll show how to use GPT-2, one of the popular language models, to generate text based on a prompt. Hugging Face's transformers
library makes it simple, and we'll make use of PyTorch under the hood for processing. In this version, we also fine-tune the generation parameters to improve the quality of the generated text and use the GPU if available.
Step 1: Import Libraries and Set Up Logging
import argparse import torch import logging from transformers import GPT2LMHeadModel, GPT2Tokenizer logging.basicConfig(level=logging.ERROR)
Here, we import:
argparse: To handle command-line arguments (like setting the prompt text and length).
torch: PyTorch's core library, which powers many AI models and allows us to move models to the GPU.
logging: To suppress all but error-level messages from Hugging Face, making our script output cleaner.
Step 2: Load the Model and Tokenizer
model_name = "gpt2" tokenizer = GPT2Tokenizer.from_pretrained(model_name) model = GPT2LMHeadModel.from_pretrained(model_name)
The model_name
specifies which version of GPT-2 we want to use. We load two components from Hugging Face:
tokenizer: Converts text into tokens (IDs) that the model understands, and back to text.
model: The actual GPT-2 model that generates text.
Step 3: Move the Model to GPU or CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device)
This checks if a GPU is available with torch.cuda.is_available()
. If it is, the model is moved to the GPU for faster processing, otherwise it stays on the CPU.
Step 4: Define the Text Generation Function
def generate_text(tokenizer, prompt, max_length=100): inputs = tokenizer.encode(prompt, return_tensors="pt").to(device) outputs = model.generate( inputs, max_length=max_length, num_return_sequences=1, temperature=0.9, top_k=50, top_p=0.9, repetition_penalty=1.2, do_sample=True, pad_token_id=tokenizer.eos_token_id ) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) return generated_text
This function is where the magic happens:
Encode the Prompt:
tokenizer.encode()
converts the input text (our prompt) into tokens, which are then transformed into PyTorch tensors and moved to the device (GPU or CPU).Generate Text:
model.generate()
generates the continuation of our input text. Let’s break down the parameters we set:max_length: Limits the output length.
temperature: Adds randomness to the choices the model makes (higher value = more random).
top_k and top_p: Control the diversity of the generated text by limiting which words are considered at each step.
repetition_penalty: Reduces repeated phrases.
do_sample: Enables sampling to make the generation non-deterministic.
pad_token_id: Ensures the model uses a padding token, which can prevent some warnings.
Decode the Output:
tokenizer.decode()
converts the generated tokens back into human-readable text.
Step 5: Main Function for Command-line Arguments
def main(): parser = argparse.ArgumentParser(description="Generate text using a pre-trained GPT-2 model.") parser.add_argument("prompt", type=str, help="Enter the prompt you want the model to generate text from.") parser.add_argument("--max_length", type=int, default=100, help="Maximum length of the generated text (default: 100 characters).") args = parser.parse_args() generated_text = generate_text(tokenizer, args.prompt, args.max_length) print("\nGenerated Text:\n", generated_text)
This part lets us use command-line arguments to set the prompt text and the max length for generation:
prompt: The text the model will complete.
max_length: Optional; controls the length of the generated text.
Running the Script
To run the script, navigate to the folder where it’s saved and use this command: ./text2.py "Hello, world!" --max_length 150
This generates a response based on the prompt "Hello, world!"
and prints the generated text in the terminal. If you have a GPU, PyTorch will automatically use it, making the process much faster!
We’re definitely getting some interesting outputs now, but more explicitly taking advantage of PyTorch’s functionalities. Keep in mind this is not the most sophisticated model and I have never professionally worked with AI. This is the first time I’ve started experimenting with LLMs and starting with Hugging Face was a great way to do so. You can access the source code for this whacky AI production here: https://github.com/CommodoreAlex/Python/blob/master/Hugs.py
Now you’ve got the basics down for building a simple text generator with GPT-2, PyTorch, and Hugging Face. From loading the model to customizing the way it responds to your prompts, you’re set to start exploring what AI text generation can really do. Try playing around with settings like temperature
or top_k
—you’ll see firsthand how little changes can make your model’s responses more creative or focused.
Whether you’re into chatbots, content creation, or just curious to see what AI can do, this is a great starting point. The world of AI is massive, and there’s always something cool to learn or build, so don’t be afraid to experiment! Dive in, tweak those prompts, and let your creativity run wild. You’re just scratching the surface of what’s possible here. Happy experimenting, and enjoy the journey!