(NLP) revolutionized how we interact with technology.
Do you remember when chatbots first appeared and sounded like robots? Thankfully, that’s in the past!
Transformer models have waved their magic wand and reshaped NLP tasks. But before you drop this post thinking “Geez, transformers are way too dense to learn”, bear with me. We will not go into another technical article trying to teach you the math behind this amazing technology, but instead, we’re learning in practice what it can do for us.
With the Transformers Pipeline from Hugging Face, NLP tasks are easier than ever.
Let’s explore!
The Only Explanation About What a Transformer Is
Think of transformer models as the elite of the NLP world.
Transformers excel because of their ability to focus on various parts of an input sequence through a mechanism called “self-attention.”
Transformers are powerful due to “self-attention,” a feature that allows them to decide which specific parts of a sentence are the most important to focus on at any given time.
Ever heard of BERT, GPT, or RoBERTa? That’s them! BERT (Bidirectional Encoder Representations from Transformers) is a revolutionary Google AI language model from 2018 that understands text context by reading words from both left-to-right and right-to-left simultaneously.
Enough talk, let’s start diving into the transformers package [1].
Introduction to the Transformers Pipeline
The Transformers library offers a complete toolkit for training and running state-of-the-art pretrained models. The Pipeline class, which is our main subject, provides an easy-to-use interface for diverse tasks, e.g.:
- Text generation
- Image segmentation
- Speech recognition
- Document QA.
Preparation
Before starting, let’s run the basics and gather our tools. We’ll need Python, the transformers library, and maybe either PyTorch or TensorFlow. Installation is business-as-usual: pip install transformers.
IDEs like Anaconda or platforms like Google Colab already bring those as a standard installation. No trouble.
The Pipeline class allows you to execute many machine learning tasks using any model available on the Hugging Face Hub. It is as simple as plugging and playing.
While every task comes with a pre-configured default model and preprocessor, you can easily customize this by using the model parameter to swap in a different model of your choice.
Code
Let’s begin with the transformers 101 and see how it works before we get any deeper. The first task we will perform is a simple sentiment analysis on any given news headline.
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
classifier("Instagram wants to limit hashtag spam.")
The response is the following.
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu
[{'label': 'NEGATIVE', 'score': 0.988932728767395}]
Since we did not supply a model parameter, it went with the default option. As a classification, we got that the sentiment over this headline is 98% NEGATIVE. Additionally, we could have a list of sentences to classify, not just one.
Super easy, right? But that’s not just it. We can keep exploring other cool functionalities.
Zero-Shot Classification
A zero-shot classification means labelling a text that hasn’t been labelled yet. So we don’t have a clear pattern for that. All we need to do then is pass a few classes for the model to choose one. This can be very useful when creating training datasets for machine learning.
This time, we are feeding the method with the model argument and a list of sentences to classify.
classifier = pipeline("zero-shot-classification", model = 'facebook/bart-large-mnli')
classifier(
["Inter Miami wins the MLS", "Match tonight betwee Chiefs vs. Patriots", "Michael Jordan plans to sell Charlotte Hornets"],
candidate_labels=["soccer", "football", "basketball"]
)
[{'sequence': 'Inter Miami wins the MLS',
'labels': ['soccer', 'football', 'basketball'],
'scores': [0.9162040948867798, 0.07244189083576202, 0.011354007758200169]},
{'sequence': 'Match tonight betwee Chiefs vs. Patriots',
'labels': ['football', 'basketball', 'soccer'],
'scores': [0.9281435608863831, 0.0391676239669323, 0.032688744366168976]},
{'sequence': 'Michael Jordan plans to sell Charlotte Hornets',
'labels': ['basketball', 'football', 'soccer'],
'scores': [0.9859175682067871, 0.009983371943235397, 0.004099058918654919]}]
It looks like the model did a great job labelling these sentences!
Text Generarion
The package can also generate text. This is a good way of creating a nice little story generator to tell our kids before bedtime. We are increasing the temperature parameter to make the model more creative.
generator = pipeline("text-generation", temperature=0.8)
generator("Once upon a time, in a land where the King Pineapple was")
[{'generated_text':
"Once upon a time, in a land where the King Pineapple was a common
crop, the Queen of the North had lived in a small village. The Queen had always
lived in a small village, and her daughter, who was also the daughter of the Queen,
had lived in a larger village. The royal family would come to the Queen's village,
and then the Queen would return to her castle and live there with her daughters.
In the middle of the night, she would lay down on the royal bed and kiss the princess
at least once, and then she would return to her castle to live there with her men.
In the daytime, however, the Queen would be gone forever, and her mother would be alone.
The reason for this disappearance, in the form of the Great Northern Passage
and the Great Northern Passage, was the royal family had always wanted to take
the place of the Queen. In the end, they took the place of the Queen, and went
with their daughter to meet the King. At that time, the King was the only person
on the island who had ever heard of the Great Northern Passage, and his return was
in the past.
After Queen Elizabeth's death, the royal family went to the
Great Northern Passage, to seek out the Princess of England and put her there.
The Princess of England had been in"}]
Name and Entity Recognition
This task can recognize person (PER), location (LOC), or entity (ORG) in a given text. That is great for creating quick marketing lists of lead names , for example.
ner = pipeline("ner", grouped_entities=True)
ner("The man landed on the moon in 1969. Neil Armstrong was the first man to step on the Moon's surface. He was a NASA Astronaut.")
[{'entity_group': 'PER', 'score': np.float32(0.99960065),'word': 'Neil Armstrong',
'start': 36, 'end': 50},
{'entity_group': 'LOC', 'score': np.float32(0.82190216), 'word': 'Moon',
'start': 84, 'end': 88},
{'entity_group': 'ORG', 'score': np.float32(0.9842771), 'word': 'NASA',
'start': 109, 'end': 113},
{'entity_group': 'MISC', 'score': np.float32(0.8394754), 'word': 'As',
'start': 114, 'end': 116}]
Summarization
Possibly one of the most used tasks, the summarization let’s us reduce a text, keeping its essence and important pieces. Let’s summarize this Wikipedia page about Transformers.
summarizer = pipeline("summarization")
summarizer("""
In deep learning, the transformer is an artificial neural network architecture based
on the multi-head attention mechanism, in which text is converted to numerical
representations called tokens, and each token is converted into a vector via lookup
from a word embedding table.[1] At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished.
Transformers have the advantage of having no recurrent units, therefore requiring
less training time than earlier recurrent neural architectures (RNNs) such as long
short-term memory (LSTM).[2] Later variations have been widely adopted for training
large language models (LLMs) on large (language) datasets.[3]
""")
[{'summary_text':
' In deep learning, the transformer is an artificial neural network architecture
based on the multi-head attention mechanism . Transformerers have the advantage of
having no recurrent units, therefore requiring less training time than earlier
recurrent neural architectures (RNNs) such as long short-term memory (LSTM)'}]
Excellent!
Image Recognition
There are other, more complex tasks, such as image recognition. And just as easy to use as the other ones.
image_classifier = pipeline(
task="image-classification", model="google/vit-base-patch16-224"
)
result = image_classifier(
"https://images.unsplash.com/photo-1689009480504-6420452a7e8e?q=80&w=687&auto=format&fit=crop&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D"
)
print(result)

[{'label': 'Yorkshire terrier', 'score': 0.9792122840881348},
{'label': 'Australian terrier', 'score': 0.00648861238732934},
{'label': 'silky terrier, Sydney silky', 'score': 0.00571345305070281},
{'label': 'Norfolk terrier', 'score': 0.0013639888493344188},
{'label': 'Norwich terrier', 'score': 0.0010306559270247817}]
So, with these couple of examples, it is easy to see how simple it is to use the Transformers library to perform different tasks with very little code.
Wrapping Up
What if we wrap up our knowledge by applying it in a practical, small project?
Let us create a simple Streamlit app that can read a resumé and return the sentiment analysis and classify the tone of the text as ["Senior", "Junior", "Trainee", "Blue-collar", "White-collar", "Self-employed"]
In the next code:
- Import the packages
- Create Title and subtitle of the page
- Add a text input area
- Tokenize the text and split it in chunks for the transformer task. See the list of models [4].
import streamlit as st
import torch
from transformers import pipeline
from transformers import AutoTokenizer
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
st.title("Resumé Sentiment Analysis")
st.caption("Checking the sentiment and language tone of your resume")
# Add input text area
text = st.text_area("Enter your resume text here")
# 1. Load your desired tokenizer
model_checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
# 2. Tokenize the text without padding or truncation
# We return tensors or lists to slice them manually
tokens = tokenizer(text, add_special_tokens=False, return_tensors="pt")["input_ids"][0]
# 3. Instantiate Text Splitter with Chunk Size of 500 words and Overlap of 100 words so that context is not lost
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
# 4. Split into chunks for efficient retrieval
chunks = text_splitter.split_documents(text)
# 5. Convert back to strings or add special tokens for model input
decoded_chunks = []
for chunk in chunks:
# This adds [CLS] and [SEP] and converts back to a format the model likes
final_input = tokenizer.prepare_for_model(chunk.tolist(), add_special_tokens=True)
decoded_chunks.append(tokenizer.decode(final_input['input_ids']))
st.write(f"Created {len(decoded_chunks)} chunks.")
Next, we will initiate the transformer’s pipeline to:
- Perform the sentiment analysis and return the confidence %.
- Classify the text tone and return the confidence %.
# Initialize sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
# Perform sentiment analysis
if st.button("Analyze"):
col1, col2 = st.columns(2)
with col1:
# Sentiment analysis
sentiment = sentiment_pipeline(decoded_chunks)[0]
st.write(f"Sentiment: {sentiment['label']}")
st.write(f"Confidence: {100*sentiment['score']:.1f}%")
with col2:
# Categorize tone
tone_pipeline = pipeline("zero-shot-classification", model = 'facebook/bart-large-mnli',
candidate_labels=["Senior", "Junior", "Trainee", "Blue-collar", "White-collar", "Self-employed"])
tone = tone_pipeline(decoded_chunks)[0]
st.write(f"Tone: {tone['labels'][0]}")
st.write(f"Confidence: {100*tone['scores'][0]:.1f}%")
Here’s the screenshot.

Before You Go
Hugging Face (HF) Transformers Pipelines are truly a game-changer for data practitioners. They provide an incredibly streamlined way to tackle complex machine learning tasks, like text generation or image segmentation, using just a few lines of code.
HF has already done the heavy lifting by wrapping sophisticated model logic into simple, intuitive methods.
This shifts the focus away from low-level coding and allows us to focus on what really matters: using our creativity to build impactful, real-world applications.
If you liked this content, find more about me in my website.
GitHub Repository
https://github.com/gurezende/Resume-Sentiment-Analysis
References
[1. Transformers package] https://huggingface.co/docs/transformers/index
[2. Transformers Pipelines] https://huggingface.co/docs/transformers/pipeline_tutorial
[3. Pipelines Examples] https://huggingface.co/learn/llm-course/chapter1/3#summarization
[3. HF Models] huggingface.co/models
Source link
#Hugging #Face #Transformers #Action #Learning #Leverage #NLP
























