How to summarize a text using the Chain of Density prompting ?
This article explains how to summarize a text step-by-step using the chain of density prompt using Python and large language models.
You'll find all the code in the py-llm-core
jupyter notebook here.
What is chain of density prompting ?
The Chain of Density (CoD) prompting is used to generate summaries that start off entity-sparse and become increasingly entity-dense.
Summaries are generated by iteratively incorporating missing entities from the source text without increasing the length.
The chain of density prompt enables the generation of summaries with varying levels of information density.
CoD summaries exhibit more fusion and less lead bias compared to GPT-4 summaries with a vanilla prompt. The research showed that human preferences favor summaries that are more dense.
Implement Chain of Density with minimum dependencies
For this tutorial, we'll apply this technique over a fairly complex topic : A Path Towards Autonomous Machine Intelligence, a position paper from LeCun (2022).
We chose this paper for the following reasons :
- The paper topic is fairly complex
- The paper is filled with rich information
- The length (60 pages / 38 k tokens) makes it hard to process in a single shot
Prerequisites
To implement the chain of density prompting, we'll use the py-llm-core Python library.
Create a new virtual environment and install the library (you'll need an OpenAI API key to use GPT models).
python3 -m venv venv
source venv/bin/activate
pip3 install py-llm-core
pip3 install PyPDF2
#: if your use OpenAI models
export OPENAI_API_KEY=sk-<replace with your actual api key>
If you need to use another model, you can choose from any supported model by the llama.cpp library. Download the GGUF file as described in the py-llm-core library.
Overview
We'll cover the following steps:
- Convert the PDF into a text string
- Create chunks that GPT-4 can digest
- Perform the CoD
Converting the PDF file into a text
import PyPDF2
# Open the PDF file
with open('data/A Path Towards Autonomous Machine Intelligence.pdf', 'rb') as file:
pdf_reader = PyPDF2.PdfReader(file)
# Extract the text from the PDF
pages = []
for page in pdf_reader.pages:
pages.append(page.extract_text())
text = ''.join(pages)
Cleanup unicode characters from the PDF
import unicodedata
def cleanup_unicode(text):
corrected_chars = []
for char in text:
corrected_char = unicodedata.normalize("NFKC", char)
corrected_chars.append(corrected_char)
return "".join(corrected_chars)
text = cleanup_unicode(text)
print(text[:1000])
Split the text to fit the window size of the model
The whole document may not fit in the context window of the model.
We'll use the GPT-4 and llama-3-8B: They both have a context window size of 8 000 tokens. It means that we'll have to perform several summaries and then concatenate the results.
Note: We cannot create chunks of 8 000 tokens as the context window size serves for both the prompt (input) and the completion (output) tokens.
from llm_core.splitters import TokenSplitter
# 75% input tokens / max 25% output tokens
splitter = TokenSplitter(chunk_size=6_000, chunk_overlap=0)
chunks = list(splitter.chunkify(text))
Implementing the Chain of Density prompting
The library py-llm-core
allows the creation of structured outputs (i.e. JSON mode - see Generate JSON with Mistral AI Instruct model).
Data classes are created with annotations to help large language models infer the data structure.
from dataclasses import dataclass
from llm_core.assistants import OpenAIAssistant, LLaMACPPAssistant
@dataclass
class DenseSummary:
denser_summary: str
missing_entities: list[str]
@dataclass
class DenserSummaryCollection:
system_prompt = """
You are an expert in writing rich and dense summaries in broad domains.
"""
prompt = """
Article:
{article}
----
You will generate increasingly concise, entity-dense summaries of the
above Article.
Repeat the following 2 steps 5 times.
- Step 1: Identify 1-3 informative Entities from the Article
which are missing from the previously generated summary and are the most
relevant.
- Step 2: Write a new, denser summary of identical length which covers
every entity and detail from the previous summary plus the missing
entities.
A Missing Entity is:
- Relevant: to the main story
- Specific: descriptive yet concise (5 words or fewer)
- Novel: not in the previous summary
- Faithful: present in the Article
- Anywhere: located anywhere in the Article
Guidelines:
- The first summary should be long (4-5 sentences, approx. 80 words) yet
highly non-specific, containing little information beyond the entities
marked as missing.
- Use overly verbose language and fillers (e.g. "this article discusses")
to reach approx. 80 words.
- Make every word count: re-write the previous summary to improve flow and
make space for additional entities.
- Make space with fusion, compression, and removal of uninformative
phrases like "the article discusses"
- The summaries should become highly dense and concise yet
self-contained, e.g., easily understood without the Article.
- Missing entities can appear anywhere in the new summary.
- Never drop entities from the previous summary. If space cannot be made,
add fewer new entities.
> Remember to use the exact same number of words for each summary.
Answer in JSON.
> The JSON in `summaries_per_step` should be a list (length 5) of
dictionaries whose keys are "missing_entities" and "denser_summary".
"""
summaries: list[DenseSummary]
@classmethod
def summarize(cls, article):
with OpenAIAssistant(cls, model='gpt-4') as assistant:
return assistant.process(article=article)
Use llama-3 or any other llama.cpp compatible model
For an open weight model, replace the summarize method as:
# Download file https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q4_K_M.gguf?download=true
# move it to "~/.cache/py-llm-core/models/"
# rename to "llama-3-q4"
@classmethod
def summarize(cls, article):
with LLaMACPPAssistant(cls, model='llama-3-q4') as assistant:
return assistant.process(article=article)
This code defines a Python dataclass DenseSummary
and DenserSummaryCollection
to structure the data for a task that involves generating increasingly concise, entity-dense summaries of a given article.
Here's a breakdown of the code:
DenseSummary
: This dataclass is used to store a denser summary and the missing entities from the summaryDenserSummaryCollection
: This dataclass is used to store a collection ofDenseSummary
objects for each iteration.summarize
: This method uses theOpenAIAssistant
initialized with the class itself to process the article and generate the summary. It returns aDenserSummaryCollection
populated instance.
Now, we can print the results for each iteration ; note that iterations are in-context, these are not separate requests.
first_chunk = chunks[0]
summary_collection = DenserSummaryCollection.summarize(first_chunk)
print(len(summary_collection.summaries))
# Outputs "5"
print(summary_collection.summaries[0].missing_entities)
['Yann LeCun', 'autonomous intelligent agents', 'self-supervised learning']
print(summary_collection.summaries[0].denser_summary)
This article discusses a position paper by Yann LeCun, which proposes an
architecture for autonomous intelligent agents.
The paper combines concepts such as configurable predictive world models,
behavior driven through intrinsic motivation, and hierarchical joint embedding
architectures trained with self-supervised learning.
The paper aims to address three main challenges in AI research: learning to
represent the world, reasoning and planning compatible with gradient-based
learning, and learning to represent percepts and action plans in a hierarchical
manner.
>>> print(summary_collection.summaries[1].missing_entities)
['JEPA', 'Hierarchical JEPA', 'non-generative architectures']
>>> print(summary_collection.summaries[1].denser_summary)
Yann LeCun's position paper proposes an architecture for autonomous
intelligent agents, combining concepts like configurable predictive world
models and behavior driven by intrinsic motivation.
It addresses three AI research challenges: learning world representation,
reasoning and planning compatible with gradient-based learning, and
hierarchical representation of percepts and action plans.
The paper also introduces JEPA and Hierarchical JEPA,
non-generative architectures for predictive world models.
Perform the summarization of the whole document
Now that we processed the first chunk, we can do the same for all chunks.
def single_pass_summary_generator(chunks):
for chunk in chunks:
chunk_summaries = DenserSummaryCollection.summarize(chunk)
yield chunk_summaries.summaries[4].denser_summary
one_pass_summary = '\n'.join(single_pass_summary_generator(chunks))
Let's check the obtained length:
splitter.compute_token_count(one_pass_summary)
# outputs: 722
We can "concatenate" summaries into a single summary.
summary_collection = DenserSummaryCollection.summarize(one_pass_summary)
final_summary = summary_collection.summaries[4].denser_summary
print(final_summary)
The paper proposes a vision for autonomous agents, integrating predictive
world models, intrinsic motivation, hierarchical joint embedding architectures,
including JEPA-1 and JEPA-2, Energy-Based Models (EBMs), Joint Embedding
Architectures (JEAs), and Hierarchical JEPA.
It scrutinizes a Model-Predictive Control process, emphasizing optimization
strategies, and advocates for a multi-scale world model.
It investigates machine learning, emphasizing architectural diagrams, and
discusses the limitations of Large Language Models (LLMs) and reinforcement
learning.
Further notes and performance of CoD with smaller models
Performing several tests with the models from OpenAI gpt-3.5-turbo, gpt-3.5-turbo-16k and gpt-4 lead interesting results:
The GPT-3.5 models provide good results in the first summary but quickly lose track of entities. Furthermore, they clearly lack the ability to identify what's missing. Doing more tests, I realized that GPT-3.5 models are unable to produce intersections between sets.
I wouldn't recommend using smaller models like GPT-3.5.
On the contrary, the GPT-4 model performs very well with this chain of density prompting technique.
Example result from llama-3-8B
Running the exact same process with llama 3 8B model (quantized version q4) gave me the following summary:
An autonomous machine intelligence architecture is proposed combining
predictive world models with hierarchical joint embedding architectures.
JEPA combines perception, prediction, and action modules for hierarchical
planning. The system learns from observation and predicts future outcomes
using energy-based models.
The results are similar to GPT-3.5, the model is struggling to keep track of missing entities.
Conclusion and next steps
The GPT-4 Summarization with Chain of Density Prompting paper is quite interesting in its approach: iterating within the context window to produce a better result in zero-shot operations.
You might be interested in reading Mistral AI Instruct v0.1 model evaluation in real world use cases.
Read more...
Subscribe to get weekly tips (5 min read):