How to summarize a text using the Chain of Density prompting ?
This article explains how to summarize a text step-by-step using the chain of density prompt using Python and large language models.
This article has been updated to be run with the latest models - especially with larger context window (128 000 tokens) - such as GPT4-o and Llama 3.1.
You'll find all the code in the py-llm-core
jupyter notebook here.
What is chain of density prompting ?
The Chain of Density (CoD) prompting is used to generate summaries that start off entity-sparse and become increasingly entity-dense.
Summaries are generated by iteratively incorporating missing entities from the source text without increasing the length.
The chain of density prompt enables the generation of summaries with varying levels of information density.
CoD summaries exhibit more fusion and less lead bias compared to GPT-4 summaries with a vanilla prompt. The research showed that human preferences favor summaries that are more dense.
Implement Chain of Density with minimum dependencies
For this tutorial, we'll apply this technique over a fairly complex topic : A Path Towards Autonomous Machine Intelligence, a position paper from LeCun (2022).
We chose this paper for the following reasons :
- The paper topic is fairly complex
- The paper is filled with rich information
- The length (60 pages / 38 k tokens) makes it hard to process in a single shot
Prerequisites
To implement the chain of density prompting, we'll use the py-llm-core Python library.
Create a new virtual environment and install the library (you'll need an OpenAI API key to use GPT models).
python3 -m venv venv
source venv/bin/activate
pip3 install -U py-llm-core
pip3 install PyPDF2
#: if your use OpenAI models
export OPENAI_API_KEY=sk-<replace with your actual api key>
If you need to use another model, you can choose from any supported model by the llama.cpp library. Download the GGUF file as described in the py-llm-core library.
Overview
We'll cover the following steps:
- Convert the PDF into a text string
- Perform the CoD
Converting the PDF file into a text
# First do the boring stuff of converting PDF to text
# pip3 install PyPDF2
import PyPDF2
from llm_core.splitters import TokenSplitter
# Open the PDF file
with open('../assets/a-path-towards-autonomous-machines.pdf', 'rb') as file:
pdf_reader = PyPDF2.PdfReader(file)
# Extract the text from the PDF
pages = []
for page in pdf_reader.pages:
pages.append(page.extract_text())
text = ''.join(pages)
Cleanup unicode characters from the PDF
import unicodedata
def cleanup_unicode(text):
corrected_chars = []
for char in text:
corrected_char = unicodedata.normalize("NFKC", char)
corrected_chars.append(corrected_char)
return "".join(corrected_chars)
article = cleanup_unicode(text)
Execute the Chain of Density
We no longer need to chunk the document into smaller pieces and we can apply the chain of density directly on the document. Here we have a 38k tokens document - that's fairly long but it fits the 128k context window of the most recent models.
Implementing the Chain of Density prompting
Writing the actual prompts
The following is a slightly adapted version of the prompt from the paper:
chain_of_density_system_prompt = "You are an expert in writing rich and dense summaries in broad domains."
chain_of_density_prompt = """
Article:
{article}
----
You will generate increasingly concise, entity-dense summaries of the
above Article.
Repeat the following 2 steps 5 times.
- Step 1: Identify 1-3 informative Entities from the Article
which are missing from the previously generated summary and are the most
relevant.
- Step 2: Write a new, denser summary of identical length which covers
every entity and detail from the previous summary plus the missing
entities.
A Missing Entity is:
- Relevant: to the main story
- Specific: descriptive yet concise (5 words or fewer)
- Novel: not in the previous summary
- Faithful: present in the Article
- Anywhere: located anywhere in the Article
Guidelines:
- The first summary should be long (4-5 sentences, approx. 80 words) yet
highly non-specific, containing little information beyond the entities
marked as missing.
- Use overly verbose language and fillers (e.g. "this article discusses")
to reach approx. {length_in_words} words.
- Make every word count: re-write the previous summary to improve flow and
make space for additional entities.
- Make space with fusion, compression, and removal of uninformative
phrases like "the article discusses"
- The summaries should become highly dense and concise yet
self-contained, e.g., easily understood without the Article.
- Missing entities can appear anywhere in the new summary.
- Never drop entities from the previous summary. If space cannot be made,
add fewer new entities.
> Remember to use the exact same number of words for each summary.
> Write the missing entities in missing_entities
> Write the summary in denser_summary
"""
Writing the Python code
The library py-llm-core
makes the generation of structured outputs very easy (i.e. JSON mode - see Generate JSON with Mistral AI Instruct model).
Data classes are created with annotations to help large language models infer the data structure.
# Let's define out target structure:
from dataclasses import dataclass
from typing import List
from llm_core.assistants import OpenAIAssistant, OpenWeightsAssistant
@dataclass
class DenseSummary:
denser_summary: str
missing_entities: List[str]
@dataclass
class DenserSummaryCollection:
system_prompt = chain_of_density_system_prompt
prompt = chain_of_density_prompt
summaries: List[DenseSummary]
Now we can write the assistant code:
# Generate iteratively the summaries
with OpenAIAssistant(DenserSummaryCollection, model='gpt-4o') as assistant:
collection = assistant.process(article=article, length_in_words=80)
Use llama 3.1 or any other llama.cpp compatible model
For an open weight model, replace the summarize method as:
# See https://github.com/advanced-stack/py-llm-core for downloading llama-8b-3.1-q4
with OpenWeightsAssistant(DenserSummaryCollection, model='llama-8b-3.1-q4', loader_kwargs={"n_ctx": 60_000}) as assistant:
collection = assistant.process(article=article, length_in_words=80)
This code defines a Python dataclass DenseSummary
and DenserSummaryCollection
to structure the data for a task that involves generating increasingly concise, entity-dense summaries of a given article.
Here's a breakdown of the code:
DenseSummary
: This dataclass is used to store a denser summary and the missing entities from the summaryDenserSummaryCollection
: This dataclass is used to store a collection ofDenseSummary
objects for each iteration.summarize
: This method uses theOpenAIAssistant
initialized with the class itself to process the article and generate the summary. It returns aDenserSummaryCollection
populated instance.
Perform the summarization of the whole document
Now, we can print the results for each iteration ; note that iterations are in-context, these are not separate requests.
for summary in collection.summaries:
print(summary.missing_entities)
['Yann LeCun', 'self-supervised learning', 'Meta']
['Courant Institute', 'New York University', 'energy-based models']
['cognitive architecture', 'deep learning', 'machine common sense']
['JEPA', 'H-JEPA', 'critic module']
['intrinsic cost', 'actor module', 'short-term memory']
[]
print(collection.summaries[-1].denser_summary)
Yann LeCun's article discusses the path towards autonomous machine intelligence, focusing on how machines can learn efficiently like humans and animals. It explores the architecture and training paradigms necessary for constructing intelligent agents.
The paper combines concepts such as predictive world models, intrinsic motivation, and hierarchical joint embedding architectures. The goal is to enable machines to reason, predict, and plan at multiple levels of abstraction and time horizons.
The text is written to appeal to readers from various backgrounds, including neuroscience, cognitive science, and philosophy. The paper emphasizes self-supervised learning and is affiliated with Meta, Courant Institute, and New York University.
It also discusses energy-based models, cognitive architecture, deep learning, and machine common sense.
The paper introduces JEPA, H-JEPA, and the critic module.
It also covers intrinsic cost, actor module, and short-term memory.
It is funny to note that because we tried to add as much entities as possible, the summary contains multiple times "It also covers".
We can read a less dense summary by taking a look at the 3rd iteration:
print(collection.summaries[2].denser_summary)
Yann LeCun's article from the Courant Institute and Meta discusses the path towards autonomous machine intelligence, focusing on how machines can learn efficiently like humans and animals.
It explores the architecture and training paradigms necessary for constructing intelligent agents.
The paper combines concepts such as predictive world models, intrinsic motivation, and hierarchical joint embedding architectures trained with self-supervised learning.
The goal is to enable machines to reason, predict, and plan at multiple levels of abstraction and time horizons.
The document is written to appeal to readers from various backgrounds, including neuroscience, cognitive science, and philosophy.
This looks better to my human eyes !
Further notes and performance of CoD with smaller models
Updated version with GPT-4o vs GPT-4o-mini
GPT-4o-mini does a decent job at summarizing a 38k tokens document - way better than gpt-3.5. However, looking at the missing_entities
variable throughout the iterations, we can see that it still struggles to keep track.
The overall style is good (while GPT-4o is excellent).
The following was written before the release of GPT-4o
Performing several tests with the models from OpenAI gpt-3.5-turbo, gpt-3.5-turbo-16k and gpt-4 lead interesting results:
The GPT-3.5 models provide good results in the first summary but quickly lose track of entities. Furthermore, they clearly lack the ability to identify what's missing. Doing more tests, I realized that GPT-3.5 models are unable to produce intersections between sets.
I wouldn't recommend using smaller models like GPT-3.5.
On the contrary, the GPT-4 model performs very well with this chain of density prompting technique.
Conclusion and next steps
The GPT-4 Summarization with Chain of Density Prompting paper is quite interesting in its approach: iterating within the context window to produce a better result in zero-shot operations.
You might be interested in reading Mistral AI Instruct v0.1 model evaluation in real world use cases.
I am currently evaluating the Chain of Density to build a document quality-assessment tool, if you want to be notified when I have the first results, leave your email: