How to summarize a text using the Chain of Density prompting ?

This article explains how to summarize a text step-by-step using the chain of density prompt using Python and large language models.

This article has been updated to be run with the latest models - especially with larger context window (128 000 tokens) - such as GPT4-o and Llama 3.1.

You'll find all the code in the py-llm-core jupyter notebook here.

What is chain of density prompting ?

The Chain of Density (CoD) prompting is used to generate summaries that start off entity-sparse and become increasingly entity-dense.

Summaries are generated by iteratively incorporating missing entities from the source text without increasing the length.

The chain of density prompt enables the generation of summaries with varying levels of information density.

CoD summaries exhibit more fusion and less lead bias compared to GPT-4 summaries with a vanilla prompt. The research showed that human preferences favor summaries that are more dense.

Implement Chain of Density with minimum dependencies

For this tutorial, we'll apply this technique over a fairly complex topic : A Path Towards Autonomous Machine Intelligence, a position paper from LeCun (2022).

We chose this paper for the following reasons :

The paper topic is fairly complex
The paper is filled with rich information
The length (60 pages / 38 k tokens) makes it hard to process in a single shot

Prerequisites

To implement the chain of density prompting, we'll use the py-llm-core Python library.

Create a new virtual environment and install the library (you'll need an OpenAI API key to use GPT models).

shell

python3 -m venv venv
source venv/bin/activate
pip3 install -U py-llm-core
pip3 install PyPDF2

#: if your use OpenAI models
export OPENAI_API_KEY=sk-<replace with your actual api key>

If you need to use another model, you can choose from any supported model by the llama.cpp library. Download the GGUF file as described in the py-llm-core library.

Overview

We'll cover the following steps:

Convert the PDF into a text string
Perform the CoD

Converting the PDF file into a text

python

# First do the boring stuff of converting PDF to text
# pip3 install PyPDF2

import PyPDF2

from llm_core.splitters import TokenSplitter

# Open the PDF file
with open('../assets/a-path-towards-autonomous-machines.pdf', 'rb') as file:
    pdf_reader = PyPDF2.PdfReader(file)

    # Extract the text from the PDF
    pages = []
    for page in pdf_reader.pages:
        pages.append(page.extract_text())

    text = ''.join(pages)

Cleanup unicode characters from the PDF

python

import unicodedata

def cleanup_unicode(text):
    corrected_chars = []
    for char in text:
        corrected_char = unicodedata.normalize("NFKC", char)
        corrected_chars.append(corrected_char)
    return "".join(corrected_chars)


article = cleanup_unicode(text)

Execute the Chain of Density

We no longer need to chunk the document into smaller pieces and we can apply the chain of density directly on the document. Here we have a 38k tokens document - that's fairly long but it fits the 128k context window of the most recent models.

Implementing the Chain of Density prompting

Writing the actual prompts

The following is a slightly adapted version of the prompt from the paper:

python

chain_of_density_system_prompt = "You are an expert in writing rich and dense summaries in broad domains."
chain_of_density_prompt = """
  Article:
  {article}
  ----

  You will generate increasingly concise, entity-dense summaries of the
  above Article.

  Repeat the following 2 steps 5 times.

  - Step 1: Identify 1-3 informative Entities from the Article
  which are missing from the previously generated summary and are the most
  relevant.

  - Step 2: Write a new, denser summary of identical length which covers
  every entity and detail from the previous summary plus the missing
  entities.

  A Missing Entity is:

  - Relevant: to the main story
  - Specific: descriptive yet concise (5 words or fewer)
  - Novel: not in the previous summary
  - Faithful: present in the Article
  - Anywhere: located anywhere in the Article

  Guidelines:
  - The first summary should be long (4-5 sentences, approx. 80 words) yet
  highly non-specific, containing little information beyond the entities
  marked as missing.

  - Use overly verbose language and fillers (e.g. "this article discusses")
  to reach approx. {length_in_words} words.

  - Make every word count: re-write the previous summary to improve flow and
  make space for additional entities.

  - Make space with fusion, compression, and removal of uninformative
  phrases like "the article discusses"

  - The summaries should become highly dense and concise yet
  self-contained, e.g., easily understood without the Article.

  - Missing entities can appear anywhere in the new summary.

  - Never drop entities from the previous summary. If space cannot be made,
  add fewer new entities.

  > Remember to use the exact same number of words for each summary.
  > Write the missing entities in missing_entities
  > Write the summary in denser_summary
  """

Writing the Python code

The library py-llm-core makes the generation of structured outputs very easy (i.e. JSON mode - see Generate JSON with Mistral AI Instruct model).

Data classes are created with annotations to help large language models infer the data structure.

python

# Let's define out target structure:

from dataclasses import dataclass
from typing import List
from llm_core.assistants import OpenAIAssistant, OpenWeightsAssistant

@dataclass
class DenseSummary:
    denser_summary: str
    missing_entities: List[str]

@dataclass
class DenserSummaryCollection:
    system_prompt = chain_of_density_system_prompt
    prompt = chain_of_density_prompt

    summaries: List[DenseSummary]

Now we can write the assistant code:

python

# Generate iteratively the summaries

with OpenAIAssistant(DenserSummaryCollection, model='gpt-4o') as assistant:
    collection = assistant.process(article=article, length_in_words=80)

Use llama 3.1 or any other llama.cpp compatible model

For an open weight model, replace the summarize method as:

python

# See https://github.com/advanced-stack/py-llm-core for downloading llama-8b-3.1-q4

with OpenWeightsAssistant(DenserSummaryCollection, model='llama-8b-3.1-q4', loader_kwargs={"n_ctx": 60_000}) as assistant:
    collection = assistant.process(article=article, length_in_words=80)

This code defines a Python dataclass DenseSummary and DenserSummaryCollection to structure the data for a task that involves generating increasingly concise, entity-dense summaries of a given article.

Here's a breakdown of the code:

DenseSummary: This dataclass is used to store a denser summary and the missing entities from the summary
DenserSummaryCollection: This dataclass is used to store a collection of DenseSummary objects for each iteration.
summarize: This method uses the OpenAIAssistant initialized with the class itself to process the article and generate the summary. It returns a DenserSummaryCollection populated instance.

Perform the summarization of the whole document

Now, we can print the results for each iteration ; note that iterations are in-context, these are not separate requests.

python

for summary in collection.summaries:
    print(summary.missing_entities)

python

['Yann LeCun', 'self-supervised learning', 'Meta']
['Courant Institute', 'New York University', 'energy-based models']
['cognitive architecture', 'deep learning', 'machine common sense']
['JEPA', 'H-JEPA', 'critic module']
['intrinsic cost', 'actor module', 'short-term memory']
[]

python

print(collection.summaries[-1].denser_summary)

markdown

Yann LeCun's article discusses the path towards autonomous machine intelligence, focusing on how machines can learn efficiently like humans and animals. It explores the architecture and training paradigms necessary for constructing intelligent agents.

The paper combines concepts such as predictive world models, intrinsic motivation, and hierarchical joint embedding architectures. The goal is to enable machines to reason, predict, and plan at multiple levels of abstraction and time horizons.

The text is written to appeal to readers from various backgrounds, including neuroscience, cognitive science, and philosophy. The paper emphasizes self-supervised learning and is affiliated with Meta, Courant Institute, and New York University.

It also discusses energy-based models, cognitive architecture, deep learning, and machine common sense.
The paper introduces JEPA, H-JEPA, and the critic module.
It also covers intrinsic cost, actor module, and short-term memory.

It is funny to note that because we tried to add as much entities as possible, the summary contains multiple times "It also covers".

We can read a less dense summary by taking a look at the 3rd iteration:

python

print(collection.summaries[2].denser_summary)

markdown

Yann LeCun's article from the Courant Institute and Meta discusses the path towards autonomous machine intelligence, focusing on how machines can learn efficiently like humans and animals.
It explores the architecture and training paradigms necessary for constructing intelligent agents.

The paper combines concepts such as predictive world models, intrinsic motivation, and hierarchical joint embedding architectures trained with self-supervised learning.

The goal is to enable machines to reason, predict, and plan at multiple levels of abstraction and time horizons.

The document is written to appeal to readers from various backgrounds, including neuroscience, cognitive science, and philosophy.

This looks better to my human eyes !

Further notes and performance of CoD with smaller models

Updated version with GPT-4o vs GPT-4o-mini

GPT-4o-mini does a decent job at summarizing a 38k tokens document - way better than gpt-3.5. However, looking at the missing_entities variable throughout the iterations, we can see that it still struggles to keep track.

The overall style is good (while GPT-4o is excellent).

The following was written before the release of GPT-4o

Performing several tests with the models from OpenAI gpt-3.5-turbo, gpt-3.5-turbo-16k and gpt-4 lead interesting results:

The GPT-3.5 models provide good results in the first summary but quickly lose track of entities. Furthermore, they clearly lack the ability to identify what's missing. Doing more tests, I realized that GPT-3.5 models are unable to produce intersections between sets.

I wouldn't recommend using smaller models like GPT-3.5.

On the contrary, the GPT-4 model performs very well with this chain of density prompting technique.

Conclusion and next steps

The GPT-4 Summarization with Chain of Density Prompting paper is quite interesting in its approach: iterating within the context window to produce a better result in zero-shot operations.

You might be interested in reading Mistral AI Instruct v0.1 model evaluation in real world use cases.

I am currently evaluating the Chain of Density to build a document quality-assessment tool, if you want to be notified when I have the first results, leave your email:

How to summarize a text using the Chain of Density prompting ? ​

What is chain of density prompting ? ​

Implement Chain of Density with minimum dependencies ​

Prerequisites ​

Overview ​

Converting the PDF file into a text ​

Cleanup unicode characters from the PDF ​

Execute the Chain of Density ​

Implementing the Chain of Density prompting ​

Writing the actual prompts ​

Writing the Python code ​

Use llama 3.1 or any other llama.cpp compatible model ​

Perform the summarization of the whole document ​

Further notes and performance of CoD with smaller models ​

Updated version with GPT-4o vs GPT-4o-mini ​

The following was written before the release of GPT-4o ​

Conclusion and next steps ​

How to summarize a text using the Chain of Density prompting ?

What is chain of density prompting ?

Implement Chain of Density with minimum dependencies

Prerequisites

Overview

Converting the PDF file into a text

Cleanup unicode characters from the PDF

Execute the Chain of Density

Implementing the Chain of Density prompting

Writing the actual prompts

Writing the Python code

Use llama 3.1 or any other llama.cpp compatible model

Perform the summarization of the whole document

Further notes and performance of CoD with smaller models

Updated version with GPT-4o vs GPT-4o-mini

The following was written before the release of GPT-4o

Conclusion and next steps