Skip to content

How to summarize a text using the Chain of Density prompting ?

This article explains how to summarize a text step-by-step using the chain of density prompt using Python and large language models.

You'll find all the code in the py-llm-core jupyter notebook here.

What is chain of density prompting ?

The Chain of Density (CoD) prompting is used to generate summaries that start off entity-sparse and become increasingly entity-dense.

Summaries are generated by iteratively incorporating missing entities from the source text without increasing the length.

The chain of density prompt enables the generation of summaries with varying levels of information density.

CoD summaries exhibit more fusion and less lead bias compared to GPT-4 summaries with a vanilla prompt. The research showed that human preferences favor summaries that are more dense.

Implement Chain of Density with minimum dependencies

For this tutorial, we'll apply this technique over a fairly complex topic : A Path Towards Autonomous Machine Intelligence, a position paper from LeCun (2022).

We chose this paper for the following reasons :

  • The paper topic is fairly complex
  • The paper is filled with rich information
  • The length (60 pages / 38 k tokens) makes it hard to process in a single shot

Prerequisites

To implement the chain of density prompting, we'll use the py-llm-core Python library.

Create a new virtual environment and install the library (you'll need an OpenAI API key to use GPT models).

shell
python3 -m venv venv
source venv/bin/activate
pip3 install py-llm-core
pip3 install PyPDF2

#: if your use OpenAI models
export OPENAI_API_KEY=sk-<replace with your actual api key>

If you need to use another model, you can choose from any supported model by the llama.cpp library. Download the GGUF file as described in the py-llm-core library.

Overview

We'll cover the following steps:

  1. Convert the PDF into a text string
  2. Create chunks that GPT-4 can digest
  3. Perform the CoD

Converting the PDF file into a text

python
import PyPDF2

# Open the PDF file
with open('data/A Path Towards Autonomous Machine Intelligence.pdf', 'rb') as file:
    pdf_reader = PyPDF2.PdfReader(file)

    # Extract the text from the PDF
    pages = []
    for page in pdf_reader.pages:
        pages.append(page.extract_text())

    text = ''.join(pages)

Cleanup unicode characters from the PDF

python
import unicodedata


def cleanup_unicode(text):
    corrected_chars = []
    for char in text:
        corrected_char = unicodedata.normalize("NFKC", char)
        corrected_chars.append(corrected_char)
    return "".join(corrected_chars)


text = cleanup_unicode(text)

print(text[:1000])

Split the text to fit the window size of the model

The whole document may not fit in the context window of the model.

We'll use the GPT-4 and llama-3-8B: They both have a context window size of 8 000 tokens. It means that we'll have to perform several summaries and then concatenate the results.

Note: We cannot create chunks of 8 000 tokens as the context window size serves for both the prompt (input) and the completion (output) tokens.

python
from llm_core.splitters import TokenSplitter

# 75% input tokens / max 25% output tokens
splitter = TokenSplitter(chunk_size=6_000, chunk_overlap=0)

chunks = list(splitter.chunkify(text))

Implementing the Chain of Density prompting

The library py-llm-core allows the creation of structured outputs (i.e. JSON mode - see Generate JSON with Mistral AI Instruct model).

Data classes are created with annotations to help large language models infer the data structure.

python
from dataclasses import dataclass
from llm_core.assistants import OpenAIAssistant, LLaMACPPAssistant


@dataclass
class DenseSummary:
    denser_summary: str
    missing_entities: list[str]


@dataclass
class DenserSummaryCollection:
  system_prompt = """
  You are an expert in writing rich and dense summaries in broad domains.
  """

  prompt = """
  Article:
  {article}
  ----

  You will generate increasingly concise, entity-dense summaries of the
  above Article.

  Repeat the following 2 steps 5 times.

  - Step 1: Identify 1-3 informative Entities from the Article
  which are missing from the previously generated summary and are the most
  relevant.

  - Step 2: Write a new, denser summary of identical length which covers
  every entity and detail from the previous summary plus the missing
  entities.

  A Missing Entity is:

  - Relevant: to the main story
  - Specific: descriptive yet concise (5 words or fewer)
  - Novel: not in the previous summary
  - Faithful: present in the Article
  - Anywhere: located anywhere in the Article

  Guidelines:
  - The first summary should be long (4-5 sentences, approx. 80 words) yet
  highly non-specific, containing little information beyond the entities
  marked as missing.

  - Use overly verbose language and fillers (e.g. "this article discusses")
  to reach approx. 80 words.

  - Make every word count: re-write the previous summary to improve flow and
  make space for additional entities.

  - Make space with fusion, compression, and removal of uninformative
  phrases like "the article discusses"

  - The summaries should become highly dense and concise yet
  self-contained, e.g., easily understood without the Article.

  - Missing entities can appear anywhere in the new summary.

  - Never drop entities from the previous summary. If space cannot be made,
  add fewer new entities.

  > Remember to use the exact same number of words for each summary.
  Answer in JSON.

  > The JSON in `summaries_per_step` should be a list (length 5) of
  dictionaries whose keys are "missing_entities" and "denser_summary".

  """

  summaries: list[DenseSummary]


  @classmethod
  def summarize(cls, article):
      with OpenAIAssistant(cls, model='gpt-4') as assistant:
          return assistant.process(article=article)

Use llama-3 or any other llama.cpp compatible model

For an open weight model, replace the summarize method as:

python

# Download file https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q4_K_M.gguf?download=true
# move it to "~/.cache/py-llm-core/models/"
# rename to "llama-3-q4"

@classmethod
  def summarize(cls, article):
      with LLaMACPPAssistant(cls, model='llama-3-q4') as assistant:
          return assistant.process(article=article)

This code defines a Python dataclass DenseSummary and DenserSummaryCollection to structure the data for a task that involves generating increasingly concise, entity-dense summaries of a given article.

Here's a breakdown of the code:

  • DenseSummary: This dataclass is used to store a denser summary and the missing entities from the summary
  • DenserSummaryCollection: This dataclass is used to store a collection of DenseSummary objects for each iteration.
  • summarize: This method uses the OpenAIAssistant initialized with the class itself to process the article and generate the summary. It returns a DenserSummaryCollection populated instance.

Now, we can print the results for each iteration ; note that iterations are in-context, these are not separate requests.

python

first_chunk = chunks[0]

summary_collection = DenserSummaryCollection.summarize(first_chunk)
print(len(summary_collection.summaries))
# Outputs "5"

print(summary_collection.summaries[0].missing_entities)
['Yann LeCun', 'autonomous intelligent agents', 'self-supervised learning']

print(summary_collection.summaries[0].denser_summary)
markdown
This article discusses a position paper by Yann LeCun, which proposes an
architecture for autonomous intelligent agents.

The paper combines concepts such as configurable predictive world models, 
behavior driven through intrinsic motivation, and hierarchical joint embedding
architectures trained with self-supervised learning.

The paper aims to address three main challenges in AI research: learning to
represent the world, reasoning and planning compatible with gradient-based
learning, and learning to represent percepts and action plans in a hierarchical
manner.
python
>>> print(summary_collection.summaries[1].missing_entities)
['JEPA', 'Hierarchical JEPA', 'non-generative architectures']

>>> print(summary_collection.summaries[1].denser_summary)
markdown
Yann LeCun's position paper proposes an architecture for autonomous
intelligent agents, combining concepts like configurable predictive world
models and behavior driven by intrinsic motivation.

It addresses three AI research challenges: learning world representation,
reasoning and planning compatible with gradient-based learning, and
hierarchical representation of percepts and action plans.

The paper also introduces JEPA and Hierarchical JEPA,
non-generative architectures for predictive world models.

Perform the summarization of the whole document

Now that we processed the first chunk, we can do the same for all chunks.

python

def single_pass_summary_generator(chunks):
  for chunk in chunks:
    chunk_summaries = DenserSummaryCollection.summarize(chunk)
    yield chunk_summaries.summaries[4].denser_summary


one_pass_summary = '\n'.join(single_pass_summary_generator(chunks))

Let's check the obtained length:

python
splitter.compute_token_count(one_pass_summary)

# outputs: 722

We can "concatenate" summaries into a single summary.

python
summary_collection = DenserSummaryCollection.summarize(one_pass_summary)
final_summary = summary_collection.summaries[4].denser_summary

print(final_summary)
markdown
The paper proposes a vision for autonomous agents, integrating predictive
world models, intrinsic motivation, hierarchical joint embedding architectures,
including JEPA-1 and JEPA-2, Energy-Based Models (EBMs), Joint Embedding
Architectures (JEAs), and Hierarchical JEPA.

It scrutinizes a Model-Predictive Control process, emphasizing optimization
strategies, and advocates for a multi-scale world model.

It investigates machine learning, emphasizing architectural diagrams, and
discusses the limitations of Large Language Models (LLMs) and reinforcement
learning.

Further notes and performance of CoD with smaller models

Performing several tests with the models from OpenAI gpt-3.5-turbo, gpt-3.5-turbo-16k and gpt-4 lead interesting results:

The GPT-3.5 models provide good results in the first summary but quickly lose track of entities. Furthermore, they clearly lack the ability to identify what's missing. Doing more tests, I realized that GPT-3.5 models are unable to produce intersections between sets.

I wouldn't recommend using smaller models like GPT-3.5.

On the contrary, the GPT-4 model performs very well with this chain of density prompting technique.

Example result from llama-3-8B

Running the exact same process with llama 3 8B model (quantized version q4) gave me the following summary:

markdown
An autonomous machine intelligence architecture is proposed combining
predictive world models with hierarchical joint embedding architectures.

JEPA combines perception, prediction, and action modules for hierarchical
planning. The system learns from observation and predicts future outcomes
using energy-based models.

The results are similar to GPT-3.5, the model is struggling to keep track of missing entities.

Conclusion and next steps

The GPT-4 Summarization with Chain of Density Prompting paper is quite interesting in its approach: iterating within the context window to produce a better result in zero-shot operations.

You might be interested in reading Mistral AI Instruct v0.1 model evaluation in real world use cases.

Read more...

Subscribe to get weekly tips (5 min read):

Advanced Stack