Generate Structured Output With Open Source LLMs

Published March 20, 2024

Intro

Introduction

When using Large Language Models (LLMs) for enterprise needs, it's often desirable to have these models return structured outputs.
This structured output facilitates parsing the generations for further processing or exposure through APIs.
Currently, a common solution involves providing formatting instructions to the LLM in the "System Prompt" and employing regular expressions (regex) to parse the output.
However, this approach can sometimes result in the LLM not returning an output that follows the provided instructions.
Also, writing regex expressions might be challenging.
A better technique has emerged to achieve this desired behavior, which we will explore in this article.

OpenAI JSON mode

OpenAI has introduced the JSON mode, which allows users to pass a JSON schema to the model in order to return a response adhering to the provided schema.
Recognizing the demand for structured outputs, the community has actively pursued solutions to integrate this functionality into open-source LLMs.

Outlines: Structured Text Generation

One notable solution is Outlines, a tool designed specifically for structured text generation. Outlines employs a combination of regex partial matching and Finite-State Machines (FSM) to ensure that the LLM generates tokens conforming to the provided regex patterns. Additionally, users can provide JSON schemas, which are then converted into regex expressions by the library.
For a more comprehensive understanding of Outlines' capabilities, refer to this article.

Using Hugging Face Text Generation For Inference

Starting from version 1.4.3 of Hugging Face Text Generation Inference (TGI), the project incorporates the Outlines library.
This integration enables the generation of structured outputs from LLMs.
For those unfamiliar with Hugging Face Text Generation Inference, I've previously written an article detailing its deployment on a SageMaker endpoint.
As of the writing of this article, version 1.4.3 has not yet been made available as part of the SageMaker DLC. Consequently, I opted to deploy it as a Docker container within an ECS cluster for practical experimentation and application.

Implementation

We will implement an application that extract essential information about movies from textual data sourced from Wikipedia.
I will use "NousResearch/Hermes-2-Pro-Mistral-7B" for this but feel free to use the model that you want.
We begin by defining a response schema using Pydantic:

from pydantic import BaseModel, Field

class Movie(BaseModel):
   name: str = Field(description="Name of the movie")
   release_year: str = Field(description="Release year of the movie")
   director: str = Field(description="Director name of the movie")

json_schema = Movie.schema()

Next, we invoke TGI endpoint "/generate" to process our textual input and we pass the schema for the output in the "grammar" field:

import requests

# Replace "[NAME OF THE DOMAIN FOR THE TGI SERVER]" with the appropriate domain name
domain_name = "[NAME OF THE DOMAIN FOR THE TGI SERVER]"

# Sample prompt for extracting movie data
prompt = """<|im_start|>user
       Extract the movie data from the description between the tags <description></description>
       <description>
       Dune (titled onscreen as Dune: Part One) is a 2021 American epic science fiction film directed and co-produced by Denis Villeneuve, who co-wrote the screenplay with Jon Spaihts and Eric Roth
       </description>
       <|im_end|>
       <|im_start|>assistant"""

# Prepare data for the request
data = {
   "inputs": prompt,
   "parameters": {
       "grammar": {"type": "json", "value": json_schema},
   },
}

headers = {
   "Content-Type": "application/json",
}

response = requests.post(
   f"https://{domain_name}/generate",
   headers=headers,
   json=data,
)
print(response.json()["generated_text"])

Upon executing the request, we will receive a structured response featuring the details about the movie extracted from the provided text.
The response is aligned with the schema we've defined.
Below is an example of the response:

{
  "director": "Denis Villeneuve",
  "name": "Dune",
  "release_year": "2021"
}

Conclusion

By leveraging these advancements in LLM capabilities and integration tools like Outlines and Hugging Face Text Generation Inference, companies can effectively harness the power of LLMs by ensuring structured and reliable outputs that can be used for their enterprise needs.
Also the structured output will allow the LLMs to interact with external sources using "function_calls" which we will explore in the another article.

Disclaimer

The information presented in this article is intended for informational purposes only. I assume no responsibility or liability for any use, misuse, or interpretation of the information contained herein.