LangChain: the framework for building applications powered by large language models

Post image

If you want to build applications that use large language models (LLM) like ChatGPT , you have two options. You can use an API of a LLM-provider (like the OpenAI-API ) directly, or you use a framework that abstracts the LLM-API and provides additional features.

LangChain is a powerful open-source framework for developing applications powered by language models. It seamlessly integrates with the language models of your choice, including OpenAI or Hugging Face models, and combines them with external sources like your local file system, Google Drive, Notion, or Wikipedia. By connecting these components, LangChain empowers you to create advanced use cases around language models by chaining them together in a flexible manner.

Still sounds complicated and too abstract? Let’s break it down a little bit.

Overview

What makes LangChain particularly appealing is that you can simply get started. If you prefer using Python, you can take advantage of LangChain , while for those who work with JavaScript/TypeScript, LangChain.js is the suitable choice.

The development framework offers several key features and functionalities that facilitate working with language models (LLMs) - but don’t worry, for starters, you can get by with just a few of them:

  • LLM Utilities: This includes a comprehensive set of tools designed for seamless interaction with LLMs. These utilities encompass prompt management, prompt optimization, a unified interface for all LLMs, and various common utilities like document loaders and parsers.
  • Chains: LangChain introduces a concept called “chains” which enables developers to create sequences of calls to LLMs or other tools such as data sources, APIs, or libraries. LangChain provides a standardized interface for chains, numerous integrations with other tools, and ready-to-use end-to-end chains for common applications.
  • Data Augmented Generation: This feature allows developers to create chains that interact with external data sources to retrieve relevant data for use in the generation process. For instance, it can be used for tasks like summarizing lengthy text or performing question-answering over specific data sources.
  • Agents: LangChain provides a mechanism called “agents” for modeling LLMs that make decisions, take actions, observe the outcomes, and repeat the process until completion. It offers a standardized interface for agents, a selection of available agent options, and practical examples of end-to-end agents.
  • Memory: LangChain supports the concept of memory, allowing developers to persist state information between calls of a chain or agent. It provides a standardized interface for memory, a collection of memory implementations, and illustrative examples of chains or agents that utilize memory.
  • Generative Model Evaluation (currently in beta): LangChain introduces an innovative approach to evaluate generative models by utilizing language models themselves for evaluation purposes. It includes specific prompts and chains designed to assist with this evaluation process.

LangChain is continuously evolving, ensuring constant growth and improvement. This enables developers to build increasingly advanced use cases and applications.

Should I use it?

You’re still unsure if you should give LangChain a try? In my opinion it is definitely worth it, even if you are just getting started with the OpenAI-API, execute some prompts and especially, if you want to work with the responses programmatically.

Why is that? LangChain abstracts the LLM and incorporates automatic retry as part of its standardized error handling and resilience features. This functionality allows for automatic retries in cases such as timeouts or reaching request rate limits, ensuring higher success rates and improved reliability in language processing workflows. It assists you with specifying the format instructions, to instruct the LLM which format you expect as output.

But there is so much more. Let’s try it out!

Install

Please create a new Python Project in a new directory, create a venv and activate it.

Then, install LangChain and the modules needed for the common LLM providers:

pip install langchain[llms]==v0.0.216

Alternatively, you can install the components separately:

pip install langchain
pip install openai

After the installation is finished, you should be good to go, and you can start building your first application.

Simple Example: “Explain a Topic”

Let’s take a look at a simple example. We want to build an application that explains topics in simple words using the OpenAI-API to generate the responses.

Our program should support the command line arguments topic and words to explain the specified topic in the given number of words: python3 simple-explainer.py --topic "Large Language Models" --words 100

First we create an instance of the ChatOpenAI class, which is a wrapper around the OpenAI-API and abstracts the Language Model (LLM) we want to use, providing us with a standardized interface for interacting with LLMs.

Then we utilize LLMChain which is the LLM-representation of the central Chain-concept that inspired LangChain’s name: the Chain interface is provided to enable more complex applications that require chaining of LLMs, either with each other or with other components. A chain is defined very generically as a sequence of calls to components, which can include other chains. Usually the chains are executed in a Sequence , but you can use Router to select the next chain to execute. We will look into these concepts in more detail in a later post.

In our simple case, we just want to execute a prompt and then handle the response. The LLMChain provides the standardized interface for interacting with LLMs. It also facilitates the execution of prompts and streamlines the handling of responses from the LLM.

Additionally, a PromptTemplate is instantiated to separate the prompt from the code and conveniently provides the mechanics to define and fill placeholders in the prompt.

simple-explainer.py

from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
import argparse

# Read command line arguments
parser = argparse.ArgumentParser(description='Run a simple LLM chain.')
parser.add_argument('--topic', type=str, default="Quantum Computing", help='the topic to explain')
parser.add_argument('--words', type=int, default=300, help='the number of words to use')
args = parser.parse_args()

# Create and configure a new OpenAI-API instance
openai = ChatOpenAI(
    # see https://platform.openai.com/docs/api-reference/chat/create#chat/create-temperature
    temperature=0.3,
    # select model that should be used
    model_name="gpt-3.5-turbo"
)

# Create a new LLMChain instance and the llm and prompt-template
chain = LLMChain(
    llm=openai,
    prompt=PromptTemplate(
        input_variables=["topic", "words"],
        template="""
            You are an expert teacher. Explain the concept of {topic} in {words} words or less
            in a way that a 12-year-old can understand.
        """
    ),
    verbose=False   # set to True to see the prompt that is sent to the LLM
)

# Run the chain
chain_result = chain.run(
    topic=args.topic,
    words=args.words
)

# Print the result
print(chain_result)

To run this program, you’ll have to generate an OpenAI API-Key first, and set it as the environment variable OPENAI_API_KEY.

export OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
python3 simple-explainer.py --topic "Large Language Models" --words 100

After executing the script, the output could look like this:

Large Language Models are like super smart computer programs that can understand and
use human language. They can read and analyze lots of text and learn how to write
and speak like humans do. This helps us communicate better with computers and also
helps computers understand us better. Think of them like a really smart language
robot that can talk to us and help us out!

We found that it’s really easy to get started with LangChain. The code is concise and readable. However, there are more helpful features that we can use to improve our application. Let’s take a look at the next example.

Format the Response: “Quiz Generator”

While printing the output of a language model (LLM) to the console can be helpful, there are many cases where you might want to format the LLM’s response for better readability or to use it programmatically. For instance, you may need to display specific parts of the response in a frontend application or make decisions based on the outcomes generated by the LLM. LangChain provides a simple way to do that by defining a format string that is used to format the response.

And even better: you define a schema by providing a pydantic model, that is used to generate the formatting instructions.

Let’s take a look at an example that generates a quiz based on a topic and the number of questions that should be generated. We use the PydanticOutputParser to parse the response and generate the formatting instructions.

quiz-generator.py

from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from typing import List

from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
import argparse

# read command line arguments
parser = argparse.ArgumentParser(description='Generate Quiz')
parser.add_argument('--topic', type=str, default="Vulcan", help='the topic to generate aliases')
parser.add_argument('--count', type=int, default=3, help='the number of suggestions to generate')
args = parser.parse_args()


# Define the model for the llm response
class Quiz(BaseModel):
    # Describe the intended usage of the fields, so that the llm
    # can match the response-content with the fields
    questions: List[str] = Field(description="Quiz-Questions")
    answers: List[str] = Field(description="Quiz-Answers")

    def to_dict(self):
        return {
            "questions": self.questions,
            "answers": self.answers,
        }


quiz_parser: PydanticOutputParser = PydanticOutputParser(
    pydantic_object=Quiz
)

# Create and configure a new OpenAI-API instance
openai = ChatOpenAI(
    # See https://platform.openai.com/docs/api-reference/chat/create#chat/create-temperature
    temperature=0.3,
    # Select model that should be used
    model_name="gpt-3.5-turbo"
)

# Create a new LLMChain instance and the llm and prompt-template
chain = LLMChain(
    llm=openai,
    prompt=PromptTemplate(
        input_variables=["topic", "count"],
        # Prompt template with placeholders.
        # Please note: the generated format instructions for the result are sent to the model
        # via the placeholder 'format_instructions' on a separate line at the end of the prompt
        template="""
                    Ask me {count} questions about {topic}. Provide the answers.
                    \n{format_instructions}
                """,
        # Provide the format_instructions derived from our model immediately
        # as 'static' part of the PromptTemplate, the rest of the variables
        # will be provided dynamically on the chain.run(...) below
        partial_variables={
            "format_instructions": quiz_parser.get_format_instructions()
        },
    ),
    verbose=False  # Set to True to see the prompt with the generated format-instructions
)

# Run the chain
chain_result = chain.run(
    topic=args.topic,
    count=args.count
)

# Print the result
print(chain_result)

You can run it with command line arguments topic and count to generate a quiz with the given number of questions: python3 quiz-generator.py --topic "Solar Eclipse" --count 3

The output will look like this:

{
  "questions": [
    "What is a solar eclipse?",
    "How often do solar eclipses occur?",
    "What precautions should be taken during a solar eclipse?"
  ],
  "answers": [
    "A solar eclipse occurs when the moon passes between the sun and the earth, blocking the sun's light and casting a shadow on the earth.",
    "Solar eclipses occur about every 18 months, but they are not visible from all parts of the earth each time.",
    "Looking directly at the sun during a solar eclipse can cause permanent eye damage. It is important to wear special eclipse glasses or use other safe viewing methods."
  ]
}

How cool is that? We generated the desired number of quiz questions and answers in json format, defined by a pydantic model. The json output is a simple data structure, that can easily be used programmatically as response of a Quiz-API or in another application.

Conclusion

Overall, LangChain brings a significant advantage by simplifying the integration of large language models into applications, enabling advanced use cases. Its user-friendly interface facilitates seamless interaction with various language models, enabling developers to create complex applications through component chaining. With LangChain, users can optimize queries through prompt engineering and leverage autonomous research capabilities. Ultimately, LangChain streamlines the integration and utilization of language models, enhancing application functionality.

In this blog post, we have only touched the surface of the powerful capabilities provided by LangChain . We have explored how this framework empowers the development of language model-driven applications. However, LangChain’s potential goes far beyond what we have covered here. In future posts, we will delve deeper into specific components, their functionalities, and how they can be customized to address unique use cases. Stay tuned for more in-depth discussions and practical insights to fully harness the remarkable potential of this framework.

You May Also Like