Version: 2.23-unstable

NvidiaGenerator

This Generator enables text generation using NVIDIA-hosted models.


Most common position in a pipeline	After a `PromptBuilder`
Mandatory init variables	`api_key`: API key for the NVIDIA NIM. Can be set with `NVIDIA_API_KEY` env var.
Mandatory run variables	`prompt`: A string containing the prompt for the LLM
Output variables	`replies`: A list of strings with all the replies generated by the LLM `meta`: A list of dictionaries with the metadata associated with each reply, such as token count and others
API reference	NVIDIA
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/nvidia

Overview

NvidiaGenerator provides an interface for generating text using LLMs self-hosted with NVIDIA NIM or models hosted on the NVIDIA API Catalog.

Usage

To start using NvidiaGenerator, install the nvidia-haystack package:

shell

pip install nvidia-haystack

You can use NvidiaGenerator with all the LLMs available in the NVIDIA API Catalog or with a model deployed using NVIDIA NIM. For more information, refer to the NVIDIA NIM for LLMs Playbook.

On its own

To use LLMs from the NVIDIA API Catalog, specify the api_url and your API key. You can get your API key from the NVIDIA API Catalog.

NvidiaGenerator uses the NVIDIA_API_KEY environment variable by default. Otherwise, you can pass an API key at initialization with the api_key parameter:

python

from haystack.utils.auth import Secret
from haystack_integrations.components.generators.nvidia import NvidiaGenerator

generator = NvidiaGenerator(
    model="meta/llama-3.1-70b-instruct",
    api_url="https://integrate.api.nvidia.com/v1",
    api_key=Secret.from_token("<your-api-key>"),
    model_arguments={
        "temperature": 0.2,
        "top_p": 0.7,
        "max_tokens": 1024,
    },
)
generator.warm_up()

result = generator.run(prompt="What is the answer?")
print(result["replies"])
print(result["meta"])

To use a locally deployed model, set the api_url to your localhost and set api_key to None:

python

from haystack_integrations.components.generators.nvidia import NvidiaGenerator

generator = NvidiaGenerator(
    model="meta/llama-3.1-8b-instruct",
    api_url="http://localhost:9999/v1",
    api_key=None,
    model_arguments={
        "temperature": 0.2,
    },
)
generator.warm_up()

result = generator.run(prompt="What is the answer?")
print(result["replies"])
print(result["meta"])

In a pipeline

The following example shows a RAG pipeline:

python

from haystack import Pipeline, Document
from haystack.utils.auth import Secret
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.generators.nvidia import NvidiaGenerator

docstore = InMemoryDocumentStore()
docstore.write_documents([
    Document(content="Rome is the capital of Italy"),
    Document(content="Paris is the capital of France"),
])

query = "What is the capital of France?"

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{ query }}?
"""

pipe = Pipeline()

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component(
    "llm",
    NvidiaGenerator(
        model="meta/llama-3.1-70b-instruct",
        api_url="https://integrate.api.nvidia.com/v1",
        api_key=Secret.from_token("<your-api-key>"),
        model_arguments={
            "temperature": 0.2,
            "top_p": 0.7,
            "max_tokens": 1024,
        },
    ),
)
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

res = pipe.run({
    "prompt_builder": {"query": query},
    "retriever": {"query": query},
})

print(res)

Cookbook: Haystack RAG Pipeline with Self-Deployed AI models using NVIDIA NIMs

Overview​

Usage​

On its own​

In a pipeline​

Related​

Overview

Usage

On its own

In a pipeline

Related