Skip to content

Responses

OpenResponses API

The Responses API in any-llm implements the OpenResponses specification—an open-source standard for building multi-provider, interoperable LLM interfaces for agentic AI systems.

Return Types

The responses() and aresponses() functions return different types depending on the provider's level of OpenResponses compliance:

Return Type When Returned
openresponses_types.ResponseResource Providers fully compliant with the OpenResponses specification
openai.types.responses.Response Providers using OpenAI's native Responses API (not yet fully OpenResponses-compliant)
Iterator[dict] / AsyncIterator[dict] When stream=True is set

Both ResponseResource and Response share a similar structure, so in many cases you can access common fields like output without type checking.

any_llm.api.responses(model, input_data, *, provider=None, tools=None, tool_choice=None, max_output_tokens=None, temperature=None, top_p=None, stream=None, api_key=None, api_base=None, instructions=None, max_tool_calls=None, parallel_tool_calls=None, reasoning=None, text=None, presence_penalty=None, frequency_penalty=None, truncation=None, store=None, service_tier=None, user=None, metadata=None, previous_response_id=None, include=None, background=None, safety_identifier=None, prompt_cache_key=None, prompt_cache_retention=None, conversation=None, client_args=None, **kwargs)

Create a response using the OpenResponses API.

This implements the OpenResponses specification and returns either openresponses_types.ResponseResource (for OpenResponses-compliant providers) or openai.types.responses.Response (for providers using OpenAI's native API). If stream=True, an iterator of any_llm.types.responses.ResponseStreamEvent items is returned.

Parameters:

Name Type Description Default
model str

Model identifier in format 'provider/model' (e.g., 'openai/gpt-4o'). If provider is provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai/gpt-4o'.

required
provider str | LLMProvider | None

Provider name to use for the request. If provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai:gpt-4o'.

None
input_data str | ResponseInputParam

The input payload accepted by provider's Responses API. For OpenAI-compatible providers, this is typically a list mixing text, images, and tool instructions, or a dict per OpenAI spec.

required
tools list[dict[str, Any] | Callable[..., Any]] | None

Optional tools for tool calling (Python callables or OpenAI tool dicts)

None
tool_choice str | dict[str, Any] | None

Controls which tools the model can call

None
max_output_tokens int | None

Maximum number of output tokens to generate

None
temperature float | None

Controls randomness in the response (0.0 to 2.0)

None
top_p float | None

Controls diversity via nucleus sampling (0.0 to 1.0)

None
stream bool | None

Whether to stream response events

None
api_key str | None

API key for the provider

None
api_base str | None

Base URL for the provider API

None
instructions str | None

A system (or developer) message inserted into the model's context.

None
max_tool_calls int | None

The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.

None
parallel_tool_calls int | None

Whether to allow the model to run tool calls in parallel.

None
reasoning Any | None

Configuration options for reasoning models.

None
text Any | None

Configuration options for a text response from the model. Can be plain text or structured JSON data.

None
presence_penalty float | None

Penalizes new tokens based on whether they appear in the text so far.

None
frequency_penalty float | None

Penalizes new tokens based on their frequency in the text so far.

None
truncation str | None

Controls how the service truncates input when it exceeds the model context window.

None
store bool | None

Whether to store the response so it can be retrieved later.

None
service_tier str | None

The service tier to use for this request.

None
user str | None

A unique identifier representing your end user.

None
metadata dict[str, str] | None

Key-value pairs for custom metadata (up to 16 pairs).

None
previous_response_id str | None

The ID of the response to use as the prior turn for this request.

None
include list[str] | None

Items to include in the response (e.g., 'reasoning.encrypted_content').

None
background bool | None

Whether to run the request in the background and return immediately.

None
safety_identifier str | None

A stable identifier used for safety monitoring and abuse detection.

None
prompt_cache_key str | None

A key to use when reading from or writing to the prompt cache.

None
prompt_cache_retention str | None

How long to retain a prompt cache entry created by this request.

None
conversation str | dict[str, Any] | None

The conversation to associate this response with (ID string or ConversationParam object).

None
client_args dict[str, Any] | None

Additional provider-specific arguments that will be passed to the provider's client instantiation.

None
**kwargs Any

Additional provider-specific arguments that will be passed to the provider's API call.

{}

Returns:

Type Description
ResponseResource | Response | Iterator[ResponseStreamEvent]

Either a ResponseResource object (OpenResponses-compliant providers),

ResponseResource | Response | Iterator[ResponseStreamEvent]

a Response object (non-compliant providers), or an iterator of

ResponseResource | Response | Iterator[ResponseStreamEvent]

ResponseStreamEvent (streaming).

Raises:

Type Description
NotImplementedError

If the selected provider does not support the Responses API.

Source code in src/any_llm/api.py
def responses(
    model: str,
    input_data: str | ResponseInputParam,
    *,
    provider: str | LLMProvider | None = None,
    tools: list[dict[str, Any] | Callable[..., Any]] | None = None,
    tool_choice: str | dict[str, Any] | None = None,
    max_output_tokens: int | None = None,
    temperature: float | None = None,
    top_p: float | None = None,
    stream: bool | None = None,
    api_key: str | None = None,
    api_base: str | None = None,
    instructions: str | None = None,
    max_tool_calls: int | None = None,
    parallel_tool_calls: int | None = None,
    reasoning: Any | None = None,
    text: Any | None = None,
    presence_penalty: float | None = None,
    frequency_penalty: float | None = None,
    truncation: str | None = None,
    store: bool | None = None,
    service_tier: str | None = None,
    user: str | None = None,
    metadata: dict[str, str] | None = None,
    previous_response_id: str | None = None,
    include: list[str] | None = None,
    background: bool | None = None,
    safety_identifier: str | None = None,
    prompt_cache_key: str | None = None,
    prompt_cache_retention: str | None = None,
    conversation: str | dict[str, Any] | None = None,
    client_args: dict[str, Any] | None = None,
    **kwargs: Any,
) -> ResponseResource | Response | Iterator[ResponseStreamEvent]:
    """Create a response using the OpenResponses API.

    This implements the OpenResponses specification and returns either
    `openresponses_types.ResponseResource` (for OpenResponses-compliant providers)
    or `openai.types.responses.Response` (for providers using OpenAI's native API).
    If `stream=True`, an iterator of `any_llm.types.responses.ResponseStreamEvent` items is returned.

    Args:
        model: Model identifier in format 'provider/model' (e.g., 'openai/gpt-4o'). If provider is provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai/gpt-4o'.
        provider: Provider name to use for the request. If provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai:gpt-4o'.
        input_data: The input payload accepted by provider's Responses API.
            For OpenAI-compatible providers, this is typically a list mixing
            text, images, and tool instructions, or a dict per OpenAI spec.
        tools: Optional tools for tool calling (Python callables or OpenAI tool dicts)
        tool_choice: Controls which tools the model can call
        max_output_tokens: Maximum number of output tokens to generate
        temperature: Controls randomness in the response (0.0 to 2.0)
        top_p: Controls diversity via nucleus sampling (0.0 to 1.0)
        stream: Whether to stream response events
        api_key: API key for the provider
        api_base: Base URL for the provider API
        instructions: A system (or developer) message inserted into the model's context.
        max_tool_calls: The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.
        parallel_tool_calls: Whether to allow the model to run tool calls in parallel.
        reasoning: Configuration options for reasoning models.
        text: Configuration options for a text response from the model. Can be plain text or structured JSON data.
        presence_penalty: Penalizes new tokens based on whether they appear in the text so far.
        frequency_penalty: Penalizes new tokens based on their frequency in the text so far.
        truncation: Controls how the service truncates input when it exceeds the model context window.
        store: Whether to store the response so it can be retrieved later.
        service_tier: The service tier to use for this request.
        user: A unique identifier representing your end user.
        metadata: Key-value pairs for custom metadata (up to 16 pairs).
        previous_response_id: The ID of the response to use as the prior turn for this request.
        include: Items to include in the response (e.g., 'reasoning.encrypted_content').
        background: Whether to run the request in the background and return immediately.
        safety_identifier: A stable identifier used for safety monitoring and abuse detection.
        prompt_cache_key: A key to use when reading from or writing to the prompt cache.
        prompt_cache_retention: How long to retain a prompt cache entry created by this request.
        conversation: The conversation to associate this response with (ID string or ConversationParam object).
        client_args: Additional provider-specific arguments that will be passed to the provider's client instantiation.
        **kwargs: Additional provider-specific arguments that will be passed to the provider's API call.

    Returns:
        Either a `ResponseResource` object (OpenResponses-compliant providers),
        a `Response` object (non-compliant providers), or an iterator of
        `ResponseStreamEvent` (streaming).

    Raises:
        NotImplementedError: If the selected provider does not support the Responses API.

    """
    if provider is None:
        provider_key, model_id = AnyLLM.split_model_provider(model)
    else:
        provider_key = LLMProvider.from_string(provider)
        model_id = model

    llm = AnyLLM.create(
        provider_key,
        api_key=api_key,
        api_base=api_base,
        **client_args or {},
    )
    return llm.responses(
        model=model_id,
        input_data=input_data,
        tools=tools,
        tool_choice=tool_choice,
        max_output_tokens=max_output_tokens,
        temperature=temperature,
        top_p=top_p,
        stream=stream,
        instructions=instructions,
        max_tool_calls=max_tool_calls,
        parallel_tool_calls=parallel_tool_calls,
        reasoning=reasoning,
        text=text,
        presence_penalty=presence_penalty,
        frequency_penalty=frequency_penalty,
        truncation=truncation,
        store=store,
        service_tier=service_tier,
        user=user,
        metadata=metadata,
        previous_response_id=previous_response_id,
        include=include,
        background=background,
        safety_identifier=safety_identifier,
        prompt_cache_key=prompt_cache_key,
        prompt_cache_retention=prompt_cache_retention,
        conversation=conversation,
        **kwargs,
    )

any_llm.api.aresponses(model, input_data, *, provider=None, tools=None, tool_choice=None, max_output_tokens=None, temperature=None, top_p=None, stream=None, api_key=None, api_base=None, instructions=None, max_tool_calls=None, parallel_tool_calls=None, reasoning=None, text=None, presence_penalty=None, frequency_penalty=None, truncation=None, store=None, service_tier=None, user=None, metadata=None, previous_response_id=None, include=None, background=None, safety_identifier=None, prompt_cache_key=None, prompt_cache_retention=None, conversation=None, client_args=None, **kwargs) async

Create a response using the OpenResponses API.

This implements the OpenResponses specification and returns either openresponses_types.ResponseResource (for OpenResponses-compliant providers) or openai.types.responses.Response (for providers using OpenAI's native API). If stream=True, an iterator of any_llm.types.responses.ResponseStreamEvent items is returned.

Parameters:

Name Type Description Default
model str

Model identifier in format 'provider/model' (e.g., 'openai/gpt-4o'). If provider is provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai/gpt-4o'.

required
provider str | LLMProvider | None

Provider name to use for the request. If provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai:gpt-4o'.

None
input_data str | ResponseInputParam

The input payload accepted by provider's Responses API. For OpenAI-compatible providers, this is typically a list mixing text, images, and tool instructions, or a dict per OpenAI spec.

required
tools list[dict[str, Any] | Callable[..., Any]] | None

Optional tools for tool calling (Python callables or OpenAI tool dicts)

None
tool_choice str | dict[str, Any] | None

Controls which tools the model can call

None
max_output_tokens int | None

Maximum number of output tokens to generate

None
temperature float | None

Controls randomness in the response (0.0 to 2.0)

None
top_p float | None

Controls diversity via nucleus sampling (0.0 to 1.0)

None
stream bool | None

Whether to stream response events

None
api_key str | None

API key for the provider

None
api_base str | None

Base URL for the provider API

None
instructions str | None

A system (or developer) message inserted into the model's context.

None
max_tool_calls int | None

The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.

None
parallel_tool_calls int | None

Whether to allow the model to run tool calls in parallel.

None
reasoning Any | None

Configuration options for reasoning models.

None
text Any | None

Configuration options for a text response from the model. Can be plain text or structured JSON data.

None
presence_penalty float | None

Penalizes new tokens based on whether they appear in the text so far.

None
frequency_penalty float | None

Penalizes new tokens based on their frequency in the text so far.

None
truncation str | None

Controls how the service truncates input when it exceeds the model context window.

None
store bool | None

Whether to store the response so it can be retrieved later.

None
service_tier str | None

The service tier to use for this request.

None
user str | None

A unique identifier representing your end user.

None
metadata dict[str, str] | None

Key-value pairs for custom metadata (up to 16 pairs).

None
previous_response_id str | None

The ID of the response to use as the prior turn for this request.

None
include list[str] | None

Items to include in the response (e.g., 'reasoning.encrypted_content').

None
background bool | None

Whether to run the request in the background and return immediately.

None
safety_identifier str | None

A stable identifier used for safety monitoring and abuse detection.

None
prompt_cache_key str | None

A key to use when reading from or writing to the prompt cache.

None
prompt_cache_retention str | None

How long to retain a prompt cache entry created by this request.

None
conversation str | dict[str, Any] | None

The conversation to associate this response with (ID string or ConversationParam object).

None
client_args dict[str, Any] | None

Additional provider-specific arguments that will be passed to the provider's client instantiation.

None
**kwargs Any

Additional provider-specific arguments that will be passed to the provider's API call.

{}

Returns:

Type Description
ResponseResource | Response | AsyncIterator[ResponseStreamEvent]

Either a ResponseResource object (OpenResponses-compliant providers),

ResponseResource | Response | AsyncIterator[ResponseStreamEvent]

a Response object (non-compliant providers), or an iterator of

ResponseResource | Response | AsyncIterator[ResponseStreamEvent]

ResponseStreamEvent (streaming).

Raises:

Type Description
NotImplementedError

If the selected provider does not support the Responses API.

Source code in src/any_llm/api.py
async def aresponses(
    model: str,
    input_data: str | ResponseInputParam,
    *,
    provider: str | LLMProvider | None = None,
    tools: list[dict[str, Any] | Callable[..., Any]] | None = None,
    tool_choice: str | dict[str, Any] | None = None,
    max_output_tokens: int | None = None,
    temperature: float | None = None,
    top_p: float | None = None,
    stream: bool | None = None,
    api_key: str | None = None,
    api_base: str | None = None,
    instructions: str | None = None,
    max_tool_calls: int | None = None,
    parallel_tool_calls: int | None = None,
    reasoning: Any | None = None,
    text: Any | None = None,
    presence_penalty: float | None = None,
    frequency_penalty: float | None = None,
    truncation: str | None = None,
    store: bool | None = None,
    service_tier: str | None = None,
    user: str | None = None,
    metadata: dict[str, str] | None = None,
    previous_response_id: str | None = None,
    include: list[str] | None = None,
    background: bool | None = None,
    safety_identifier: str | None = None,
    prompt_cache_key: str | None = None,
    prompt_cache_retention: str | None = None,
    conversation: str | dict[str, Any] | None = None,
    client_args: dict[str, Any] | None = None,
    **kwargs: Any,
) -> ResponseResource | Response | AsyncIterator[ResponseStreamEvent]:
    """Create a response using the OpenResponses API.

    This implements the OpenResponses specification and returns either
    `openresponses_types.ResponseResource` (for OpenResponses-compliant providers)
    or `openai.types.responses.Response` (for providers using OpenAI's native API).
    If `stream=True`, an iterator of `any_llm.types.responses.ResponseStreamEvent` items is returned.

    Args:
        model: Model identifier in format 'provider/model' (e.g., 'openai/gpt-4o'). If provider is provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai/gpt-4o'.
        provider: Provider name to use for the request. If provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai:gpt-4o'.
        input_data: The input payload accepted by provider's Responses API.
            For OpenAI-compatible providers, this is typically a list mixing
            text, images, and tool instructions, or a dict per OpenAI spec.
        tools: Optional tools for tool calling (Python callables or OpenAI tool dicts)
        tool_choice: Controls which tools the model can call
        max_output_tokens: Maximum number of output tokens to generate
        temperature: Controls randomness in the response (0.0 to 2.0)
        top_p: Controls diversity via nucleus sampling (0.0 to 1.0)
        stream: Whether to stream response events
        api_key: API key for the provider
        api_base: Base URL for the provider API
        instructions: A system (or developer) message inserted into the model's context.
        max_tool_calls: The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.
        parallel_tool_calls: Whether to allow the model to run tool calls in parallel.
        reasoning: Configuration options for reasoning models.
        text: Configuration options for a text response from the model. Can be plain text or structured JSON data.
        presence_penalty: Penalizes new tokens based on whether they appear in the text so far.
        frequency_penalty: Penalizes new tokens based on their frequency in the text so far.
        truncation: Controls how the service truncates input when it exceeds the model context window.
        store: Whether to store the response so it can be retrieved later.
        service_tier: The service tier to use for this request.
        user: A unique identifier representing your end user.
        metadata: Key-value pairs for custom metadata (up to 16 pairs).
        previous_response_id: The ID of the response to use as the prior turn for this request.
        include: Items to include in the response (e.g., 'reasoning.encrypted_content').
        background: Whether to run the request in the background and return immediately.
        safety_identifier: A stable identifier used for safety monitoring and abuse detection.
        prompt_cache_key: A key to use when reading from or writing to the prompt cache.
        prompt_cache_retention: How long to retain a prompt cache entry created by this request.
        conversation: The conversation to associate this response with (ID string or ConversationParam object).
        client_args: Additional provider-specific arguments that will be passed to the provider's client instantiation.
        **kwargs: Additional provider-specific arguments that will be passed to the provider's API call.

    Returns:
        Either a `ResponseResource` object (OpenResponses-compliant providers),
        a `Response` object (non-compliant providers), or an iterator of
        `ResponseStreamEvent` (streaming).

    Raises:
        NotImplementedError: If the selected provider does not support the Responses API.

    """
    if provider is None:
        provider_key, model_id = AnyLLM.split_model_provider(model)
    else:
        provider_key = LLMProvider.from_string(provider)
        model_id = model

    llm = AnyLLM.create(
        provider_key,
        api_key=api_key,
        api_base=api_base,
        **client_args or {},
    )
    return await llm.aresponses(
        model=model_id,
        input_data=input_data,
        tools=tools,
        tool_choice=tool_choice,
        max_output_tokens=max_output_tokens,
        temperature=temperature,
        top_p=top_p,
        stream=stream,
        instructions=instructions,
        max_tool_calls=max_tool_calls,
        parallel_tool_calls=parallel_tool_calls,
        reasoning=reasoning,
        text=text,
        presence_penalty=presence_penalty,
        frequency_penalty=frequency_penalty,
        truncation=truncation,
        store=store,
        service_tier=service_tier,
        user=user,
        metadata=metadata,
        previous_response_id=previous_response_id,
        include=include,
        background=background,
        safety_identifier=safety_identifier,
        prompt_cache_key=prompt_cache_key,
        prompt_cache_retention=prompt_cache_retention,
        conversation=conversation,
        **kwargs,
    )