Responses

Warning

This API is experimental and subject to changes based upon our experience as we integrate additional providers. Use with caution.

`any_llm.api.responses(model, input_data, *, provider=None, tools=None, tool_choice=None, max_output_tokens=None, temperature=None, top_p=None, stream=None, api_key=None, api_base=None, instructions=None, max_tool_calls=None, parallel_tool_calls=None, reasoning=None, text=None, client_args=None, **kwargs)`

Create a response using the OpenAI-style Responses API.

This follows the OpenAI Responses API shape and returns the aliased any_llm.types.responses.Response type. If stream=True, an iterator of any_llm.types.responses.ResponseStreamEvent items is returned.

Parameters:

Name	Type	Description	Default
`model`	`str`	Model identifier in format 'provider/model' (e.g., 'openai/gpt-4o'). If provider is provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai/gpt-4o'.	required
`provider`	`str \| LLMProvider \| None`	Provider name to use for the request. If provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai:gpt-4o'.	`None`
`input_data`	`str \| ResponseInputParam`	The input payload accepted by provider's Responses API. For OpenAI-compatible providers, this is typically a list mixing text, images, and tool instructions, or a dict per OpenAI spec.	required
`tools`	`list[dict[str, Any] \| Callable[..., Any]] \| None`	Optional tools for tool calling (Python callables or OpenAI tool dicts)	`None`
`tool_choice`	`str \| dict[str, Any] \| None`	Controls which tools the model can call	`None`
`max_output_tokens`	`int \| None`	Maximum number of output tokens to generate	`None`
`temperature`	`float \| None`	Controls randomness in the response (0.0 to 2.0)	`None`
`top_p`	`float \| None`	Controls diversity via nucleus sampling (0.0 to 1.0)	`None`
`stream`	`bool \| None`	Whether to stream response events	`None`
`api_key`	`str \| None`	API key for the provider	`None`
`api_base`	`str \| None`	Base URL for the provider API	`None`
`instructions`	`str \| None`	A system (or developer) message inserted into the model's context.	`None`
`max_tool_calls`	`int \| None`	The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.	`None`
`parallel_tool_calls`	`int \| None`	Whether to allow the model to run tool calls in parallel.	`None`
`reasoning`	`Any \| None`	Configuration options for reasoning models.	`None`
`text`	`Any \| None`	Configuration options for a text response from the model. Can be plain text or structured JSON data.	`None`
`client_args`	`dict[str, Any] \| None`	Additional provider-specific arguments that will be passed to the provider's client instantiation.	`None`
`**kwargs`	`Any`	Additional provider-specific arguments that will be passed to the provider's API call.	`{}`

Returns:

Type	Description
`Response \| Iterator[ResponseStreamEvent]`	Either a `Response` object (non-streaming) or an iterator of
`Response \| Iterator[ResponseStreamEvent]`	`ResponseStreamEvent` (streaming).

Raises:

Type	Description
`NotImplementedError`	If the selected provider does not support the Responses API.

Source code in src/any_llm/api.py

def responses(
    model: str,
    input_data: str | ResponseInputParam,
    *,
    provider: str | LLMProvider | None = None,
    tools: list[dict[str, Any] | Callable[..., Any]] | None = None,
    tool_choice: str | dict[str, Any] | None = None,
    max_output_tokens: int | None = None,
    temperature: float | None = None,
    top_p: float | None = None,
    stream: bool | None = None,
    api_key: str | None = None,
    api_base: str | None = None,
    instructions: str | None = None,
    max_tool_calls: int | None = None,
    parallel_tool_calls: int | None = None,
    reasoning: Any | None = None,
    text: Any | None = None,
    client_args: dict[str, Any] | None = None,
    **kwargs: Any,
) -> Response | Iterator[ResponseStreamEvent]:
    """Create a response using the OpenAI-style Responses API.

    This follows the OpenAI Responses API shape and returns the aliased
    `any_llm.types.responses.Response` type. If `stream=True`, an iterator of
    `any_llm.types.responses.ResponseStreamEvent` items is returned.

    Args:
        model: Model identifier in format 'provider/model' (e.g., 'openai/gpt-4o'). If provider is provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai/gpt-4o'.
        provider: Provider name to use for the request. If provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai:gpt-4o'.
        input_data: The input payload accepted by provider's Responses API.
            For OpenAI-compatible providers, this is typically a list mixing
            text, images, and tool instructions, or a dict per OpenAI spec.
        tools: Optional tools for tool calling (Python callables or OpenAI tool dicts)
        tool_choice: Controls which tools the model can call
        max_output_tokens: Maximum number of output tokens to generate
        temperature: Controls randomness in the response (0.0 to 2.0)
        top_p: Controls diversity via nucleus sampling (0.0 to 1.0)
        stream: Whether to stream response events
        api_key: API key for the provider
        api_base: Base URL for the provider API
        instructions: A system (or developer) message inserted into the model's context.
        max_tool_calls: The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.
        parallel_tool_calls: Whether to allow the model to run tool calls in parallel.
        reasoning: Configuration options for reasoning models.
        text: Configuration options for a text response from the model. Can be plain text or structured JSON data.
        client_args: Additional provider-specific arguments that will be passed to the provider's client instantiation.
        **kwargs: Additional provider-specific arguments that will be passed to the provider's API call.

    Returns:
        Either a `Response` object (non-streaming) or an iterator of
        `ResponseStreamEvent` (streaming).

    Raises:
        NotImplementedError: If the selected provider does not support the Responses API.

    """
    all_args = locals()
    all_args.pop("provider")
    kwargs = all_args.pop("kwargs")

    model = all_args.pop("model")
    if provider is None:
        provider_key, model_id = AnyLLM.split_model_provider(model)
    else:
        provider_key = LLMProvider.from_string(provider)
        model_id = model
    all_args["model"] = model_id

    llm = AnyLLM.create(
        provider_key,
        api_key=all_args.pop("api_key"),
        api_base=all_args.pop("api_base"),
        **all_args.pop("client_args") or {},
    )
    return llm.responses(**all_args, **kwargs)

`any_llm.api.aresponses(model, input_data, *, provider=None, tools=None, tool_choice=None, max_output_tokens=None, temperature=None, top_p=None, stream=None, api_key=None, api_base=None, instructions=None, max_tool_calls=None, parallel_tool_calls=None, reasoning=None, text=None, client_args=None, **kwargs)` `async`