Skip to content

Responses

Responses

Warning

This API is experimental and subject to changes based upon our experience as we integrate additional providers. Use with caution.

any_llm.responses(model, input_data, *, provider=None, tools=None, tool_choice=None, max_output_tokens=None, temperature=None, top_p=None, stream=None, api_key=None, api_base=None, api_timeout=None, user=None, instructions=None, max_tool_calls=None, parallel_tool_calls=None, reasoning=None, text=None, **kwargs)

Create a response using the OpenAI-style Responses API.

This follows the OpenAI Responses API shape and returns the aliased any_llm.types.responses.Response type. If stream=True, an iterator of any_llm.types.responses.ResponseStreamEvent items is returned.

Parameters:

Name Type Description Default
model str

Model identifier in format 'provider/model' (e.g., 'openai/gpt-4o'). If provider is provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai/gpt-4o'.

required
provider str | ProviderName | None

Provider name to use for the request. If provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai:gpt-4o'.

None
input_data str | ResponseInputParam

The input payload accepted by provider's Responses API. For OpenAI-compatible providers, this is typically a list mixing text, images, and tool instructions, or a dict per OpenAI spec.

required
tools list[dict[str, Any] | Callable[..., Any]] | None

Optional tools for tool calling (Python callables or OpenAI tool dicts)

None
tool_choice str | dict[str, Any] | None

Controls which tools the model can call

None
max_output_tokens int | None

Maximum number of output tokens to generate

None
temperature float | None

Controls randomness in the response (0.0 to 2.0)

None
top_p float | None

Controls diversity via nucleus sampling (0.0 to 1.0)

None
stream bool | None

Whether to stream response events

None
api_key str | None

API key for the provider

None
api_base str | None

Base URL for the provider API

None
api_timeout float | None

Request timeout in seconds

None
user str | None

Unique identifier for the end user

None
instructions str | None

A system (or developer) message inserted into the model's context.

None
max_tool_calls int | None

The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.

None
parallel_tool_calls int | None

Whether to allow the model to run tool calls in parallel.

None
reasoning Any | None

Configuration options for reasoning models.

None
text Any | None

Configuration options for a text response from the model. Can be plain text or structured JSON data.

None
**kwargs Any

Additional provider-specific parameters

{}

Returns:

Type Description
Response | Iterator[ResponseStreamEvent]

Either a Response object (non-streaming) or an iterator of

Response | Iterator[ResponseStreamEvent]

ResponseStreamEvent (streaming).

Raises:

Type Description
NotImplementedError

If the selected provider does not support the Responses API.

Source code in src/any_llm/api.py
def responses(
    model: str,
    input_data: str | ResponseInputParam,
    *,
    provider: str | ProviderName | None = None,
    tools: list[dict[str, Any] | Callable[..., Any]] | None = None,
    tool_choice: str | dict[str, Any] | None = None,
    max_output_tokens: int | None = None,
    temperature: float | None = None,
    top_p: float | None = None,
    stream: bool | None = None,
    api_key: str | None = None,
    api_base: str | None = None,
    api_timeout: float | None = None,
    user: str | None = None,
    instructions: str | None = None,
    max_tool_calls: int | None = None,
    parallel_tool_calls: int | None = None,
    reasoning: Any | None = None,
    text: Any | None = None,
    **kwargs: Any,
) -> Response | Iterator[ResponseStreamEvent]:
    """Create a response using the OpenAI-style Responses API.

    This follows the OpenAI Responses API shape and returns the aliased
    `any_llm.types.responses.Response` type. If `stream=True`, an iterator of
    `any_llm.types.responses.ResponseStreamEvent` items is returned.

    Args:
        model: Model identifier in format 'provider/model' (e.g., 'openai/gpt-4o'). If provider is provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai/gpt-4o'.
        provider: Provider name to use for the request. If provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai:gpt-4o'.
        input_data: The input payload accepted by provider's Responses API.
            For OpenAI-compatible providers, this is typically a list mixing
            text, images, and tool instructions, or a dict per OpenAI spec.
        tools: Optional tools for tool calling (Python callables or OpenAI tool dicts)
        tool_choice: Controls which tools the model can call
        max_output_tokens: Maximum number of output tokens to generate
        temperature: Controls randomness in the response (0.0 to 2.0)
        top_p: Controls diversity via nucleus sampling (0.0 to 1.0)
        stream: Whether to stream response events
        api_key: API key for the provider
        api_base: Base URL for the provider API
        api_timeout: Request timeout in seconds
        user: Unique identifier for the end user
        instructions: A system (or developer) message inserted into the model's context.
        max_tool_calls: The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.
        parallel_tool_calls: Whether to allow the model to run tool calls in parallel.
        reasoning: Configuration options for reasoning models.
        text: Configuration options for a text response from the model. Can be plain text or structured JSON data.

        **kwargs: Additional provider-specific parameters

    Returns:
        Either a `Response` object (non-streaming) or an iterator of
        `ResponseStreamEvent` (streaming).

    Raises:
        NotImplementedError: If the selected provider does not support the Responses API.

    """
    if provider is None:
        provider_key, model_name = ProviderFactory.split_model_provider(model)
    else:
        provider_key = ProviderName.from_string(provider)
        model_name = model

    config: dict[str, str] = {}
    if api_key:
        config["api_key"] = str(api_key)
    if api_base:
        config["api_base"] = str(api_base)
    api_config = ApiConfig(**config)

    provider_instance = ProviderFactory.create_provider(provider_key, api_config)

    responses_kwargs = kwargs.copy()
    if tools is not None:
        responses_kwargs["tools"] = prepare_tools(tools)
    if tool_choice is not None:
        responses_kwargs["tool_choice"] = tool_choice
    if max_output_tokens is not None:
        responses_kwargs["max_output_tokens"] = max_output_tokens
    if temperature is not None:
        responses_kwargs["temperature"] = temperature
    if top_p is not None:
        responses_kwargs["top_p"] = top_p
    if stream is not None:
        responses_kwargs["stream"] = stream
    if api_timeout is not None:
        responses_kwargs["timeout"] = api_timeout
    if user is not None:
        responses_kwargs["user"] = user
    if instructions is not None:
        responses_kwargs["instructions"] = instructions
    if max_tool_calls is not None:
        responses_kwargs["max_tool_calls"] = max_tool_calls
    if parallel_tool_calls is not None:
        responses_kwargs["parallel_tool_calls"] = parallel_tool_calls
    if reasoning is not None:
        responses_kwargs["reasoning"] = reasoning
    if text is not None:
        responses_kwargs["text"] = text

    return provider_instance.responses(model_name, input_data, **responses_kwargs)

any_llm.aresponses(model, input_data, *, provider=None, tools=None, tool_choice=None, max_output_tokens=None, temperature=None, top_p=None, stream=None, api_key=None, api_base=None, api_timeout=None, user=None, instructions=None, max_tool_calls=None, parallel_tool_calls=None, reasoning=None, text=None, **kwargs) async

Create a response using the OpenAI-style Responses API.

This follows the OpenAI Responses API shape and returns the aliased any_llm.types.responses.Response type. If stream=True, an iterator of any_llm.types.responses.ResponseStreamEvent items is returned.

Parameters:

Name Type Description Default
model str

Model identifier in format 'provider/model' (e.g., 'openai/gpt-4o'). If provider is provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai/gpt-4o'.

required
provider str | ProviderName | None

Provider name to use for the request. If provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai:gpt-4o'.

None
input_data str | ResponseInputParam

The input payload accepted by provider's Responses API. For OpenAI-compatible providers, this is typically a list mixing text, images, and tool instructions, or a dict per OpenAI spec.

required
tools list[dict[str, Any] | Callable[..., Any]] | None

Optional tools for tool calling (Python callables or OpenAI tool dicts)

None
tool_choice str | dict[str, Any] | None

Controls which tools the model can call

None
max_output_tokens int | None

Maximum number of output tokens to generate

None
temperature float | None

Controls randomness in the response (0.0 to 2.0)

None
top_p float | None

Controls diversity via nucleus sampling (0.0 to 1.0)

None
stream bool | None

Whether to stream response events

None
api_key str | None

API key for the provider

None
api_base str | None

Base URL for the provider API

None
api_timeout float | None

Request timeout in seconds

None
user str | None

Unique identifier for the end user

None
instructions str | None

A system (or developer) message inserted into the model's context.

None
max_tool_calls int | None

The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.

None
parallel_tool_calls int | None

Whether to allow the model to run tool calls in parallel.

None
reasoning Any | None

Configuration options for reasoning models.

None
text Any | None

Configuration options for a text response from the model. Can be plain text or structured JSON data.

None
**kwargs Any

Additional provider-specific parameters

{}

Returns:

Type Description
Response | AsyncIterator[ResponseStreamEvent]

Either a Response object (non-streaming) or an iterator of

Response | AsyncIterator[ResponseStreamEvent]

ResponseStreamEvent (streaming).

Raises:

Type Description
NotImplementedError

If the selected provider does not support the Responses API.

Source code in src/any_llm/api.py
async def aresponses(
    model: str,
    input_data: str | ResponseInputParam,
    *,
    provider: str | ProviderName | None = None,
    tools: list[dict[str, Any] | Callable[..., Any]] | None = None,
    tool_choice: str | dict[str, Any] | None = None,
    max_output_tokens: int | None = None,
    temperature: float | None = None,
    top_p: float | None = None,
    stream: bool | None = None,
    api_key: str | None = None,
    api_base: str | None = None,
    api_timeout: float | None = None,
    user: str | None = None,
    instructions: str | None = None,
    max_tool_calls: int | None = None,
    parallel_tool_calls: int | None = None,
    reasoning: Any | None = None,
    text: Any | None = None,
    **kwargs: Any,
) -> Response | AsyncIterator[ResponseStreamEvent]:
    """Create a response using the OpenAI-style Responses API.

    This follows the OpenAI Responses API shape and returns the aliased
    `any_llm.types.responses.Response` type. If `stream=True`, an iterator of
    `any_llm.types.responses.ResponseStreamEvent` items is returned.

    Args:
        model: Model identifier in format 'provider/model' (e.g., 'openai/gpt-4o'). If provider is provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai/gpt-4o'.
        provider: Provider name to use for the request. If provided, we assume that the model does not contain the provider name. Otherwise, we assume that the model contains the provider name, like 'openai:gpt-4o'.
        input_data: The input payload accepted by provider's Responses API.
            For OpenAI-compatible providers, this is typically a list mixing
            text, images, and tool instructions, or a dict per OpenAI spec.
        tools: Optional tools for tool calling (Python callables or OpenAI tool dicts)
        tool_choice: Controls which tools the model can call
        max_output_tokens: Maximum number of output tokens to generate
        temperature: Controls randomness in the response (0.0 to 2.0)
        top_p: Controls diversity via nucleus sampling (0.0 to 1.0)
        stream: Whether to stream response events
        api_key: API key for the provider
        api_base: Base URL for the provider API
        api_timeout: Request timeout in seconds
        user: Unique identifier for the end user
        instructions: A system (or developer) message inserted into the model's context.
        max_tool_calls: The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.
        parallel_tool_calls: Whether to allow the model to run tool calls in parallel.
        reasoning: Configuration options for reasoning models.
        text: Configuration options for a text response from the model. Can be plain text or structured JSON data.

        **kwargs: Additional provider-specific parameters

    Returns:
        Either a `Response` object (non-streaming) or an iterator of
        `ResponseStreamEvent` (streaming).

    Raises:
        NotImplementedError: If the selected provider does not support the Responses API.

    """
    if provider is None:
        provider_key, model_name = ProviderFactory.split_model_provider(model)
    else:
        provider_key = ProviderName.from_string(provider)
        model_name = model

    config: dict[str, str] = {}
    if api_key:
        config["api_key"] = str(api_key)
    if api_base:
        config["api_base"] = str(api_base)
    api_config = ApiConfig(**config)

    provider_instance = ProviderFactory.create_provider(provider_key, api_config)

    responses_kwargs = kwargs.copy()
    if tools is not None:
        responses_kwargs["tools"] = prepare_tools(tools)
    if tool_choice is not None:
        responses_kwargs["tool_choice"] = tool_choice
    if max_output_tokens is not None:
        responses_kwargs["max_output_tokens"] = max_output_tokens
    if temperature is not None:
        responses_kwargs["temperature"] = temperature
    if top_p is not None:
        responses_kwargs["top_p"] = top_p
    if stream is not None:
        responses_kwargs["stream"] = stream
    if api_timeout is not None:
        responses_kwargs["timeout"] = api_timeout
    if user is not None:
        responses_kwargs["user"] = user
    if instructions is not None:
        responses_kwargs["instructions"] = instructions
    if max_tool_calls is not None:
        responses_kwargs["max_tool_calls"] = max_tool_calls
    if parallel_tool_calls is not None:
        responses_kwargs["parallel_tool_calls"] = parallel_tool_calls
    if reasoning is not None:
        responses_kwargs["reasoning"] = reasoning
    if text is not None:
        responses_kwargs["text"] = text

    return await provider_instance.aresponses(model_name, input_data, **responses_kwargs)