Completion

`any_llm.api.completion(model, messages, *, provider=None, tools=None, tool_choice=None, temperature=None, top_p=None, max_tokens=None, response_format=None, stream=None, n=None, stop=None, presence_penalty=None, frequency_penalty=None, seed=None, api_key=None, api_base=None, user=None, parallel_tool_calls=None, logprobs=None, top_logprobs=None, logit_bias=None, stream_options=None, max_completion_tokens=None, reasoning_effort='auto', client_args=None, **kwargs)`

Create a chat completion.

Parameters:

Name	Type	Description	Default
`model`	`str`	Model identifier. Recommended: Use with separate `provider` parameter (e.g., model='gpt-4', provider='openai'). Alternative: Combined format 'provider:model' (e.g., 'openai:gpt-4'). Legacy format 'provider/model' is also supported but deprecated.	required
`provider`	`str \| LLMProvider \| None`	Recommended: Provider name to use for the request (e.g., 'openai', 'mistral'). When provided, the model parameter should contain only the model name.	`None`
`messages`	`list[dict[str, Any] \| ChatCompletionMessage]`	List of messages for the conversation	required
`tools`	`list[dict[str, Any] \| Callable[..., Any]] \| None`	List of tools for tool calling. Can be Python callables or OpenAI tool format dicts	`None`
`tool_choice`	`str \| dict[str, Any] \| None`	Controls which tools the model can call	`None`
`temperature`	`float \| None`	Controls randomness in the response (0.0 to 2.0)	`None`
`top_p`	`float \| None`	Controls diversity via nucleus sampling (0.0 to 1.0)	`None`
`max_tokens`	`int \| None`	Maximum number of tokens to generate	`None`
`response_format`	`dict[str, Any] \| type[BaseModel] \| None`	Format specification for the response	`None`
`stream`	`bool \| None`	Whether to stream the response	`None`
`n`	`int \| None`	Number of completions to generate	`None`
`stop`	`str \| list[str] \| None`	Stop sequences for generation	`None`
`presence_penalty`	`float \| None`	Penalize new tokens based on presence in text	`None`
`frequency_penalty`	`float \| None`	Penalize new tokens based on frequency in text	`None`
`seed`	`int \| None`	Random seed for reproducible results	`None`
`api_key`	`str \| None`	API key for the provider	`None`
`api_base`	`str \| None`	Base URL for the provider API	`None`
`user`	`str \| None`	Unique identifier for the end user	`None`
`parallel_tool_calls`	`bool \| None`	Whether to allow parallel tool calls	`None`
`logprobs`	`bool \| None`	Include token-level log probabilities in the response	`None`
`top_logprobs`	`int \| None`	Number of alternatives to return when logprobs are requested	`None`
`logit_bias`	`dict[str, float] \| None`	Bias the likelihood of specified tokens during generation	`None`
`stream_options`	`dict[str, Any] \| None`	Additional options controlling streaming behavior	`None`
`max_completion_tokens`	`int \| None`	Maximum number of tokens for the completion	`None`
`reasoning_effort`	`Literal['minimal', 'low', 'medium', 'high', 'auto'] \| None`	Reasoning effort level for models that support it. "auto" will map to each provider's default.	`'auto'`
`client_args`	`dict[str, Any] \| None`	Additional provider-specific arguments that will be passed to the provider's client instantiation.	`None`
`**kwargs`	`Any`	Additional provider-specific arguments that will be passed to the provider's API call.	`{}`

Returns:

Type	Description
`ChatCompletion \| Iterator[ChatCompletionChunk]`	The completion response from the provider

Source code in src/any_llm/api.py

def completion(
    model: str,
    messages: list[dict[str, Any] | ChatCompletionMessage],
    *,
    provider: str | LLMProvider | None = None,
    tools: list[dict[str, Any] | Callable[..., Any]] | None = None,
    tool_choice: str | dict[str, Any] | None = None,
    temperature: float | None = None,
    top_p: float | None = None,
    max_tokens: int | None = None,
    response_format: dict[str, Any] | type[BaseModel] | None = None,
    stream: bool | None = None,
    n: int | None = None,
    stop: str | list[str] | None = None,
    presence_penalty: float | None = None,
    frequency_penalty: float | None = None,
    seed: int | None = None,
    api_key: str | None = None,
    api_base: str | None = None,
    user: str | None = None,
    parallel_tool_calls: bool | None = None,
    logprobs: bool | None = None,
    top_logprobs: int | None = None,
    logit_bias: dict[str, float] | None = None,
    stream_options: dict[str, Any] | None = None,
    max_completion_tokens: int | None = None,
    reasoning_effort: Literal["minimal", "low", "medium", "high", "auto"] | None = "auto",
    client_args: dict[str, Any] | None = None,
    **kwargs: Any,
) -> ChatCompletion | Iterator[ChatCompletionChunk]:
    """Create a chat completion.

    Args:
        model: Model identifier. **Recommended**: Use with separate `provider` parameter (e.g., model='gpt-4', provider='openai').
            **Alternative**: Combined format 'provider:model' (e.g., 'openai:gpt-4').
            Legacy format 'provider/model' is also supported but deprecated.
        provider: **Recommended**: Provider name to use for the request (e.g., 'openai', 'mistral').
            When provided, the model parameter should contain only the model name.
        messages: List of messages for the conversation
        tools: List of tools for tool calling. Can be Python callables or OpenAI tool format dicts
        tool_choice: Controls which tools the model can call
        temperature: Controls randomness in the response (0.0 to 2.0)
        top_p: Controls diversity via nucleus sampling (0.0 to 1.0)
        max_tokens: Maximum number of tokens to generate
        response_format: Format specification for the response
        stream: Whether to stream the response
        n: Number of completions to generate
        stop: Stop sequences for generation
        presence_penalty: Penalize new tokens based on presence in text
        frequency_penalty: Penalize new tokens based on frequency in text
        seed: Random seed for reproducible results
        api_key: API key for the provider
        api_base: Base URL for the provider API
        user: Unique identifier for the end user
        parallel_tool_calls: Whether to allow parallel tool calls
        logprobs: Include token-level log probabilities in the response
        top_logprobs: Number of alternatives to return when logprobs are requested
        logit_bias: Bias the likelihood of specified tokens during generation
        stream_options: Additional options controlling streaming behavior
        max_completion_tokens: Maximum number of tokens for the completion
        reasoning_effort: Reasoning effort level for models that support it. "auto" will map to each provider's default.
        client_args: Additional provider-specific arguments that will be passed to the provider's client instantiation.
        **kwargs: Additional provider-specific arguments that will be passed to the provider's API call.

    Returns:
        The completion response from the provider

    """
    all_args = locals()
    all_args.pop("provider")
    kwargs = all_args.pop("kwargs")

    model = all_args.pop("model")
    if provider is None:
        provider_key, model_id = AnyLLM.split_model_provider(model)
    else:
        provider_key = LLMProvider.from_string(provider)
        model_id = model
    all_args["model"] = model_id

    llm = AnyLLM.create(
        provider_key,
        api_key=all_args.pop("api_key"),
        api_base=all_args.pop("api_base"),
        **all_args.pop("client_args") or {},
    )
    return llm.completion(**all_args, **kwargs)

`any_llm.api.acompletion(model, messages, *, provider=None, tools=None, tool_choice=None, temperature=None, top_p=None, max_tokens=None, response_format=None, stream=None, n=None, stop=None, presence_penalty=None, frequency_penalty=None, seed=None, api_key=None, api_base=None, user=None, parallel_tool_calls=None, logprobs=None, top_logprobs=None, logit_bias=None, stream_options=None, max_completion_tokens=None, reasoning_effort='auto', client_args=None, **kwargs)` `async`

Create a chat completion asynchronously.

Parameters:

Name	Type	Description	Default
`model`	`str`	Model identifier. Recommended: Use with separate `provider` parameter (e.g., model='gpt-4', provider='openai'). Alternative: Combined format 'provider:model' (e.g., 'openai:gpt-4'). Legacy format 'provider/model' is also supported but deprecated.	required
`provider`	`str \| LLMProvider \| None`	Recommended: Provider name to use for the request (e.g., 'openai', 'mistral'). When provided, the model parameter should contain only the model name.	`None`
`messages`	`list[dict[str, Any] \| ChatCompletionMessage]`	List of messages for the conversation	required
`tools`	`list[dict[str, Any] \| Callable[..., Any]] \| None`	List of tools for tool calling. Can be Python callables or OpenAI tool format dicts	`None`
`tool_choice`	`str \| dict[str, Any] \| None`	Controls which tools the model can call	`None`
`temperature`	`float \| None`	Controls randomness in the response (0.0 to 2.0)	`None`
`top_p`	`float \| None`	Controls diversity via nucleus sampling (0.0 to 1.0)	`None`
`max_tokens`	`int \| None`	Maximum number of tokens to generate	`None`
`response_format`	`dict[str, Any] \| type[BaseModel] \| None`	Format specification for the response	`None`
`stream`	`bool \| None`	Whether to stream the response	`None`
`n`	`int \| None`	Number of completions to generate	`None`
`stop`	`str \| list[str] \| None`	Stop sequences for generation	`None`
`presence_penalty`	`float \| None`	Penalize new tokens based on presence in text	`None`
`frequency_penalty`	`float \| None`	Penalize new tokens based on frequency in text	`None`
`seed`	`int \| None`	Random seed for reproducible results	`None`
`api_key`	`str \| None`	API key for the provider	`None`
`api_base`	`str \| None`	Base URL for the provider API	`None`
`user`	`str \| None`	Unique identifier for the end user	`None`
`parallel_tool_calls`	`bool \| None`	Whether to allow parallel tool calls	`None`
`logprobs`	`bool \| None`	Include token-level log probabilities in the response	`None`
`top_logprobs`	`int \| None`	Number of alternatives to return when logprobs are requested	`None`
`logit_bias`	`dict[str, float] \| None`	Bias the likelihood of specified tokens during generation	`None`
`stream_options`	`dict[str, Any] \| None`	Additional options controlling streaming behavior	`None`
`max_completion_tokens`	`int \| None`	Maximum number of tokens for the completion	`None`
`reasoning_effort`	`Literal['minimal', 'low', 'medium', 'high', 'auto'] \| None`	Reasoning effort level for models that support it. "auto" will map to each provider's default.	`'auto'`
`client_args`	`dict[str, Any] \| None`	Additional provider-specific arguments that will be passed to the provider's client instantiation.	`None`
`**kwargs`	`Any`	Additional provider-specific arguments that will be passed to the provider's API call.	`{}`

Returns:

Type	Description
`ChatCompletion \| AsyncIterator[ChatCompletionChunk]`	The completion response from the provider

Source code in src/any_llm/api.py

async def acompletion(
    model: str,
    messages: list[dict[str, Any] | ChatCompletionMessage],
    *,
    provider: str | LLMProvider | None = None,
    tools: list[dict[str, Any] | Callable[..., Any]] | None = None,
    tool_choice: str | dict[str, Any] | None = None,
    temperature: float | None = None,
    top_p: float | None = None,
    max_tokens: int | None = None,
    response_format: dict[str, Any] | type[BaseModel] | None = None,
    stream: bool | None = None,
    n: int | None = None,
    stop: str | list[str] | None = None,
    presence_penalty: float | None = None,
    frequency_penalty: float | None = None,
    seed: int | None = None,
    api_key: str | None = None,
    api_base: str | None = None,
    user: str | None = None,
    parallel_tool_calls: bool | None = None,
    logprobs: bool | None = None,
    top_logprobs: int | None = None,
    logit_bias: dict[str, float] | None = None,
    stream_options: dict[str, Any] | None = None,
    max_completion_tokens: int | None = None,
    reasoning_effort: Literal["minimal", "low", "medium", "high", "auto"] | None = "auto",
    client_args: dict[str, Any] | None = None,
    **kwargs: Any,
) -> ChatCompletion | AsyncIterator[ChatCompletionChunk]:
    """Create a chat completion asynchronously.

    Args:
        model: Model identifier. **Recommended**: Use with separate `provider` parameter (e.g., model='gpt-4', provider='openai').
            **Alternative**: Combined format 'provider:model' (e.g., 'openai:gpt-4').
            Legacy format 'provider/model' is also supported but deprecated.
        provider: **Recommended**: Provider name to use for the request (e.g., 'openai', 'mistral').
            When provided, the model parameter should contain only the model name.
        messages: List of messages for the conversation
        tools: List of tools for tool calling. Can be Python callables or OpenAI tool format dicts
        tool_choice: Controls which tools the model can call
        temperature: Controls randomness in the response (0.0 to 2.0)
        top_p: Controls diversity via nucleus sampling (0.0 to 1.0)
        max_tokens: Maximum number of tokens to generate
        response_format: Format specification for the response
        stream: Whether to stream the response
        n: Number of completions to generate
        stop: Stop sequences for generation
        presence_penalty: Penalize new tokens based on presence in text
        frequency_penalty: Penalize new tokens based on frequency in text
        seed: Random seed for reproducible results
        api_key: API key for the provider
        api_base: Base URL for the provider API
        user: Unique identifier for the end user
        parallel_tool_calls: Whether to allow parallel tool calls
        logprobs: Include token-level log probabilities in the response
        top_logprobs: Number of alternatives to return when logprobs are requested
        logit_bias: Bias the likelihood of specified tokens during generation
        stream_options: Additional options controlling streaming behavior
        max_completion_tokens: Maximum number of tokens for the completion
        reasoning_effort: Reasoning effort level for models that support it. "auto" will map to each provider's default.
        client_args: Additional provider-specific arguments that will be passed to the provider's client instantiation.
        **kwargs: Additional provider-specific arguments that will be passed to the provider's API call.

    Returns:
        The completion response from the provider

    """
    all_args = locals()
    all_args.pop("provider")
    kwargs = all_args.pop("kwargs")

    model = all_args.pop("model")
    if provider is None:
        provider_key, model_id = AnyLLM.split_model_provider(model)
    else:
        provider_key = LLMProvider.from_string(provider)
        model_id = model
    all_args["model"] = model_id

    llm = AnyLLM.create(
        provider_key,
        api_key=all_args.pop("api_key"),
        api_base=all_args.pop("api_base"),
        **all_args.pop("client_args") or {},
    )
    return await llm.acompletion(**all_args, **kwargs)