Skip to content

Completion Types

The completion types used by any_llm.completion() and any_llm.acompletion() are re-exports from the OpenAI Python SDK, extended where needed to support additional fields like reasoning content.

The response object for a non-streaming completion request. Extends openai.types.chat.ChatCompletion with support for reasoning content in the message choices.

Import: from any_llm.types.completion import ChatCompletion

Key fields:

FieldTypeDescription
choiceslist[Choice]

A single chunk in a streaming completion response. Extends openai.types.chat.ChatCompletionChunk.

Import: from any_llm.types.completion import ChatCompletionChunk

Key fields:

FieldTypeDescription
idstrCompletion identifier (same across all chunks).
choiceslist[ChunkChoice]Each chunk choice has a delta with incremental content, role, and optionally reasoning.
modelstrThe model used.

A message within a completion response. Extends openai.types.chat.ChatCompletionMessage with a reasoning field.

Import: from any_llm.types.completion import ChatCompletionMessage

FieldTypeDescription
rolestrMessage role (e.g., "assistant").
contentstr | NoneText content of the message.
reasoningReasoning | NoneReasoning/thinking content (when the model supports it).
tool_callslist[ChatCompletionMessageToolCall] | NoneTool calls requested by the model.
annotationslist[dict] | NoneAnnotations attached to the message.

Returned when response_format is a Pydantic BaseModel subclass or a dataclass type. Extends ChatCompletion with a generic type parameter.

Import: from any_llm import ParsedChatCompletion

Access the parsed object via response.choices[0].message.parsed, which will be an instance of the type passed as response_format.

Response object for embedding requests. Re-exported directly from openai.types.CreateEmbeddingResponse.

Import: from any_llm.types.completion import CreateEmbeddingResponse

FieldTypeDescription
datalist[Embedding]List of embedding objects, each with an embedding vector and index.
modelstrThe model used.
usageUsageToken usage with prompt_tokens and total_tokens.

A literal type controlling reasoning depth for models that support it.

Import: from any_llm.types.completion import ReasoningEffort

ReasoningEffort = Literal["none", "minimal", "low", "medium", "high", "xhigh", "auto"]

The value "auto" (the default) maps to each provider’s own default reasoning level.

Normalized parameters for chat completions, used internally to pass structured parameters from the public API to provider implementations.

Import: from any_llm.types.completion import CompletionParams

FieldTypeDescription
model_idstrModel identifier (e.g., ‘mistral-small-latest’)
messageslist[dict[str, Any]]List of messages for the conversation
toolslist[dict[str, Any] | Any] | NoneList of tools for tool calling. Should be converted to OpenAI tool format dicts
tool_choicestr | dict[str, Any] | NoneControls which tools the model can call
temperaturefloat | NoneControls randomness in the response (0.0 to 2.0)
top_pfloat | NoneControls diversity via nucleus sampling (0.0 to 1.0)
max_tokensint | NoneMaximum number of tokens to generate
response_formatdict[str, Any] | type | NoneFormat specification for the response. Accepts Pydantic BaseModel subclasses, dataclass types, or dicts.
streambool | NoneWhether to stream the response
nint | NoneNumber of completions to generate
stopstr | list[str] | NoneStop sequences for generation
presence_penaltyfloat | NonePenalize new tokens based on presence in text
frequency_penaltyfloat | NonePenalize new tokens based on frequency in text
seedint | NoneRandom seed for reproducible results
userstr | NoneUnique identifier for the end user
parallel_tool_callsbool | NoneWhether to allow parallel tool calls
logprobsbool | NoneInclude token-level log probabilities in the response
top_logprobsint | NoneNumber of top alternatives to return when logprobs are requested
logit_biasdict[str, float] | NoneBias the likelihood of specified tokens during generation
stream_optionsdict[str, Any] | NoneAdditional options controlling streaming behavior
max_completion_tokensint | NoneMaximum number of tokens for the completion (provider-dependent)
reasoning_effortLiteral['none', 'minimal', 'low', 'medium', 'high', 'xhigh', 'auto'] | None

The following types are also available from any_llm.types.completion:

TypeOriginDescription
CompletionUsageopenai.types.CompletionUsageToken usage counts.
Functionopenai.types.chatFunction definition within a tool call.
Embeddingopenai.types.EmbeddingSingle embedding vector with index.
ChoiceDeltaToolCallopenai.types.chatTool call delta in streaming chunks.

For full field-level documentation of the base OpenAI types, see the OpenAI Python SDK reference.