Completion
Completion
any_llm.completion(model, messages, *, provider=None, tools=None, tool_choice=None, temperature=None, top_p=None, max_tokens=None, response_format=None, stream=None, n=None, stop=None, presence_penalty=None, frequency_penalty=None, seed=None, api_key=None, api_base=None, api_timeout=None, user=None, parallel_tool_calls=None, logprobs=None, top_logprobs=None, logit_bias=None, stream_options=None, max_completion_tokens=None, reasoning_effort='auto', **kwargs)
Create a chat completion.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
str
|
Model identifier. Recommended: Use with separate |
required |
provider
|
str | ProviderName | None
|
Recommended: Provider name to use for the request (e.g., 'openai', 'mistral'). When provided, the model parameter should contain only the model name. |
None
|
messages
|
list[dict[str, Any] | ChatCompletionMessage]
|
List of messages for the conversation |
required |
tools
|
list[dict[str, Any] | Callable[..., Any]] | None
|
List of tools for tool calling. Can be Python callables or OpenAI tool format dicts |
None
|
tool_choice
|
str | dict[str, Any] | None
|
Controls which tools the model can call |
None
|
temperature
|
float | None
|
Controls randomness in the response (0.0 to 2.0) |
None
|
top_p
|
float | None
|
Controls diversity via nucleus sampling (0.0 to 1.0) |
None
|
max_tokens
|
int | None
|
Maximum number of tokens to generate |
None
|
response_format
|
dict[str, Any] | type[BaseModel] | None
|
Format specification for the response |
None
|
stream
|
bool | None
|
Whether to stream the response |
None
|
n
|
int | None
|
Number of completions to generate |
None
|
stop
|
str | list[str] | None
|
Stop sequences for generation |
None
|
presence_penalty
|
float | None
|
Penalize new tokens based on presence in text |
None
|
frequency_penalty
|
float | None
|
Penalize new tokens based on frequency in text |
None
|
seed
|
int | None
|
Random seed for reproducible results |
None
|
api_key
|
str | None
|
API key for the provider |
None
|
api_base
|
str | None
|
Base URL for the provider API |
None
|
api_timeout
|
float | None
|
Request timeout in seconds |
None
|
user
|
str | None
|
Unique identifier for the end user |
None
|
parallel_tool_calls
|
bool | None
|
Whether to allow parallel tool calls |
None
|
logprobs
|
bool | None
|
Include token-level log probabilities in the response |
None
|
top_logprobs
|
int | None
|
Number of alternatives to return when logprobs are requested |
None
|
logit_bias
|
dict[str, float] | None
|
Bias the likelihood of specified tokens during generation |
None
|
stream_options
|
dict[str, Any] | None
|
Additional options controlling streaming behavior |
None
|
max_completion_tokens
|
int | None
|
Maximum number of tokens for the completion |
None
|
reasoning_effort
|
Literal['minimal', 'low', 'medium', 'high', 'auto'] | None
|
Reasoning effort level for models that support it. "auto" will map to each provider's default. |
'auto'
|
**kwargs
|
Any
|
Additional provider-specific parameters |
{}
|
Returns:
Type | Description |
---|---|
ChatCompletion | Iterator[ChatCompletionChunk]
|
The completion response from the provider |
Source code in src/any_llm/api.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
any_llm.acompletion(model, messages, *, provider=None, tools=None, tool_choice=None, temperature=None, top_p=None, max_tokens=None, response_format=None, stream=None, n=None, stop=None, presence_penalty=None, frequency_penalty=None, seed=None, api_key=None, api_base=None, api_timeout=None, user=None, parallel_tool_calls=None, logprobs=None, top_logprobs=None, logit_bias=None, stream_options=None, max_completion_tokens=None, reasoning_effort='auto', **kwargs)
async
Create a chat completion asynchronously.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
str
|
Model identifier. Recommended: Use with separate |
required |
provider
|
str | ProviderName | None
|
Recommended: Provider name to use for the request (e.g., 'openai', 'mistral'). When provided, the model parameter should contain only the model name. |
None
|
messages
|
list[dict[str, Any] | ChatCompletionMessage]
|
List of messages for the conversation |
required |
tools
|
list[dict[str, Any] | Callable[..., Any]] | None
|
List of tools for tool calling. Can be Python callables or OpenAI tool format dicts |
None
|
tool_choice
|
str | dict[str, Any] | None
|
Controls which tools the model can call |
None
|
temperature
|
float | None
|
Controls randomness in the response (0.0 to 2.0) |
None
|
top_p
|
float | None
|
Controls diversity via nucleus sampling (0.0 to 1.0) |
None
|
max_tokens
|
int | None
|
Maximum number of tokens to generate |
None
|
response_format
|
dict[str, Any] | type[BaseModel] | None
|
Format specification for the response |
None
|
stream
|
bool | None
|
Whether to stream the response |
None
|
n
|
int | None
|
Number of completions to generate |
None
|
stop
|
str | list[str] | None
|
Stop sequences for generation |
None
|
presence_penalty
|
float | None
|
Penalize new tokens based on presence in text |
None
|
frequency_penalty
|
float | None
|
Penalize new tokens based on frequency in text |
None
|
seed
|
int | None
|
Random seed for reproducible results |
None
|
api_key
|
str | None
|
API key for the provider |
None
|
api_base
|
str | None
|
Base URL for the provider API |
None
|
api_timeout
|
float | None
|
Request timeout in seconds |
None
|
user
|
str | None
|
Unique identifier for the end user |
None
|
parallel_tool_calls
|
bool | None
|
Whether to allow parallel tool calls |
None
|
logprobs
|
bool | None
|
Include token-level log probabilities in the response |
None
|
top_logprobs
|
int | None
|
Number of alternatives to return when logprobs are requested |
None
|
logit_bias
|
dict[str, float] | None
|
Bias the likelihood of specified tokens during generation |
None
|
stream_options
|
dict[str, Any] | None
|
Additional options controlling streaming behavior |
None
|
max_completion_tokens
|
int | None
|
Maximum number of tokens for the completion |
None
|
reasoning_effort
|
Literal['minimal', 'low', 'medium', 'high', 'auto'] | None
|
Reasoning effort level for models that support it. "auto" will map to each provider's default. |
'auto'
|
**kwargs
|
Any
|
Additional provider-specific parameters |
{}
|
Returns:
Type | Description |
---|---|
ChatCompletion | AsyncIterator[ChatCompletionChunk]
|
The completion response from the provider |
Source code in src/any_llm/api.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 |
|