API Reference
document_to_podcast.preprocessing.data_cleaners
clean_html(text)
Clean HTML text.
This function removes
- scripts
- styles
- links
- meta tags
In addition, it calls clean_with_regex.
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The HTML text to clean. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The cleaned text. |
Source code in src/document_to_podcast/preprocessing/data_cleaners.py
clean_markdown(text)
Clean Markdown text.
This function removes
- markdown images
In addition, it calls clean_with_regex.
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The Markdown text to clean. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The cleaned text. |
Source code in src/document_to_podcast/preprocessing/data_cleaners.py
clean_with_regex(text)
Clean text using regular expressions.
This function removes
- URLs
- emails
- special characters
- extra spaces
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The text to clean. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The cleaned text. |
Source code in src/document_to_podcast/preprocessing/data_cleaners.py
document_to_podcast.inference.model_loaders
TTSModel
dataclass
The purpose of this class is to provide a unified interface for all the TTS models supported. Specifically, different TTS model families have different peculiarities, for example, the bark models need a BarkProcessor, the parler models need their own tokenizer, etc. This wrapper takes care of this complexity so that the user doesn't have to deal with it.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
InterfaceGGUF
|
A TTS model that has a .generate() method or similar that takes text as input, and returns an audio in the form of a numpy array. |
required |
model_id
|
str
|
The model's identifier string. |
required |
sample_rate
|
int
|
The sample rate of the audio, required for properly saving the audio to a file. |
required |
custom_args
|
dict
|
Any model-specific arguments that a TTS model might require, e.g. tokenizer. |
required |
Source code in src/document_to_podcast/inference/model_loaders.py
load_llama_cpp_model(model_id)
Loads the given model_id using Llama.from_pretrained.
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model id to load.
Format is expected to be |
required |
Returns:
Name | Type | Description |
---|---|---|
Llama |
Llama
|
The loaded model. |
Source code in src/document_to_podcast/inference/model_loaders.py
document_to_podcast.inference.model_loaders.TTS_LOADERS = {'OuteAI/OuteTTS-0.1-350M-GGUF/OuteTTS-0.1-350M-FP16.gguf': _load_oute_tts, 'OuteAI/OuteTTS-0.2-500M-GGUF/OuteTTS-0.2-500M-FP16.gguf': _load_oute_tts, 'hexgrad/Kokoro-82M/kokoro-v0_19.pth': _load_kokoro_tts}
module-attribute
document_to_podcast.inference.text_to_text
text_to_text(input_text, model, system_prompt, return_json=True, stop=None)
Transforms input_text using the given model and system prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_text
|
str
|
The text to be transformed. |
required |
model
|
Llama
|
The model to use for conversion. |
required |
system_prompt
|
str
|
The system prompt to use for conversion. |
required |
return_json
|
bool
|
Whether to return the response as JSON. Defaults to True. |
True
|
stop
|
str | list[str] | None
|
The stop token(s). |
None
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The full transformed text. |
Source code in src/document_to_podcast/inference/text_to_text.py
text_to_text_stream(input_text, model, system_prompt, return_json=True, stop=None)
Transforms input_text using the given model and system prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_text
|
str
|
The text to be transformed. |
required |
model
|
Llama
|
The model to use for conversion. |
required |
system_prompt
|
str
|
The system prompt to use for conversion. |
required |
return_json
|
bool
|
Whether to return the response as JSON. Defaults to True. |
True
|
stop
|
str | list[str] | None
|
The stop token(s). |
None
|
Yields:
Name | Type | Description |
---|---|---|
str |
str
|
Chunks of the transformed text as they are available. |
Source code in src/document_to_podcast/inference/text_to_text.py
document_to_podcast.inference.text_to_speech
text_to_speech(input_text, model, voice_profile)
Generate speech from text using a TTS model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_text
|
str
|
The text to convert to speech. |
required |
model
|
TTSModel
|
The TTS model to use. |
required |
voice_profile
|
str
|
The voice profile to use for the speech. The format depends on the TTSModel used. For OuteTTS (the default), it should be a pre-defined ID like |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The waveform of the speech as a 2D numpy array |