API Reference
document_to_podcast.preprocessing.data_loaders
document_to_podcast.preprocessing.data_cleaners
clean_html(text)
Clean HTML text.
This function removes
- scripts
- styles
- links
- meta tags
In addition, it calls clean_with_regex.
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The HTML text to clean. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The cleaned text. |
Source code in src/document_to_podcast/preprocessing/data_cleaners.py
clean_markdown(text)
Clean Markdown text.
This function removes
- markdown images
In addition, it calls clean_with_regex.
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The Markdown text to clean. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The cleaned text. |
Source code in src/document_to_podcast/preprocessing/data_cleaners.py
clean_with_regex(text)
Clean text using regular expressions.
This function removes
- URLs
- emails
- special characters
- extra spaces
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The text to clean. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The cleaned text. |
Source code in src/document_to_podcast/preprocessing/data_cleaners.py
document_to_podcast.inference.model_loaders
TTSModel
dataclass
The purpose of this class is to provide a unified interface for all the TTS models supported. Specifically, different TTS model families have different peculiarities, for example, the bark models need a BarkProcessor, the parler models need their own tokenizer, etc. This wrapper takes care of this complexity so that the user doesn't have to deal with it.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
KPipeline
|
A TTS model that has a .generate() method or similar that takes text as input, and returns an audio in the form of a numpy array. |
required |
model_id
|
str
|
The model's identifier string. |
required |
sample_rate
|
int
|
The sample rate of the audio, required for properly saving the audio to a file. |
required |
custom_args
|
dict
|
Any model-specific arguments that a TTS model might require, e.g. tokenizer. |
required |
Source code in src/document_to_podcast/inference/model_loaders.py
_load_kokoro_tts(model_id, **kwargs)
Loads the kokoro model using the KPipeline from the package https://github.com/hexgrad/kokoro
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
Identifier for a specific model. Kokoro currently only supports one model. |
required |
kwargs
|
str
|
Needs to include 'lang_code' necessary to set the language used for generation. For example: ๐ช๐ธ 'e' => Spanish es ๐ซ๐ท 'f' => French fr-fr ๐ฎ๐ณ 'h' => Hindi hi ๐ฎ๐น 'i' => Italian it ๐ง๐ท 'p' => Brazilian Portuguese pt-br ๐บ๐ธ 'a' => American English ๐ฌ๐ง 'b' => British English ๐ฏ๐ต 'j' => Japanese: you will need to also pip install misaki[ja] ๐จ๐ณ 'z' => Mandarin Chinese: you will need to also pip install misaki[zh] |
{}
|
Returns: TTSModel: The loaded model using the TTSModel wrapper.
Source code in src/document_to_podcast/inference/model_loaders.py
load_llama_cpp_model(model_id)
Loads the given model_id using Llama.from_pretrained.
Examples:
>>> model = load_llama_cpp_model("bartowski/Qwen2.5-7B-Instruct-GGUF/Qwen2.5-7B-Instruct-Q8_0.gguf")
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model id to load.
Format is expected to be |
required |
Returns:
Name | Type | Description |
---|---|---|
Llama |
Llama
|
The loaded model. |
Source code in src/document_to_podcast/inference/model_loaders.py
document_to_podcast.inference.model_loaders.TTS_LOADERS = {'hexgrad/Kokoro-82M': _load_kokoro_tts}
module-attribute
document_to_podcast.inference.text_to_text
text_to_text(input_text, model, system_prompt, return_json=True, stop=None)
Transforms input_text using the given model and system prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_text
|
str
|
The text to be transformed. |
required |
model
|
Llama
|
The model to use for conversion. |
required |
system_prompt
|
str
|
The system prompt to use for conversion. |
required |
return_json
|
bool
|
Whether to return the response as JSON. Defaults to True. |
True
|
stop
|
str | list[str] | None
|
The stop token(s). |
None
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The full transformed text. |
Source code in src/document_to_podcast/inference/text_to_text.py
text_to_text_stream(input_text, model, system_prompt, return_json=True, stop=None)
Transforms input_text using the given model and system prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_text
|
str
|
The text to be transformed. |
required |
model
|
Llama
|
The model to use for conversion. |
required |
system_prompt
|
str
|
The system prompt to use for conversion. |
required |
return_json
|
bool
|
Whether to return the response as JSON. Defaults to True. |
True
|
stop
|
str | list[str] | None
|
The stop token(s). |
None
|
Yields:
Name | Type | Description |
---|---|---|
str |
str
|
Chunks of the transformed text as they are available. |
Source code in src/document_to_podcast/inference/text_to_text.py
document_to_podcast.inference.text_to_speech
_text_to_speech_kokoro(input_text, model, voice_profile)
TTS generation function for the Kokoro model Args: input_text (str): The text to convert to speech. model (KPipeline): The kokoro pipeline as defined in https://github.com/hexgrad/kokoro voice_profile (str) : a pre-defined ID for the Kokoro models (e.g. "af_bella") more info here https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md
Returns:
Type | Description |
---|---|
ndarray
|
numpy array: The waveform of the speech as a 2D numpy array |
Source code in src/document_to_podcast/inference/text_to_speech.py
text_to_speech(input_text, model, voice_profile)
Generate speech from text using a TTS model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_text
|
str
|
The text to convert to speech. |
required |
model
|
TTSModel
|
The TTS model to use. |
required |
voice_profile
|
str
|
The voice profile to use for the speech. The format depends on the TTSModel used. |
required |
Returns: np.ndarray: The waveform of the speech as a 2D numpy array