API Reference
document_to_podcast.preprocessing.data_cleaners
clean_html(text)
Clean HTML text.
This function removes
- scripts
- styles
- links
- meta tags
In addition, it calls clean_with_regex.
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The HTML text to clean. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The cleaned text. |
Source code in src/document_to_podcast/preprocessing/data_cleaners.py
clean_markdown(text)
Clean Markdown text.
This function removes
- markdown images
In addition, it calls clean_with_regex.
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The Markdown text to clean. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The cleaned text. |
Source code in src/document_to_podcast/preprocessing/data_cleaners.py
clean_with_regex(text)
Clean text using regular expressions.
This function removes
- URLs
- emails
- special characters
- extra spaces
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The text to clean. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The cleaned text. |
Source code in src/document_to_podcast/preprocessing/data_cleaners.py
document_to_podcast.inference.model_loaders
load_llama_cpp_model(model_id)
Loads the given model_id using Llama.from_pretrained.
Examples:
>>> model = load_llama_cpp_model(
"allenai/OLMoE-1B-7B-0924-Instruct-GGUF/olmoe-1b-7b-0924-instruct-q8_0.gguf")
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model id to load.
Format is expected to be |
required |
Returns:
Name | Type | Description |
---|---|---|
Llama |
Llama
|
The loaded model. |
Source code in src/document_to_podcast/inference/model_loaders.py
load_parler_tts_model_and_tokenizer(model_id, device='cpu')
Loads the given model_id using parler_tts.from_pretrained.
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model id to load.
Format is expected to be |
required |
device
|
str
|
The device to load the model on, such as "cuda:0" or "cpu". |
'cpu'
|
Returns:
Name | Type | Description |
---|---|---|
PreTrainedModel |
Tuple[PreTrainedModel, PreTrainedTokenizerBase]
|
The loaded model. |
Source code in src/document_to_podcast/inference/model_loaders.py
document_to_podcast.inference.text_to_text
text_to_text(input_text, model, system_prompt, return_json=True, stop=None)
Transforms input_text using the given model and system prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_text
|
str
|
The text to be transformed. |
required |
model
|
Llama
|
The model to use for conversion. |
required |
system_prompt
|
str
|
The system prompt to use for conversion. |
required |
return_json
|
bool
|
Whether to return the response as JSON. Defaults to True. |
True
|
stop
|
str | list[str] | None
|
The stop token(s). |
None
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The full transformed text. |
Source code in src/document_to_podcast/inference/text_to_text.py
text_to_text_stream(input_text, model, system_prompt, return_json=True, stop=None)
Transforms input_text using the given model and system prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_text
|
str
|
The text to be transformed. |
required |
model
|
Llama
|
The model to use for conversion. |
required |
system_prompt
|
str
|
The system prompt to use for conversion. |
required |
return_json
|
bool
|
Whether to return the response as JSON. Defaults to True. |
True
|
stop
|
str | list[str] | None
|
The stop token(s). |
None
|
Yields:
Name | Type | Description |
---|---|---|
str |
str
|
Chunks of the transformed text as they are available. |
Source code in src/document_to_podcast/inference/text_to_text.py
document_to_podcast.inference.text_to_speech
text_to_speech(input_text, model, tokenizer, speaker_profile)
Generates a speech waveform using the input_text, a model and a speaker profile to define a distinct voice pattern.
Examples:
>>> waveform = text_to_speech(input_text="Welcome to our amazing podcast", model=model, tokenizer=tokenizer, speaker_profile="Laura's voice is exciting and fast in delivery with very clear audio and no background noise.")
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_text
|
str
|
The text to convert to speech. |
required |
model
|
PreTrainedModel
|
The model used for generating the waveform. |
required |
tokenizer
|
PreTrainedTokenizerBase
|
The tokenizer used for tokenizing the text in order to send to the model. |
required |
speaker_profile
|
str
|
A description used by the ParlerTTS model to configure the speaker profile. |
required |
Returns: numpy array: The waveform of the speech as a 2D numpy array
Source code in src/document_to_podcast/inference/text_to_speech.py
document_to_podcast.podcast_maker.script_to_audio
parse_script_to_waveform(script, podcast_config)
Given a script with speaker identifiers (such as "Speaker 1") parse it so that each speaker has its own unique voice and concatenate all the voices in a sequence to form the complete podcast. Args: script: podcast_config:
Returns: A 2D numpy array containing the whole podcast in waveform format.
Source code in src/document_to_podcast/podcast_maker/script_to_audio.py
save_waveform_as_file(waveform, sampling_rate, filename)
Save the output of the TTS (a numpy waveform) to a .wav file using the soundfile library.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
waveform
|
ndarray
|
2D numpy array of a waveform |
required |
sampling_rate
|
int
|
Usually 44.100, but check the specifications of the TTS model you are using. |
required |
filename
|
str
|
the destination filename to save the audio |
required |
Source code in src/document_to_podcast/podcast_maker/script_to_audio.py
document_to_podcast.podcast_maker.config
PodcastConfig
Bases: BaseModel
Pydantic model that stores configuration of all the speakers for the TTS model. This allows different speakers to use different models and configurations.
Source code in src/document_to_podcast/podcast_maker/config.py
SpeakerConfig
Bases: BaseModel
Pydantic model that stores configuration of an individual speaker for the TTS model.