API Reference

`document_to_podcast.preprocessing.data_cleaners`

`clean_html(text)`

Clean HTML text.

This function removes

scripts
styles
links
meta tags

Examples:

>>> clean_html("<html><body><p>Hello,  world!  </p></body></html>"")
"Hello, world!"

Parameters:

Name	Type	Description	Default
`text`	`str`	The HTML text to clean.	required

Returns:

Name	Type	Description
`str`	`str`	The cleaned text.

Source code in src/document_to_podcast/preprocessing/data_cleaners.py

def clean_html(text: str) -> str:
    """Clean HTML text.

    This function removes:
        - scripts
        - styles
        - links
        - meta tags

    In addition, it calls [clean_with_regex][document_to_podcast.preprocessing.data_cleaners.clean_with_regex].

    Examples:
        >>> clean_html("<html><body><p>Hello,  world!  </p></body></html>"")
        "Hello, world!"

    Args:
        text (str): The HTML text to clean.

    Returns:
        str: The cleaned text.
    """
    soup = BeautifulSoup(text, "html.parser")
    for tag in soup(["script", "style", "link", "meta"]):
        tag.decompose()
    text = soup.get_text()
    return clean_with_regex(text)

`clean_markdown(text)`

Clean Markdown text.

This function removes

markdown images

In addition, it calls clean_with_regex.

Examples:

>>> clean_markdown('# Title   with image ![alt text](image.jpg "Image Title")')
"Title with image"

Parameters:

Name	Type	Description	Default
`text`	`str`	The Markdown text to clean.	required

Returns:

Name	Type	Description
`str`	`str`	The cleaned text.

Source code in src/document_to_podcast/preprocessing/data_cleaners.py

def clean_markdown(text: str) -> str:
    """Clean Markdown text.

    This function removes:
        - markdown images

    In addition, it calls [clean_with_regex][document_to_podcast.preprocessing.data_cleaners.clean_with_regex].

    Examples:
        >>> clean_markdown('# Title   with image ![alt text](image.jpg "Image Title")')
        "Title with image"

    Args:
        text (str): The Markdown text to clean.

    Returns:
        str: The cleaned text.
    """
    text = re.sub(r'!\[.*?\]\(.*?(".*?")?\)', "", text)

    return clean_with_regex(text)

`clean_with_regex(text)`

Clean text using regular expressions.

This function removes

URLs
emails
special characters
extra spaces

Examples:

>>> clean_with_regex(" Hello,   world! http://example.com")
"Hello, world!"

Parameters:

Name	Type	Description	Default
`text`	`str`	The text to clean.	required

Returns:

Name	Type	Description
`str`	`str`	The cleaned text.

Source code in src/document_to_podcast/preprocessing/data_cleaners.py

def clean_with_regex(text: str) -> str:
    """
    Clean text using regular expressions.

    This function removes:
        - URLs
        - emails
        - special characters
        - extra spaces

    Examples:
        >>> clean_with_regex("\xa0Hello,   world! http://example.com")
        "Hello, world!"

    Args:
        text (str): The text to clean.

    Returns:
        str: The cleaned text.
    """
    text = re.sub(
        r"http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+",
        "",
        text,
    )
    text = re.sub(r"[\w\.-]+@[\w\.-]+\.[\w]+", "", text)
    text = re.sub(r'[^a-zA-Z0-9\s.,!?;:"\']', "", text)
    text = re.sub(r"\s+", " ", text).strip()
    return text

`document_to_podcast.inference.model_loaders`

`TTSModel` `dataclass`

The purpose of this class is to provide a unified interface for all the TTS models supported. Specifically, different TTS model families have different peculiarities, for example, the bark models need a BarkProcessor, the parler models need their own tokenizer, etc. This wrapper takes care of this complexity so that the user doesn't have to deal with it.

Parameters:

Name	Type	Description	Default
`model`	`InterfaceGGUF`	A TTS model that has a .generate() method or similar that takes text as input, and returns an audio in the form of a numpy array.	required
`model_id`	`str`	The model's identifier string.	required
`sample_rate`	`int`	The sample rate of the audio, required for properly saving the audio to a file.	required
`custom_args`	`dict`	Any model-specific arguments that a TTS model might require, e.g. tokenizer.	required

Source code in src/document_to_podcast/inference/model_loaders.py

@dataclass
class TTSModel:
    """
    The purpose of this class is to provide a unified interface for all the TTS models supported.
    Specifically, different TTS model families have different peculiarities, for example, the bark models need a
    BarkProcessor, the parler models need their own tokenizer, etc. This wrapper takes care of this complexity so that
    the user doesn't have to deal with it.

    Args:
        model (InterfaceGGUF): A TTS model that has a .generate() method or similar
            that takes text as input, and returns an audio in the form of a numpy array.
        model_id (str): The model's identifier string.
        sample_rate (int): The sample rate of the audio, required for properly saving the audio to a file.
        custom_args (dict): Any model-specific arguments that a TTS model might require, e.g. tokenizer.
    """

    model: InterfaceGGUF
    model_id: str
    sample_rate: int
    custom_args: field(default_factory=dict)

`load_llama_cpp_model(model_id)`

Loads the given model_id using Llama.from_pretrained.

Examples:

>>> model = load_llama_cpp_model("bartowski/Qwen2.5-3B-Instruct-GGUF/Qwen2.5-3B-Instruct-f16.gguf")

Parameters:

Name	Type	Description	Default
`model_id`	`str`	The model id to load. Format is expected to be `{org}/{repo}/{filename}`.	required

Returns:

Name	Type	Description
`Llama`	`Llama`	The loaded model.

Source code in src/document_to_podcast/inference/model_loaders.py

def load_llama_cpp_model(model_id: str) -> Llama:
    """
    Loads the given model_id using Llama.from_pretrained.

    Examples:
        >>> model = load_llama_cpp_model("bartowski/Qwen2.5-3B-Instruct-GGUF/Qwen2.5-3B-Instruct-f16.gguf")

    Args:
        model_id (str): The model id to load.
            Format is expected to be `{org}/{repo}/{filename}`.

    Returns:
        Llama: The loaded model.
    """
    org, repo, filename = model_id.split("/")
    model = Llama.from_pretrained(
        repo_id=f"{org}/{repo}",
        filename=filename,
        n_ctx=0,  # 0 means that the model limit will be used, instead of the default (512) or other hardcoded value
        verbose=False,
        n_gpu_layers=-1 if torch.cuda.is_available() else 0,
    )
    return model

`document_to_podcast.inference.model_loaders.TTS_LOADERS = {'OuteAI/OuteTTS-0.1-350M-GGUF/OuteTTS-0.1-350M-FP16.gguf': _load_oute_tts, 'OuteAI/OuteTTS-0.2-500M-GGUF/OuteTTS-0.2-500M-FP16.gguf': _load_oute_tts, 'hexgrad/Kokoro-82M/kokoro-v0_19.pth': _load_kokoro_tts}` `module-attribute`

`document_to_podcast.inference.text_to_text`

`text_to_text(input_text, model, system_prompt, return_json=True, stop=None)`

Transforms input_text using the given model and system prompt.

Parameters:

Name	Type	Description	Default
`input_text`	`str`	The text to be transformed.	required
`model`	`Llama`	The model to use for conversion.	required
`system_prompt`	`str`	The system prompt to use for conversion.	required
`return_json`	`bool`	Whether to return the response as JSON. Defaults to True.	`True`
`stop`	`str \| list[str] \| None`	The stop token(s).	`None`

Returns:

Name	Type	Description
`str`	`str`	The full transformed text.

Source code in src/document_to_podcast/inference/text_to_text.py

def text_to_text(
    input_text: str,
    model: Llama,
    system_prompt: str,
    return_json: bool = True,
    stop: str | list[str] | None = None,
) -> str:
    """
    Transforms input_text using the given model and system prompt.

    Args:
        input_text (str): The text to be transformed.
        model (Llama): The model to use for conversion.
        system_prompt (str): The system prompt to use for conversion.
        return_json (bool, optional): Whether to return the response as JSON.
            Defaults to True.
        stop (str | list[str] | None, optional): The stop token(s).

    Returns:
        str: The full transformed text.
    """
    response = chat_completion(
        input_text, model, system_prompt, return_json, stop=stop, stream=False
    )
    return response["choices"][0]["message"]["content"]

`text_to_text_stream(input_text, model, system_prompt, return_json=True, stop=None)`

Transforms input_text using the given model and system prompt.

Parameters:

Name	Type	Description	Default
`input_text`	`str`	The text to be transformed.	required
`model`	`Llama`	The model to use for conversion.	required
`system_prompt`	`str`	The system prompt to use for conversion.	required
`return_json`	`bool`	Whether to return the response as JSON. Defaults to True.	`True`
`stop`	`str \| list[str] \| None`	The stop token(s).	`None`

Yields:

Name	Type	Description
`str`	`str`	Chunks of the transformed text as they are available.

Source code in src/document_to_podcast/inference/text_to_text.py

def text_to_text_stream(
    input_text: str,
    model: Llama,
    system_prompt: str,
    return_json: bool = True,
    stop: str | list[str] | None = None,
) -> Iterator[str]:
    """
    Transforms input_text using the given model and system prompt.

    Args:
        input_text (str): The text to be transformed.
        model (Llama): The model to use for conversion.
        system_prompt (str): The system prompt to use for conversion.
        return_json (bool, optional): Whether to return the response as JSON.
            Defaults to True.
        stop (str | list[str] | None, optional): The stop token(s).

    Yields:
        str: Chunks of the transformed text as they are available.
    """
    response = chat_completion(
        input_text, model, system_prompt, return_json, stop=stop, stream=True
    )
    for item in response:
        if item["choices"][0].get("delta", {}).get("content", None):
            yield item["choices"][0].get("delta", {}).get("content", None)

`document_to_podcast.inference.text_to_speech`

`text_to_speech(input_text, model, voice_profile)`

Generate speech from text using a TTS model.

Parameters:

Name	Type	Description	Default
`input_text`	`str`	The text to convert to speech.	required
`model`	`TTSModel`	The TTS model to use.	required
`voice_profile`	`str`	The voice profile to use for the speech. The format depends on the TTSModel used. For OuteTTS (the default), it should be a pre-defined ID like `female_1`. You can find all the IDs at this link	required

Returns:

Type	Description
`ndarray`	np.ndarray: The waveform of the speech as a 2D numpy array

Source code in src/document_to_podcast/inference/text_to_speech.py

def text_to_speech(input_text: str, model: TTSModel, voice_profile: str) -> np.ndarray:
    """
    Generate speech from text using a TTS model.

    Args:
        input_text (str): The text to convert to speech.
        model (TTSModel): The TTS model to use.
        voice_profile (str): The voice profile to use for the speech.
            The format depends on the TTSModel used.

            For OuteTTS (the default), it should be a pre-defined ID like `female_1`.
            You can find all the IDs [at this link](https://github.com/edwko/OuteTTS/tree/main/outetts/version/v1/default_speakers)

    Returns:
        np.ndarray: The waveform of the speech as a 2D numpy array
    """
    return TTS_INFERENCE[model.model_id](
        input_text, model.model, voice_profile, **model.custom_args
    )

`document_to_podcast.inference.text_to_speech.TTS_INFERENCE = {'OuteAI/OuteTTS-0.1-350M-GGUF/OuteTTS-0.1-350M-FP16.gguf': _text_to_speech_oute, 'OuteAI/OuteTTS-0.2-500M-GGUF/OuteTTS-0.2-500M-FP16.gguf': _text_to_speech_oute, 'hexgrad/Kokoro-82M/kokoro-v0_19.pth': _text_to_speech_kokoro}` `module-attribute`

API Reference

document_to_podcast.preprocessing.data_cleaners

clean_html(text)

clean_markdown(text)

clean_with_regex(text)

document_to_podcast.inference.model_loaders

TTSModel dataclass

load_llama_cpp_model(model_id)

document_to_podcast.inference.model_loaders.TTS_LOADERS = {'OuteAI/OuteTTS-0.1-350M-GGUF/OuteTTS-0.1-350M-FP16.gguf': _load_oute_tts, 'OuteAI/OuteTTS-0.2-500M-GGUF/OuteTTS-0.2-500M-FP16.gguf': _load_oute_tts, 'hexgrad/Kokoro-82M/kokoro-v0_19.pth': _load_kokoro_tts} module-attribute

document_to_podcast.inference.text_to_text

text_to_text(input_text, model, system_prompt, return_json=True, stop=None)

text_to_text_stream(input_text, model, system_prompt, return_json=True, stop=None)

document_to_podcast.inference.text_to_speech

text_to_speech(input_text, model, voice_profile)

`document_to_podcast.preprocessing.data_cleaners`

`clean_html(text)`

`clean_markdown(text)`

`clean_with_regex(text)`

`document_to_podcast.inference.model_loaders`

`TTSModel` `dataclass`

`load_llama_cpp_model(model_id)`

`document_to_podcast.inference.model_loaders.TTS_LOADERS = {'OuteAI/OuteTTS-0.1-350M-GGUF/OuteTTS-0.1-350M-FP16.gguf': _load_oute_tts, 'OuteAI/OuteTTS-0.2-500M-GGUF/OuteTTS-0.2-500M-FP16.gguf': _load_oute_tts, 'hexgrad/Kokoro-82M/kokoro-v0_19.pth': _load_kokoro_tts}` `module-attribute`

`document_to_podcast.inference.text_to_text`

`text_to_text(input_text, model, system_prompt, return_json=True, stop=None)`

`text_to_text_stream(input_text, model, system_prompt, return_json=True, stop=None)`

`document_to_podcast.inference.text_to_speech`

`text_to_speech(input_text, model, voice_profile)`