Skip to content

API Reference

document_to_podcast.preprocessing.data_cleaners

clean_html(text)

Clean HTML text.

This function removes
  • scripts
  • styles
  • links
  • meta tags

In addition, it calls clean_with_regex.

Examples:

>>> clean_html("<html><body><p>Hello,  world!  </p></body></html>"")
"Hello, world!"

Parameters:

Name Type Description Default
text str

The HTML text to clean.

required

Returns:

Name Type Description
str str

The cleaned text.

Source code in src/document_to_podcast/preprocessing/data_cleaners.py
def clean_html(text: str) -> str:
    """Clean HTML text.

    This function removes:
        - scripts
        - styles
        - links
        - meta tags

    In addition, it calls [clean_with_regex][document_to_podcast.preprocessing.data_cleaners.clean_with_regex].

    Examples:
        >>> clean_html("<html><body><p>Hello,  world!  </p></body></html>"")
        "Hello, world!"

    Args:
        text (str): The HTML text to clean.

    Returns:
        str: The cleaned text.
    """
    soup = BeautifulSoup(text, "html.parser")
    for tag in soup(["script", "style", "link", "meta"]):
        tag.decompose()
    text = soup.get_text()
    return clean_with_regex(text)

clean_markdown(text)

Clean Markdown text.

This function removes
  • markdown images

In addition, it calls clean_with_regex.

Examples:

>>> clean_markdown('# Title   with image ![alt text](image.jpg "Image Title")')
"Title with image"

Parameters:

Name Type Description Default
text str

The Markdown text to clean.

required

Returns:

Name Type Description
str str

The cleaned text.

Source code in src/document_to_podcast/preprocessing/data_cleaners.py
def clean_markdown(text: str) -> str:
    """Clean Markdown text.

    This function removes:
        - markdown images

    In addition, it calls [clean_with_regex][document_to_podcast.preprocessing.data_cleaners.clean_with_regex].

    Examples:
        >>> clean_markdown('# Title   with image ![alt text](image.jpg "Image Title")')
        "Title with image"

    Args:
        text (str): The Markdown text to clean.

    Returns:
        str: The cleaned text.
    """
    text = re.sub(r'!\[.*?\]\(.*?(".*?")?\)', "", text)

    return clean_with_regex(text)

clean_with_regex(text)

Clean text using regular expressions.

This function removes
  • URLs
  • emails
  • special characters
  • extra spaces

Examples:

>>> clean_with_regex(" Hello,   world! http://example.com")
"Hello, world!"

Parameters:

Name Type Description Default
text str

The text to clean.

required

Returns:

Name Type Description
str str

The cleaned text.

Source code in src/document_to_podcast/preprocessing/data_cleaners.py
def clean_with_regex(text: str) -> str:
    """
    Clean text using regular expressions.

    This function removes:
        - URLs
        - emails
        - special characters
        - extra spaces

    Examples:
        >>> clean_with_regex("\xa0Hello,   world! http://example.com")
        "Hello, world!"

    Args:
        text (str): The text to clean.

    Returns:
        str: The cleaned text.
    """
    text = re.sub(
        r"http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+",
        "",
        text,
    )
    text = re.sub(r"[\w\.-]+@[\w\.-]+\.[\w]+", "", text)
    text = re.sub(r'[^a-zA-Z0-9\s.,!?;:"\']', "", text)
    text = re.sub(r"\s+", " ", text).strip()
    return text

document_to_podcast.inference.model_loaders

TTSModel dataclass

The purpose of this class is to provide a unified interface for all the TTS models supported. Specifically, different TTS model families have different peculiarities, for example, the bark models need a BarkProcessor, the parler models need their own tokenizer, etc. This wrapper takes care of this complexity so that the user doesn't have to deal with it.

Parameters:

Name Type Description Default
model InterfaceGGUF

A TTS model that has a .generate() method or similar that takes text as input, and returns an audio in the form of a numpy array.

required
model_id str

The model's identifier string.

required
sample_rate int

The sample rate of the audio, required for properly saving the audio to a file.

required
custom_args dict

Any model-specific arguments that a TTS model might require, e.g. tokenizer.

required
Source code in src/document_to_podcast/inference/model_loaders.py
@dataclass
class TTSModel:
    """
    The purpose of this class is to provide a unified interface for all the TTS models supported.
    Specifically, different TTS model families have different peculiarities, for example, the bark models need a
    BarkProcessor, the parler models need their own tokenizer, etc. This wrapper takes care of this complexity so that
    the user doesn't have to deal with it.

    Args:
        model (InterfaceGGUF): A TTS model that has a .generate() method or similar
            that takes text as input, and returns an audio in the form of a numpy array.
        model_id (str): The model's identifier string.
        sample_rate (int): The sample rate of the audio, required for properly saving the audio to a file.
        custom_args (dict): Any model-specific arguments that a TTS model might require, e.g. tokenizer.
    """

    model: InterfaceGGUF
    model_id: str
    sample_rate: int
    custom_args: field(default_factory=dict)

load_llama_cpp_model(model_id)

Loads the given model_id using Llama.from_pretrained.

Examples:

>>> model = load_llama_cpp_model("bartowski/Qwen2.5-3B-Instruct-GGUF/Qwen2.5-3B-Instruct-f16.gguf")

Parameters:

Name Type Description Default
model_id str

The model id to load. Format is expected to be {org}/{repo}/{filename}.

required

Returns:

Name Type Description
Llama Llama

The loaded model.

Source code in src/document_to_podcast/inference/model_loaders.py
def load_llama_cpp_model(model_id: str) -> Llama:
    """
    Loads the given model_id using Llama.from_pretrained.

    Examples:
        >>> model = load_llama_cpp_model("bartowski/Qwen2.5-3B-Instruct-GGUF/Qwen2.5-3B-Instruct-f16.gguf")

    Args:
        model_id (str): The model id to load.
            Format is expected to be `{org}/{repo}/{filename}`.

    Returns:
        Llama: The loaded model.
    """
    org, repo, filename = model_id.split("/")
    model = Llama.from_pretrained(
        repo_id=f"{org}/{repo}",
        filename=filename,
        n_ctx=0,  # 0 means that the model limit will be used, instead of the default (512) or other hardcoded value
        verbose=False,
        n_gpu_layers=-1 if torch.cuda.is_available() else 0,
    )
    return model

document_to_podcast.inference.model_loaders.TTS_LOADERS = {'OuteAI/OuteTTS-0.1-350M-GGUF/OuteTTS-0.1-350M-FP16.gguf': _load_oute_tts, 'OuteAI/OuteTTS-0.2-500M-GGUF/OuteTTS-0.2-500M-FP16.gguf': _load_oute_tts, 'hexgrad/Kokoro-82M/kokoro-v0_19.pth': _load_kokoro_tts} module-attribute

document_to_podcast.inference.text_to_text

text_to_text(input_text, model, system_prompt, return_json=True, stop=None)

Transforms input_text using the given model and system prompt.

Parameters:

Name Type Description Default
input_text str

The text to be transformed.

required
model Llama

The model to use for conversion.

required
system_prompt str

The system prompt to use for conversion.

required
return_json bool

Whether to return the response as JSON. Defaults to True.

True
stop str | list[str] | None

The stop token(s).

None

Returns:

Name Type Description
str str

The full transformed text.

Source code in src/document_to_podcast/inference/text_to_text.py
def text_to_text(
    input_text: str,
    model: Llama,
    system_prompt: str,
    return_json: bool = True,
    stop: str | list[str] | None = None,
) -> str:
    """
    Transforms input_text using the given model and system prompt.

    Args:
        input_text (str): The text to be transformed.
        model (Llama): The model to use for conversion.
        system_prompt (str): The system prompt to use for conversion.
        return_json (bool, optional): Whether to return the response as JSON.
            Defaults to True.
        stop (str | list[str] | None, optional): The stop token(s).

    Returns:
        str: The full transformed text.
    """
    response = chat_completion(
        input_text, model, system_prompt, return_json, stop=stop, stream=False
    )
    return response["choices"][0]["message"]["content"]

text_to_text_stream(input_text, model, system_prompt, return_json=True, stop=None)

Transforms input_text using the given model and system prompt.

Parameters:

Name Type Description Default
input_text str

The text to be transformed.

required
model Llama

The model to use for conversion.

required
system_prompt str

The system prompt to use for conversion.

required
return_json bool

Whether to return the response as JSON. Defaults to True.

True
stop str | list[str] | None

The stop token(s).

None

Yields:

Name Type Description
str str

Chunks of the transformed text as they are available.

Source code in src/document_to_podcast/inference/text_to_text.py
def text_to_text_stream(
    input_text: str,
    model: Llama,
    system_prompt: str,
    return_json: bool = True,
    stop: str | list[str] | None = None,
) -> Iterator[str]:
    """
    Transforms input_text using the given model and system prompt.

    Args:
        input_text (str): The text to be transformed.
        model (Llama): The model to use for conversion.
        system_prompt (str): The system prompt to use for conversion.
        return_json (bool, optional): Whether to return the response as JSON.
            Defaults to True.
        stop (str | list[str] | None, optional): The stop token(s).

    Yields:
        str: Chunks of the transformed text as they are available.
    """
    response = chat_completion(
        input_text, model, system_prompt, return_json, stop=stop, stream=True
    )
    for item in response:
        if item["choices"][0].get("delta", {}).get("content", None):
            yield item["choices"][0].get("delta", {}).get("content", None)

document_to_podcast.inference.text_to_speech

text_to_speech(input_text, model, voice_profile)

Generate speech from text using a TTS model.

Parameters:

Name Type Description Default
input_text str

The text to convert to speech.

required
model TTSModel

The TTS model to use.

required
voice_profile str

The voice profile to use for the speech. The format depends on the TTSModel used.

For OuteTTS (the default), it should be a pre-defined ID like female_1. You can find all the IDs at this link

required

Returns:

Type Description
ndarray

np.ndarray: The waveform of the speech as a 2D numpy array

Source code in src/document_to_podcast/inference/text_to_speech.py
def text_to_speech(input_text: str, model: TTSModel, voice_profile: str) -> np.ndarray:
    """
    Generate speech from text using a TTS model.

    Args:
        input_text (str): The text to convert to speech.
        model (TTSModel): The TTS model to use.
        voice_profile (str): The voice profile to use for the speech.
            The format depends on the TTSModel used.

            For OuteTTS (the default), it should be a pre-defined ID like `female_1`.
            You can find all the IDs [at this link](https://github.com/edwko/OuteTTS/tree/main/outetts/version/v1/default_speakers)

    Returns:
        np.ndarray: The waveform of the speech as a 2D numpy array
    """
    return TTS_INFERENCE[model.model_id](
        input_text, model.model, voice_profile, **model.custom_args
    )

document_to_podcast.inference.text_to_speech.TTS_INFERENCE = {'OuteAI/OuteTTS-0.1-350M-GGUF/OuteTTS-0.1-350M-FP16.gguf': _text_to_speech_oute, 'OuteAI/OuteTTS-0.2-500M-GGUF/OuteTTS-0.2-500M-FP16.gguf': _text_to_speech_oute, 'hexgrad/Kokoro-82M/kokoro-v0_19.pth': _text_to_speech_kokoro} module-attribute