API Reference
structured_qa.preprocessing
document_to_sections_dir(input_file, output_dir)
Convert a document to a directory of sections.
Uses pymupdf4llm to convert input_file to markdown.
Then uses split_markdown_by_headings
to split the markdown into sections based on the headers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_file
|
str
|
Path to the input document. |
required |
output_dir
|
str
|
required |
Returns:
Type | Description |
---|---|
list[str]
|
List of section names. |
Source code in src/structured_qa/preprocessing.py
split_markdown_by_headings(markdown_text, heading_patterns=None)
Splits a markdown document into sections based on specified heading patterns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
markdown_text
|
str
|
The markdown document as a single string. |
required |
heading_patterns
|
str
|
A list of regex patterns representing heading markers in the markdown document. Defaults to None. If None, the default patterns are used. |
None
|
Returns:
Type | Description |
---|---|
dict[str, str]
|
dict[str, str]: A dictionary where the keys are the section names and the values are the section contents. |
Source code in src/structured_qa/preprocessing.py
structured_qa.model_loaders
load_llama_cpp_model(model_id)
Loads the given model_id using Llama.from_pretrained.
Examples:
>>> model = load_llama_cpp_model("allenai/OLMoE-1B-7B-0924-Instruct-GGUF/olmoe-1b-7b-0924-instruct-q8_0.gguf")
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model id to load.
Format is expected to be |
required |
Returns:
Name | Type | Description |
---|---|---|
Llama |
LlamaModel
|
The loaded model. |
Source code in src/structured_qa/model_loaders.py
structured_qa.workflow
find_retrieve_answer(question, model, sections_dir, find_prompt, answer_prompt, max_sections_to_check=None)
Workflow to find the relevant section, retrieve the information, and answer the question.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
question
|
str
|
The question to answer. |
required |
model
|
LlamaModel
|
The model to use for generating completions. |
required |
sections_dir
|
str
|
The directory containing the sections.
See |
required |
find_prompt
|
str
|
The prompt for finding the section. See |
required |
answer_prompt
|
str
|
The prompt for answering the question. See |
required |
max_sections_to_check
|
int
|
The maximum number of sections to check before giving up. Defaults to None. If None, it will check up to a maximum of 20 sections until it finds the answer. |
None
|
Returns:
Type | Description |
---|---|
tuple[str, list[str]] | tuple[None, list[str]]
|
tuple[str, list[str]] | tuple[None, list[str]]: If the answer is found, the tuple contains the answer and the sections checked. If the answer is not found, the tuple contains None and the sections checked |
Source code in src/structured_qa/workflow.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|