API Reference
structured_qa.preprocessing
document_to_sections_dir(input_file, output_dir)
Convert a document to a directory of sections.
Uses pymupdf4llm to convert input_file to markdown. Then uses langchain_text_splitters to split the markdown into sections based on the headers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_file
|
str
|
Path to the input document. |
required |
output_dir
|
str
|
required |
Returns:
Type | Description |
---|---|
list[str]
|
List of section names. |
Source code in src/structured_qa/preprocessing.py
structured_qa.model_loaders
load_llama_cpp_model(model_id)
Loads the given model_id using Llama.from_pretrained.
Examples:
>>> model = load_llama_cpp_model("allenai/OLMoE-1B-7B-0924-Instruct-GGUF/olmoe-1b-7b-0924-instruct-q8_0.gguf")
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model id to load.
Format is expected to be |
required |
Returns:
Name | Type | Description |
---|---|---|
Llama |
Llama
|
The loaded model. |
Source code in src/structured_qa/model_loaders.py
structured_qa.workflow
find_retrieve_answer(question, model, sections_dir, find_prompt, answer_prompt)
Workflow to find the relevant section, retrieve the information, and answer the question.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
question
|
str
|
The question to answer. |
required |
model
|
Llama
|
The Llama model to use for generating completions. |
required |
sections_dir
|
str
|
The directory containing the sections.
See |
required |
find_prompt
|
str
|
The prompt for finding the section. See |
required |
answer_prompt
|
str
|
The prompt for answering the question. See |
required |
Returns:
Type | Description |
---|---|
tuple[str, list[str]] | tuple[None, list[str]]
|
tuple[str, list[str]] | tuple[None, list[str]]: If the answer is found, the tuple contains the answer and the sections checked. If the answer is not found, the tuple contains None and the sections checked |