Project: Local LLM Entity Extractor¶

This project is a lightweight Python utility that transforms unstructured text into structured JSON data using a local Large Language Model (LLM) via Ollama.

🏗 System Architecture¶

1. Ollama Integration¶

The script interacts with the Ollama REST API (/api/generate). Ollama handles the heavy lifting of loading the model (Llama 3.1 8B) and performing inference locally on your hardware.

2. Key API Concepts¶

JSON Mode: By setting format: "json", the LLM is constrained to only output valid JSON, preventing conversational filler.
Synchronous Requests (stream: False): The script waits for the full extraction to complete before receiving the data. This ensures the output is a complete, parseable object.
Prompt Engineering: Uses a "Strict Schema" injection to guide the AI on the exact key names (people, dates, locations) required.

3. Timeout & Error Resilience¶

Local LLM inference can be slow depending on system resources. The script includes:

timeout=30: A network-level limit to prevent the script from hanging.
Exception Handling: Specifically catches requests.exceptions.Timeout to provide a user-friendly error message if the hardware is overloaded.

🚀 Quick Start¶

Prerequisites¶

Ollama installed and running.
Llama 3.1 8B model downloaded (ollama pull llama3.1:8b).
Python requests library.

Execution¶

python extract.py "Your messy text here"

In [ ]:

Copied!





import requests
import json 
import sys 

# Global configuration
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL_NAME = "llama3.1:8b"

def extract_entities(text):
    # Define the strict schema
    schema = {
        "people": ["list of names"], 
        "dates": ["list of dates"], 
        "locations": ["list of locations"]
    }

    prompt = f"""
    Extract names of people, dates, and locations from the following text.
    Return the result strictly as a JSON object following this format: {json.dumps(schema)}
    Do not include any preamble, explanations, or markdown formatting. 
    Only return valid JSON.

    TEXT:
    {text}
    """

    payload = {
        "model": MODEL_NAME,
        "prompt": prompt, 
        "stream": False, 
        "format": "json"
    }

    try:
        response = requests.post(OLLAMA_URL, json=payload, timeout=30)
        response.raise_for_status()

        result = response.json()
        return result["response"]

    except requests.exceptions.Timeout:
        return "Error: Request timed out. Ollama server may be slow or unresponsive."
    except Exception as e:
        return f"Error: {e}"


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python extract.py 'your messy text here'")
    else:
        input_text = sys.argv[1]
        structured_data = extract_entities(input_text)
        print(structured_data)
import requests
import json 
import sys 

# Global configuration
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL_NAME = "llama3.1:8b"

def extract_entities(text):
    # Define the strict schema
    schema = {
        "people": ["list of names"], 
        "dates": ["list of dates"], 
        "locations": ["list of locations"]
    }

    prompt = f"""
    Extract names of people, dates, and locations from the following text.
    Return the result strictly as a JSON object following this format: {json.dumps(schema)}
    Do not include any preamble, explanations, or markdown formatting. 
    Only return valid JSON.

    TEXT:
    {text}
    """

    payload = {
        "model": MODEL_NAME,
        "prompt": prompt, 
        "stream": False, 
        "format": "json"
    }

    try:
        response = requests.post(OLLAMA_URL, json=payload, timeout=30)
        response.raise_for_status()

        result = response.json()
        return result["response"]

    except requests.exceptions.Timeout:
        return "Error: Request timed out. Ollama server may be slow or unresponsive."
    except Exception as e:
        return f"Error: {e}"


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python extract.py 'your messy text here'")
    else:
        input_text = sys.argv[1]
        structured_data = extract_entities(input_text)
        print(structured_data)