A Python script that crawls conferences from <https://www.dm.unipi.it/research/past-conferences/> and processes them using a local-run LLM (we used [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)) to translate the natural language info to a more structured format (json).
## Installation
Download the LLM model to use, specifically we use [`Mistral-7B-Instruct-v0.2-GGUF`](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF):
The following command will crawl all the conferences from `https://www.dm.unipi.it/research/past-conferences/` (all pages) and save the results in `conferences.json` as a list of json objects, one per line.
And the LLM will complete this conversation with a json representation of the `{{ conference_html }}`. The first example is needed to show the model how to convert information from the input html data.