initial commit, reworked @BachoSeven's code
commit
8c6364277e
@ -0,0 +1,11 @@
|
||||
# Local files
|
||||
*.local*
|
||||
|
||||
# Python
|
||||
venv/
|
||||
|
||||
# Editors
|
||||
.vscode/
|
||||
|
||||
# LLM Models
|
||||
*.gguf
|
@ -0,0 +1,40 @@
|
||||
# Past Conferences Crawler (with LLM)
|
||||
|
||||
A Python script that crawls conferences from <https://www.dm.unipi.it/research/past-conferences/> and processes them using a local-run LLM (we used [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)) to translate the natural language info to a more structured format (json).
|
||||
|
||||
## Installation
|
||||
|
||||
Download the LLM model to use, specifically we use [`Mistral-7B-Instruct-v0.2-GGUF`](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF):
|
||||
|
||||
```bash
|
||||
# download the model, ~4GB
|
||||
$ wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf
|
||||
```
|
||||
|
||||
Install python the requirements:
|
||||
|
||||
```bash
|
||||
# if you want create a venv
|
||||
$ python -m venv venv
|
||||
$ source venv/bin/activate
|
||||
|
||||
# enable gpu support for llama-cpp
|
||||
$ export CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
|
||||
|
||||
# install requirements
|
||||
$ pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Launch
|
||||
|
||||
The following command will crawl the conferences from `https://www.dm.unipi.it/research/past-conferences/` (pages 1 to 5) and save the results in `conferences.json`:
|
||||
|
||||
```bash
|
||||
$ python main.py
|
||||
```
|
||||
|
||||
The output is a list of json objects, one per line. To display the results with `jq`:
|
||||
|
||||
```bash
|
||||
$ jq -s '.' conferences.json
|
||||
```
|
@ -0,0 +1,14 @@
|
||||
beautifulsoup4==4.12.3
|
||||
bs4==0.0.2
|
||||
certifi==2023.11.17
|
||||
charset-normalizer==3.3.2
|
||||
diskcache==5.6.3
|
||||
idna==3.6
|
||||
Jinja2==3.1.3
|
||||
llama_cpp_python==0.2.36
|
||||
MarkupSafe==2.1.4
|
||||
numpy==1.26.3
|
||||
requests==2.31.0
|
||||
soupsieve==2.5
|
||||
typing_extensions==4.9.0
|
||||
urllib3==2.1.0
|
Loading…
Reference in New Issue