chore: updated readme, minor changes to the code

main
Antonio De Lucreziis 9 months ago
parent 2a29c9e09f
commit bb6af6cf11

@ -38,3 +38,47 @@ The output is a list of json objects, one per line. To display the results with
```bash ```bash
$ jq -s '.' conferences.json $ jq -s '.' conferences.json
``` ```
## Main idea / Explainaition
We need to parse strings like the following
```html
<p><a href="http://www.crm.sns.it/event/507/" target="_blank" rel="noreferrer
noopener">Statistical and Computational Aspects of Dynamics<br></a>Organized by Buddhima
Kasun Fernando Akurugodage (Centro di ricerca matematica Ennio De Giorgi &#8211; SNS),
Paolo Giulietti, and Tanja Isabelle Schindler (Universität Wien, Austria). Centro De
Giorgi &#8211; SNS, Pisa. December 13 &#8211; 16, 2022.</p>
<p><a href="https://events.dm.unipi.it/event/126/" target="_blank" rel="noreferrer
noopener">Weekend di lavoro su Calcolo delle Variazioni<br></a>Organized by Giuseppe
Buttazzo, Maria Stella Gelli, and Aldo Pratelli. Grand Hotel Tettuccio Montecatini.
November 25 &#8211; 27, 2022.</p>
<p><a href="https://events.dm.unipi.it/event/109/" target="_blank" rel="noreferrer
noopener">Incontri di geometria algebrica ed aritmetica Milano Pisa<br></a>Department
of Mathematics, Pisa. November 16 &#8211; 17, 2022.</p>
```
We compose the following conversation with an LLM, things between `{{ ... }}` are templates that will be replaced before starting generating.
```
INPUT:
<p><a href="http://www.crm.sns.it/event/507/" target="_blank" rel="noreferrer noopener">Statistical and Computational Aspects of Dynamics<br></a>Organized by Buddhima Kasun Fernando Akurugodage (Centro di ricerca matematica Ennio De Giorgi &#8211; SNS), Paolo Giulietti, and Tanja Isabelle Schindler (Universität Wien, Austria). Centro De Giorgi &#8211; SNS, Pisa. December 13 &#8211; 16, 2022.</p>
OUTPUT:
{
"title": "Statistical and Computational Aspects of Dynamics",
"url": "http://www.crm.sns.it/event/507/",
"description": "Organized by Buddhima Kasun Fernando Akurugodage (Centro di ricerca matematica Ennio De Giorgi SNS), Paolo Giulietti, and Tanja Isabelle Schindler (Universität Wien, Austria). Location: Centro De Giorgi - SNS, Pisa.",
"startDate": "2022-12-13",
"endDate": "2022-12-16"
}
INPUT:
{{ conference_html }}
OUTPUT:
```
And the LLM will complete this conversation with a json representation of the `{{ conference_html }}`. The first example is needed to show the model how to convert information from the input html data.

@ -5,13 +5,21 @@ from bs4 import BeautifulSoup
import requests import requests
import json import json
OUTPUT_FILE = "conferences.json" OUTPUT_FILE = "conferences.json"
HTML_EXAMPLE = r"""<p><a href="http://www.crm.sns.it/event/507/" target="_blank" rel="noreferrer noopener">Statistical and Computational Aspects of Dynamics<br></a>Organized by Buddhima Kasun Fernando Akurugodage (Centro di ricerca matematica Ennio De Giorgi &#8211; SNS), Paolo Giulietti, and Tanja Isabelle Schindler (Universität Wien, Austria). Centro De Giorgi &#8211; SNS, Pisa. December 13 &#8211; 16, 2022.</p>""" HTML_EXAMPLE = r"""<p><a href="http://www.crm.sns.it/event/507/" target="_blank" rel="noreferrer noopener">Statistical and Computational Aspects of Dynamics<br></a>Organized by Buddhima Kasun Fernando Akurugodage (Centro di ricerca matematica Ennio De Giorgi &#8211; SNS), Paolo Giulietti, and Tanja Isabelle Schindler (Universität Wien, Austria). Centro De Giorgi &#8211; SNS, Pisa. December 13 &#8211; 16, 2022.</p>"""
OUTPUT_EXAMPLE = json.dumps(
{ "title": "Statistical and Computational Aspects of Dynamics", "url": "http://www.crm.sns.it/event/507/", "description": "Organized by Buddhima Kasun Fernando Akurugodage (Centro di ricerca matematica Ennio De Giorgi SNS), Paolo Giulietti, and Tanja Isabelle Schindler (Universität Wien, Austria). Location: Centro De Giorgi - SNS, Pisa.", "startDate": "2022-12-13", "endDate": "2022-12-16" }
) OUTPUT_EXAMPLE = json.dumps({
"title": "Statistical and Computational Aspects of Dynamics",
"url": "http://www.crm.sns.it/event/507/",
"description": "Organized by Buddhima Kasun Fernando Akurugodage (Centro di ricerca matematica Ennio De Giorgi SNS), Paolo Giulietti, and Tanja Isabelle Schindler (Universität Wien, Austria). Location: Centro De Giorgi - SNS, Pisa.",
"startDate": "2022-12-13",
"endDate": "2022-12-16"
})
def translate_to_json(conference_html: str) -> str: def translate_to_json(conference_html: str) -> str:
llm_answer = llm.create_chat_completion( llm_answer = llm.create_chat_completion(

Loading…
Cancel
Save