|
|
|
# Weather Web-Scraper
|
|
|
|
|
|
|
|
A backend that scrapes italian weather websites to collect and compare weather
|
|
|
|
information for Pisa, and serve them as an API for a frontend that doesn't yet
|
|
|
|
exist
|
|
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
|
|
<details>
|
|
|
|
<summary>NixOS-specific setup</summary>
|
|
|
|
|
|
|
|
In case you're using NixOS, you might have some trouble make `puppeteer` work,
|
|
|
|
as `puppeteer` ships with its own copy of chromium, which has dynamic libraries.
|
|
|
|
|
|
|
|
Dynamic libraries do not work on NixOS, and we must work around this to make
|
|
|
|
puppeteer work. To do this, we first install separately chromium, and then tell
|
|
|
|
`puppeteer` to use this chromium instead of its own
|
|
|
|
|
|
|
|
You can either globally install chromium, and then get its path with
|
|
|
|
`which chromium`, OR...
|
|
|
|
You can temporarily install it on a nix-shell, and this will put chromium in the
|
|
|
|
stor until the next garbage collection. Of course this is a dirty way of solving
|
|
|
|
this problem because you'd be using a chromium which isn't technically installed
|
|
|
|
on your system, and it will disappear once you garbage collect, but it's also
|
|
|
|
true that I don't really want chromium installed on my system sooooo... choose
|
|
|
|
your poison
|
|
|
|
|
|
|
|
Once you have chromium installed (or at least present in the nix store), get its
|
|
|
|
path (with `which chromium` if it's installed or in some hacky way otherwise),
|
|
|
|
and edit the `.env` file accordingly (remember to set `ON_NIX` to true as well)
|
|
|
|
|
|
|
|
</details>
|
|
|
|
|
|
|
|
Install the dependencies with
|
|
|
|
|
|
|
|
```bash
|
|
|
|
npm install
|
|
|
|
```
|
|
|
|
|
|
|
|
and then start the backend with
|
|
|
|
|
|
|
|
```bash
|
|
|
|
npm run serve
|
|
|
|
```
|
|
|
|
|
|
|
|
The backend will listen on the default port (3000) unless specified otherwise in
|
|
|
|
the `.env` file.
|
|
|
|
|
|
|
|
The only available endpoint is `/` which returns the latest update for the
|
|
|
|
scraped data, in an object containing the data from all the sources, as well as
|
|
|
|
the timestamp of the update.
|
|
|
|
|
|
|
|
The script scrapes the weather forecast from the implemented sources (see the
|
|
|
|
table below) and returns an object with the following fields:
|
|
|
|
|
|
|
|
- **today:** an object with elements from the hour after the current one to
|
|
|
|
23, of type `hourly` (see below)
|
|
|
|
- **tomorrow:** an object with elements from hours 0 to 23, of type `hourly`
|
|
|
|
(see below)
|
|
|
|
- **dayAfterTomorrow:** an object with elements from hours 0 to 23, of type
|
|
|
|
`hourly` (see below)
|
|
|
|
- **week:** an object with days 0 to 6, of type `daily`
|
|
|
|
- **format:** an object specifying the meaning of `hourly` and `daily` type
|
|
|
|
for the objects above
|
|
|
|
|
|
|
|
The keys for `today`, `tomorrow` and `dayAfterTomorrow` are intended as hours
|
|
|
|
where `0` refers to the time from 0:00 to 0:59, and `23` refers to the time from
|
|
|
|
23:00 to 23:59.
|
|
|
|
|
|
|
|
The keys for the `week` entry are intended as an offset from today. That is, the
|
|
|
|
object at `0` will be the results for today, the object at `1` will be the
|
|
|
|
results for tomorrow, and the object at `6` will be the results for 6 days from
|
|
|
|
now
|
|
|
|
|
|
|
|
Each source has its own format and they are specified below in the sources
|
|
|
|
section (as well as in the object returned by the scraper)
|
|
|
|
|
|
|
|
### Sources
|
|
|
|
|
|
|
|
These are the sources that are currently implemented or will be implemented
|
|
|
|
eventually, together with the current level of implementation
|
|
|
|
|
|
|
|
```
|
|
|
|
✅ = implemented
|
|
|
|
🚧 = partially implemented
|
|
|
|
⛔️ = Not implemented
|
|
|
|
```
|
|
|
|
|
|
|
|
| Source | Status | Comments |
|
|
|
|
| ---------------------------------------------- | ------ | ----------------------------------------------- |
|
|
|
|
| [iLMeteo](https://www.ilmeteo.it) | ✅ | |
|
|
|
|
| [3Bmeteo](https://www.3bmeteo.com/) | 🚧 | Precipitation might not work as intended |
|
|
|
|
| [OpenMeteo](https://open-meteo.com/) | ✅ | |
|
|
|
|
| [Aeronautica Militare](http://www.meteoam.it/) | 🚧 | Week only has `[0..4]` instead of `[0..6]` days |
|
|
|
|
|
|
|
|
<details>
|
|
|
|
<summary>iLMeteo</summary>
|
|
|
|
|
|
|
|
Format:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"hourly": {
|
|
|
|
"temperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"precipitation": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "mm"
|
|
|
|
},
|
|
|
|
"apparentTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"daily": {
|
|
|
|
"minimumTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"maximumTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"minimumApparentTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"maximumApparentTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"precipitationSum": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "mm"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
</details>
|
|
|
|
|
|
|
|
<details>
|
|
|
|
<summary>3Bmeteo</summary>
|
|
|
|
|
|
|
|
Format:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"hourly": {
|
|
|
|
"temperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"precipitation": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "mm"
|
|
|
|
},
|
|
|
|
"apparentTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"weatherCode": {
|
|
|
|
"type": "string"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"daily": {
|
|
|
|
"minimumTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"maximumTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"minimumApparentTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"maximumApparentTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"precipitationSum": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "mm"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
</details>
|
|
|
|
|
|
|
|
<details>
|
|
|
|
<summary>OpenMeteo</summary>
|
|
|
|
|
|
|
|
Format:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"hourly": {
|
|
|
|
"temperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"precipitation": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "mm"
|
|
|
|
},
|
|
|
|
"precipitationProbability": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "%"
|
|
|
|
},
|
|
|
|
"apparentTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"weatherCode": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "WMO code"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"daily": {
|
|
|
|
"minimumTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"maximumTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"minimumApparentTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"maximumApparentTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"precipitationSum": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "mm"
|
|
|
|
},
|
|
|
|
"weatherCode": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "WMO code"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
</details>
|
|
|
|
|
|
|
|
<details>
|
|
|
|
<summary>Aeronautica Militare</summary>
|
|
|
|
|
|
|
|
Format:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"hourly": {
|
|
|
|
"temperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"precipitationProbability": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "%"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"daily": {
|
|
|
|
"minimumTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
},
|
|
|
|
"maximumTemperature": {
|
|
|
|
"type": "number",
|
|
|
|
"unit": "°C"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
</details>
|