You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

286 lines
7.3 KiB
Markdown

4 weeks ago
# Weather Web-Scraper
3 weeks ago
A backend that scrapes italian weather websites to collect and compare weather
information for Pisa, and serve them as an API for a frontend that doesn't yet
exist
4 weeks ago
## Usage
3 weeks ago
<details>
<summary>NixOS-specific setup</summary>
In case you're using NixOS, you might have some trouble make `puppeteer` work,
as `puppeteer` ships with its own copy of chromium, which has dynamic libraries.
Dynamic libraries do not work on NixOS, and we must work around this to make
puppeteer work. To do this, we first install separately chromium, and then tell
`puppeteer` to use this chromium instead of its own
You can either globally install chromium, and then get its path with
`which chromium`, OR...
You can temporarily install it on a nix-shell, and this will put chromium in the
stor until the next garbage collection. Of course this is a dirty way of solving
this problem because you'd be using a chromium which isn't technically installed
on your system, and it will disappear once you garbage collect, but it's also
true that I don't really want chromium installed on my system sooooo... choose
your poison
Once you have chromium installed (or at least present in the nix store), get its
path (with `which chromium` if it's installed or in some hacky way otherwise),
and edit the `.env` file accordingly (remember to set `ON_NIX` to true as well)
</details>
Install the dependencies with
```bash
npm install
```
and then start the backend with
```bash
npm run serve
```
The backend will listen on the default port (3000) unless specified otherwise in
the `.env` file.
The only available endpoint is `/` which returns the latest update for the
scraped data, in an object containing the data from all the sources, as well as
the timestamp of the update.
4 weeks ago
The script scrapes the weather forecast from the implemented sources (see the
3 weeks ago
table below) and returns an object with the following fields:
4 weeks ago
- **today:** an object with elements from the hour after the current one to
23, of type `hourly` (see below)
- **tomorrow:** an object with elements from hours 0 to 23, of type `hourly`
(see below)
- **dayAfterTomorrow:** an object with elements from hours 0 to 23, of type
`hourly` (see below)
- **week:** an object with days 0 to 6, of type `daily`
- **format:** an object specifying the meaning of `hourly` and `daily` type
for the objects above
The keys for `today`, `tomorrow` and `dayAfterTomorrow` are intended as hours
where `0` refers to the time from 0:00 to 0:59, and `23` refers to the time from
23:00 to 23:59.
4 weeks ago
The keys for the `week` entry are intended as an offset from today. That is, the
object at `0` will be the results for today, the object at `1` will be the
results for tomorrow, and the object at `6` will be the results for 6 days from
now
4 weeks ago
Each source has its own format and they are specified below in the sources
section (as well as in the object returned by the scraper)
4 weeks ago
### Sources
These are the sources that are currently implemented or will be implemented
eventually, together with the current level of implementation
```
✅ = implemented
🚧 = partially implemented
⛔️ = Not implemented
```
| Source | Status | Comments |
| ---------------------------------------------- | ------ | ----------------------------------------------- |
| [iLMeteo](https://www.ilmeteo.it) | ✅ | |
| [3Bmeteo](https://www.3bmeteo.com/) | 🚧 | Precipitation might not work as intended |
| [OpenMeteo](https://open-meteo.com/) | ✅ | |
| [Aeronautica Militare](http://www.meteoam.it/) | 🚧 | Week only has `[0..4]` instead of `[0..6]` days |
<details>
<summary>iLMeteo</summary>
Format:
```json
{
"hourly": {
"temperature": {
"type": "number",
"unit": "°C"
},
"precipitation": {
"type": "number",
"unit": "mm"
},
"apparentTemperature": {
"type": "number",
"unit": "°C"
}
},
"daily": {
"minimumTemperature": {
"type": "number",
"unit": "°C"
},
"maximumTemperature": {
"type": "number",
"unit": "°C"
},
"minimumApparentTemperature": {
"type": "number",
"unit": "°C"
},
"maximumApparentTemperature": {
"type": "number",
"unit": "°C"
},
"precipitationSum": {
"type": "number",
"unit": "mm"
}
}
}
```
</details>
<details>
<summary>3Bmeteo</summary>
Format:
```json
{
"hourly": {
"temperature": {
"type": "number",
"unit": "°C"
},
"precipitation": {
"type": "number",
"unit": "mm"
},
"apparentTemperature": {
"type": "number",
"unit": "°C"
},
"weatherCode": {
"type": "string"
}
},
"daily": {
"minimumTemperature": {
"type": "number",
"unit": "°C"
},
"maximumTemperature": {
"type": "number",
"unit": "°C"
},
"minimumApparentTemperature": {
"type": "number",
"unit": "°C"
},
"maximumApparentTemperature": {
"type": "number",
"unit": "°C"
},
"precipitationSum": {
"type": "number",
"unit": "mm"
}
}
}
```
</details>
<details>
<summary>OpenMeteo</summary>
Format:
```json
{
"hourly": {
"temperature": {
"type": "number",
"unit": "°C"
},
"precipitation": {
"type": "number",
"unit": "mm"
},
"precipitationProbability": {
"type": "number",
"unit": "%"
},
"apparentTemperature": {
"type": "number",
"unit": "°C"
},
"weatherCode": {
"type": "number",
"unit": "WMO code"
}
},
"daily": {
"minimumTemperature": {
"type": "number",
"unit": "°C"
},
"maximumTemperature": {
"type": "number",
"unit": "°C"
},
"minimumApparentTemperature": {
"type": "number",
"unit": "°C"
},
"maximumApparentTemperature": {
"type": "number",
"unit": "°C"
},
"precipitationSum": {
"type": "number",
"unit": "mm"
},
"weatherCode": {
"type": "number",
"unit": "WMO code"
}
}
}
```
</details>
<details>
<summary>Aeronautica Militare</summary>
Format:
```json
{
"hourly": {
"temperature": {
"type": "number",
"unit": "°C"
},
"precipitationProbability": {
"type": "number",
"unit": "%"
}
},
"daily": {
"minimumTemperature": {
"type": "number",
"unit": "°C"
},
"maximumTemperature": {
"type": "number",
"unit": "°C"
}
}
}
```
</details>