3b03ada880 | 2 months ago | |
---|---|---|
scrapers | 2 months ago | |
.env.example | 2 months ago | |
.gitignore | 2 months ago | |
.prettierrc | 2 months ago | |
README.md | 2 months ago | |
index.js | 2 months ago | |
package-lock.json | 2 months ago | |
package.json | 2 months ago | |
weatherScraper.js | 2 months ago |
README.md
Weather Web-Scraper
A backend that scrapes italian weather websites to collect and compare weather information for Pisa, and serve them as an API for a frontend that doesn't yet exist
Usage
NixOS-specific setup
In case you're using NixOS, you might have some trouble make puppeteer
work,
as puppeteer
ships with its own copy of chromium, which has dynamic libraries.
Dynamic libraries do not work on NixOS, and we must work around this to make
puppeteer work. To do this, we first install separately chromium, and then tell
puppeteer
to use this chromium instead of its own
You can either globally install chromium, and then get its path with
which chromium
, OR...
You can temporarily install it on a nix-shell, and this will put chromium in the
stor until the next garbage collection. Of course this is a dirty way of solving
this problem because you'd be using a chromium which isn't technically installed
on your system, and it will disappear once you garbage collect, but it's also
true that I don't really want chromium installed on my system sooooo... choose
your poison
Once you have chromium installed (or at least present in the nix store), get its
path (with which chromium
if it's installed or in some hacky way otherwise),
and edit the .env
file accordingly (remember to set ON_NIX
to true as well)
Install the dependencies with
npm install
and then start the backend with
npm run serve
The backend will listen on the default port (3000) unless specified otherwise in
the .env
file.
The only available endpoint is /
which returns the latest update for the
scraped data, in an object containing the data from all the sources, as well as
the timestamp of the update.
The script scrapes the weather forecast from the implemented sources (see the table below) and returns an object with the following fields:
- today: an object with elements from the hour after the current one to
23, of type
hourly
(see below) - tomorrow: an object with elements from hours 0 to 23, of type
hourly
(see below) - dayAfterTomorrow: an object with elements from hours 0 to 23, of type
hourly
(see below) - week: an object with days 0 to 6, of type
daily
- format: an object specifying the meaning of
hourly
anddaily
type for the objects above
The keys for today
, tomorrow
and dayAfterTomorrow
are intended as hours
where 0
refers to the time from 0:00 to 0:59, and 23
refers to the time from
23:00 to 23:59.
The keys for the week
entry are intended as an offset from today. That is, the
object at 0
will be the results for today, the object at 1
will be the
results for tomorrow, and the object at 6
will be the results for 6 days from
now
Each source has its own format and they are specified below in the sources section (as well as in the object returned by the scraper)
Sources
These are the sources that are currently implemented or will be implemented eventually, together with the current level of implementation
✅ = implemented
🚧 = partially implemented
⛔️ = Not implemented
Source | Status | Comments |
---|---|---|
iLMeteo | ✅ | |
3Bmeteo | 🚧 | Precipitation might not work as intended |
OpenMeteo | ✅ | |
Aeronautica Militare | 🚧 | Week only has [0..4] instead of [0..6] days |
iLMeteo
Format:
{
"hourly": {
"temperature": {
"type": "number",
"unit": "°C"
},
"precipitation": {
"type": "number",
"unit": "mm"
},
"apparentTemperature": {
"type": "number",
"unit": "°C"
}
},
"daily": {
"minimumTemperature": {
"type": "number",
"unit": "°C"
},
"maximumTemperature": {
"type": "number",
"unit": "°C"
},
"minimumApparentTemperature": {
"type": "number",
"unit": "°C"
},
"maximumApparentTemperature": {
"type": "number",
"unit": "°C"
},
"precipitationSum": {
"type": "number",
"unit": "mm"
}
}
}
3Bmeteo
Format:
{
"hourly": {
"temperature": {
"type": "number",
"unit": "°C"
},
"precipitation": {
"type": "number",
"unit": "mm"
},
"apparentTemperature": {
"type": "number",
"unit": "°C"
},
"weatherCode": {
"type": "string"
}
},
"daily": {
"minimumTemperature": {
"type": "number",
"unit": "°C"
},
"maximumTemperature": {
"type": "number",
"unit": "°C"
},
"minimumApparentTemperature": {
"type": "number",
"unit": "°C"
},
"maximumApparentTemperature": {
"type": "number",
"unit": "°C"
},
"precipitationSum": {
"type": "number",
"unit": "mm"
}
}
}
OpenMeteo
Format:
{
"hourly": {
"temperature": {
"type": "number",
"unit": "°C"
},
"precipitation": {
"type": "number",
"unit": "mm"
},
"precipitationProbability": {
"type": "number",
"unit": "%"
},
"apparentTemperature": {
"type": "number",
"unit": "°C"
},
"weatherCode": {
"type": "number",
"unit": "WMO code"
}
},
"daily": {
"minimumTemperature": {
"type": "number",
"unit": "°C"
},
"maximumTemperature": {
"type": "number",
"unit": "°C"
},
"minimumApparentTemperature": {
"type": "number",
"unit": "°C"
},
"maximumApparentTemperature": {
"type": "number",
"unit": "°C"
},
"precipitationSum": {
"type": "number",
"unit": "mm"
},
"weatherCode": {
"type": "number",
"unit": "WMO code"
}
}
}
Aeronautica Militare
Format:
{
"hourly": {
"temperature": {
"type": "number",
"unit": "°C"
},
"precipitationProbability": {
"type": "number",
"unit": "%"
}
},
"daily": {
"minimumTemperature": {
"type": "number",
"unit": "°C"
},
"maximumTemperature": {
"type": "number",
"unit": "°C"
}
}
}