1
0
mirror of https://github.com/tiyn/stud.ip-crawler.git synced 2026-02-22 06:34:48 +01:00

database: files id and chdates are stored

- mysql creates database and tables to given mysql if not existent already
- mysql reads last change values from db
- mysql saves ch_date after downloading
- run now takes care for the variables of mysql and studip
This commit is contained in:
TiynGER
2020-06-07 12:49:08 +02:00
parent 6d18baa8b6
commit fa36e0f29e
8 changed files with 286 additions and 211 deletions

View File

@@ -13,28 +13,30 @@ If you run the program again it only downloads files that have changed since the
- [x] Specify Stud.IP-URL
- [x] Specify output directory
- [x] Specify chunk size to download big files
- [x] Specify all important database variables
- [x] Only download files after given date
- [x] Save and read download date
- [x] Possible reset of download date
- [ ] Incremental file download
- [ ] Indexing downloaded files and folders
- [x] Incremental file download
- [x] Store id and chdate of downloaded files
- [ ] Logging
- [x] Console log
- [ ] Log file
## Installation
- create an instance of
- `git clone https://github.com/tiyn/studip-crawler`
- `cd studip-crawler`
- `cd studip-crawler/src/`
- `pip3install -r requirements` - install dependencies
## Usage
Just run the file via `python3 crawler.py [options]`.
Alternatively to `python3 crawler.py` you can give yourself permissions using `chmod +x crawler.py [options]` and
run it with `./crawler.py [options]`.
Just run the file via `python3 run.py [options]`.
Alternatively to `python3 run.py` you can give yourself permissions using `chmod +x run.py [options]` and
run it with `./run.py [options]`.
There are several options required to work.
Run `python3 crawler.py -h` for a help menu and see which ones are important for you.
Run `python3 run.py -h` for a help menu and see which ones are important for you.
## Tested StudIP instances