bash - Mirror multiple page site with lftp -
i need mirror data hosted on web site on regular basis, trying use lftp (version 4.0.9) great job task. site downloading has multiple pages (i intending loop on recent n pages in bash script run several times day). can't work out how lftp accept page parameter. i've had no luck searching solution online , have tried has failed far.
this works perfectly:
lftp -c 'mirror -v -i "s1a" -p 4 https://qc.sentinel1.eo.esa.int/aux_resorb/' this not:
lftp -c 'mirror -v -i "s1a" -p 4 https://qc.sentinel1.eo.esa.int/aux_resorb/?page=2' it gives error:
mirror: access failed: 404 not found (/aux_resorb/?page=2) i tried passing new url in variable didn't work either. i'd grateful suggestions solve issue.
before suggested, know wget option , pagination works - tested - don't want use because less appropriate wastes lot of time getting "index.html?param=value" , removing them, given number of pages isn't feasible.
the problem lftp's mirror command adds slash given url when requesting page (see below). boils down how remote end handle urls , whether gets upset of trailing slash. on tests, drupal sites example not trailing slash , return 404 other sites worked fine. unfortunately not able figure out workaround if insist of using lftp.
tests
i tried following requests against web server:
1. lftp -c 'mirror -v http://example/path' 2. lftp -c 'mirror -v http://example/path/?page=2' 3. lftp -c 'mirror -v http://example/path/file' 4. lftp -c 'mirror -v http://example/path/file?page=2' these commands resulted following head requests seen web server:
1. head /path/ 2. head /path/%3fpage=2/ 3. head /path/file/ 4. head /path/file%3fpage=2/ note there's trailing slash in request. %3f url encoded character ?.
Comments
Post a Comment