bash - Mirror multiple page site with lftp -


i need mirror data hosted on web site on regular basis, trying use lftp (version 4.0.9) great job task. site downloading has multiple pages (i intending loop on recent n pages in bash script run several times day). can't work out how lftp accept page parameter. i've had no luck searching solution online , have tried has failed far.

this works perfectly:

lftp -c 'mirror -v -i "s1a" -p 4 https://qc.sentinel1.eo.esa.int/aux_resorb/' 

this not:

lftp -c 'mirror -v -i "s1a" -p 4 https://qc.sentinel1.eo.esa.int/aux_resorb/?page=2' 

it gives error:

mirror: access failed: 404 not found (/aux_resorb/?page=2) 

i tried passing new url in variable didn't work either. i'd grateful suggestions solve issue.

before suggested, know wget option , pagination works - tested - don't want use because less appropriate wastes lot of time getting "index.html?param=value" , removing them, given number of pages isn't feasible.

the problem lftp's mirror command adds slash given url when requesting page (see below). boils down how remote end handle urls , whether gets upset of trailing slash. on tests, drupal sites example not trailing slash , return 404 other sites worked fine. unfortunately not able figure out workaround if insist of using lftp.

tests

i tried following requests against web server:

1. lftp -c 'mirror -v http://example/path' 2. lftp -c 'mirror -v http://example/path/?page=2' 3. lftp -c 'mirror -v http://example/path/file' 4. lftp -c 'mirror -v http://example/path/file?page=2' 

these commands resulted following head requests seen web server:

1. head /path/ 2. head /path/%3fpage=2/ 3. head /path/file/ 4. head /path/file%3fpage=2/ 

note there's trailing slash in request. %3f url encoded character ?.


Comments

Popular posts from this blog

neo4j - finding mutual friends in a cypher statement starting with three or more persons -

php - How to remove letter in front of the word laravel -

minify - Minimizing css files -