I can not download PDF files from links generated from a javascript function with python 3.6.0 + selenium 3.4.3 -


the url is:site

by using selenium firefox 47.0.2 binary , python 3.6.0, page click on “pesquisar” button , in next page fill in form tha date range (format d/m/y) , click again on new “pesquisar” button, list of pdf documents , want download them.

when print page_source, can see links generated, don’t understand why selenium can’t locate links.

the simplified code follows:

from selenium import webdriver selenium.webdriver.support.ui import select selenium.webdriver.firefox.firefox_binary import firefoxbinary selenium.webdriver.support.ui import webdriverwait selenium.webdriver.support import expected_conditions ec selenium.webdriver.common.by import datetime import datetime, date, timedelta calendar import monthrange import time   driver = webdriver.firefox(firefox_profile=profile, firefox_binary=binary, capabilities=capabilities) driver.maximize_window() wait = webdriverwait(driver, 10)  months = range(1, 13) limits = monthrange(2017, 8)  #num_docs = limites[1]-limites[0]  date_input_begin = '{num:0{width}}'.format(num=limits[0], width=2) + '08' + '2017' date_input_end = '{num:0{width}}'.format(num=limits[1], width=2) + '08' + '2017'  today = datetime.now().date() date = today  date = date - timedelta(24)  driver.get("http://dje.trf2.jus.br/dje/paginas/externas/inicial.aspx")  driver.find_element_by_id("ctl00_contentplaceholder_ctrinicial_btnpesquisar").click()  wait.until(ec.presence_of_element_located(     (by.xpath, '//*[@id="ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_btnfiltrar"]')))  select1 = select(driver.find_element_by_id("ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_ddlareajudicial")) select1.select_by_index(3)  select2 = select(driver.find_element_by_id("ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_ddlregistrospaginas")) select2.select_by_index(6)  element_date_begin = driver.find_element_by_id(     'ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_tbxdatainicial') element_date_begin.clear() element_date_begin.send_keys(date_input_begin)  element_date_end = driver.find_element_by_id(     'ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_tbxdatafinal') element_date_end.clear() element_date_end.send_keys(date_input_end)  driver.find_element_by_id('ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_btnfiltrar').submit()  wait.until(ec.presence_of_element_located((by.id, 'ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_btnfiltrar'))) wait.until(ec.element_to_be_clickable((by.id, 'ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_btnfiltrar')))  time.sleep(5) driver.find_element_by_id('ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_btnfiltrar').click()  wait.until(ec.presence_of_element_located(     (by.xpath, '//*[@id="ctl00_contentplaceholder_ctrlistadiarios_udtvisualizaadmrj_lblnomecaderno"]')))  driver.find_element_by_xpath(     '//*[@id="ctl00_contentplaceholder_ctrlistadiarios_udtvisualizaadmrj_grvcadernos_ct102_lnkdata"]').click() 

but when links id or xpath, following error:

file "c:\users\b2002032064079\anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.nosuchelementexception: message: unable locate element: {"method":"xpath","selector":"//*[@id=\"ctl00_contentplaceholder_ctrlistadiarios_udtvisualizaadmrj_grvcadernos_ct102_lnkdata\"]"}

i’m newbie @ scraping , i’d thankful help! thank you!

first of all: browser using? 2: site slow. maybe try giving more waiting time. 3: xpath correct? think problem xpath try using xpath helper on chrome check.


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -