I can not download PDF files from links generated from a javascript function with python 3.6.0 + selenium 3.4.3 -
the url is:site
by using selenium firefox 47.0.2 binary , python 3.6.0, page click on “pesquisar” button , in next page fill in form tha date range (format d/m/y) , click again on new “pesquisar” button, list of pdf documents , want download them.
when print page_source, can see links generated, don’t understand why selenium can’t locate links.
the simplified code follows:
from selenium import webdriver selenium.webdriver.support.ui import select selenium.webdriver.firefox.firefox_binary import firefoxbinary selenium.webdriver.support.ui import webdriverwait selenium.webdriver.support import expected_conditions ec selenium.webdriver.common.by import datetime import datetime, date, timedelta calendar import monthrange import time driver = webdriver.firefox(firefox_profile=profile, firefox_binary=binary, capabilities=capabilities) driver.maximize_window() wait = webdriverwait(driver, 10) months = range(1, 13) limits = monthrange(2017, 8) #num_docs = limites[1]-limites[0] date_input_begin = '{num:0{width}}'.format(num=limits[0], width=2) + '08' + '2017' date_input_end = '{num:0{width}}'.format(num=limits[1], width=2) + '08' + '2017' today = datetime.now().date() date = today date = date - timedelta(24) driver.get("http://dje.trf2.jus.br/dje/paginas/externas/inicial.aspx") driver.find_element_by_id("ctl00_contentplaceholder_ctrinicial_btnpesquisar").click() wait.until(ec.presence_of_element_located( (by.xpath, '//*[@id="ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_btnfiltrar"]'))) select1 = select(driver.find_element_by_id("ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_ddlareajudicial")) select1.select_by_index(3) select2 = select(driver.find_element_by_id("ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_ddlregistrospaginas")) select2.select_by_index(6) element_date_begin = driver.find_element_by_id( 'ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_tbxdatainicial') element_date_begin.clear() element_date_begin.send_keys(date_input_begin) element_date_end = driver.find_element_by_id( 'ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_tbxdatafinal') element_date_end.clear() element_date_end.send_keys(date_input_end) driver.find_element_by_id('ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_btnfiltrar').submit() wait.until(ec.presence_of_element_located((by.id, 'ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_btnfiltrar'))) wait.until(ec.element_to_be_clickable((by.id, 'ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_btnfiltrar'))) time.sleep(5) driver.find_element_by_id('ctl00_contentplaceholder_ctrfiltrapesquisadocumentos_btnfiltrar').click() wait.until(ec.presence_of_element_located( (by.xpath, '//*[@id="ctl00_contentplaceholder_ctrlistadiarios_udtvisualizaadmrj_lblnomecaderno"]'))) driver.find_element_by_xpath( '//*[@id="ctl00_contentplaceholder_ctrlistadiarios_udtvisualizaadmrj_grvcadernos_ct102_lnkdata"]').click()
but when links id or xpath, following error:
file "c:\users\b2002032064079\anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.nosuchelementexception: message: unable locate element: {"method":"xpath","selector":"//*[@id=\"ctl00_contentplaceholder_ctrlistadiarios_udtvisualizaadmrj_grvcadernos_ct102_lnkdata\"]"}
i’m newbie @ scraping , i’d thankful help! thank you!
first of all: browser using? 2: site slow. maybe try giving more waiting time. 3: xpath correct? think problem xpath try using xpath helper on chrome check.
Comments
Post a Comment