scrapy - xpath one line doesn't get me the link -

July 15, 2012

what want spider engine recognizes link next page.

this page http://quotes.toscrape.com/

i have 2 variants. first 1 css syntax based, works, second 1 (which want xpath version be, doesn't)

next_page_url = response.css('li.next > a::attr(href)').extract_first()

//this below not work

next_page_url = response.xpath('/a[contains(@href,"next")]/@href').extract_first()

so while can go along css, still curious @ knowing incorrect given xpath syntax makes not give results of css equivalent.

thank you

it goes here:

#follow pagination link next_page_url = response.css('li.next > a::attr(href)').extract_first() if next_page_url:    next_page_url = response.urljoin(next_page_url)    yield scrapy.request(url=next_page_url,callback=self.parse)

considering provided html target link doesn't contain "next" in @href. try below expression:

next_page_url = response.xpath('/a[contains(text(), "next")]/@href').extract_first()

if want exact analogue of css selector:

next_page_url = response.xpath('/li[contains(@class, "next")]/a/@href').extract_first()

Search This Blog

Single

scrapy - xpath one line doesn't get me the link -

Comments

Post a Comment

Popular posts from this blog

neo4j - finding mutual friends in a cypher statement starting with three or more persons -

php - How to remove letter in front of the word laravel -

linux - Why does bash short curcuit fail in crontab? -