python - ValueError: Missing scheme in request url: h when using media pipeline -


i trying download pdf website, followed instruction provided scrapy website got error:

file "/home/joseph/env/lib/python3.5/site-packages/scrapy/http/request/__init__.py", line 58, in _set_url     raise valueerror('missing scheme in request url: %s' % self._url) valueerror: missing scheme in request url: h 2017-09-12 17:47:40 [scrapy.core.scraper] error: error processing {'file_urls': 'https://www.sec.gov/divisions/corpfin/cf-noaction/2008/jpmorgan080409-405.pdf',  'title': ('jpmorgan chase & co.',)} 

settings.py

item_pipelines = { 'sec_scrape.pipelines.secscrapepipeline': 300, 'sec_scrape.pipelines.jsonwriterpipeline': 800, 'scrapy.pipelines.files.filespipeline': 1, }  files_store = '/home/joseph/pdf' 

items.py

import scrapy  class letteritem(scrapy.item):     title = scrapy.field()     file_urls = scrapy.field()     files = scrapy.field() 

spider.py

import scrapy sec_scrape.items import letteritem  class quotesspider(scrapy.spider):     name = "corporate_finance"     allowed_domains = ["sec.gov"]     start_urls = ['https://www.sec.gov/divisions/corpfin/cf-noaction.shtml']  def parse(self, response):     letter in response.xpath('//table[2]/tr/td[3]/ul[74]/li/a'):         item = letteritem()         item['title'] = letter.xpath('text()').extract_first(),         item['file_urls'] = response.urljoin(letter.xpath('@href').extract_first())         yield item 

any idea why getting error?

thank you

file_urls item attribute has list, while set string (the url of file download). change line

item['file_urls'] = response.urljoin(letter.xpath('@href').extract_first()) 

to

item['file_urls'] = [response.urljoin(letter.xpath('@href').extract_first())] 

Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -