by combining base url getting text out of image in python using scrapy? -
i tried code :
src1 = "https://hms.harvard.edu/"<br/> src = response.css('div.person-line > div > img::attr("src")').extract_first()<br/> src = sites/default/files/hms-faculty-emails/bx0uvxkp.jpg <br/> import urlparse <br/> urlparse.urljoin(src1, src)<br/> https://hms.harvard.edu/sites/default/files/hms-faculty-emails/bx0uvxkp.jpg<br/> src2 = urlparse.urljoin(src1,src)<br/> email = pytesseract.image_to_string(image.open(src2))<br/>
i'm getting error
ioerror errno 22 invalid mode ('rb') or filename
how email text out of text image..can 1 please?
you should use io.bufferio
buffer, because call function image_to_string
http
path. need write code this:
def get_text(src): response = urlopen(src) buffer = io.bytesio(response.read()) return pytesseract.image_to_string(image.open(buffer))
Comments
Post a Comment