Python BeautifulSoup HTML Scraping Issue -
so, i've been playing around python, , i've been trying learn bit of newer things mixing code find , making possibly end using in future. today, i've project, although when print out links, says
https://v3rmillion.net/showthread.php
instead of being prefer being:
https://v3rmillion.net/showthread.php?tid=393794
import requests,os,urllib,sys, webbrowser, bs4 bs4 import beautifulsoup def startup(): os.system('cls') print('discord profile') user = raw_input('discord tag: ') r = requests.get('https://www.google.ca/search?source=hp&q=' + user + ' site:v3rmillion.net') soup = beautifulsoup(r.text, "html.parser") print soup.find('div',{'id':'resultstats'}).text content=r.content.decode('utf-8','replace') #attempting scrape links, although i'd full length instead of .php links=[] while '<h3 class="r">' in content: content=content.split('<h3 class="r">', 1)[1] split_content=content.split('</h3>', 1) link='http'+split_content[1].split(':http',1)[1].split('%',1)[0] links.append(link) content=split_content[1] link in links[:5]: # max number of links 5 print(link) startup()
Comments
Post a Comment