Python BeautifulSoup HTML Scraping Issue -

September 15, 2012

so, i've been playing around python, , i've been trying learn bit of newer things mixing code find , making possibly end using in future. today, i've project, although when print out links, says

https://v3rmillion.net/showthread.php

instead of being prefer being:

https://v3rmillion.net/showthread.php?tid=393794

import requests,os,urllib,sys, webbrowser, bs4  bs4 import beautifulsoup  def startup():     os.system('cls')     print('discord profile')     user = raw_input('discord tag: ')     r = requests.get('https://www.google.ca/search?source=hp&q=' + user + ' site:v3rmillion.net')     soup = beautifulsoup(r.text, "html.parser")     print soup.find('div',{'id':'resultstats'}).text     content=r.content.decode('utf-8','replace')      #attempting scrape links, although i'd full length instead of .php     links=[]     while '<h3 class="r">' in content:         content=content.split('<h3 class="r">', 1)[1]         split_content=content.split('</h3>', 1)         link='http'+split_content[1].split(':http',1)[1].split('%',1)[0]         links.append(link)         content=split_content[1]     link in links[:5]:  # max number of links 5         print(link)  startup()

Search This Blog

Single

Python BeautifulSoup HTML Scraping Issue -

Comments

Post a Comment

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -