python - Scraping box ccores with BeautifulSoup and using pandas to export to Excel -


i've been trying figure out how scrape baseball box scores fangraphs python 3.6 , beautifulsoup , pandas modules. final goal save different sections of webpage different sheets in excel.

in order this, think have pull each table separately respective id tags. code 4 tables (below graph on page) make first excel sheet. running code results in error:

traceback (most recent call last):  file "fangraphs box score scraper.py", line 14, in <module> df1 = pd.read_html(soup,attrs={'id': ['winsbox1_dghb','winsbox1_dghp','winsbox1_dgab','winsbox1_dgap']})  file "c:\python36\lib\site-packages\pandas\io\html.py", line 906, in read_html keep_default_na=keep_default_na)  file "c:\python36\lib\site-packages\pandas\io\html.py", line 743, in _parse raise_with_traceback(retained)  file "c:\python36\lib\site-packages\pandas\compat\__init__.py", line 344, in raise_with_traceback  raise exc.with_traceback(traceback)  typeerror: 'nonetype' object not callable 

import requests bs4 import beautifulsoup import pandas pd  url = 'http://www.fangraphs.com/boxscore.aspx?date=2017-09-10&team=red%20sox&dh=0&season=2017' response = requests.get(url) soup = beautifulsoup(response.text,"lxml")  df1 = pd.read_html(soup,attrs={'id': ['winsbox1_dghb','winsbox1_dghp','winsbox1_dgab','winsbox1_dgap']})  writer = pd.excelwriter('box scores.xlsx') df1.to_excel(writer,'traditional box scores') 

you use wrong id, take form <div> need take <table> tags read_html attrs , think not need use bs, try it:

import pandas pd  url = 'http://www.fangraphs.com/boxscore.aspx?date=2017-09-10&team=red%20sox&dh=0&season=2017' df1 = pd.read_html(     url,     attrs={'id': ['winsbox1_dghb_ctl00', 'winsbox1_dgab_ctl00']} )  # , df1 list of df writer = pd.excelwriter('box scores.xlsx') row = 0 df in df1:     df.to_excel(writer, sheet_name='tables', startrow=row , startcol=0)        row = row + len(df.index) + 3  writer.save() 

Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -