python - Scraping box ccores with BeautifulSoup and using pandas to export to Excel -
i've been trying figure out how scrape baseball box scores fangraphs python 3.6 , beautifulsoup , pandas modules. final goal save different sections of webpage different sheets in excel.
in order this, think have pull each table separately respective id tags. code 4 tables (below graph on page) make first excel sheet. running code results in error:
traceback (most recent call last): file "fangraphs box score scraper.py", line 14, in <module> df1 = pd.read_html(soup,attrs={'id': ['winsbox1_dghb','winsbox1_dghp','winsbox1_dgab','winsbox1_dgap']}) file "c:\python36\lib\site-packages\pandas\io\html.py", line 906, in read_html keep_default_na=keep_default_na) file "c:\python36\lib\site-packages\pandas\io\html.py", line 743, in _parse raise_with_traceback(retained) file "c:\python36\lib\site-packages\pandas\compat\__init__.py", line 344, in raise_with_traceback raise exc.with_traceback(traceback) typeerror: 'nonetype' object not callable
import requests bs4 import beautifulsoup import pandas pd url = 'http://www.fangraphs.com/boxscore.aspx?date=2017-09-10&team=red%20sox&dh=0&season=2017' response = requests.get(url) soup = beautifulsoup(response.text,"lxml") df1 = pd.read_html(soup,attrs={'id': ['winsbox1_dghb','winsbox1_dghp','winsbox1_dgab','winsbox1_dgap']}) writer = pd.excelwriter('box scores.xlsx') df1.to_excel(writer,'traditional box scores')
you use wrong id
, take form <div>
need take <table>
tags read_html attrs , think not need use bs, try it:
import pandas pd url = 'http://www.fangraphs.com/boxscore.aspx?date=2017-09-10&team=red%20sox&dh=0&season=2017' df1 = pd.read_html( url, attrs={'id': ['winsbox1_dghb_ctl00', 'winsbox1_dgab_ctl00']} ) # , df1 list of df writer = pd.excelwriter('box scores.xlsx') row = 0 df in df1: df.to_excel(writer, sheet_name='tables', startrow=row , startcol=0) row = row + len(df.index) + 3 writer.save()
Comments
Post a Comment