In this blog post, we’ll see how you can do some web scraping of HTML data tables that you see on the inter-webs!
Before we begin, If you haven’t downloaded and installed the data explorer add-in for Excel 2010 & 2013, you can find Information about it here: http://office.microsoft.com/en-us/excel/download-data-explorer-for-excel-FX104018616.aspx
First, let’s try doing copy-pasting of tables found on websites into excel without data explorer add-in.
So I found some very interesting tables here: http://powerpivot-info.com/post/16-powerpivot-dax-function-list-with-samples
And here’s my copy-pasting efforts:
Now, that requires formatting! I don’t want to do that especially if I am doing that for few more tables – let’s see an elegant way of going about web scraping tables using Data Explorer add-in:
Keep the URL handy.
Now, Open Excel 2010/2013 > switch to Data Explorer tab > click on From Web
Paste the URL that has the tables you need:
The dialog box would list all the tables from that HTML page and so you’ll need to select the table that you want.
(optional) if your tables have headers as first rows. Make sure to mark them as headers: Right Click a Column > use First Row as headers
Click DONE and your excel sheet will populate itself w/ the data from the table.
That’s about it for the steps!
1) Data Explorer add-in will let you “explore” external open datasets that’s out there on the internet.
2) Please make sure that you’re not violating any copyrights before you go about web scraping and sharing your work.
And here are some related Posts on Data Explorer:
Unpivoting data using the data explorer preview for Excel 2010/2013
Merging/Joining datasets in Excel using Data Explorer add-in
Remove Duplicates in Excel Tables using Data Explorer Add-in
That’s about it for this post, your comments are very welcome!