Web Scraping Tables using Excel add-in Data Explorer preview:

Standard

In this blog post, we’ll see how you can do some web scraping of HTML data tables that you see on the inter-webs!

Before we begin, If you haven’t downloaded and installed the data explorer add-in for Excel 2010 & 2013, you can find Information about it here: http://office.microsoft.com/en-us/excel/download-data-explorer-for-excel-FX104018616.aspx 

First, let’s try doing copy-pasting of tables found on websites into excel without data explorer add-in.

So I found some very interesting tables here: http://powerpivot-info.com/post/16-powerpivot-dax-function-list-with-samples

And here’s my copy-pasting efforts:

excel copy pasting html tablesNow, that requires formatting! I don’t want to do that especially if I am doing that for few more tables – let’s see an elegant way of going about web scraping tables using Data Explorer add-in:

Step 1:

Keep the URL handy.

Now, Open Excel 2010/2013 > switch to Data Explorer tab > click on From Web

Step 2:

Paste the URL that has the tables you need:

excel web scraping html data tables

Step 3:

The dialog box would list all the tables from that HTML page and so you’ll need to select the table that you want.

(optional) if your tables have headers as first rows. Make sure to mark them as headers: Right Click a Column > use First Row as headers

excel data explorer query editorStep 4:

Click DONE and your excel sheet will populate itself w/ the data from the table.

excel data copied from website data explorerThat’s about it for the steps!

Notes:

1) Data Explorer add-in will let you “explore” external open datasets that’s out there on the internet.

2) Please make sure that you’re not violating any copyrights before you go about web scraping and sharing your work.

And here are some related Posts on Data Explorer:
Unpivoting data using the data explorer preview for Excel 2010/2013
Merging/Joining datasets in Excel using Data Explorer add-in
Remove Duplicates in Excel Tables using Data Explorer Add-in

That’s about it for this post, your comments are very welcome!

New Azure portal is ALL HTML 5!

Standard

New Azure portal is HTML 5 – so what? it just means that portal would be accessible from all devices! Do not get me wrong, I am not against Silver-light but it’s just it was little limiting because the portal was not accessible from say iPad. So from the accessibility stand-point, I am happy!

Let me share a conversation I had with @krisherpi few months back where he was not able to access Azure portal from a tablet that he had just bought – At that time, I had commented that I wish the portal was build using HTML 5 so that we could have more device options to connect to Azure portal – well, seems like Azure team was already working on that!

So I just wanted to point this out. And this is just one of the many awesome features that were discussed at Meet Windows Azure event (7 June 2012)

HTML 5 Powered Azure portal and it’s metro-styled !

new azure portal html 5

Getting started with creating Java app on Google App engine – Guest Post by Dhwaneet Bhatt

Standard

This is a Guest Post by Dhwaneet Bhatt. He’s one awesome programmer and technologist with an innate ability to master any technical topic very quickly. Here’s his twitter profile: https://twitter.com/#!/dhwaneetbhatt for you to able to connect with him.

If you searching for tutorial to get you started with python app on Google App engine. click here. And Here’s his post which would help you to get started with creating a java app on Google App engine:

1. We will be using Eclipse IDE to code a simple Java App for Google App Engine because it is very simple, the other option is building the application using Ant script, which involves a lot of code writing, and I prefer to keep things simple.

2. Download Eclipse from the following URL:

http://www.eclipse.org/downloads/

The App Engine has support for all the versions but we prefer to use the latest version, so download Eclipse Indigo (3.7.x). On the web page, download from the first link that reads “Eclipse IDE for Java EE Developers”. That will download Eclipse 3.7.x.

3. Extract eclipse and start, you will be prompted to create a workspace, create it anywhere you like and proceed.

image

4. Once in eclipse, we need to install the Google App Engine plugin. Go to Help->Install New Software, once there, enter the following URL in the “Work With” text box: http://dl.google.com/eclipse/plugin/3.7

Note: 3.7 in the URL corresponds to the Eclipse version, if you are using some other version, change the last number according to that. Or you can visit this link that gives all the URLs:

https://developers.google.com/eclipse/docs/download

Click on the “Add” button and this repository will be added to Eclipse. Next time you can directly use this repository name for downloading any Google related plugin. It will then download a list of plugins associated with this URL.

image

Select the options “Google Plugin for Eclipse 3.7” under Google Plugin for Eclipse and Google App Engine Java SDKunder SDKs. These are the basic tools required for deploying a simple app to App Engine.

Click Next (2 times), and Accept the Terms of Agreement (after reading of course) and then it will take a couple of minutes for the plugins to get installed. Go grab a cup of coffee.

5. After it is installed, Eclipse will prompt for restart. Restart the Eclipse.

6. Click on New -> Other… and from the list select “Web Application Project” under Google”, give a name for your project and a package structure, and untick Use Google Web Toolkit and click Finish.

image

7. Now comes the surprise, you don’t need to do anything now. Eclipse has already created all the files required for the Hello World App, but still, I will be taking them one by one so that you can understand the necessary steps for writing a custom app.

8. First comes the deployment descriptor (for people who are new to J2EE, I recommend reading about J2EE first), it is the file web.xml located in the location war/WEB-INF/web.xml.

[sourcecode language=”xml”]
<?xml version="1.0" encoding="utf-8"?>

<web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns="http://java.sun.com/xml/ns/javaee"

xmlns:web="http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"

xsi:schemaLocation="http://java.sun.com/xml/ns/javaee

http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" version="2.5">

<servlet>

<servlet-name>Dhwaneetbhattjavaapp</servlet-name>

<servlet-class>com.dhwaneetbhatt.helloworldjavaapp.DhwaneetbhattjavaappServlet</servlet-class>

</servlet>

<servlet-mapping>

<servlet-name>Dhwaneetbhattjavaapp</servlet-name>

<url-pattern>/dhwaneetbhattjavaapp</url-pattern>

</servlet-mapping>

<welcome-file-list>

<welcome-file>index.html</welcome-file>

</welcome-file-list>

</web-app>
[/sourcecode]

The web.xml contains mapping for the default servlet created with the name DhwaneetbhattjavaappServlet, and its URL mapping. This is the URL which will allow the servlet to give a response. And index.html located under war/ is the first file to be loaded when the application is started.

9. The next file is a pretty simple HTML file that contains nothing but a simple link to call the servlet.

[sourcecode language=”html”]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

<meta http-equiv="content-type" content="text/html; charset=UTF-8">

<title>Hello App Engine</title>

</head>

<body>

<h3>Hello App Engine!</h3>

<a href="dhwaneetbhattjavaapp">Servlet</a>

</body>

</html>
[/sourcecode]

10. The next important file is the servlet itself – which responds to the request by the html file. The doGet method means that the request coming from the html file is a GET request. You can repond to a POST request by implementing the doPost method and changing the type of request coming from html form to POST.

[sourcecode language=”java”]
package com.dhwaneetbhatt.helloworldjavaapp;

import java.io.IOException;
import javax.servlet.http.*;

@SuppressWarnings("serial")
public class DhwaneetbhattjavaappServlet extends HttpServlet {
public void doGet(HttpServletRequest req, HttpServletResponse resp)
throws IOException {
resp.setContentType("text/plain");
resp.getWriter().println("Hello, world");
}
}
[/sourcecode]

11. The last file is the appengine-web.xml file which is located in the WEB-INF/ directory. App Engine needs one additional configuration file to figure out how to deploy and run the application. This file includes the registered ID of your application (Eclipse creates this with an empty ID for you to fill in later), the version number of your application, and lists of files that ought to be treated as static files (such as images and CSS) and resource files (such as JSPs and other application data). In the tag, enter the name of your App Id that you had chosen while registering for the app previously.

[sourcecode language=”xml”]
<?xml version="1.0" encoding="utf-8"?>

<appengine-web-app xmlns="http://appengine.google.com/ns/1.0">

<application>dhwaneetbhattjavaapp</application>

<version>1</version>

<system-properties>

<property name="java.util.logging.config.file" value="WEB-INF/logging.properties"/>

</system-properties>

</appengine-web-app>
[/sourcecode]

12. That’s about all the files we’ve covered. There are a couple of other files – configuration files in META-INF, logging.properties file and a favicon file that are unimportant at the moment. As I said Keep it Simple.

13. Now we are ready for a test run. Right click on the project from the Project Explorer, Run As -> Web Application, it will start the server. The default port is 8888. Open any web browser, type http://localhost:8888, you web application will run.

image

image

14. Before you Deploy it to Google App Engine – you’ll need to create an application on Google App Engine and Get the App Id. Scroll to the bottom of the blog post for a step by step guide.

Now that we have tested the code on our machine, it is time to deploy it go Google App Engine. Right click on the project, Google -> Deploy to App Engine, login with your credentials and on the next screen that comes, click on Deploy.

image

15. Your app will be deployed on the App Engine. Open any browser and enter the URL: http://dhwaneetbhattjavaapp.appspot.com and you will have the “Hello, world” app ready for the world to access.

– Dhwaneet Bhatt

 

*Here’s a step by step Tutorial to create an application on Google App Engine and getting the app id: [from Source]

Now, let’s sign up for a Google App Engine Account with a Google account you may have. Part of the process includes verification via a code sent to your mobile device – so be ready to provide your mobile number. And every Google Account gets to deploy 10 applications with a Google App Engine admin account.

Now, To sign up for Google APP engine. Go to: https://appengine.google.com/ and if you have a Google Apps account go to: https://appengine.google.com/a/<DOMAIN.COM>

Now, sign in with your Google account and you would be asked to verify your account:

image

FAQ for this process is here: http://code.google.com/appengine/kb/sms.html

Now, once you successfully verify your account, you would see something like:

image

And click on create application and fill in the following details. For now, fill in the App ID and the App Title. Leave other options as default for now. And yes, please check the availability of your app id and this will also be your URL. The URL will take the form.appspot.com

image

Scroll down and you will find a “create application”. please click on it. You will also see a message: “Application registered successfully”.