If data are not available in some kind of XML format (like RSS or web services), sometime you need to deal with HTML output of web page. HTML is usually much harder to analyze than XML because you need to write your own parser which is different for every web site. But, the first step, reading of HTML output of web page is pretty simple. To get HTML of web page you need only few lines of code.
To start, place two TextBox controls named txtURL and txtPageHTML, and one button control on web form, like in image bellow:
Image 1: Web form for getting page HTML at design time
Now, on button's click event function, place this code:
[ C# ]
// We need these namespaces
using System;
using System.Text;
using System.Net;
public partial class DefaultCS : System.Web.UI.Page
{
protected void btnGetHTML_Click(object sender, EventArgs e)
{
// We'll use WebClient class for reading HTML of web page
WebClient MyWebClient = new WebClient();
// Read web page HTML to byte array
Byte[] PageHTMLBytes;
if (txtURL.Text != "")
{
PageHTMLBytes = MyWebClient.DownloadData(txtURL.Text);
// Convert result from byte array to string
// and display it in TextBox txtPageHTML
UTF8Encoding oUTF8 = new UTF8Encoding();
txtPageHTML.Text = oUTF8.GetString(PageHTMLBytes);
}
}
}
using System;
using System.Text;
using System.Net;
public partial class DefaultCS : System.Web.UI.Page
{
protected void btnGetHTML_Click(object sender, EventArgs e)
{
// We'll use WebClient class for reading HTML of web page
WebClient MyWebClient = new WebClient();
// Read web page HTML to byte array
Byte[] PageHTMLBytes;
if (txtURL.Text != "")
{
PageHTMLBytes = MyWebClient.DownloadData(txtURL.Text);
// Convert result from byte array to string
// and display it in TextBox txtPageHTML
UTF8Encoding oUTF8 = new UTF8Encoding();
txtPageHTML.Text = oUTF8.GetString(PageHTMLBytes);
}
}
}
[ VB.NET ]
' We need these namespaces
Imports System
Imports System.Text
Imports System.Net
Partial Class _Default
Inherits System.Web.UI.Page
Protected Sub btnGetHTML_Click(ByVal sender As Object, ByVal e As System.EventArgs)Handles btnGetHTML.Click
' We'll use WebClient class for reading HTML of web page
Dim MyWebClient As WebClient = New WebClient()
' Read web page HTML to byte array
Dim PageHTMLBytes() As Byte
If txtURL.Text <> "" Then
PageHTMLBytes = MyWebClient.DownloadData(txtURL.Text)
' Convert result from byte array to string
' and display it in TextBox txtPageHTML
Dim oUTF8 As UTF8Encoding = New UTF8Encoding()
txtPageHTML.Text = oUTF8.GetString(PageHTMLBytes)
End If
End Sub
End Class
Imports System
Imports System.Text
Imports System.Net
Partial Class _Default
Inherits System.Web.UI.Page
Protected Sub btnGetHTML_Click(ByVal sender As Object, ByVal e As System.EventArgs)Handles btnGetHTML.Click
' We'll use WebClient class for reading HTML of web page
Dim MyWebClient As WebClient = New WebClient()
' Read web page HTML to byte array
Dim PageHTMLBytes() As Byte
If txtURL.Text <> "" Then
PageHTMLBytes = MyWebClient.DownloadData(txtURL.Text)
' Convert result from byte array to string
' and display it in TextBox txtPageHTML
Dim oUTF8 As UTF8Encoding = New UTF8Encoding()
txtPageHTML.Text = oUTF8.GetString(PageHTMLBytes)
End If
End Sub
End Class
Now you can start sample project, type some valid URL in first TextBox control and click to "btnGetHTML" button. Code listed above will return HTML code of requested URL and display it in second text box, like in image bellow:
Image 2: HTML Code is read and shown in text box
As you see, loading of HTML code of web page is relatively easy. Analyzing of this data is much harder and depends of page structure.
1 comment:
Thanks Suthahar. I was looking for this.
Post a Comment