Storing scanned web-pages and files to SQL Server

Wednesday Feb 25th 2004 by Muthusamy Anantha Kumar aka The MAK
Share:

Learn how to explore any Internet web page and store partial or complete information to either an HTML file or Microsoft SQL server table.

Learn how to explore any Internet web page and store partial or complete information to either an HTML file or Microsoft SQL server table.

This article provides a guide on how to explore any Internet web page and store partial or complete information to either an HTML file or Microsoft SQL server table. In addition, the method presented here will aid you to scan a webpage for a particular string and then send that string and related content as an email.

Method 1:

This method explains how to read many web pages listed in a text file and store those web pages as HTML files.

Step1:

Create a folder C:\ScanWebPage and list all of the web sites that you would like to scan in a text file as shown below (C:\ScanWebPage\WebSiteList.txt).

http://www.microsoft.com
http://www.oracle.com
http://www.databasejournal.com
http://www.dell.com

Step2:

Copy and paste the code below, and save the file as C:\ScanWebPage\ScanWebPagefromText.vbs

'Author: MAK
'Contact: mak_999@yahoo.com
'Objective: To scan many websites and store it as HTML files
Set iFSO = CreateObject("Scripting.FilesyStemObject")
InputFile="C:\ScanWebPage\WebSiteList.txt"
Set ifile = iFSO.OpenTextFile(inputfile)  
Do until ifile.AtEndOfLine
webpage = ifile.ReadLine
Outputfile=replace(webpage,"http://","")
Outputfile="c:\Scanwebpage\"+Outputfile+".html"
'msgbox webpage
'msgbox outputfile
Set objXMLhttp = CreateObject("MSXML2.XMLhttp") 
With objXMLhttp 
  .open "GET", webpage, False 
	On Error Resume Next 
  .send 
  If Err.Number <> 0 Then 
    Msgbox "XMLhttp error " & Hex(Err.Number) & " " & Err.Description 
  ElseIf .status <> 200 Then 
    MsgBox "http error " & CStr(.status) & " " & .statusText 
  Else 
    Set objFSO = CreateObject("Scripting.FileSystemObject") 
    Set objTS = objFSO.CreateTextFile(outputfile, True) 
    objTS.Write Replace(.responseText, vbLf, vbNewLine) 
    objTS.Close 
    Set objTS = Nothing 
    Set objFSO = Nothing 
  End If 
end with
Set objXMLhttp =nothing
Loop
MsgBox "Completed writing all Web Pages!"

Step 3:

Execute the above VB script to create one HTML file for every website listed in the WebSiteList.txt. Upon completion, you will be prompted with the following message box:

Three files will have been created under c:\scanwebpage:

Method 2

This method explains how to read many web pages listed in a SQL Server table and store those web pages as html files.

Step1:

The following script creates the database and tables for Scanning Web Pages:

Create Database WebPageScanner
go
use WebPageScanner
go
Create Table WebPages(Id int identity(1,1), WebPage varchar(500))
go
Insert into WebPages(WebPage)  select 'http://www.microsoft.com'
Insert into WebPages(WebPage)  select 'http://www.oracle.com'
Insert into WebPages(WebPage)  select 'http://www.databasejournal.com'
Insert into WebPages(WebPage)  select 'http://www.dell.com'
Go
use master
go 
sp_addlogin 'WebPageUser','Web','WebPageScanner'
go
use WebPageScanner
go
sp_adduser 'WebPageUser'
go
sp_addrolemember 'db_datawriter', 'WebPageUser'
go
sp_addrolemember 'db_datareader', 'WebPageUser'
go

Step2:

Copy and paste the code below into a text file and save as C:\ScanWebPage\ScanWebPagefromSQL.vbs

'Author: MAK
'Contact: mak_999@yahoo.com
'Objective: To scan many websites from SQL Table and store it as HTML files
Set AdCn = CreateObject("ADODB.Connection")
AdCn.commandtimeout =36000
Set AdRec1 = CreateObject("ADODB.Recordset")
AdCn.Open = "Provider=SQLOLEDB.1;Data Source=SQL2k\instance1;
Initial Catalog=WebPageScanner;user id = 'WebPageuser';password='Web' "
SQL1 = "Select WebPage from WebPages order by [ID]"
AdRec1.Open SQL1, AdCn,1,1
while not Adrec1.EOF 
WebPage= Adrec1("WebPage")
Outputfile=replace(webpage,"http://","")
Outputfile="c:\Scanwebpage\"+Outputfile+".html"
'msgbox webpage
'msgbox outputfile

Set objXMLhttp = CreateObject("MSXML2.XMLhttp") 
With objXMLhttp 
  .open "GET", webpage, False 
	On Error Resume Next 
  .send 
  If Err.Number <> 0 Then 
    Msgbox "XMLhttp error " & Hex(Err.Number) & " " & Err.Description 
  ElseIf .status <> 200 Then 
    MsgBox "http error " & CStr(.status) & " " & .statusText 
  Else 
    Set objFSO = CreateObject("Scripting.FileSystemObject") 
    Set objTS = objFSO.CreateTextFile(outputfile, True) 
    objTS.Write Replace(.responseText, vbLf, vbNewLine) 
    objTS.Close 
    Set objTS = Nothing 
    Set objFSO = Nothing 
  End If 
end with
Set objXMLhttp =nothing

Adrec1.movenext
Wend

MsgBox "Completed writing all Web Pages!"

Step 3:

Executing the above VB script will create one HTML file for each website listed in the SQL table Webpages. Upon completion, you will be prompted with the following message box:

Three files will have been created under c:\scanwebpage:



Method 3

This method explains how to read multiple web pages listed in a SQL Server table and store those web pages to a SQL Table.

Step1:

Create a table in SQL server database to store the web pages.

Use WebPageScanner
Go
Create table WebPageStorage (ID int identity(1,1), 
  Webpage varchar(500), Content Varchar(7000))
Go

Step 2 :

Copy and paste the code below to a text file and save as C:\ScanWebPage\ScanWebPagefromandToSQL.vbs

'Author: MAK
'Contact: mak_999@yahoo.com
'Objective: To scan many websites from SQL Table 
'and store it in a SQL Table

Set AdCn = CreateObject("ADODB.Connection")
AdCn.commandtimeout =36000
Set AdRec = CreateObject("ADODB.Recordset")
Set AdRec1 = CreateObject("ADODB.Recordset")
AdCn.Open = "Provider=SQLOLEDB.1;Data Source=
  SQL2k\instance1;Initial Catalog=WebPageScanner;
  user id = 'WebPageuser';password='Web' "
SQL1 = "Select WebPage from WebPages order by [ID]"
AdRec1.Open SQL1, AdCn,1,1
while not Adrec1.EOF 
WebPage= Adrec1("WebPage")

Set objXMLhttp = CreateObject("MSXML2.XMLhttp") 
With objXMLhttp 
  .open "GET", webpage, False 
'	On Error Resume Next 
  .send 
  If Err.Number <> 0 Then 
    Msgbox "XMLhttp error " & Hex(Err.Number) & 
	" " & Err.Description 
  ElseIf .status <> 200 Then 
    MsgBox "http error " & CStr(.status) & " " & .statusText 
  Else 
	Query=""
	Query = "insert into WebPageStorage (WebPage,Content) Values ('"+ WebPage + "',
	  '"+ replace(.responseText,"'","")+"')"
	AdRec.Open Query, AdCn,1,1
  End If 
end with
Set objXMLhttp =nothing
Adrec1.movenext
Wend

MsgBox "Completed writing all Web Pages!"

Step 3:

Above VB script stores the web page in a TEXT format in a SQL Server table as shown below.

Note: Replace SQL2k\Instance1 with your SQL Server box name.

Upon completion, you will be presented with the following message box:

Method 4:

This method will scan the webpage http://forums.databasejournal.com and find the string "Members" and send email to mak_999@yahoo.com

Step 1:

Copy and paste the following code to a text file and save as C:\ScanWebPage\ScanWebPage.vbs

'Author: MAK
'Contact: mak_999@yahoo.com
'Objective: To scan http://forums.databasejournal.com and find # of members and 
'send email to mak_999@yahoo.com with latest information.

'Option Explicit 
Const ForReading = 1 
Const ForWriting = 2 
Dim objXMLhttp, objFSO, objTS 
Set fso = CreateObject("Scripting.FileSystemObject") 
Set objXMLhttp = CreateObject("MSXML2.XMLhttp") 
With objXMLhttp 
  .open "GET", "http://forums.databasejournal.com", False 
  On Error Resume Next 
  .send 
  If Err.Number <> 0 Then 
    Msgbox "XMLhttp error " & Hex(Err.Number) & " " & Err.Description 
  ElseIf .status <> 200 Then 
    MsgBox "http error " & CStr(.status) & " " & .statusText 
  Else 
    Set objFSO = CreateObject("Scripting.FileSystemObject") 
    Set objTS = objFSO.CreateTextFile("findtext.txt", True) 
    objTS.Write Replace(.responseText, vbLf, vbNewLine) 
    objTS.Close 
    Set objTS = Nothing 
    Set objFSO = Nothing 
'    MsgBox "Completed writing file!" 
  End If 
End With 

Set ReadFile = fso.OpenTextFile("findtext.txt", ForReading, TristateFalse) 
thisTXT = ReadFile.ReadAll 

Chaine = "Members: " 
Chaine2 = "<br>" 
ValueSearch = InStr(1,thisTXT,Chaine,1) 
ValueSearch2 = InStr(valuesearch,thisTXT,Chaine2,1) 
'msgbox ValueSearch 
'msgbox ValueSearch2 
If ValueSearch>=1 Then 
y=  mid(thisTXT,ValueSearch,(ValueSearch2-ValueSearch)) 
'msgbox y 
Else 
y="Wrong Data" 
End If 
Set objXMLhttp = Nothing 
x=Weekdayname(weekday(now())) +", "+monthname(month(now()))+" "+right("00"+cstr(month(now())),2)+", "+cstr(year(now())) 
'if left(mid(thisTXT,ValueSearch,(ValueSearch2-ValueSearch)),26)=x then 
'msgbox "True" 
'end if 
'msgbox x 

'Sending a text email using a remote server  

Set objMessage = CreateObject("CDO.Message") 
objMessage.Subject = "Database Journal # of Members" 
objMessage.Sender = "mak_999@yahoo.com" 
objMessage.To = "mak_999@yahoo.com" 
objMessage.TextBody = y 
'msgbox y
'The line below shows how to send a webpage from a remote site 
objMessage.Configuration.Fields.Item _ 
("http://schemas.microsoft.com/cdo/configuration/sendusing") = 2 
'Name or IP of Remote SMTP Server 
objMessage.Configuration.Fields.Item _ 
("http://schemas.microsoft.com/cdo/configuration/smtpserver") = "121.111.222.11" 
'Server port (typically 25) 
objMessage.Configuration.Fields.Item _ 
("http://schemas.microsoft.com/cdo/configuration/smtpserverport") = 25 
objMessage.Configuration.Fields.Update 

objMessage.Send

Step 2:

Executing the above VB Script will find the number of members, threads and posts as shown below, and send an email.

Note: Replace 121.111.222.11 with your SMTP server IP address. Replace mak_999@yahoo.com with latest # of members, threads and posts.

Conclusion:

By scheduling the VB scripts listed in method 1, method 2 and method 3 you can store the listed web pages in an HTML file or as an SQL Server table. By scheduling the VB script in method 4, you will get e-mail with latest information from a web site. You can also update the script in such a way that it can compare the previous values and send email only when the value was changed.

» See All Articles by Columnist MAK

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved