Data Scraping using Excel VBA and Selenium
Selenium is also used as the automation tool for the data scraping of information from the HTML web pages.
Steps to prepare Excel Macro
Step 1: Open an Excel-based Macro.
Step 2: Insert a new module.
Step 3: Initialize a new subroutine and name it.
Step 4: Under Tools > References. Add Selenium Type Library. These libraries help in opening google Chrome and developing a macro script.
Now the excel file is ready to interact with the browser.
How to open Google Chrome using VBA?
Step 1: Declare and initialize the variable in the subroutine.
Sub test()
Dim driver As New WebDriver
Dim rc, cc, colC As Integer
Step 2: For opening Google Chrome using VBA and Selenium, you need to write command a driver.start ”chrome” and press F5.
Sub test()
Dim driver As New WebDriver
Dim rc, cc, colC As Integer
driver.Start “Chrome”
Application.Wait Now + TimeValue(“00:00:20”)
End Sub
How to open a Website in Chrome Browser using VBA?
Now we need to pass the URL to access the website on the browser.
Sub test()
Dim driver As New WebDriver
Dim rc, cc, colC As Integer
driver.Start “Chrome”
driver.get “https://www.google.com”
Application.Wait Now + TimeValue(“00:00:20”)
End Sub
How to scrape information from the website using VBA?
Suppose someone wants data scraping from a website daily. For this, he wants that as soon as he presses the click button, the data should be auto pulled in the excel file.
The first step is to Inspect Element and check how the data is structured in the source content of the website.
Let us assume that the source code looks like this:
<table class=”datatable”>
<thead>
<tr>
<th>Company</th>
<th>Group</th>
<th>Pre Close (Rs)</th>
<th>Current Price (Rs)</th>
<th>% Change</th> </tr>
As it is clear that data is structured in the single HTML table. So we would design a macro which will pull the header information and corresponding data from the HTML table.
Step 1: Create a for loop that runs as a collection using FindElementByTask() and FindElementByTag()Sub test() Dim driver As New WebDriver Dim rowc, cc, columnC As Integer rowc = 2 Application.ScreenUpdating = False driver.Start “chrome” driver.get “http://demo.guru99.com/test/web-table-element.php” For Each th In driver.FindElementByClass(“dataTable”).FindElementByTag(“thead”).FindElementsByTag(“tr”) cc = 1 For Each t In th.FindElementsByTag(“th”) Sheet2.Cells(1, cc).Value = t.Text cc = cc + 1 Next t Next th End Sub
Step 2: Selenium will now locate the data using the above similar approach.Sub test() Dim driver As New WebDriver Dim rowc, cc, columnC As Integer rowc = 2 Application.ScreenUpdating = False driver.Start “chrome” driver.get “http://demo.guru99.com/test/web-table-element.php” For Each th In driver.FindElementByClass(“dataTable”).FindElementByTag(“thead”).FindElementsByTag(“tr”) cc = 1 For Each t In th.FindElementsByTag(“th”) Sheet2.Cells(1, cc).Value = t.Text cc = cc + 1 Next t Next th For Each tr In driver.FindElementByClass(“dataTable”).FindElementByTag(“tbody”).FindElementsByTag(“tr”) columnC = 1 For Each td In tr.FindElementsByTag(“td”) Sheet2.Cells(rowc, columnC).Value = td.Text columnC = columnC + 1 Next td rowc = rowc + 1 Next tr Application.Wait Now + TimeValue(“00:00:20”) End Sub
Step 3: Once the macro is ready, pass, and assign the subroutine to excel button and exit the module. Label the button as any name.
Step 4: Press the labeled button to get the desired output.
Step 5: Compare the scraped results of Excel file with the results of the Chrome Site.