Saturday, March 28, 2015

Data Scrapper in Python

Hello All,

Nowadays we know the data is the most valuable thing in the world, who has the more data has the more power or command over the market. This market is totally data driven and I'm sure in next couple of decades the data can also decide the future, just kidding :) 
But trust me we can power our recommendations systems to predict very much accurate results with the data. Data is directly proportional to the value.

As the data is important then the its collection is also important, so we have number of data sources available over the net, one just need to find it out and fetch the required information from.

So in this post, we are going to learn one of the very famous data collection method is Data Scrapping from world wide web. Today we are going to write data scrapper in Python(3.4.3) 

#Import the required libraries
import urllib.request
import re

#stock symbol lists, you may refer it from file
symbolslist = ["suzlon.bo","unitech.bo","spicejet.bo","idfc6.bo","powergrid6.bo"]

i=0
while i<len(symbolslist):
#scapping page url
urlstr = "https://in.finance.yahoo.com/q?s="+symbolslist[i]+""
htmfile = urllib.request.urlopen(urlstr)
htmtext = htmfile.read().decode('utf-8')
regex='<span id="yfs_l84_'+symbolslist[i]+'">(.+?)</span>'
pattern = re.compile(regex)
price = re.findall(pattern, htmtext)
#Print the scrapped data
print("The price of",symbolslist[i]," is ",price)
i+=1

This is just a basic program you can modify and extend as per your requirement.

Thanks for visiting, stay tuned for more!!!

Followers