loading the best events around you...

Error message here!

Forgot your password?

Error message here!

Error message here!

Error message here!

Error message here!

Lost your password?Please enter your email address.You will receive a link to create a new password.

Error message here!

Back to log-in

Close

Webscraping Any Facebook (url | page | group) information through Python Beautifulsoup library:

14 June 2017 at 11:33

Science & Technology

kavin sharma

Webscraping Any Facebook (url | page | group) information through Python Beautifulsoup library:

So folks while scraping information from Facebook one suspected which comes in our mind is that we can scrap it through Facebook API yet that's not reality in light of the fact with the assistance of Facebook API we can just scrap the information for analysis and statistics for number of Likes, Shares, Comments and everything except on the off chance that you have to scrap the content or Text from Facebook pages that's truly a cramp-breaking assignment.
You may think the explanation behind this since we are scraping information from Facebook and it's one of the greatest site yet, that's not the purpose for it.
The genuine explanation behind that the majority of the content or text of Facebook is commented in code which implies Beautifulsoup won't have the capacity to distinguish the content | text in the source code


. So now the question emerges How would we be able to determine it ?
Here is the answer for it : You simply need to use scrap data through Beautiful Soup 2 times -
Here is the example url of a facebook page :https://www.facebook.com/occasions/1407771472571452/
Code :-------
import urllib2

from bs4 import BeautifulSoup

facebook="https://www.facebook.com/events/1407771472571452/"

page = urllib2.urlopen(facebook)

soup = BeautifulSoup(page, 'lxml')

data = soup.findAll("div", {"class": "hidden_elem"})

#============= Upto this i think everybody will able to scrap information now let's scrap data for 'div' class = "_2qgs"==#

for item in data:

	commentedHTML = item.find('code').contents[0]

	more_soup = BeautifulSoup(commentedHTML, 'lxml')


	wanted_text = more_soup.findAll('div', {'class': '_2qgs'})
	if wanted_text:

		wanted_text

Output:

Presently the content is accessible as wanted_text[0].text
Along these lines, that's it we can Scarp information from commented code :)