Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites.
There are several modules that make it easy to scrape web pages in Python.
- Webbrowser: It comes with Python and opens a browser to a specific page.
- Requests: Downloads files and web pages from the Internet.
- Beautiful Soup: To parses HTML, the format that web pages are written in.
- Selenium: It can launch and control a web browser. It is able to fill in forms and simulate mouse clicks in browser and other interesting stuff.
This is Beautiful Soup official website and this is Selenium python docs.
I wrote a small script for a friend of mine last month. His team is into making small planes. They collect their data(like battery type, motor manufacturer, propeller type etc.) for their models from this german website.
So on the site there are about 60 battery types, 172 controller types, 22 propeller types, many different motor manufacturers and each manufacturer has different motor types. All this data appears in form of drop down menu. His objective is to get maximum thrust for input power <= 1000W.
Now it is not humanly possible to try every possible combination to get the desired result. Here comes the role of web scraping: extracting and manipulating content of a website to obtain desired result.
My web scraping code repository on git.
No comments :
Post a Comment