SCRAPING WEBSITES | |
install bs4 (if not done yet) | installs BeautifulSoup |
HTML (hypter text markup language) | doesn't contain actual information (usually), but moreover information how to layout it (position and shape) |
tags | |
<name key=value> ….content…. | opening tag; key = value → "attributes" |
</name> | closing tag |
example: <b> hello! </b> | |
<ul> <li> one </li> <li> two </li> <li> three </li> </ul> | parent tag children of parent tag and siblings to the one next to it |
<div> <div> <a> hi! </b> <b> wow </b> </div> </div> | div, a, and b are all descendants to first div |
<a> </a> | anchor tag, for links |
html.parser | function to return tags result set (list of tags) |
methods to investigate tags | |
.string() | gives content of the tag |
.find() | find the first descendants of that tag meeting the () requirement Attribute Error 'Nonetype' object no attribute 'find' → element looking for was not inside of each loop |
.find_all() | find all descendants matching the requirement |
From website to SQL | |
1) Setup a database and connect to it 2) Create tables 3) Insert | for all of these, see CheatSheet psql |