Changes

Listing Page Classifier (view source)

Revision as of 14:12, 17 April 2019

3 bytes added , 14:12, 17 April 2019

====URL Extraction from HTML====

The goal here is to identify url links from the HTML code of a website. We can solve this by finding the place holder, which is the anchor tag <a>, for a hyperlink. Within the anchor tag, we may locate the href attribute that contains the url link that we are looking for (see example below).

<a href="/wiki/Listing_Page_Classifier_Progress" title="Listing Page Classifier Progress"> Progress Log (updated on 4/15/2019)</a>

'''Note:''' the [https://www.crummy.com/software/BeautifulSoup/bs4/doc/ beautifulsoup] package is used for pulling data out of HTML

====Distinguish Internal Links====

NancyYu

227

edits

Changes

Listing Page Classifier (view source)

Revision as of 14:12, 17 April 2019

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools