DiscoverSymbolic Connection034. Web Scraping and Data Science
034. Web Scraping and Data Science

034. Web Scraping and Data Science

Update: 2022-01-21
Share

Description

Data collection is a crucial step for any data related projects. So much so that you might have encountered something along the lines of the “GIGO” (garbage in, garbage out) concept. Some might even say having the right data is more important than having tons of data that can’t be used.


As web scraping being one of the ways to collect data, for this episode, we invited Cliff, a data consultant, back to discuss his personal experience with web scraping. He shared topics such as the basics of web scraping, web scraping tools, the challenges that he faced while trying to scrape web contents, ethics of web scraping, learning materials, and more!


Resources:



  1. Cliff's medium post 1: https://medium.com/codex/scraping-singapore-libraries-f74c541f1f94

  2. Cliff's medium post 2: https://cliffy-gardens.medium.com/iterations-for-my-nlb-scraper-github-code-provided-b4e1f1bd422e

  3. Selenium: https://www.selenium.dev/

  4. BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

  5. TagUI: https://github.com/kelaberetiv/TagUI

  6. Web Scraping with Python: https://www.oreilly.com/library/view/web-scraping-with/9781491985564/

Comments (1)

Denial Brown

Accurate data collection is vital—quality over quantity matters. GIGO reminds us that poor input leads to flawed output, making reliable, usable data essential for meaningful results in any data project. I also read the post How to scrape google news https://groupbwt.com/blog/how-to-scrape-google-news/ to understand how you can use information to obtain data

Jun 16th
Reply
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

034. Web Scraping and Data Science

034. Web Scraping and Data Science

Thu Ya Kyaw & Koo Ping Shung