GitHub - adbar/trafilatura: Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Discovered: Aug 4, 2022 15:07 GitHub - adbar/trafilatura: Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments <– QUOTE: Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. <— via hrbrmstr’s newsletter