# scrapework A Python framework for scraping web pages. ### Requirements * Python 3.4+ * [`requests`](https://pypi.org/project/requests/) module * [`beautifulsoup4`](https://pypi.org/project/beautifulsoup4/) module ### Getting Started To run: ``` $ cd git/scrapework $ python Python 3.6.2 |Anaconda custom (x86_64)| (default, Sep 21 2017, 18:29:43) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import scrapework >>> s = scrapework.Scrape(vars) >>> s.get_pages() >>> s.parse_files() >>> ``` Scrape class takes the following as arguments: * Base URL of website to scrape - i.e. 'https://archivesgig.com' * Path to output directory - i.e. '/Users/username/path/to/folder or 'folder' * Desired filename for output files - i.e. 'archivesgig' * Pagination data (optional): URL structure for paginated pages (i.e. '/pages/'), begin page number (i.e. 1), end page number (i.e. 200), step (i.e. 1)