python爬虫之路scrapy

2023-09-06 阅读 23 评论 0

摘要：python里面的yield怎么用？ yield from是什么意思？ scrpay The best way to learn is with examples, and Scrapy is no exception. 常用命令 scrapy startporject scrapy crawl hello scrapy shell http://www.qq.com 打开页面提取数据保存参数 scrapy概述就

python里面的yield怎么用？
yield from是什么意思？

scrpay

The best way to learn is with examples, and Scrapy is no exception.

常用命令
scrapy startporject
scrapy crawl hello
scrapy shell http://www.qq.com

打开页面
提取数据
保存参数

scrapy概述

就如同web开发有框架flask spring等等，爬虫也是有框架的，scrapy就是一个爬虫框架，或者说是爬虫引擎。

安装

pip3 install scrapy

使用

以爬取这个网站为例子 quotes.toscrape.com

第一创建一个爬虫应用。
scrapy startproject tutorial
目录结构如下
第二步写一个爬虫。爬取网站提取数据。


class QuotesSpider(scrapy.Spider):name = "quotes"def start_requests(self):urls = ['http://quotes.toscrape.com/page/1/','http://quotes.toscrape.com/page/2/',]for url in urls:yield scrapy.Request(url=url, callback=self.parse)def parse(self, response):page = response.url.split("/")[-2]filename = 'quotes-%s.html' % pagewith open(filename, 'wb') as f:f.write(response.body)self.log('Saved file %s' % filename)