爬虫蜘蛛Scrapy shell之从蜘蛛调用shell来检查响应 (27)python SCRAPY最新教程1.51以上版本

发表于： 2020年9月7日 2022年12月8日
分类： Python, scrapy
标签： core, Crawled, DEBUG, engine, GET, inspect, inspect_response, org, python, referer, Scrapy, scrapy shell, scrapy.core.engine, scrapy教程, shell, 爬虫, 蜘蛛

有时你想检查蜘蛛某个特定点正在处理的响应，只是为了检查你期望的响应是否到达那里。

这可以通过使用该scrapy.shell.inspect_response功能来实现。

这是一个如何从蜘蛛中调用它的示例：

import scrapy


class MySpider(scrapy.Spider):
    name = "myspider"
    start_urls = [
        "http://example.com",
        "http://example.org",
        "http://example.net",
    ]

    def parse(self, response):
        # We want to inspect one specific response.
        if ".org" in response.url:
            from scrapy.shell import inspect_response
            inspect_response(response, self)

        # Rest of parsing code.

当你运行蜘蛛时，你会得到类似的东西：

2014-01-23 17:48:31-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.com> (referer: None)
2014-01-23 17:48:31-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.org> (referer: None)
[s] Available Scrapy objects:
[s]   crawler    <scrapy.crawler.Crawler object at 0x1e16b50>
...

>>> response.url
'http://example.org'

然后，您可以检查提取代码是否正常工作：

>>> response.xpath('//h1[@class="fn"]')
[]

不，它没有。因此，您可以在Web浏览器中打开响应，看看它是否是您期望的响应：

>>> view(response)
True

最后，按Ctrl-D（或Windows中的Ctrl-Z）退出shell并继续爬行：

>>> ^D
2014-01-23 17:50:03-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.net> (referer: None)
...

请注意，fetch由于Scrapy引擎被shell阻止，因此您无法使用此处的快捷方式。但是，在您离开外壳后，蜘蛛将继续在停止的位置爬行，如上所示。

使用Python解释器(2)python入门教程 2019年1月2日
爬虫蜘蛛Scrapy shell之运行使用shell详解 (26)python SCRAPY最新教程1.51以上版本 2020年9月6日
数据模型、对象、值和类型(3)Python语言的核心语法(语法教程)(参考资料) 2019年2月1日
引用/导入模块或包的详解(6)python入门教程 2019年1月4日
- 子进程管理 - 并发执行（Python教程）（参考资料） 2019年2月27日
爬虫蜘蛛的运行与调试(43)python Scrapy教程1.51以上版本 2020年9月16日
用于命令行选项，参数和子命令的解析器 - 通用操作系统服务（Python教程）（参考资料） 2019年2月19日
Python的数字/字符/切片等介绍(3)python入门教程 2019年1月3日
1.使用C或C ++扩展Python - 扩展和嵌入Python解释器（Python教程）（参考资料） 2019年5月30日
如何在WordPress中为自定义帖子类型创建高级搜索表单 2018年12月30日
2to3 - 自动Python 2到3代码翻译 - 开发工具（Python教程）（参考资料） 2019年4月26日