爬虫蜘蛛项目加载器Item Loader类详解之嵌套加载器详解 (22)python SCRAPY最新教程1.51以上版本

发表于： 2020年9月4日 2022年12月8日
分类： Python, scrapy
标签： add, add_xpath, class, email, footer, href, item, ItemLoader, load, Loader, python, Scrapy, scrapy教程, stuff, 加载, 嵌套, 爬虫, 蜘蛛, 选择器, 页脚

解析文档子节中的相关值时，创建嵌套加载器会很有用。想象一下，您从页面的页脚中提取详细信息，如下所示：

例：

<footer>
    <a class="social" href="https://facebook.com/whatever">Like Us</a>
    <a class="social" href="https://twitter.com/whatever">Follow Us</a>
    <a class="email" href="mailto:[email protected]">Email Us</a>
</footer>

如果没有嵌套的加载器，则需要为要提取的每个值指定完整的xpath（或css）。

例：

loader = ItemLoader(item=Item())
# load stuff not in the footer
loader.add_xpath('social', '//footer/a[@class = "social"]/@href')
loader.add_xpath('email', '//footer/a[@class = "email"]/@href')
loader.load_item()

相反，您可以使用页脚选择器创建嵌套加载程序并添加相对于页脚的值。功能相同但您避免重复页脚选择器。

例：

loader = ItemLoader(item=Item())
# load stuff not in the footer
footer_loader = loader.nested_xpath('//footer')
footer_loader.add_xpath('social', 'a[@class = "social"]/@href')
footer_loader.add_xpath('email', 'a[@class = "email"]/@href')
# no need to call footer_loader.load_item()
loader.load_item()

您可以任意嵌套加载器，它们可以使用xpath或css选择器。作为一般准则，当它们使代码更简单时使用嵌套的加载器但是不要过度嵌套或者解析器变得难以阅读。

顶级Notch WordPress页脚设计（最佳实践和技巧） 2019年1月10日
如何使用jQuery在WordPress中将平滑滚动添加到顶部效果 2022年8月10日
爬虫蜘蛛Scrapy设置Settings大全(36)python SCRAPY最新教程1.51以上版本 2020年9月11日
类的定义、参数声明、数据成员使用详解(9) - python入门教程 2019年1月6日
- HTTP模块 - Internet协议和支持（Python教程）（参考资料） 2019年4月5日
用于命令行选项，参数和子命令的解析器 - 通用操作系统服务（Python教程）（参考资料） 2019年2月19日
爬虫蜘蛛采集请求和回应Request和Response之请求对象scrapy.Request(33)py… 2020年9月10日
- HTTP服务器 - 互联网协议和支持（Python教程）（参考资料） 2019年4月10日
抓取采集网页并提取数据(5)python SCRAPY最新教程1.51以上版本 2020年8月27日
使用蒙特卡洛方案为奇异期权定价的观察 2022年9月1日
- HTTP协议客户端 - Internet协议和支持（Python教程）（参考资料） 2019年4月5日