1
15874103329 OP 按照教程里写的,但是我这代码只爬取了一页就结束了,求大佬帮忙看看
|
2
15874103329 OP 求助啊
|
3
Leigg 2019-01-10 00:00:46 +08:00 via iPhone
把 next 打印出来
|
4
carry110 2019-01-10 04:48:30 +08:00 via iPhone
next 哪行,不要 extract_first ()试试。
|
5
carry110 2019-01-10 10:55:22 +08:00
把 if next:去掉就能行了,亲测!
|
6
15874103329 OP |
7
15874103329 OP @carry110
我这还是只打印了一页,不知啥情况 |
8
Leigg 2019-01-10 12:09:33 +08:00 via iPhone
贴出 scrapy 结束的日志
|
9
15874103329 OP @Leigg
2019-01-10 11:35:18 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'http': <GET http://http//quotes.toscrape.com/page/2> 2019-01-10 11:35:18 [scrapy.core.engine] INFO: Closing spider (finished) 2019-01-10 11:35:18 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 446, 'downloader/request_count': 2, 'downloader/request_method_count/GET': 2, 'downloader/response_bytes': 2701, 'downloader/response_count': 2, 'downloader/response_status_count/200': 1, 'downloader/response_status_count/404': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 1, 10, 3, 35, 18, 314550), 'item_scraped_count': 10, 'log_count/DEBUG': 14, 'log_count/INFO': 7, 'offsite/domains': 1, 'offsite/filtered': 9, 'request_depth_max': 1, 'response_received_count': 2, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2019, 1, 10, 3, 35, 14, 371325)} 2019-01-10 11:35:18 [scrapy.core.engine] INFO: Spider closed (finished) |
10
15874103329 OP 已解决,修改代码为 yield scrapy.http.Request(url, callback=self.parse, dont_filter=True)
|