连接本地的redis服务时,不会有任何问题,但是一旦连接服务器上的redis服务,就会大量出现Connection closed by server
的报错,同时爬取速度极慢(大概只有本地速度的百分之一),而且会发生大量重复爬取的情况。不再使用bloom filter
,改为scrapy-redis
原来的实现,则问题消失。为此,我有两个问题想要请教老师:
错误信息:
2020-05-04 15:44:20 [twisted] CRITICAL:
Traceback (most recent call last):
File “C:\Users\airan\Anaconda3\envs\scrapy-redis\lib\site-packages\twisted\internet\task.py”, line 517, in _oneWorkUnit
result = next(self._iterator)
File “C:\Users\airan\Anaconda3\envs\scrapy-redis\lib\site-packages\scrapy\utils\defer.py”, line 74, in
work = (callable(elem, *args, **named) for elem in iterable)
File “C:\Users\airan\Anaconda3\envs\scrapy-redis\lib\site-packages\scrapy\core\scraper.py”, line 193, in _process_spidermw_output
self.crawler.engine.crawl(request=output, spider=spider)
File “C:\Users\airan\Anaconda3\envs\scrapy-redis\lib\site-packages\scrapy\core\engine.py”, line 216, in crawl
self.schedule(request, spider)
File “C:\Users\airan\Anaconda3\envs\scrapy-redis\lib\site-packages\scrapy\core\engine.py”, line 222, in schedule
if not self.slot.scheduler.enqueue_request(request):
File “C:\Users\airan\Desktop\中间件\spider-distributed\scrapy_APP\scrapy_redis\scheduler.py”, line 163, in enqueue_request
if not request.dont_filter and self.df.request_seen(request):
File “C:\Users\airan\Desktop\中间件\spider-distributed\scrapy_APP\scrapy_redis\dupefilter.py”, line 107, in request_seen
self.bf.add(fp)
File “C:\Users\airan\Desktop\中间件\spider-distributed\scrapy_APP\utils\bloomfilter.py”, line 35, in add
self.redis.setbit(name, hash, 1)
File “C:\Users\airan\Anaconda3\envs\scrapy-redis\lib\site-packages\redis\client.py”, line 1777, in setbit
return self.execute_command(‘SETBIT’, name, offset, value)
File “C:\Users\airan\Anaconda3\envs\scrapy-redis\lib\site-packages\redis\client.py”, line 878, in execute_command
return self.parse_response(conn, command_name, **options)
File “C:\Users\airan\Anaconda3\envs\scrapy-redis\lib\site-packages\redis\client.py”, line 892, in parse_response
response = connection.read_response()
File “C:\Users\airan\Anaconda3\envs\scrapy-redis\lib\site-packages\redis\connection.py”, line 734, in read_response
response = self._parser.read_response()
File “C:\Users\airan\Anaconda3\envs\scrapy-redis\lib\site-packages\redis\connection.py”, line 316, in read_response
response = self._buffer.readline()
File “C:\Users\airan\Anaconda3\envs\scrapy-redis\lib\site-packages\redis\connection.py”, line 248, in readline
self._read_from_socket()
File “C:\Users\airan\Anaconda3\envs\scrapy-redis\lib\site-packages\redis\connection.py”, line 193, in _read_from_socket
raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.
2020-05-04 15:44:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://localhost/4> (referer: http://localhost/2)
带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎
了解课程