输入正文最近突发奇想,想测试下布隆过滤器,感觉URL量变大之后,存储真的好慢啊
然后还会报这个错误:
Traceback (most recent call last):
File "D:/pyworkSpace/ENV/scrapy/Scripts/article_spider/article_spider/utils/bloomfilter.py", line 87, in <module>
if not bf.is_exist(test_url):
File "D:/pyworkSpace/ENV/scrapy/Scripts/article_spider/article_spider/utils/bloomfilter.py", line 47, in is_exist
name = self.key + "_" + str(ord(value[0]) % self.blocknum)
IndexError: string index out of range
带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎
了解课程