老师好,我在本地windows系统上已经写好,并且运行正常的scrapy搬到AWS EC2上发生各种错误,其中花很多时间仍然无法解决的问题是undetected_chromedriver的部分,我在linux上已经正确的安装chromedriver并启动。
套件版本:
chromedriver版本: 93.0.4577.63
undetected_chromedriver版本: 3.4.6
出现的错误是:
builtins.TypeError: expected str, bytes or os.PathLike object, not NoneType
查询到的问题与解决方法:
问题可能是uc.Chrome()需要指定我在linux中的driver路径,因此我在相关代码中加入路径chrome_driver_binary:
def build_driver():
options = ChromeOptions()
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument('--disable-infobars')
options.add_argument('--disable-popup-blocking')
options.add_argument("--headless")
chrome_driver_binary = '/usr/local/bin/chromedriver'
return uc.Chrome(chrome_options = options, driver_executable_path = chrome_driver_binary)
问题:
我已经将driver的路径指定到正确的位置,并且使用chmod 777开启了chromedriver的权限,但是为何在build_driver的时候仍然返回None?
请问除了undetected_chromedriver目前遇到的问题之外,还有甚么问题会将scrapy移植到linux上会遇到的,并且需要留意的呢?
详细错误内容:
(scrapy-venv) [ec2-user@ip-172-31-10-226 ~]$ python3.9 housecrawler/main.py
/home/ec2-user/housecrawler/items.py:34: ScrapyDeprecationWarning: scrapy.loader.processors.MapCompose is deprecated, instantiate itemloaders.processors.MapCompose instead.
default_input_processor = MapCompose(str.strip)
/home/ec2-user/housecrawler/items.py:36: ScrapyDeprecationWarning: scrapy.loader.processors.MapCompose is deprecated, instantiate itemloaders.processors.MapCompose instead.
price_in = MapCompose(lambda x: x.replace(',', ''))
/home/ec2-user/housecrawler/items.py:38: ScrapyDeprecationWarning: scrapy.loader.processors.Join is deprecated, instantiate itemloaders.processors.Join instead.
default_output_processor = Join()
2023-03-28 02:45:41 [scrapy.utils.log] INFO: Scrapy 2.7.1 started (bot: housecrawler)
2023-03-28 02:45:41 [scrapy.utils.log] INFO: Versions: lxml 4.9.2.0, libxml2 2.9.14, cssselect 1.2.0, parsel 1.7.0, w3lib 2.1.1, Twisted 22.10.0, Python 3.9.13 (main, Mar 27 2023, 06:53:04) - [GCC 7.3.1 20180712 (Red Hat 7.3.1-15)], pyOpenSSL 23.0.0 (OpenSSL 3.0.7 1 Nov 2022), cryptography 39.0.0, Platform Linux-5.10.167-147.601.amzn2.x86_64-x86_64-with-glibc2.26
2023-03-28 02:45:41 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'housecrawler',
'COOKIES_ENABLED': False,
'DUPEFILTER_CLASS': 'scrapy_redis_bloomfilter.RFPDupeFilter',
'LOG_FILE': 'rent.log',
'LOG_LEVEL': 'ERROR',
'NEWSPIDER_MODULE': 'housecrawler.spiders',
'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
'ROBOTSTXT_OBEY': True,
'SCHEDULER': 'scrapy_redis.scheduler.Scheduler',
'SPIDER_MODULES': ['housecrawler.spiders'],
'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'}
Unhandled error in Deferred:
Traceback (most recent call last):
File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/scrapy/crawler.py", line 220, in crawl
return self._crawl(crawler, *args, **kwargs)
File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/scrapy/crawler.py", line 224, in _crawl
d = crawler.crawl(*args, **kwargs)
File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1947, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1857, in _cancellableInlineCallbacks
_inlineCallbacks(None, gen, status, _copy_context())
--- <exception caught here> ---
File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks
result = context.run(gen.send, result)
File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/scrapy/crawler.py", line 115, in crawl
self.spider = self._create_spider(*args, **kwargs)
File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/scrapy/crawler.py", line 127, in _create_spider
return self.spidercls.from_crawler(self, *args, **kwargs)
File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/scrapy_redis/spiders.py", line 244, in from_crawler
obj = super(RedisSpider, cls).from_crawler(crawler, *args, **kwargs)
File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/scrapy/spiders/__init__.py", line 48, in from_crawler
spider = cls(*args, **kwargs)
File "/home/ec2-user/housecrawler/spiders/rent.py", line 44, in __init__
self.page_driver = build_driver()
File "/home/ec2-user/housecrawler/spiders/rent.py", line 43, in build_driver
return uc.Chrome(chrome_options = options, driver_executable_path = chrome_driver_binary)
File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/undetected_chromedriver/__init__.py", line 421, in __init__
browser = subprocess.Popen(
File "/usr/local/lib/python3.9/subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/lib/python3.9/subprocess.py", line 1696, in _execute_child
and os.path.dirname(executable)
File "/usr/local/lib/python3.9/posixpath.py", line 152, in dirname
p = os.fspath(p)
builtins.TypeError: expected str, bytes or os.PathLike object, not NoneType
带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎
了解课程