请稍等 ...
×

采纳答案成功!

向帮助你的同学说点啥吧!感谢那些助人为乐的人

scrapy移到AWS EC2上运行出现大量错误 (undetected_chromedriver)

老师好,我在本地windows系统上已经写好,并且运行正常的scrapy搬到AWS EC2上发生各种错误,其中花很多时间仍然无法解决的问题是undetected_chromedriver的部分,我在linux上已经正确的安装chromedriver并启动。

套件版本:
chromedriver版本: 93.0.4577.63
undetected_chromedriver版本: 3.4.6

出现的错误是:
builtins.TypeError: expected str, bytes or os.PathLike object, not NoneType

查询到的问题与解决方法:
问题可能是uc.Chrome()需要指定我在linux中的driver路径,因此我在相关代码中加入路径chrome_driver_binary:

def build_driver():
    options = ChromeOptions()
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_argument('--disable-infobars')
    options.add_argument('--disable-popup-blocking')
    options.add_argument("--headless")
    chrome_driver_binary = '/usr/local/bin/chromedriver'
    return uc.Chrome(chrome_options = options, driver_executable_path = chrome_driver_binary)

问题:
我已经将driver的路径指定到正确的位置,并且使用chmod 777开启了chromedriver的权限,但是为何在build_driver的时候仍然返回None?

请问除了undetected_chromedriver目前遇到的问题之外,还有甚么问题会将scrapy移植到linux上会遇到的,并且需要留意的呢?

详细错误内容:

(scrapy-venv) [ec2-user@ip-172-31-10-226 ~]$ python3.9 housecrawler/main.py
/home/ec2-user/housecrawler/items.py:34: ScrapyDeprecationWarning: scrapy.loader.processors.MapCompose is deprecated, instantiate itemloaders.processors.MapCompose instead.
  default_input_processor = MapCompose(str.strip)
/home/ec2-user/housecrawler/items.py:36: ScrapyDeprecationWarning: scrapy.loader.processors.MapCompose is deprecated, instantiate itemloaders.processors.MapCompose instead.
  price_in = MapCompose(lambda x: x.replace(',', ''))
/home/ec2-user/housecrawler/items.py:38: ScrapyDeprecationWarning: scrapy.loader.processors.Join is deprecated, instantiate itemloaders.processors.Join instead.
  default_output_processor = Join()
2023-03-28 02:45:41 [scrapy.utils.log] INFO: Scrapy 2.7.1 started (bot: housecrawler)
2023-03-28 02:45:41 [scrapy.utils.log] INFO: Versions: lxml 4.9.2.0, libxml2 2.9.14, cssselect 1.2.0, parsel 1.7.0, w3lib 2.1.1, Twisted 22.10.0, Python 3.9.13 (main, Mar 27 2023, 06:53:04) - [GCC 7.3.1 20180712 (Red Hat 7.3.1-15)], pyOpenSSL 23.0.0 (OpenSSL 3.0.7 1 Nov 2022), cryptography 39.0.0, Platform Linux-5.10.167-147.601.amzn2.x86_64-x86_64-with-glibc2.26
2023-03-28 02:45:41 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'housecrawler',
 'COOKIES_ENABLED': False,
 'DUPEFILTER_CLASS': 'scrapy_redis_bloomfilter.RFPDupeFilter',
 'LOG_FILE': 'rent.log',
 'LOG_LEVEL': 'ERROR',
 'NEWSPIDER_MODULE': 'housecrawler.spiders',
 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
 'ROBOTSTXT_OBEY': True,
 'SCHEDULER': 'scrapy_redis.scheduler.Scheduler',
 'SPIDER_MODULES': ['housecrawler.spiders'],
 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'}
Unhandled error in Deferred:

Traceback (most recent call last):
  File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/scrapy/crawler.py", line 220, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/scrapy/crawler.py", line 224, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1947, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1857, in _cancellableInlineCallbacks
    _inlineCallbacks(None, gen, status, _copy_context())
--- <exception caught here> ---
  File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/scrapy/crawler.py", line 115, in crawl
    self.spider = self._create_spider(*args, **kwargs)
  File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/scrapy/crawler.py", line 127, in _create_spider
    return self.spidercls.from_crawler(self, *args, **kwargs)
  File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/scrapy_redis/spiders.py", line 244, in from_crawler
    obj = super(RedisSpider, cls).from_crawler(crawler, *args, **kwargs)
  File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/scrapy/spiders/__init__.py", line 48, in from_crawler
    spider = cls(*args, **kwargs)
  File "/home/ec2-user/housecrawler/spiders/rent.py", line 44, in __init__
    self.page_driver = build_driver()
  File "/home/ec2-user/housecrawler/spiders/rent.py", line 43, in build_driver
    return uc.Chrome(chrome_options = options, driver_executable_path = chrome_driver_binary)
  File "/home/ec2-user/scrapy-venv/lib/python3.9/site-packages/undetected_chromedriver/__init__.py", line 421, in __init__
    browser = subprocess.Popen(
  File "/usr/local/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/local/lib/python3.9/subprocess.py", line 1696, in _execute_child
    and os.path.dirname(executable)
  File "/usr/local/lib/python3.9/posixpath.py", line 152, in dirname
    p = os.fspath(p)
builtins.TypeError: expected str, bytes or os.PathLike object, not NoneType

正在回答 回答被采纳积分+3

1回答

bobby 2023-03-29 21:12:52

本地的可运行的python版本和服务的python版本是否一致?

0 回复 有任何疑惑可以回复我~
  • 提问者 weixin_慕仙7241916 #1
    版本一致,requirements也是一致的
    回复 有任何疑惑可以回复我~ 2023-03-29 23:29:22
  • bobby 回复 提问者 weixin_慕仙7241916 #2
    https://ask.replit.com/t/undetected-chromedriver-path/14905 看看这个呢
    回复 有任何疑惑可以回复我~ 2023-03-31 14:23:01
问题已解决,确定采纳
还有疑问,暂不采纳
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号