(已经解决)老师!!browser = uc.Chrome()出问题了,我添加
if name == ‘main’:
freeze_support() 还是不行,以下是报错信息
D:\Anaconda3\python.exe “D:\pycharm\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\pydevd.py” --cmd-line --multiproc --qt-support=auto --client 127.0.0.1 --port 14261 --file D:/eswork-master/articles/main.py
pydev debugger: process 28624 is connecting
Connected to pydev debugger (build 202.6397.98)
2022-02-08 01:55:14 [scrapy.utils.log] INFO: Scrapy 2.5.1 started (bot: articles)
2022-02-08 01:55:14 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1m 14 Dec 2021), cryptography 2.8, Platform Windows-10-10.0.17134-SP0
2022-02-08 01:55:14 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2022-02-08 01:55:14 [scrapy.crawler] INFO: Overridden settings:
{‘BOT_NAME’: ‘articles’,
‘NEWSPIDER_MODULE’: ‘articles.spiders’,
‘SPIDER_MODULES’: [‘articles.spiders’]}
2022-02-08 01:55:14 [scrapy.extensions.telnet] INFO: Telnet Password: 0597b55467723803
2022-02-08 01:55:14 [scrapy.middleware] INFO: Enabled extensions:
[‘scrapy.extensions.corestats.CoreStats’,
‘scrapy.extensions.telnet.TelnetConsole’,
‘scrapy.extensions.logstats.LogStats’]
2022-02-08 01:55:14 [scrapy.middleware] INFO: Enabled downloader middlewares:
[‘scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware’,
‘scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware’,
‘scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware’,
‘scrapy.downloadermiddlewares.useragent.UserAgentMiddleware’,
‘scrapy.downloadermiddlewares.retry.RetryMiddleware’,
‘scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware’,
‘scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware’,
‘scrapy.downloadermiddlewares.redirect.RedirectMiddleware’,
‘scrapy.downloadermiddlewares.cookies.CookiesMiddleware’,
‘scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware’,
‘scrapy.downloadermiddlewares.stats.DownloaderStats’]
2022-02-08 01:55:14 [scrapy.middleware] INFO: Enabled spider middlewares:
[‘scrapy.spidermiddlewares.httperror.HttpErrorMiddleware’,
‘scrapy.spidermiddlewares.offsite.OffsiteMiddleware’,
‘scrapy.spidermiddlewares.referer.RefererMiddleware’,
‘scrapy.spidermiddlewares.urllength.UrlLengthMiddleware’,
‘scrapy.spidermiddlewares.depth.DepthMiddleware’]
2022-02-08 01:55:14 [scrapy.middleware] INFO: Enabled item pipelines:
[‘articles.pipelines.ElasticsearchPipeline’]
2022-02-08 01:55:14 [scrapy.core.engine] INFO: Spider opened
2022-02-08 01:55:14 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-02-08 01:55:14 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2022-02-08 01:55:14 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://news.cnblogs.com/n/page/1> (referer: None)
2022-02-08 01:55:14 [undetected_chromedriver.patcher] DEBUG: getting release number from /LATEST_RELEASE
2022-02-08 01:55:15 [undetected_chromedriver.patcher] DEBUG: downloading from https://chromedriver.storage.googleapis.com/98.0.4758.80/chromedriver_win32.zip
2022-02-08 01:55:16 [undetected_chromedriver.patcher] DEBUG: unzipping C:\Users\Lenovo\AppData\Local\Temp\tmpzgwnf88f
2022-02-08 01:55:16 [undetected_chromedriver.patcher] INFO: patching driver executable C:\Users\Lenovo\appdata\roaming\undetected_chromedriver\chromedriver.exe
2022-02-08 01:55:21 [scrapy.utils.log] INFO: Scrapy 2.5.1 started (bot: articles)
2022-02-08 01:55:21 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1m 14 Dec 2021), cryptography 2.8, Platform Windows-10-10.0.17134-SP0
2022-02-08 01:55:21 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2022-02-08 01:55:21 [scrapy.crawler] INFO: Overridden settings:
{‘BOT_NAME’: ‘articles’,
‘NEWSPIDER_MODULE’: ‘articles.spiders’,
‘SPIDER_MODULES’: [‘articles.spiders’]}
2022-02-08 01:55:21 [scrapy.extensions.telnet] INFO: Telnet Password: 03f6f96c1e07c8f7
2022-02-08 01:55:21 [scrapy.middleware] INFO: Enabled extensions:
[‘scrapy.extensions.corestats.CoreStats’,
‘scrapy.extensions.telnet.TelnetConsole’,
‘scrapy.extensions.logstats.LogStats’]
2022-02-08 01:55:21 [scrapy.middleware] INFO: Enabled downloader middlewares:
[‘scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware’,
‘scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware’,
‘scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware’,
‘scrapy.downloadermiddlewares.useragent.UserAgentMiddleware’,
‘scrapy.downloadermiddlewares.retry.RetryMiddleware’,
‘scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware’,
‘scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware’,
‘scrapy.downloadermiddlewares.redirect.RedirectMiddleware’,
‘scrapy.downloadermiddlewares.cookies.CookiesMiddleware’,
‘scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware’,
‘scrapy.downloadermiddlewares.stats.DownloaderStats’]
2022-02-08 01:55:21 [scrapy.middleware] INFO: Enabled spider middlewares:
[‘scrapy.spidermiddlewares.httperror.HttpErrorMiddleware’,
‘scrapy.spidermiddlewares.offsite.OffsiteMiddleware’,
‘scrapy.spidermiddlewares.referer.RefererMiddleware’,
‘scrapy.spidermiddlewares.urllength.UrlLengthMiddleware’,
‘scrapy.spidermiddlewares.depth.DepthMiddleware’]
2022-02-08 01:55:21 [scrapy.middleware] INFO: Enabled item pipelines:
[‘articles.pipelines.ElasticsearchPipeline’]
2022-02-08 01:55:21 [scrapy.core.engine] INFO: Spider opened
2022-02-08 01:55:21 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-02-08 01:55:21 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2022-02-08 01:55:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://news.cnblogs.com/n/page/1> (referer: None)
2022-02-08 01:55:22 [undetected_chromedriver.patcher] DEBUG: getting release number from /LATEST_RELEASE
2022-02-08 01:55:22 [undetected_chromedriver.patcher] DEBUG: downloading from https://chromedriver.storage.googleapis.com/98.0.4758.80/chromedriver_win32.zip
2022-02-08 01:55:23 [undetected_chromedriver.patcher] DEBUG: unzipping C:\Users\Lenovo\AppData\Local\Temp\tmpp_33ux_j
2022-02-08 01:55:24 [undetected_chromedriver.patcher] INFO: patching driver executable C:\Users\Lenovo\appdata\roaming\undetected_chromedriver\chromedriver.exe
2022-02-08 01:55:24 [scrapy.core.scraper] ERROR: Spider error processing <GET https://news.cnblogs.com/n/page/1> (referer: None)
Traceback (most recent call last):
File “D:\Anaconda3\lib\site-packages\scrapy\utils\defer.py”, line 120, in iter_errback
yield next(it)
File “D:\Anaconda3\lib\site-packages\scrapy\utils\python.py”, line 353, in next
return next(self.data)
File “D:\Anaconda3\lib\site-packages\scrapy\utils\python.py”, line 353, in next
return next(self.data)
File “D:\Anaconda3\lib\site-packages\scrapy\core\spidermw.py”, line 56, in _evaluate_iterable
for r in iterable:
File “D:\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.py”, line 29, in process_spider_output
for x in result:
File “D:\Anaconda3\lib\site-packages\scrapy\core\spidermw.py”, line 56, in _evaluate_iterable
for r in iterable:
File “D:\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\referer.py”, line 342, in
return (_set_referer® for r in result or ())
File “D:\Anaconda3\lib\site-packages\scrapy\core\spidermw.py”, line 56, in _evaluate_iterable
for r in iterable:
File “D:\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\urllength.py”, line 40, in
return (r for r in result or () if _filter®)
File “D:\Anaconda3\lib\site-packages\scrapy\core\spidermw.py”, line 56, in _evaluate_iterable
for r in iterable:
File “D:\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\depth.py”, line 58, in
return (r for r in result or () if _filter®)
File “D:\Anaconda3\lib\site-packages\scrapy\core\spidermw.py”, line 56, in evaluate_iterable
for r in iterable:
File “D:\eswork-master\articles\articles\spiders\pm_spider.py”, line 49, in parse
browser = uc.Chrome()
File "D:\Anaconda3\lib\site-packages\undetected_chromedriver_init.py", line 357, in init
options.binary_location, *options.arguments
File “D:\Anaconda3\lib\site-packages\undetected_chromedriver\dprocess.py”, line 34, in start_detached
daemon=True,
File “D:\Anaconda3\lib\multiprocessing\process.py”, line 112, in start
self._popen = self._Popen(self)
File “D:\Anaconda3\lib\multiprocessing\context.py”, line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File “D:\Anaconda3\lib\multiprocessing\context.py”, line 322, in _Popen
return Popen(process_obj)
File “D:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py”, line 46, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File “D:\Anaconda3\lib\multiprocessing\spawn.py”, line 143, in get_preparation_data
_check_not_importing_main()
File “D:\Anaconda3\lib\multiprocessing\spawn.py”, line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.’’’)
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if name == ‘main’:
freeze_support()
…
The “freeze_support()” line can be omitted if the program
is not going to be frozen to produce an executable.
2022-02-08 01:55:24 [scrapy.core.engine] INFO: Closing spider (finished)
2022-02-08 01:55:24 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{‘downloader/request_bytes’: 224,
‘downloader/request_count’: 1,
‘downloader/request_method_count/GET’: 1,
‘downloader/response_bytes’: 16567,
‘downloader/response_count’: 1,
‘downloader/response_status_count/200’: 1,
‘elapsed_time_seconds’: 2.543997,
‘finish_reason’: ‘finished’,
‘finish_time’: datetime.datetime(2022, 2, 7, 17, 55, 24, 205104),
‘httpcompression/response_bytes’: 80409,
‘httpcompression/response_count’: 1,
‘log_count/DEBUG’: 4,
‘log_count/ERROR’: 1,
‘log_count/INFO’: 11,
‘response_received_count’: 1,
‘scheduler/dequeued’: 1,
‘scheduler/dequeued/memory’: 1,
‘scheduler/enqueued’: 1,
‘scheduler/enqueued/memory’: 1,
‘spider_exceptions/RuntimeError’: 1,
‘start_time’: datetime.datetime(2022, 2, 7, 17, 55, 21, 661107)}
2022-02-08 01:55:24 [scrapy.core.engine] INFO: Spider closed (finished)
带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎
了解课程