python main.py
运行之后的结果如下:
2019-04-13 23:07:56 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: ArticleSpider)
2019-04-13 23:07:56 [scrapy.utils.log] INFO: Versions: lxml 4.3.1.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 18.9.0, Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1a 20 Nov 2018), cryptography 2.5, Platform Windows-10-10.0.10586-SP0
2019-04-13 23:07:56 [scrapy.crawler] INFO: Overridden settings: {‘BOT_NAME’: ‘ArticleSpider’, ‘NEWSPIDER_MODULE’: ‘ArticleSpider.spiders’, ‘SPIDER_MODULES’: [‘ArticleSpider.spiders’]}
2019-04-13 23:07:56 [scrapy.extensions.telnet] INFO: Telnet Password: a971135b73a32091
2019-04-13 23:07:56 [scrapy.middleware] INFO: Enabled extensions:
[‘scrapy.extensions.corestats.CoreStats’,
‘scrapy.extensions.telnet.TelnetConsole’,
‘scrapy.extensions.logstats.LogStats’]
2019-04-13 23:07:57 [scrapy.middleware] INFO: Enabled downloader middlewares:
[‘scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware’,
‘scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware’,
‘scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware’,
‘scrapy.downloadermiddlewares.useragent.UserAgentMiddleware’,
‘scrapy.downloadermiddlewares.retry.RetryMiddleware’,
‘scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware’,
‘scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware’,
‘scrapy.downloadermiddlewares.redirect.RedirectMiddleware’,
‘scrapy.downloadermiddlewares.cookies.CookiesMiddleware’,
‘scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware’,
‘scrapy.downloadermiddlewares.stats.DownloaderStats’]
2019-04-13 23:07:57 [scrapy.middleware] INFO: Enabled spider middlewares:
[‘scrapy.spidermiddlewares.httperror.HttpErrorMiddleware’,
‘scrapy.spidermiddlewares.offsite.OffsiteMiddleware’,
‘scrapy.spidermiddlewares.referer.RefererMiddleware’,
‘scrapy.spidermiddlewares.urllength.UrlLengthMiddleware’,
‘scrapy.spidermiddlewares.depth.DepthMiddleware’]
2019-04-13 23:07:57 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2019-04-13 23:07:57 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:53584/session {“capabilities”: {“firstMatch”: [{}], “alwaysMatch”: {“browserName”: “chrome”, “platformName”: “any”, “goog:chromeOptions”: {“extensions”: [], “args”: []}}}, “desiredCapabilities”: {“browserName”: “chrome”, “version”: “”, “platform”: “ANY”, “goog:chromeOptions”: {“extensions”: [], “args”: []}}} 2019-04-13 23:07:57 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 127.0.0.1:53584
DevTools listening on ws://127.0.0.1:53591/devtools/browser/0c0bdaf5-a82c-49ec-901d-fa3ee5822547
2019-04-13 23:07:59 [urllib3.connectionpool] DEBUG: http://127.0.0.1:53584 “POST /session HTTP/1.1” 200 996
2019-04-13 23:07:59 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2019-04-13 23:07:59 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:53584/session/b3cc85c75d6d1d37e96c0732f35b89e8/url {“url”: “https://www.zhihu.com/signin”, “sessionId”: “b3cc85c75d6d1d37e96c0732f35b89e8”}
2019-04-13 23:08:00 [urllib3.connectionpool] DEBUG: http://127.0.0.1:53584 “POST /session/b3cc85c75d6d1d37e96c0732f35b89e8/url HTTP/1.1” 200 72
2019-04-13 23:08:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2019-04-13 23:08:00 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:53584/session/b3cc85c75d6d1d37e96c0732f35b89e8/elements {“using”: “css selector”, “value”: “.SignFlow-accountInput. Input-wrapper input”, “sessionId”: “b3cc85c75d6d1d37e96c0732f35b89e8”}
2019-04-13 23:08:00 [urllib3.connectionpool] DEBUG: http://127.0.0.1:53584 “POST /session/b3cc85c75d6d1d37e96c0732f35b89e8/elements HTTP/1.1” 200 310
2019-04-13 23:08:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
Unhandled error in Deferred:
2019-04-13 23:08:00 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last):
File “C:\Users\01\Envs\article\lib\site-packages\scrapy\crawler.py”, line 172, in crawl
return self._crawl(crawler, *args, **kwargs)
File “C:\Users\01\Envs\article\lib\site-packages\scrapy\crawler.py”, line 176, in _crawl
d = crawler.crawl(*args, **kwargs)
File “C:\Users\01\Envs\article\lib\site-packages\twisted\internet\defer.py”, line 1613, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File “C:\Users\01\Envs\article\lib\site-packages\twisted\internet\defer.py”, line 1529, in _cancellableInlineCallbacks
_inlineCallbacks(None, g, status)
— —
File “C:\Users\01\Envs\article\lib\site-packages\twisted\internet\defer.py”, line 1418, in _inlineCallbacks
result = g.send(result)
File “C:\Users\01\Envs\article\lib\site-packages\scrapy\crawler.py”, line 81, in crawl
start_requests = iter(self.spider.start_requests())
File “F:\spider_pj\ArticleSpider\ArticleSpider\spiders\zhihu.py”, line 16, in start_requests
browser.find_elements_by_css_selector(".SignFlow-accountInput. Input-wrapper input").send_keys(“18073696767”)
File “C:\Users\01\Envs\article\lib\site-packages\selenium\webdriver\remote\webdriver.py”, line 614, in find_elements_by_css_selector
return self.find_elements(by=By.CSS_SELECTOR, value=css_selector)
File “C:\Users\01\Envs\article\lib\site-packages\selenium\webdriver\remote\webdriver.py”, line 1007, in find_elements
’value’: value})[‘value’] or []
File “C:\Users\01\Envs\article\lib\site-packages\selenium\webdriver\remote\webdriver.py”, line 321, in execute
self.error_handler.check_response(response)
File “C:\Users\01\Envs\article\lib\site-packages\selenium\webdriver\remote\errorhandler.py”, line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
(Session info: chrome=73.0.3683.103)
(Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Windows NT 10.0.10586 x86_64)
2019-04-13 23:08:00 [twisted] CRITICAL:
Traceback (most recent call last):
File “C:\Users\01\Envs\article\lib\site-packages\twisted\internet\defer.py”, line 1418, in _inlineCallbacks
result = g.send(result)
File “C:\Users\01\Envs\article\lib\site-packages\scrapy\crawler.py”, line 81, in crawl
start_requests = iter(self.spider.start_requests())
File “F:\spider_pj\ArticleSpider\ArticleSpider\spiders\zhihu.py”, line 16, in start_requests
browser.find_elements_by_css_selector(".SignFlow-accountInput. Input-wrapper input").send_keys(“18073696767”)
File “C:\Users\01\Envs\article\lib\site-packages\selenium\webdriver\remote\webdriver.py”, line 614, in find_elements_by_css_selector
return self.find_elements(by=By.CSS_SELECTOR, value=css_selector)
File “C:\Users\01\Envs\article\lib\site-packages\selenium\webdriver\remote\webdriver.py”, line 1007, in find_elements
’value’: value})[‘value’] or []
File “C:\Users\01\Envs\article\lib\site-packages\selenium\webdriver\remote\webdriver.py”, line 321, in execute
self.error_handler.check_response(response)
File “C:\Users\01\Envs\article\lib\site-packages\selenium\webdriver\remote\errorhandler.py”, line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
(Session info: chrome=73.0.3683.103)
(Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Windows NT 10.0.10586 x86_64)
带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎
了解课程