请稍等 ...

4-9调回函数没有执行

自己照着做了个爬取imdb数据的，按照视频中步骤调试时，逐语句时程序走到trackerf.py模块，逐过程程序直接结束。把断点打在两个parser_detail函数里，程序就根本没有停直接结束了，请问可能是哪里出了问题？
这是我的代码：
import scrapy
import re
from scrapy.http import Request
class ImdbSpider(scrapy.Spider):
name = 'imdb’
allowed_domains = [‘www.imdb,com’]
start_urls = [‘https://www.imdb.com/title/tt7605074/?ref_=fn_al_tt_1’]

def parse(self,response):
    faq_url="https://www.imdb.com/title/tt7605074/faq?ref_=tt_ql_op_2"
    userReview_url="https://www.imdb.com/title/tt7605074/reviews?ref_=tt_ql_3"
    yield Request(url=faq_url,callback=self.parse_detail_faq)
    yield Request(url=userReview_url,callback=self.parse_detail_Reviews)

#提取每个detail的具体字段   


def parse_detail_faq(self,response):
    #匹配FAQ问题
    q_list=response.xpath('//div[@class="faq-question-text"]/text()').extract()
    #可以提取出faq问题回答，但是其中混杂大量的html语言，需要数据清理
    f_list=response.css("div.ipl-hideable-container p").extract()
    #提取faq总数字段
    num=response.css("div.ipl-itemcount-header::text").extract()
    #将faq总数提取为纯数字
    re_match=re.match(".*?(d+).*",num[0].strip())
    faq_num=re_match.group(1)

    
    
def parse_detail_Reviews(self,response):
    #匹配user-review日期
    date_list=response.xpath('//span[@class="review-date"]/text()').extract()
    #用正则表达式匹配日期的日和月（纯数字）
    #此处数据不全，应该需要做循环翻页
    for date in date_list:
        result=re.match('^(d)(.*?)(d+)',date)
        if result:
            day=result.group(1)
            month=result.group(2)
            year=result.group(3)
    #与有限的评论日期相对应的评论内容，不全.内含html内容，数据需要清洗
    text_list=response.xpath("//div[@class='text show-more__control']").extract()
    #需要ajax动态加载

用VS试了一下debug，逐语句时直接报错：提示在加载的模块中未找到当前堆栈帧。无法显示此位置的源代码。

慕粉9212132 2019-02-23 14:21:01

1721

收起