采纳答案成功!
向帮助你的同学说点啥吧!感谢那些助人为乐的人
发现淘吧不仅加了反爬还要求用户登录信息。这个要怎么解决,希望Git里也改一下
目前最直接的方式是在请求头中添加cookie,参考代码如下:
import requests import re import json def spider_tb(sn ,book_list=[]): url = 'https://s.taobao.com/search?q={0}'.format(sn) headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36', 'cookie': 'thw=cn; cna=z3E0FOmdTw4CAQ4XdGQE6YAy; tg=0; enc=MdX0qWENhimjq1%2BByON6LaVBpleip6difMPMIemeQbw6JG7iOZQlFVQzh%2F4dnksxjBJHTsVHh0kl%2Bb5neFA9og%3D%3D; miid=261071241481983551; hng=CN%7Czh-CN%7CCNY%7C156; t=cafb78af76aad054fbb999d341655a77; x=e%3D1%26p%3D*%26s%3D0%26c%3D0%26f%3D0%26g%3D0%26t%3D0%26__ll%3D-1%26_ato%3D0; uc3=lg2=WqG3DMC9VAQiUQ%3D%3D&vt3=F8dByuHfZxv77UjOKZQ%3D&id2=Vve0Toj4LKhI&nk2=F5RGNwsJycwH; lgc=tb3136007; uc4=nk4=0%40FY4NAq%2FuJVjSd2PUqumOwBl%2Fcdk%3D&id4=0%40VHHMgZFEaNn%2BKA6xHNRnzP7RF9U%3D; tracknick=tb3136007; _cc_=UtASsssmfA%3D%3D; v=0; mt=ci=-1_0; cookie2=5c574b0ae2462ee55f2512374cb733f2; _tb_token_=ef68353170ed7; uc1=cookie14=UoTbnKFWr40EAA%3D%3D; isg=BFJSDlw4W0tJbaGvpvkBeLuJthg-YlOj7yVCKxyrfoXwL_IpBPOmDVjNm4sTX86V; l=dBL0NVhrvx5T8m_bBOCi-ZaU8qbOSLAAguWbLfXei_5I86L__N_OkgyorFp6VjWfGb8B4AytjDw9-etuiYZNApDgcGAw_xDc.' } # 获取html内容 text = requests.get(url, headers=headers).text # 使用正则表达式找到json对象 p = re.compile(r'g_page_config = (\{.+\});\s*', re.M) rest = p.search(text) if rest: print(rest.group(1)) data = json.loads(rest.group(1)) bk_list = data['mods']['itemlist']['data']['auctions'] print (len (bk_list)) for bk in bk_list: #标题 title = bk["raw_title"] print(title) #价格 price = bk["view_price"] print(price) #购买链接 link = bk["detail_url"] print(link) #商家 store = bk["nick"] print(store) book_list.append({ 'title' : title, 'price' : price, 'link' : link, 'store' : store }) print ('{title}:{price}:{link}:{store}'.format( title = title, price = price, link = link, store = store )) if __name__ == '__main__': spider_tb('9787115428028')
当然,cookie从何而来,是个问题,有多重解决方案:
通过浏览器登录淘宝,通过F12快捷键找到cookie填入。
同学可以准备多个淘宝账号,模拟登陆淘宝,组成自己的cookie池,实现爬虫。
登录后可查看更多问答,登录/注册
学会项目开发思路,掌握Python高阶用法。
892 28
890 16
787 16
767 12
739 11