使用请求维护 Web 抓取会话
保持网络抓取会话以保持 cookie 和其他参数是个好主意。此外,由于 requests.Session
将底层 TCP 连接重用于主机,因此可以提高性能 :
import requests
with requests.Session() as session:
# all requests through session now have User-Agent header set
session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'}
# set cookies
session.get('http://httpbin.org/cookies/set?key=value')
# get cookies
response = session.get('http://httpbin.org/cookies')
print(response.text)