南通电商网站建设,网站转化率是什么意思,手机商城在哪里找到,注册博客域名做视频网站会怎么样1、requests基本使用
1.1、requests介绍
requests是python中一个常用于发送HTTP请求的第三方库#xff0c;它极大地简化了web服务交互的过程。它是唯一的一个非转基因的python HTTP库#xff0c;人类可以安全享用。
1.2、requests库的安装 pip install -i https://pypi.tu…1、requests基本使用
1.1、requests介绍
requests是python中一个常用于发送HTTP请求的第三方库它极大地简化了web服务交互的过程。它是唯一的一个非转基因的python HTTP库人类可以安全享用。
1.2、requests库的安装 pip install -i https://pypi.tuan.tsinghua.edu.cn/simple requests 1.3、requests基础语法
import requests
url http://www.baidu.com
response requests.get(url) 1.4、response的属性以及类型
1一个类型
print(type(response)) # class requests.models.Response
2六个属性
# 是指相应的编码格式
response.encoding utf-8
# 以字符串形式返回网页源码
print(response.text)
# 获取请求头
print(response.url)
# 返回二进制数据
print(response.content)
# 返回状态码信息
print(response.status_code)
# 获取响应头信息
print(response.headers) 2、requests的get请求
爬取郑州页面信息和urllib基本差不多只要明白urllib相信requests的get请求也不会有什么难度。
import requests
url https://www.baidu.com/s?
headers {user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36
}
data {wd:郑州
}
# url 请求资源路径 params 参数 # kwargs 字典
response requests.get(urlurl,paramsdata,headersheaders)
content response.text
print(content)与urllib的get请求区别 1、参数需要使用params传递 2、参数无需urlencode 3、不需要请求对象的定制 4、请求资源路径中的?可以省略 3、requests的post请求
我们还是以之前urllib中关于post请求-百度翻译为例
import requests
url https://fanyi.baidu.com/sug
headers {user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36,cookie:BIDUPSID91AC5A2A82E26F50448A070917943E70; PSTM1732629509; BAIDUID91AC5A2A82E26F50448A070917943E70:FG1; BDUSS_BFESSE1IcjZ0NVRodGlNNjJaNFdXNUZQVjVsZE04eW5iaVdOSXkzQ3BDRkcxVndMbkpuRUFBQUFBJCQAAAAAAQAAAAEAAABYaMgfAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAHChSmdwoUpne; BAIDUID_BFESS91AC5A2A82E26F50448A070917943E70:FG1; ZFY0L:BrFXMz3oPPSIl2WrbINbmdK4f2nDwQtL:Bfl6za7PM:C; BDRCVFR[l9-IMhu-BDf]mk3SLVN4HKm; delPer0; H_PS_PSSID61027_61099_61217_61280_61298_61246_60853; BDRCVFR[feWj1Vr5u3D]I67x6TjHwwYf0; BDORZFFFB88E999055A3F8A630C64834BD6D0; H_WISE_SIDS61027_61099_61217_61280_61298_61246_60853; PSINO1; BA_HECTORa58l2h24a121a1808ka48g213kh3u01jlb88s1u; BCLID10763796247062205483; BCLID_BFESS10763796247062205483; BDSFRCVIDrvFOJexroG3B_xQJosAdbCbKXuweG7bTDYrEOwXPsp3LGJLVdLE8EG0Pts1-dEu-S2OOogKKBeOTHn0F_2uxOjjg8UtVJeC6EG0Ptf8g0M5; BDSFRCVID_BFESSrvFOJexroG3B_xQJosAdbCbKXuweG7bTDYrEOwXPsp3LGJLVdLE8EG0Pts1-dEu-S2OOogKKBeOTHn0F_2uxOjjg8UtVJeC6EG0Ptf8g0M5; H_BDCLCKID_SFtbkD_C-MfIvhDRTvhCcjh-FSMgTBKI62aKDsoJ71BhcqJ-ovQpJmjU4ByRnkBJoa0Krihn6cWKJJ8UbeWfvp3t_D-tuH3lLHQJnph66dah5nhMJmBp_VhfL3qtCOaJby523i5J5vQpn_hhQ3DRoWXPIqbN7P-p5Z5mAqKl0MLPbtbb0xXj_0DTbLjH8jqTntaD5yWj6JanTjjTrFbKTjhPrML4tJWMT-MTryKM3xJh7-Ox7Xy4nDLPDUWMciB5OMBanRhlRNQRjVHqI4Lq_K360ZWec72MQxtNRJMMKEal5MKqF9MRJobUPULxo9LUvXtgcdot5yBbc8eIna5hjkbfJBQttjQn3hfIkj2CKLfC-aMCt6eno_Mt4HqfbQa4JWHDQbsJOOaCvDSqQOy4oTj6D05-TRbMRZXa5ZaRonKqviEP8RW4r_3MvB-fnyKMIJye3CBItbtbr5ol6KQft20-DAeMtjBbLLfNTtVn7jWhvIeq72y-I2QlRX5q79atTMfNTJ-qcH0KQpsIJM5-DWbT8EjHCDJ5kDtJuHVbobHJoHjJbGq4bohjPX54j9BtQO-DOxoho7MUjkDPOqb-5T-xPR5qJ-05baQgnkQq5vbMnmqPtRXMJkXhKOX-_O0x-jLTneo66e34KVVIoOXPnJyUPYbtnnBPCj3H8HL4nv2JcJbM5m3x6qLTKkQN3T-PKO5bRu_CcJ-J8XMD89jTbP; H_BDCLCKID_SF_BFESStbkD_C-MfIvhDRTvhCcjh-FSMgTBKI62aKDsoJ71BhcqJ-ovQpJmjU4ByRnkBJoa0Krihn6cWKJJ8UbeWfvp3t_D-tuH3lLHQJnph66dah5nhMJmBp_VhfL3qtCOaJby523i5J5vQpn_hhQ3DRoWXPIqbN7P-p5Z5mAqKl0MLPbtbb0xXj_0DTbLjH8jqTntaD5yWj6JanTjjTrFbKTjhPrML4tJWMT-MTryKM3xJh7-Ox7Xy4nDLPDUWMciB5OMBanRhlRNQRjVHqI4Lq_K360ZWec72MQxtNRJMMKEal5MKqF9MRJobUPULxo9LUvXtgcdot5yBbc8eIna5hjkbfJBQttjQn3hfIkj2CKLfC-aMCt6eno_Mt4HqfbQa4JWHDQbsJOOaCvDSqQOy4oTj6D05-TRbMRZXa5ZaRonKqviEP8RW4r_3MvB-fnyKMIJye3CBItbtbr5ol6KQft20-DAeMtjBbLLfNTtVn7jWhvIeq72y-I2QlRX5q79atTMfNTJ-qcH0KQpsIJM5-DWbT8EjHCDJ5kDtJuHVbobHJoHjJbGq4bohjPX54j9BtQO-DOxoho7MUjkDPOqb-5T-xPR5qJ-05baQgnkQq5vbMnmqPtRXMJkXhKOX-_O0x-jLTneo66e34KVVIoOXPnJyUPYbtnnBPCj3H8HL4nv2JcJbM5m3x6qLTKkQN3T-PKO5bRu_CcJ-J8XMD89jTbP; ab_sr1.0.1_ZmQ5MTQ5YzBmNGJkNTY1NzMwMDMyZDljNDI4ZDNmNDk2YjBiOTJiOTkyNTYwZDEwYWM1MTAyNDliM2IwZjQxNmFmYmQxZGJmZDI0MDI5YmViZDIwYzIwMDVkZmMxNjljNGEzNzQ5MTYyOWY5MzVmMTgxZTQxOGY4YzFhMTk3YWRiNGQ0NGI3Y2M1NjhjOGEyMTE1MDU1N2M1MDI2OWVjMg; RTz1dmbaidu.comsi683d19d9-ec4a-4ee1-ba25-d45da6aaef7fssm4fnfeojsl3ttb6obcnhttps%3A%2F%2Ffclog.baidu.com%2Flog%2Fweirwood%3Ftype%3Dperfldruw
}
data {kw:eye
}
response requests.post(urlurl, headersheaders, datadata)
content response.text
import json
content json.loads(content)
print(content)
与urllib的post请求的区别 1、post请求不需要编解码 2、post请求的参数是data 3、不需要请求对象的定制 4、代理
import requests
url http://www.baidu.com/s?
headers {# accept:text/html,application/xhtmlxml,application/xml;q0.9,image/avif,image/webp,image/apng,*/*;q0.8,application/signed-exchange;vb3;q0.7,user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36,# cookie:BIDUPSID91AC5A2A82E26F50448A070917943E70; PSTM1732629509; BAIDUID91AC5A2A82E26F50448A070917943E70:FG1; BD_UPN12314753; BDUSS_BFESSE1IcjZ0NVRodGlNNjJaNFdXNUZQVjVsZE04eW5iaVdOSXkzQ3BDRkcxVndMbkpuRUFBQUFBJCQAAAAAAQAAAAEAAABYaMgfAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAHChSmdwoUpne; BAIDUID_BFESS91AC5A2A82E26F50448A070917943E70:FG1; ZFY0L:BrFXMz3oPPSIl2WrbINbmdK4f2nDwQtL:Bfl6za7PM:C; B64_BOT1; BDRCVFR[l9-IMhu-BDf]mk3SLVN4HKm; delPer0; BD_CK_SAM1; H_PS_PSSID61027_61099_61217_61280_61298_61246_60853; shifen[8451320_53724]1733557849; shifen[304792146112_6039]1733557876; BDRCVFR[feWj1Vr5u3D]I67x6TjHwwYf0; BDORZFFFB88E999055A3F8A630C64834BD6D0; H_WISE_SIDS61027_61099_61217_61280_61298_61246_60853; BA_HECTORa58l2h24a121a1808ka48g213kh3u01jlb88s1u; shifen[8332037_91638]1733665082; BCLID10763796247062205483; BCLID_BFESS10763796247062205483; BDSFRCVIDrvFOJexroG3B_xQJosAdbCbKXuweG7bTDYrEOwXPsp3LGJLVdLE8EG0Pts1-dEu-S2OOogKKBeOTHn0F_2uxOjjg8UtVJeC6EG0Ptf8g0M5; BDSFRCVID_BFESSrvFOJexroG3B_xQJosAdbCbKXuweG7bTDYrEOwXPsp3LGJLVdLE8EG0Pts1-dEu-S2OOogKKBeOTHn0F_2uxOjjg8UtVJeC6EG0Ptf8g0M5; H_BDCLCKID_SFtbkD_C-MfIvhDRTvhCcjh-FSMgTBKI62aKDsoJ71BhcqJ-ovQpJmjU4ByRnkBJoa0Krihn6cWKJJ8UbeWfvp3t_D-tuH3lLHQJnph66dah5nhMJmBp_VhfL3qtCOaJby523i5J5vQpn_hhQ3DRoWXPIqbN7P-p5Z5mAqKl0MLPbtbb0xXj_0DTbLjH8jqTntaD5yWj6JanTjjTrFbKTjhPrML4tJWMT-MTryKM3xJh7-Ox7Xy4nDLPDUWMciB5OMBanRhlRNQRjVHqI4Lq_K360ZWec72MQxtNRJMMKEal5MKqF9MRJobUPULxo9LUvXtgcdot5yBbc8eIna5hjkbfJBQttjQn3hfIkj2CKLfC-aMCt6eno_Mt4HqfbQa4JWHDQbsJOOaCvDSqQOy4oTj6D05-TRbMRZXa5ZaRonKqviEP8RW4r_3MvB-fnyKMIJye3CBItbtbr5ol6KQft20-DAeMtjBbLLfNTtVn7jWhvIeq72y-I2QlRX5q79atTMfNTJ-qcH0KQpsIJM5-DWbT8EjHCDJ5kDtJuHVbobHJoHjJbGq4bohjPX54j9BtQO-DOxoho7MUjkDPOqb-5T-xPR5qJ-05baQgnkQq5vbMnmqPtRXMJkXhKOX-_O0x-jLTneo66e34KVVIoOXPnJyUPYbtnnBPCj3H8HL4nv2JcJbM5m3x6qLTKkQN3T-PKO5bRu_CcJ-J8XMD89jTbP; H_BDCLCKID_SF_BFESStbkD_C-MfIvhDRTvhCcjh-FSMgTBKI62aKDsoJ71BhcqJ-ovQpJmjU4ByRnkBJoa0Krihn6cWKJJ8UbeWfvp3t_D-tuH3lLHQJnph66dah5nhMJmBp_VhfL3qtCOaJby523i5J5vQpn_hhQ3DRoWXPIqbN7P-p5Z5mAqKl0MLPbtbb0xXj_0DTbLjH8jqTntaD5yWj6JanTjjTrFbKTjhPrML4tJWMT-MTryKM3xJh7-Ox7Xy4nDLPDUWMciB5OMBanRhlRNQRjVHqI4Lq_K360ZWec72MQxtNRJMMKEal5MKqF9MRJobUPULxo9LUvXtgcdot5yBbc8eIna5hjkbfJBQttjQn3hfIkj2CKLfC-aMCt6eno_Mt4HqfbQa4JWHDQbsJOOaCvDSqQOy4oTj6D05-TRbMRZXa5ZaRonKqviEP8RW4r_3MvB-fnyKMIJye3CBItbtbr5ol6KQft20-DAeMtjBbLLfNTtVn7jWhvIeq72y-I2QlRX5q79atTMfNTJ-qcH0KQpsIJM5-DWbT8EjHCDJ5kDtJuHVbobHJoHjJbGq4bohjPX54j9BtQO-DOxoho7MUjkDPOqb-5T-xPR5qJ-05baQgnkQq5vbMnmqPtRXMJkXhKOX-_O0x-jLTneo66e34KVVIoOXPnJyUPYbtnnBPCj3H8HL4nv2JcJbM5m3x6qLTKkQN3T-PKO5bRu_CcJ-J8XMD89jTbP; ab_sr1.0.1_ZmQ5MTQ5YzBmNGJkNTY1NzMwMDMyZDljNDI4ZDNmNDk2YjBiOTJiOTkyNTYwZDEwYWM1MTAyNDliM2IwZjQxNmFmYmQxZGJmZDI0MDI5YmViZDIwYzIwMDVkZmMxNjljNGEzNzQ5MTYyOWY5MzVmMTgxZTQxOGY4YzFhMTk3YWRiNGQ0NGI3Y2M1NjhjOGEyMTE1MDU1N2M1MDI2OWVjMg; RTz1dmbaidu.comsi683d19d9-ec4a-4ee1-ba25-d45da6aaef7fssm4fnfeojsl4ttcn1bcnhttps%3A%2F%2Ffclog.baidu.com%2Flog%2Fweirwood%3Ftype%3Dperfldwmjulo4bdhdo4c0; PSINO7; sugstore1; H_PS_645ECe2c20yk9RoanWFIVyDJbr18JC5dzOzNojiUaPy0JXsXtSzcOKsks5N3IUyetiaDn7Vsq5ZY; baikeVisitId1d823dea-39eb-4e63-978d-65fd09a0d697; COOKIE_SESSION81376_0_6_6_7_3_1_0_6_3_205_1_111167_0_0_0_1733584849_0_1733666222%7C9%2379969_3_1733137574%7C2
}
data {wd:ip
}
# 代理池
proxy{http:23.247.137.142:80
}
response requests.get(urlurl,paramsdata,headersheaders,proxiesproxy)
content response.text
file open(ip.html,w,encodingutf-8)
file.write(content)
file.close()
5、cookie登录
我们以古诗文个人主页页面为例子含有验证码。 首先我们进入登陆界面后搜遍输入密码然后打开开发者模式看到login接口看负载payload里面有许多信息。 __VIEWSTATE:MnTNH2SbI9isHX8zdfu1NvmByZXoSVf8Vxj5QIeJ5C8EmgWhaBFQRNjQYMe47EqOOss1LSDNdjYeNRy/bdvD7wktgbMm73Cku21k7NhLMYo79CC54kuz//cZ9kSLKKFvkpppzOssnyET3GX789uH1DMUM __VIEWSTATEGENERATOR: C93BE1AE 这两个信息不固定是变量而code也是变量。因此解决这三个变量就是这个例子的难点 难点1__VIEWSTATE __VIEWSTATEGENERATOR 我们回到登陆页面检查源代码发现里面是有这两个变量的。而hidden我们称之为隐藏域。
获取登录页面源码
import requests
url https://www.gushiwen.cn/user/login.aspx?fromhttp://www.gushiwen.cn/user/collect.aspx
headers {user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36
}
response requests.get(url, headersheaders)
content response.text
解析__VIEWSTATE __VIEWSTATEGENERATOR两个变量的value可以通过beautifulsoup语法也可用通过xpath
from lxml import etree
tree etree.HTML(content)
__VIEWSTATE tree.xpath(//input[name__VIEWSTATE]/value)
__VIEWSTATEGENERATOR tree.xpath(//input[name__VIEWSTATEGENERATOR]/value)
print(__VIEWSTATE)
print(__VIEWSTATEGENERATOR) 难点2code验证码获取验证码图片
code tree.xpath(//img[idimgCode]/src)[0]
code_url https://so.gushiwen.cncode
获取了验证码图片后下载到本地观察验证码然后在控制台输入即可当然也可以用pytesseract来识别数字
import urllib.request
urllib.request.urlretrieve(urlcode_url,filenamecode.jpg)
code_name input(请输入验证码:)
但这种方法显然是有问题的只有我们输入验证码后才会生成新的验证码也就是说这个时候我们输入的验证码是旧的验证码。因此我们可以用requests库中的session方法通过session的返回值是请求变成一个对象。
session requests.session()
response_code session.get(code_url)
content_code response_code.content # 此时要使用二进制数据因为使用的图片的下载
f open(code.jpg,wb) # wb的模式就是将二进制数据写入到文件
f.write(content_code)
f.close()
code_name input(请输入验证码:)
抓取登录按钮的接口
url_post https://www.gushiwen.cn/user/login.aspx?fromhttp%3a%2f%2fwww.gushiwen.cn%2fuser%2fcollect.aspx
data_post {__VIEWSTATE: viewstate,__VIEWSTATEGENERATOR: viewstategenerator,from: http://www.gushiwen.cn/user/collect.aspx,email: 17719114890,pwd: dwq0219423,code: code_name,denglu: 登录
}
response_post session.post(urlurl_post, headersheaders, datadata_post)
content_post response_post.text
f open(古诗文.html,w,encodingutf-8)
f.write(content_post)
完整代码如下
import requests
url https://www.gushiwen.cn/user/login.aspx?fromhttp://www.gushiwen.cn/user/collect.aspx
headers {user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36
}
response requests.get(url, headersheaders)
content response.text
from lxml import etree
tree etree.HTML(content)
viewstate tree.xpath(//input[name__VIEWSTATE]/value)[0]
viewstategenerator tree.xpath(//input[name__VIEWSTATEGENERATOR]/value)[0]
code tree.xpath(//img[idimgCode]/src)[0]
code_url https://so.gushiwen.cncode
session requests.session()
response_code session.get(code_url)
content_code response_code.content # 此时要使用二进制数据因为使用的图片的下载
f open(code.jpg,wb) # wb的模式就是将二进制数据写入到文件
f.write(content_code)
f.close()
code_name input(请输入验证码:)
url_post https://www.gushiwen.cn/user/login.aspx?fromhttp%3a%2f%2fwww.gushiwen.cn%2fuser%2fcollect.aspx
data_post {__VIEWSTATE: viewstate,__VIEWSTATEGENERATOR: viewstategenerator,from: http://www.gushiwen.cn/user/collect.aspx,email: 17719114890,pwd: dwq0219423,code: code_name,denglu: 登录
}
response_post session.post(urlurl_post, headersheaders, datadata_post)
content_post response_post.text
f open(古诗文.html,w,encodingutf-8)
f.write(content_post)