当前位置：首页 > news >正文

海外访问国内网站 dnswordpress 同步微信

news 2025/11/28 4:16:15

海外访问国内网站 dns,wordpress 同步微信,义乌便宜自适应网站建设厂家,软件开发技术管理系统研发豆瓣电影Top250 豆瓣榜单简介需求描述Python实现豆瓣榜单简介豆瓣电影 Top 250 榜单是豆瓣网站上列出的评分最高、受观众喜爱的电影作品。这个榜单包含了一系列优秀的影片#xff0c;涵盖了各种类型、不同国家和时期的电影。需求描述使用python爬取top250电影#xff… 豆瓣电影Top250 豆瓣榜单简介需求描述Python实现豆瓣榜单简介豆瓣电影 Top 250 榜单是豆瓣网站上列出的评分最高、受观众喜爱的电影作品。这个榜单包含了一系列优秀的影片涵盖了各种类型、不同国家和时期的电影。需求描述使用python爬取top250电影获取相应电影排名电影名星级打分和评论人数信息将信息输出到Excel表格中。 Python实现获取爬取网页 def download_all_htmls(index list(range(0, 250, 25))):htmls []for idx in index:url fhttps://movie.douban.com/top250?start{idx}filterprint(craw html:, url)# 豆瓣具有反爬虫机制添加headersheaders {User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36}r requests.get(url, headers headers)if r.status_code ! 200:raise Exception(error)htmls.append(r.text)return htmls解析得到单个网页内容 def parse_single_heml(html):soup BeautifulSoup(html, html.parser)article_items soup.find(div, class_article)\.find(ol, class_grid_view)\.find_all(div, class_item)datas []for article_item in article_items:rank article_item.find(div, class_pic).find(em).get_text()info article_item.find(div, class_info)title info.find(div, class_hd).find(span, class_title).get_text()stars info.find(div, class_bd).find(div, class_star).find_all(span)rating_star stars[0][class][0]rating_num stars[1].get_text()comments stars[3].get_text()datas.append({rank: rank,title: title,rating_star: rating_star.replace(rating,).replace(-t,),rating_num: rating_num,comments: comments.replace(人评价, )})return datas爬取相关内容并将结果写入Excel import requests from bs4 import BeautifulSoup import pandas as pd import pprint import jsonhtmls download_all_htmls() all_datas [] for html in htmls:all_datas.extend(parse_single_heml(html)) df pd.DataFrame(all_datas) df.to_excel(practice03_豆瓣电影top250.xlsx, indexFalse)结果展示

查看全文

http://www.dnsts.com.cn/news/231647.html