电子商务网站建设 教案,广州商务网站建设电话,优化关键词排名哪家好,动易网站后台编辑器无效问题文章目录 书籍推荐正则抓取腾讯动漫数据Flask展示数据 书籍推荐
如果你对Python网络爬虫感兴趣#xff0c;强烈推荐你阅读《Python网络爬虫入门到实战》。这本书详细介绍了Python网络爬虫的基础知识和高级技巧#xff0c;是每位爬虫开发者的必读之作。详细介绍见#x1f44… 文章目录 书籍推荐正则抓取腾讯动漫数据Flask展示数据 书籍推荐
如果你对Python网络爬虫感兴趣强烈推荐你阅读《Python网络爬虫入门到实战》。这本书详细介绍了Python网络爬虫的基础知识和高级技巧是每位爬虫开发者的必读之作。详细介绍见 《Python网络爬虫入门到实战》 书籍介绍
正则抓取腾讯动漫数据
import requests
import re
import threading
from queue import Queuedef format_html(html):li_pattern re.compile(li classret-search-item clearfix[\s\S]?/li)title_pattern re.compile(title(.*?))img_src_pattern re.compile(data-original(.*?))update_pattern re.compile(span classmod-cover-list-text(.*?)/span)tags_pattern re.compile(span href/Comic/all/theme/.*? target_blank(.*?)/span)popularity_pattern re.compile(span人气em(.*?)/em/span)items li_pattern.findall(html)for item in items:title title_pattern.search(item).group(1)img_src img_src_pattern.search(item).group(1)update_info update_pattern.search(item).group(1)tags tags_pattern.findall(item)popularity popularity_pattern.search(item).group(1)data_queue.put(f{title},{img_src},{update_info},{#.join(tags)},{popularity}\n)def run(index):try:headers {User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36}response requests.get(fhttps://ac.qq.com/Comic/index/page/{index}, headersheaders)html response.textformat_html(html)except Exception as e:print(fError occurred while processing page {index}: {e})finally:semaphore.release()if __name__ __main__:data_queue Queue()semaphore threading.BoundedSemaphore(5)lst_record_threads []for index in range(1, 3):print(f正在抓取{index})semaphore.acquire()t threading.Thread(targetrun, args(index,))t.start()lst_record_threads.append(t)for rt in lst_record_threads:rt.join()with open(./qq_comic_data.csv, a, encodinggbk) as f:while not data_queue.empty():f.write(data_queue.get())print(数据爬取完毕)
Flask展示数据
上面能够实现爬取数据但是我希望展示在前端。
main.py代码如下
# coding gbk
from flask import Flask, render_template
import csvapp Flask(__name__)def read_data_from_csv():with open(qq_comic_data.csv, r, encodingutf-8) as f:reader csv.reader(f)data list(reader)[1:] # 跳过标题行# 统一转换人气数据为浮点数单位亿for row in data:popularity row[4]if 亿 in popularity:row[4] float(popularity.replace(亿, ))elif 万 in popularity:row[4] float(popularity.replace(万, )) / 10000 # 将万转换为亿# 按人气排序并保留前10条记录data.sort(keylambda x: x[4], reverseTrue)return data[:10]app.route(/)
def index():comics read_data_from_csv()return render_template(index.html, comicscomics)if __name__ __main__:app.run(debugTrue)
templates/index.html如下
!DOCTYPE html
html langen
headmeta charsetUTF-8title漫画信息/titlestylebody {font-family: Arial, sans-serif;background-color: #f4f4f4;color: #333;line-height: 1.6;padding: 20px;}.container {width: 80%;margin: auto;overflow: hidden;}h1 {text-align: center;color: #333;}.comic {background: #fff;margin-bottom: 20px;padding: 15px;border-radius: 10px;box-shadow: 0 5px 10px rgba(0,0,0,0.1);}.comic h2 {margin-top: 0;}.comic p {line-height: 1.25;}.comic:nth-child(even) {background: #f9f9f9;}/style
/head
bodydiv classcontainerh1人气前10的漫画/h1{% for comic in comics %}div classcomich2{{ comic[0] }}/h2pstrong更新信息/strong{{ comic[2] }}/ppstrong类型/strong{{ comic[3] }}/ppstrong人气/strong{{ comic[4] }}/p/div{% endfor %}/div
/body
/html
效果如下