使用python抓取CSDN热榜文章数据分析最新技术动向

举报
ImAlex 发表于 2024/10/29 18:41:13 2024/10/29
【摘要】 本文通过分析CSDN热榜文章列表,了解哪些技术正在快速发展,哪些问题在开发者中引起广泛讨论,从而为学习和研究提供重要参考。

本文借助爬虫来分析哪些技术正在快速发展,哪些问题在开发者中引起广泛讨论,从而为学习和研究提供重要参考。

一、爬取目标

在当今快速变化的技术环境中,获取最新的行业信息和技术趋势至关重要。CSDN作为中国最大的IT技术社区之一,汇聚了大量的技术文章和开发者分享的经验。我们的目标是抓取CSDN热榜文章数据,以便获取当前技术社区中最受关注的内容和趋势。这些数据将帮助我们了解哪些技术正在快速发展,哪些问题在开发者中引起广泛讨论,从而为学习和研究提供重要参考。

文本涉及的思路和代码仅用于学习和研究之用,并遵循CSDN网站的使用协议。

二、代码环境

以下是本文相关的代码环境依赖和配置。

2.1 编程语言

  • Python:选择Python作为编程语言是因为其强大的库支持和简洁的语法,特别适用于数据抓取和处理任务。。

2.2 三方库

  • Requests:用于发送HTTP请求,获取网页内容。它简单易用,支持各种HTTP方法,是Python中进行网络请求的首选库。
  • BeautifulSoup:用于解析HTML和XML文档,提取网页中的数据。BeautifulSoup提供了多种解析器,能够轻松处理复杂的HTML结构。
  • lxml:一个高性能的XML和HTML解析库,基于libxml2和libxslt库,可以高效地处理大型文档,适用于需求复杂的网页信息提取。

可以使用以下命令安装所需的Python库:

pip install requests beautifulsoup4 pandas

2.3 环境配置

  • Python版本:推荐使用Python 3.7及以上版本,以确保兼容性和性能优化。
  • IDE设置:建议使用PyCharm或者VSCode等集成开发环境,这些工具提供了良好的代码编辑、调试和运行支持。
    • PyCharm:提供强大的代码补全、调试和项目管理功能,非常适合大型项目开发。
    • VSCode:轻量级的编辑器,具有丰富的插件支持,可以根据需要灵活配置。

三、代码实战

3.1 接口分析

CSDN全站综合热榜页地址:https://blog.csdn.net/rank/list。使用Chrome浏览器访问该网页,按下F12键调出开发者工具,如下图切换到网络功能,然后选择Fetch/XHR
image.png

上述准备准备工作完成之后,再次刷新该网页。如下图所示,可以看到有一个名为hot-rank的请求。
image.png

3.2 接口参数分析

点击hot-rank请求,可以看到该请求的详细信息。
image.png

如上图所示,可以看到完整的请求地址为https://blog.csdn.net/phoenix/web/blog/hot-rank?page=0&pageSize=25&type=
image.png

如上图所示,切换到响应页,可以看到该接口的响应结果,如下所示,可以看到一共有25条热榜文章的信息,主要包括当前热榜周期,热榜热度,作者昵称,作者用户名,文章地址,阅读点赞评论数量等相关信息。

{
    "code": 200,
    "message": "success",
    "traceId": "8d606612-0994-479f-96a0-2629e91fb57d",
    "data": [
        {
            "period": "2024-10-29-12",
            "hotRankScore": "21673",
            "pcHotRankScore": "2.2w",
            "loginUserIsFollow": false,
            "nickName": "平凡程序猿~",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/2ee6469786394599b70d5357c55e4e4a_2302_81410974.jpg!1",
            "userName": "2302_81410974",
            "articleTitle": "深入探索:深度学习在时间序列预测中的强大应用与实现",
            "articleDetailUrl": "https://blog.csdn.net/2302_81410974/article/details/143285977",
            "commentCount": "75",
            "favorCount": "90",
            "viewCount": "1151",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/b229b47116c7421b9487dc4d9640b8d8.jpeg"
            ],
            "isNew": null,
            "productId": "143285977",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "19924",
            "pcHotRankScore": "2.0w",
            "loginUserIsFollow": true,
            "nickName": "诡异森林。",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/1071f5d6f89d42d6ae79f0b04ab9ac4d_m0_74068921.jpg!1",
            "userName": "m0_74068921",
            "articleTitle": "Docker:技术架构的演进之路",
            "articleDetailUrl": "https://blog.csdn.net/m0_74068921/article/details/143252466",
            "commentCount": "64",
            "favorCount": "92",
            "viewCount": "1914",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/a473a9ccc3444e92b4bd76e3ab238c54.png"
            ],
            "isNew": null,
            "productId": "143252466",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "19525",
            "pcHotRankScore": "2.0w",
            "loginUserIsFollow": true,
            "nickName": "程序猿进阶",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/6f2e9637836e45a884cab077791f1478_zhengzhaoyang122.jpg!1",
            "userName": "zhengzhaoyang122",
            "articleTitle": "web3.0 开发实践",
            "articleDetailUrl": "https://blog.csdn.net/zhengzhaoyang122/article/details/143272424",
            "commentCount": "67",
            "favorCount": "71",
            "viewCount": "2063",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/2e02bd7403b643048df326467be6dc30.png"
            ],
            "isNew": null,
            "productId": "143272424",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "18719",
            "pcHotRankScore": "1.9w",
            "loginUserIsFollow": false,
            "nickName": "nantangyuxi",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/bcce2794bbfb446180c3734639950b78_xiaoxingkongyuxi.jpg!1",
            "userName": "xiaoxingkongyuxi",
            "articleTitle": "MATLAB实现基于CNN-BiLSTM卷积双向长短期记忆神经网络的时间序列预测",
            "articleDetailUrl": "https://blog.csdn.net/xiaoxingkongyuxi/article/details/143263563",
            "commentCount": "1",
            "favorCount": "22",
            "viewCount": "801",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/5f55baeb85aa4c2b93c19fdd7ec1ea7d.jpeg"
            ],
            "isNew": null,
            "productId": "143263563",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "17018",
            "pcHotRankScore": "1.7w",
            "loginUserIsFollow": false,
            "nickName": "虞书欣的6",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/8f9e3e31879a4c1f93bc7caca17f3ffc_cxh666888_.jpg!1",
            "userName": "cxh666888_",
            "articleTitle": "Python小游戏14——雷霆战机",
            "articleDetailUrl": "https://blog.csdn.net/cxh666888_/article/details/143270011",
            "commentCount": "2",
            "favorCount": "32",
            "viewCount": "1959",
            "hotComment": null,
            "picList": [
                "https://img-blog.csdnimg.cn/adc77680bd174e358868931dd9720f49.jpg"
            ],
            "isNew": null,
            "productId": "143270011",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "14736",
            "pcHotRankScore": "1.5w",
            "loginUserIsFollow": true,
            "nickName": "小ᶻZ࿆",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/fcce75c8147f48338d288437f2ce7f8e_2201_75539691.jpg!1",
            "userName": "2201_75539691",
            "articleTitle": "【AIGC】从CoT到BoT:AGI推理能力提升24%的技术变革如何驱动ChatGPT未来发展",
            "articleDetailUrl": "https://blog.csdn.net/2201_75539691/article/details/143277081",
            "commentCount": "110",
            "favorCount": "100",
            "viewCount": "1286",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/392caf61274b4c2b87aec87b7cec5b09.jpeg"
            ],
            "isNew": null,
            "productId": "143277081",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "12787",
            "pcHotRankScore": "1.3w",
            "loginUserIsFollow": true,
            "nickName": "2的n次方_",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/06acfd433ab0467f918983afeee463dd_2202_76097976.jpg!1",
            "userName": "2202_76097976",
            "articleTitle": "【Spring MVC】请求参数的传递",
            "articleDetailUrl": "https://blog.csdn.net/2202_76097976/article/details/143088438",
            "commentCount": "84",
            "favorCount": "80",
            "viewCount": "1353",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/ec823407c72f4d1eba176532fa523a60.png"
            ],
            "isNew": null,
            "productId": "143088438",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "12517",
            "pcHotRankScore": "1.3w",
            "loginUserIsFollow": true,
            "nickName": "程序边界",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/ea275d892df44c5fb722c5756f8ba98b_qq_32682301.jpg!1",
            "userName": "qq_32682301",
            "articleTitle": "AIGC时代的数据盛宴:R语言引领数据分析新风尚",
            "articleDetailUrl": "https://blog.csdn.net/qq_32682301/article/details/143290891",
            "commentCount": "40",
            "favorCount": "56",
            "viewCount": "738",
            "hotComment": null,
            "picList": [
                "https://img-blog.csdnimg.cn/img_convert/1556a4ddabef4ac5b908fba2a47d3eee.png"
            ],
            "isNew": null,
            "productId": "143290891",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "12246",
            "pcHotRankScore": "1.2w",
            "loginUserIsFollow": true,
            "nickName": "小林熬夜学编程",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/54f236b0001841bd8615c85217b2ec93_2201_75584283.jpg!1",
            "userName": "2201_75584283",
            "articleTitle": "【Linux系统编程】第三十八弹---信号世界探索:从生活到技术的全面解析",
            "articleDetailUrl": "https://blog.csdn.net/2201_75584283/article/details/142580264",
            "commentCount": "91",
            "favorCount": "104",
            "viewCount": "592",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/9f6ddbb1af0440a09d928581b47a29f2.png"
            ],
            "isNew": null,
            "productId": "142580264",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "12180",
            "pcHotRankScore": "1.2w",
            "loginUserIsFollow": false,
            "nickName": "小码农叔叔",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/2a38fa4448434cf9be6818697cdaf229_zhangcongyi420.jpg!1",
            "userName": "zhangcongyi420",
            "articleTitle": "【大数据】Flink + Kafka 实现通用流式数据处理详解",
            "articleDetailUrl": "https://blog.csdn.net/zhangcongyi420/article/details/143027849",
            "commentCount": "146",
            "favorCount": "88",
            "viewCount": "2092",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/10919f87476e4db59d559260198a1925.png"
            ],
            "isNew": null,
            "productId": "143027849",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "12115",
            "pcHotRankScore": "1.2w",
            "loginUserIsFollow": true,
            "nickName": "Kwan的解忧杂货铺@新空间代码工作室",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/7b22c422c2df41c9aa22ff208e9cb96d_qyj19920704.jpg!1",
            "userName": "qyj19920704",
            "articleTitle": "本地Docker部署开源WAF雷池并实现异地远程登录管理界面",
            "articleDetailUrl": "https://blog.csdn.net/qyj19920704/article/details/143309113",
            "commentCount": "85",
            "favorCount": "64",
            "viewCount": "931",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/78b08acbdf86436f868d0b827fc038ca.jpeg"
            ],
            "isNew": null,
            "productId": "143309113",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "11432",
            "pcHotRankScore": "1.1w",
            "loginUserIsFollow": false,
            "nickName": "Mr.Winter`",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/055065db1f8b4d4bb2b90751efcd4e07_frigidwinter.jpg!1",
            "userName": "FRIGIDWINTER",
            "articleTitle": "轨迹规划 | 基于差速运动学的有模型PID算法(附ROS C++仿真)",
            "articleDetailUrl": "https://blog.csdn.net/FRIGIDWINTER/article/details/143258906",
            "commentCount": "28",
            "favorCount": "45",
            "viewCount": "1122",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/8baf090b363a46b982678d5dee8fd3c5.gif"
            ],
            "isNew": null,
            "productId": "143258906",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "10661",
            "pcHotRankScore": "1.1w",
            "loginUserIsFollow": false,
            "nickName": "凯子坚持C",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/125c0a68206144a6a3869aeddf9a9cbc_2301_80863610.jpg!1",
            "userName": "2301_80863610",
            "articleTitle": "AI创作者与人类创作者的协作模式",
            "articleDetailUrl": "https://blog.csdn.net/2301_80863610/article/details/143305703",
            "commentCount": "83",
            "favorCount": "78",
            "viewCount": "1109",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/ce0c12726e154ac08a26f62929d6c837.gif"
            ],
            "isNew": null,
            "productId": "143305703",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "9709",
            "pcHotRankScore": "1.0w",
            "loginUserIsFollow": false,
            "nickName": "CoderJia_",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/3de6e5c240924736b5dda23e0d74cbd3_u014390502.jpg!1",
            "userName": "u014390502",
            "articleTitle": "重学SpringBoot3-Spring WebFlux之SSE服务器发送事件",
            "articleDetailUrl": "https://blog.csdn.net/u014390502/article/details/143275309",
            "commentCount": "42",
            "favorCount": "60",
            "viewCount": "1601",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/b41deddffd8d416fbb67e296f0143932.webp"
            ],
            "isNew": null,
            "productId": "143275309",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "9559",
            "pcHotRankScore": "1.0w",
            "loginUserIsFollow": false,
            "nickName": "颜淡慕潇",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/2febcd43c21442509a2bf41fa63d7b67_weixin_36755535.jpg!1",
            "userName": "weixin_36755535",
            "articleTitle": "【K8S系列】Kubernetes 中 Service IP 地址和端口不匹配问题及解决方案【已解决】",
            "articleDetailUrl": "https://blog.csdn.net/weixin_36755535/article/details/143271989",
            "commentCount": "39",
            "favorCount": "30",
            "viewCount": "1898",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/d49227ce7645436aad653e460cfbdd86.png"
            ],
            "isNew": null,
            "productId": "143271989",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8973",
            "pcHotRankScore": "8973",
            "loginUserIsFollow": false,
            "nickName": "DARLING Zero two♡",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/00d1491d64c44cdbbd15500178786292_zero_vpn.jpg!1",
            "userName": "Zero_VPN",
            "articleTitle": "关于我、重生到500年前凭借C语言改变世界科技vlog.11——深入理解指针(1)",
            "articleDetailUrl": "https://blog.csdn.net/Zero_VPN/article/details/143288823",
            "commentCount": "94",
            "favorCount": "55",
            "viewCount": "1246",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/3d6ea5738c17443e9969873cf82c5e8b.png"
            ],
            "isNew": null,
            "productId": "143288823",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8843",
            "pcHotRankScore": "8843",
            "loginUserIsFollow": true,
            "nickName": "Jupiter·",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/036f6cff17d74f91abb8a599504fe014_2301_77509762.jpg!1",
            "userName": "2301_77509762",
            "articleTitle": "深入理解数据链路层:以太网帧格式、MAC地址、交换机、MTU及ARP协议详解与ARP欺骗探究",
            "articleDetailUrl": "https://blog.csdn.net/2301_77509762/article/details/142486419",
            "commentCount": "28",
            "favorCount": "41",
            "viewCount": "588",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/e0c2599816294c55a3aca8018727b542.png"
            ],
            "isNew": null,
            "productId": "142486419",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8700",
            "pcHotRankScore": "8700",
            "loginUserIsFollow": true,
            "nickName": "小强在此",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/b1d9c6e6380a41898d74591dc3406e7c_zcy_c.jpg!1",
            "userName": "ZCY_c",
            "articleTitle": "机器学习【学校智慧食堂及其应用】",
            "articleDetailUrl": "https://blog.csdn.net/ZCY_c/article/details/143310696",
            "commentCount": "45",
            "favorCount": "47",
            "viewCount": "681",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/5301d509f3154664944736e615200d97.jpeg"
            ],
            "isNew": null,
            "productId": "143310696",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8690",
            "pcHotRankScore": "8690",
            "loginUserIsFollow": false,
            "nickName": "deephub",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/0054ba55863d44d98d9ba189a2401027_m0_46510245.jpg!1",
            "userName": "m0_46510245",
            "articleTitle": "深度学习中的学习率调度:循环学习率、SGDR、1cycle 等方法介绍及实践策略研究",
            "articleDetailUrl": "https://blog.csdn.net/m0_46510245/article/details/143280635",
            "commentCount": "1",
            "favorCount": "23",
            "viewCount": "3668",
            "hotComment": null,
            "picList": [
                "https://img-blog.csdnimg.cn/img_convert/037199363e3dd7cfa9764901ee3734e8.jpeg"
            ],
            "isNew": null,
            "productId": "143280635",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8681",
            "pcHotRankScore": "8681",
            "loginUserIsFollow": false,
            "nickName": "景天科技苑",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/89baaea9472b4fd7a2058d0e79ee2bc9_littlefun591.jpg!1",
            "userName": "littlefun591",
            "articleTitle": "【Golang】Go语言中如何进行包管理",
            "articleDetailUrl": "https://blog.csdn.net/littlefun591/article/details/143300406",
            "commentCount": "50",
            "favorCount": "46",
            "viewCount": "1234",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/947ad3e9d18b4e66a55dc31ecd7c1109.jpeg"
            ],
            "isNew": null,
            "productId": "143300406",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8412",
            "pcHotRankScore": "8412",
            "loginUserIsFollow": true,
            "nickName": "盼小辉丶",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/40d49e89aeed4d50b8b32680f49e3ac1_lovemy134611.jpg!1",
            "userName": "LOVEmy134611",
            "articleTitle": "遗传算法与深度学习实战(20)——使用进化策略自动超参数优化",
            "articleDetailUrl": "https://blog.csdn.net/LOVEmy134611/article/details/143283052",
            "commentCount": "19",
            "favorCount": "34",
            "viewCount": "431",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/412d30ce3bda4f70845f524b724c35e3.png"
            ],
            "isNew": null,
            "productId": "143283052",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8255",
            "pcHotRankScore": "8255",
            "loginUserIsFollow": true,
            "nickName": "中草药z",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/0e3cd82612e7400c9dfcef891e713698_2302_79806056.jpg!1",
            "userName": "2302_79806056",
            "articleTitle": "【Spring】Ioc&DI",
            "articleDetailUrl": "https://blog.csdn.net/2302_79806056/article/details/143271383",
            "commentCount": "39",
            "favorCount": "61",
            "viewCount": "675",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/752811eac90a465fa9b2c7b5f94b3b3d.png"
            ],
            "isNew": null,
            "productId": "143271383",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8057",
            "pcHotRankScore": "8057",
            "loginUserIsFollow": false,
            "nickName": "川川菜鸟",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/171da61cd5e74ebfa9ab08e29de51894_weixin_46211269.jpg!1",
            "userName": "weixin_46211269",
            "articleTitle": "数学建模学习(131):使用Python基于VIKOR算法的多准则决策分析",
            "articleDetailUrl": "https://blog.csdn.net/weixin_46211269/article/details/143268220",
            "commentCount": "0",
            "favorCount": "12",
            "viewCount": "816",
            "hotComment": null,
            "picList": [
                "https://img-blog.csdnimg.cn/img_convert/fad536e972e14ce4b37803185dc3b00c.png"
            ],
            "isNew": null,
            "productId": "143268220",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "7904",
            "pcHotRankScore": "7904",
            "loginUserIsFollow": true,
            "nickName": "java李杨勇",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/ec7f24ace4574ddf9f7859b8d43ec482_weixin_39709134.jpg!1",
            "userName": "weixin_39709134",
            "articleTitle": "基于大数据爬虫+Hive+SpringBoot+的歌曲筛选推荐与可视化大屏平台设计和实现(源码+论文+部署讲解等)",
            "articleDetailUrl": "https://blog.csdn.net/weixin_39709134/article/details/143314219",
            "commentCount": "25",
            "favorCount": "28",
            "viewCount": "1927",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/43134b0303c1466383ec0a6b8e19fd1d.png"
            ],
            "isNew": null,
            "productId": "143314219",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "7143",
            "pcHotRankScore": "7143",
            "loginUserIsFollow": false,
            "nickName": "小码农<^_^>",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/3480896bab63498188fad036782a48b3_2401_87257864.jpg!1",
            "userName": "2401_87257864",
            "articleTitle": "优选算法精品课--双指针算法(2)",
            "articleDetailUrl": "https://blog.csdn.net/2401_87257864/article/details/143302883",
            "commentCount": "55",
            "favorCount": "67",
            "viewCount": "991",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/758a49d491dc40c297a510d18e15849a.png"
            ],
            "isNew": null,
            "productId": "143302883",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        }
    ]
}

结合上述得到的信息,我们对这个热榜文章查询api的分析如下:

接口地址

  • https://blog.csdn.net/phoenix/web/blog/hot-rank

请求方法

  • GET

描述

此API端点用于获取CSDN的热榜文章列表,用户可以指定分页和排行榜类型。

请求参数

  • page (整数,可选)

    • 描述: 指定要获取的结果页码。
    • 默认值: 0
    • 示例: page=1
  • pageSize (整数,可选)

    • 描述: 指定每页返回的文章数量。
    • 默认值: 25
    • 示例: pageSize=50
  • type (字符串,可选)

    • 描述: 指定热榜的类型(例如,日榜、周榜)。具体值取决于CSDN的内部配置。
    • 示例: type=daily

响应

  • 内容类型: application/json

  • 响应结构:

    {
        "status": "success",
        "data": [
            {
                "title": "文章标题",
                "author": "作者姓名",
                "time": "发布时间",
                "read_count": "阅读数量",
                "link": "文章链接"
            },
            ...
        ],
        "page": 0,
        "pageSize": 25,
        "total": "文章总数"
    }
    

示例请求

GET /phoenix/web/blog/hot-rank?page=0&pageSize=25&type= HTTP/1.1
Host: blog.csdn.net

示例响应

{
    "status": "success",
    "data": [
        {
            "title": "理解AI",
            "author": "张三",
            "time": "2023-10-01",
            "read_count": "5000",
            "link": "https://blog.csdn.net/article/12345"
        }
    ],
    "page": 0,
    "pageSize": 25,
    "total": 100
}

错误处理

  • HTTP 状态码:
    • 200 OK: 请求成功。
    • 400 Bad Request: 请求无效,通常由于参数错误。
    • 500 Internal Server Error: 服务器内部错误。

3.3 爬虫代码编写

针对上一节得到的API文档,我们可以使用python编程模拟实现热榜请求,代码如下所示:

import requests
import time
import random

# 热榜请求URL
csdn_hot_rank_url = "https://blog.csdn.net/phoenix/web/blog/hot-rank?page=0&pageSize=25&type="

# 抓取数据函数
def fetch_hot_rank():
    try:
        headers = {
            "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
            "Accept-Encoding": "gzip, deflate",
            "Accept-Charset": "utf-8"
        }
        # 发送请求
        response = requests.get(csdn_hot_rank_url, headers=headers, timeout=10)
        response.raise_for_status()
        
        # 解析响应
        data = response.json()
        
        # 筛选并打印所需数据
        if data["code"] == 200:
            for entry in data["data"]:
                nick_name = entry.get("nickName", "N/A")
                article_title = entry.get("articleTitle", "N/A")
                pc_hot_rank_score = entry.get("pcHotRankScore", "N/A")
                print(f"作者: {nick_name}, 文章名: {article_title}, 当前热度值: {pc_hot_rank_score}")
        else:
            print("Failed to fetch data. Server responded with error.")
    
    except requests.RequestException as e:
        print(f"Request failed: {e}")


# 主函数
if __name__ == "__main__":
    fetch_hot_rank()

运行结果如下所示,一共获取到25条数据。
image.png

对比热榜页前8篇文章,可以看到是一一对应的。
image.png

3.4 代理ip模式爬取

在进行网络爬虫时,使用代理IP是一种有效的策略,能够提升爬虫的稳定性和效率。主要体现在如下几方面:

  • 避免IP封禁:频繁请求同一网站可能导致IP被临时或永久封禁。代理IP可以通过切换不同的IP地址,降低被封禁的风险。
  • 提高爬取效率:使用多个代理IP可以实现并行请求,显著提高数据爬取速度。
  • 访问受限资源:某些网站可能对特定地区的IP进行访问限制,使用代理IP可以绕过这些限制,实现无障碍访问。

刚好,前两天有个学习爬虫的粉丝朋友找我推荐好用的代理ip,他想学习爬取国外电商平台的数据。根据我的经验,我给他推荐了我一直在用的青果网络代理IP:https://www.qg.net/,基本满足了我的日常所需:

  1. 支持长达6小时上千个代理ip的免费试用,拿来做测试验证是非常合适的。
  2. 国内、国外全覆盖,可适用于全球化应用。
  3. 价格便宜,性价比高。
  4. 提供开箱即用的SDK和主流语言示例代码,几乎0接入成本。

下面,给大家演示一下,使用代理ip之后,爬虫代码如何编写:

import requests
import time
import random

#这是我手动提取的一组代理ip,实际应用过程中可以使用代码自动请求,更加灵活实用。
proxyAddr = "125.77.162.121:20062"

#根据自己的信息进行替换
authKey = "秘钥中的Authkey"
password = "秘钥中的Authpwd"

# 账密模式,无需配置ip白名单
proxyUrl = "http://%(user)s:%(password)s@%(server)s" % {
    "user": authKey,
    "password": password,
    "server": proxyAddr,
}

proxies = {
    "http": proxyUrl,
    "https": proxyUrl,
}

# 热榜请求URL
csdn_hot_rank_url = "https://blog.csdn.net/phoenix/web/blog/hot-rank?page=0&pageSize=25&type="

# 抓取数据函数
def fetch_hot_rank():
    try:
        headers = {
            "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
            "Accept-Encoding": "gzip, deflate",
            "Accept-Charset": "utf-8"
        }
        # 发送请求,增加proxies字段,每次访问目标网站之前前会启用代理ip。
        response = requests.get(csdn_hot_rank_url, headers=headers, proxies=proxies, timeout=10)
        response.raise_for_status()
        
        # 解析响应
        data = response.json()
        
        # 筛选并打印所需数据
        if data["code"] == 200:
            for entry in data["data"]:
                nick_name = entry.get("nickName", "N/A")
                article_title = entry.get("articleTitle", "N/A")
                pc_hot_rank_score = entry.get("pcHotRankScore", "N/A")
                print(f"作者: {nick_name}, 文章名: {article_title}, 当前热度值: {pc_hot_rank_score}")
        else:
            print("Failed to fetch data. Server responded with error.")
    
    except requests.RequestException as e:
        print(f"Request failed: {e}")


# 主函数
if __name__ == "__main__":
    fetch_hot_rank()

执行结果如下图所示,依然可以请求到结果:
image.png

【版权声明】本文为华为云社区用户原创内容,未经允许不得转载,如需转载请自行联系原作者进行授权。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。