鱼C论坛

 找回密码
 立即注册
查看: 5217|回复: 8

爬虫requests.get获取的代码与网页源代码不一致怎么办?

[复制链接]
发表于 2021-7-27 22:47:49 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
求大神们给个解决办法,在网上找了很久也没找到解决办法,真的才刚入门就已经入土了,第一步自信心就没了。。。
网页源码都不对的话,后面bs4的操作都做不了

这是我在浏览器上查看网页源代码现实的代码:

view-source:https://www.pexels.com/zh-cn/

这是我的爬虫代码:
  1. # coding: utf-8
  2. import requests

  3. url = "https://www.pexels.com/zh-cn/"
  4. headers = {
  5.     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36 Edg/86.0.622.56'
  6. }
  7. response = requests.get(url=url, headers=headers)
  8. response.encoding = 'utf-8'
  9. print(response.text)
复制代码


返回的结果是这样的:
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>

<title>Attention Required! | Cloudflare</title>


<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
<style type="text/css">body{margin:0;padding:0}</style>


<!--[if gte IE 10]><!-->
<script>
  if (!navigator.cookieEnabled) {
    window.addEventListener('DOMContentLoaded', function () {
      var cookieEl = document.getElementById('cookie-alert');
      cookieEl.style.display = 'block';
    })
  }
</script>
<!--<![endif]-->


  
    <script type="text/javascript">
    //<![CDATA[
    (function(){
      window._cf_chl_opt={
        cvId: "2",
        cType: "interactive",
        cNounce: "8283",
        cRay: "67569e957c6031f1",
        cHash: "654cba96faf1022",
        cFPWv: "g",
        cTTimeMs: "4000",
        cLt: "n",
        cRq: {
          ru: "aHR0cHM6Ly93d3cucGV4ZWxzLmNvbS96aC1jbi8=",
          ra: "TW96aWxsYS81LjAgKFdpbmRvd3MgTlQgMTAuMDsgV2luNjQ7IHg2NCkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzg2LjAuNDI0MC4xMTEgU2FmYXJpLzUzNy4zNiBFZGcvODYuMC42MjIuNTY=",
          rm: "R0VU",
          d: "fCJJvpAatJA8dqA9l5xM826kc8DftVxEbE89a2PsceXLeCFUOGxB6u2W+6mImrtTCu/vixa1YVnM+6QlN/vy4EUtOhq3VVoVROcMlnZPxpPKwhj3Bo1vynK4IuLoVQbeoHxGDIFUSRDT1xbp115ZOU+FHJvAmnepCH0g+IQNS1slxtxYe3HvOZOQ4oM0QZTfa8z4HOwo/Cak0/MLCJ323rCD+2fq5pqCtB4NfdaTxU/5qrNcF6jwsfTOA9LJY5fI+wHiHNJYhIO2IBcyxb+gIQvrrDtmfzleQdtNhhb9mmqXs4qJeK55R2Ec5saA6i1PhMqZOBgRFLb8alyy9lSDfy9AAV5ogodlyc1z6ryfZgBU/oUv33K+E3m7pLq92tc/nPMRGWkh3vfE4b/R8h2f/wyqxxi/pseCWgWGJJEZwNSblZx4qqKA3RnULIvPGY1OQFMB3wMqZTX+iIfnr1Lj3DEytOKjQByBM6hOGr9NPTNsrLlH+yXfOBBPqVszoP2TTrybdobViVOdMt4bIp9twmfU0LIdGunLUEupWO5Xsn79Rj/q3xtXq7/okdEHKEFUgmgxlY6BvJ/efcwvZSlnVQOa1DB9PgVKB7tnmznuByaies8d6v8GYpSfNXOQzW/Ih/R8mIwrjugSzfXTav1VDmoPHqQYH3tELnAqMgR1iXZgC7THO2Xws9R3DuvEuKgIwamxIx+My8N2yvDiF2Z+P1pJLCShw0/TIRtRRFJ/GDEuiOOEZmxU9hNq4LrxA1nsCb1OhfsoNtzdn490RWAv8ctZ/eO2/8RAn4avCyy61dqF8v4z/OUaxDh4FyS4ptLx7+96Gv0PALBNtng+qdXe3PkiiEnV9yEORpMICj8KC7s=",
          t: "MTYyNzM5Njc5MC42MzgwMDA=",
          m: "isxzX0OKwv0iPyjFzrH/+IqmqybJwIjrSvrPiUXb+Ng=",
          i1: "KrjQc/ks/37mEGMxG2MnLg==",
          i2: "FuzPMnppfKNpjxTpD+WNrQ==",
          zh: "NNvtYPqUSp5OZQlBARs/elbgaYo5sB6TI6GfvNyxORY=",
          uh: "/NJGqcG0XAQsHK/RA4n1triAIEfidm9X4UEB3cfV7L8=",
          hh: "Q3T1VSV1376IoXShdoD7ceOIGJMapzzf5okB22fCjaY=",
        }
      };
    }());
    //]]>
    </script>
  

<style type="text/css">
  #cf-wrapper #spinner {width:69px; margin:  auto;}
  #cf-wrapper #cf-please-wait{text-align:center}
  .attribution {margin-top: 32px;}
  .bubbles { background-color: #f58220; width:20px; height: 20px; margin:2px; border-radius:100%; display:inline-block; }
  #cf-wrapper #challenge-form { padding-top:25px; padding-bottom:25px; }
  #cf-hcaptcha-container { text-align:center;}
  #cf-hcaptcha-container iframe { display: inline-block;}
  @keyframes fader     { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
  #cf-wrapper #cf-bubbles { width:69px; }
  @-webkit-keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
  #cf-bubbles > .bubbles { animation: fader 1.6s infinite;}
  #cf-bubbles > .bubbles:nth-child(2) { animation-delay: .2s;}
  #cf-bubbles > .bubbles:nth-child(3) { animation-delay: .4s;}
</style>
</head>
<body>
  <div id="cf-wrapper">
    <div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>
    <div id="cf-error-details" class="cf-error-details-wrapper">
      <div class="cf-wrapper cf-header cf-error-overview">
      
        <h1 data-translate="challenge_headline">One more step</h1>
        <h2 class="cf-subheadline"><span data-translate="complete_sec_check">Please complete the security check to access</span> www.pexels.com</h2>
      
      </div>
      
      <div class="cf-section cf-highlight cf-captcha-container">
        <div class="cf-wrapper">
          <div class="cf-columns two">
            <div class="cf-column">
            
              <div class="cf-highlight-inverse cf-form-stacked">
                <form class="challenge-form interactive-form" id="challenge-form" action="/zh-cn/?__cf_chl_captcha_tk__=pmd_3c0f648e0cc8308372e265f4632a9e4691d8245f-1627396790-0-gqNtZGzNAuKjcnBszQe6" method="POST" enctype="application/x-www-form-urlencoded">
  
    <div id='cf-please-wait'>
      <div id='spinner'>
        <div id="cf-bubbles">
            <div class="bubbles"></div>
            <div class="bubbles"></div>
            <div class="bubbles"></div>
        </div>
      </div>
      <p data-translate="please_wait" id="cf-spinner-please-wait">Please stand by, while we are checking your browser...</p>
      <p data-translate="redirecting" id="cf-spinner-redirecting" style="display:none">Redirecting...</p>
      </div>
  
  <input type="hidden" name="md" value="6628804fd7f36aff991674053cccaff88fef44b2-1627396790-0-AQiAGYS9iOa2R9TYAvN9fcLv6-si4116lrsOUVZV56SBMXbGD6IHC1YyNq4CJpPzl1FMjXMu8qHuWm36Epf6v0MQ12Qo-aA9JF7NEvTm3sDF5UU3i5JAoPdWl7Md_wDw87cGHY7BJlQ5rVoIYEmRJtk_1lupsUhbrPQ2CKkP7hJgz9H5-ZJzdE94h0ydCtrrakGbH7dzTCbtIp0cYwa1t4lb2l4qB5M3w7d7-EDwBloAUuql71tdzxv0livg6Z3UzKCHRVu1i920LVQOnYN-c8FEbdBvdX3Q3i9ajfpN0j--EdjxaSFtq7qAnQqn9xlCU_fKHYQ1lq774jk5EPI6Xj-SC0fc29yOusCBFspAFvVCC58TU7X0jrFmoDIwz5tUuFFAppSrJTJAm7xcZj6vOTAN5XRXSuuMEWmAs4viDjfb9nvPXZl08DSX8e2fK2aolOfthzAONoT-lGeMV1Cz2OXzCe-BTGK0fJ1dW4eb99iu2EO5ZGFAAt-UZa5qNmJd_HJlkSWuIJY32GaPqDRd-9BvXQ1ll9UfZjDWuYmeVvXp0_7RYwgKQKWUa_3zxJYopE044ZXfytuCqUU5YpK0C0Wb1t2wFVzsKqJZTEZ5fyMUJs6VqXLMZ9HmvFugpFoJfIfEshbwn-mKDJoGk0BN7oFwdHSVsmYetGqYKA9XQlii" />
  <input type="hidden" name="r" value="c349cfeee38aa8436787def6fc2d9079a56c02da-1627396790-0-AfI8U68bupiyIQnRomkLkSm+bo6kXrquv6fUGz+0MXz/CK+wZ1s+M9mSAZPTJhLHSb20U2FxLuA32+DQX8pdQr0kLAF263sRHNpy//Xb3HEoAfCTZKhRz098uzZmCTz/IiIPmiOQ9NwnIHHUz6navdctkBQlQLa3eUeSDd8vm+3QwWGjHjmtcU4fOk2mOYrBYllLwTg+Ayqd7c6rE/ToqFLdor0AOjgzt4WLSkmsr6bqDjQcG7F20qkasG4DfXriY+UVcSmQ7mbsWLvWKNtsb+iv9yEVTnXK0Mkpq5f2WMpNx4PYeyJbKnf9rCyz4PKAJ0cHrv04sKykT44bFyzqLS3fZdDQdcqGCjxglVUZtUx1HOE26lYZ2x0Ng7GmuKjKzNdtsw/qagywNf98m4NwtWgGtZgwecjmhmoZPdABGwl8y37qqRyS7/e7ikscotKyUEHHWLCJuzItH7y2ab/vJGzBWaXp6fdJVCCKy5Qqq5LtIiAXp3cj06aL9yYlTq3jNJYWHM5zT9dPVIeZXTveBbrYPaUddvkUNVoaRJoyObXtlZSGX09Z8lHVj9ZJ/VQDmvX2W45nOi1nD5r/lGL8Vf24iPel4gpdL+2kzD1jubZg9RrzSMy8Bv7Wa9Rt5RdNk9SnVGe5eXlAoJy7HEccJLNN8xDroSEXR8VTa/S1cTMeLjTyvfz2+rOjZwVXRxCqEhPxuKA8y7Bghnoa9qNl/dk1/mE0d7dZBalc8vignksmI18v/Sw/Jdcppj0AP8/z6z86gIrFKeC6MufCtQq1Brd6i/se47LdgnxXpUkOxwUiWtYdoN7P1OJk7iXuBRBUIUnZsz0NGIXYzrnwcGl80jTpw+P0cLyRCwiLFjV2cr57I7kDk8hc1zt8yGdMX/Nx2UNRGUcGPgcNkbdWeRKonKmMb61NfO9AgMEbzag2apoYN4bqAHoQnAphwdbV1SOJDyfaKqsEQbrDW7t/2oopyDabhvVeTzjoM7Eqg7C3e6RXB/mVaHvTN7KXqjkVGVF/QiyrMi37sYcpIjnTBxqqyXHX2BXuZXrafLVS0MKwUMRDpcrYkwv+pxDPwfzqDW5D1e7+vuT0xAb6c75Nf35Up4T+5XsPyWVRtASFrEV2q47NMsoMhGbaCjB8pvR+7sxePzIRvb9QkLAZ5M69bMb0EBmSHTfzRnV9JwjLEc9P5smVN0lvbTinuAqUUcQ/0KZ15zQZO4IXbNuBQg3eEN8ff5djpS/mEmKYKWm+Bncrz+YJkuxY5mC3oa4Qvhl3KTo48DDRX1s4bsSAFcZ2qLJa5tllRmYiUSN/hphWExgc2D+QkUt76zNNK34RBSih+R5kzOsWm8sNt9josYuCQMyXNlne0rwdHpfvim3hxwWTBpYvCDDcxhLs2KJQugcZJOg4qabT4zv4JV7ChAB05jLQWxdQbyhHt+8DszrR2rh265HFbdmu4GojEAPNMn0wysxqJe5Va7LhhqiDiGhdgwohZuPPO+Xfq1t3ZA9U157m4VPd9q0Bi2IMvkrDo+79+9pzDl3w5i0oPQOHIAVbuxNYZnKa0RqZqFN5gF+gEzroAJRZ92S2Gsa5Bi0dpCjTlPQlb/qzrUA08/vx6QU7LIwi8G9q9U0m/XxkloPKXcffTU8mB2V5GwED2KYn5bbEdtWVd34tVgO8MMh5PpRZOEQ++/55Qwc8SmqR3Y55VdBpa6q7qWz8kaTfQxV4QvSWC5lblzStghNp6olToKUmt01hFZWK88LhyiS9qxCdGrQW3cM8ORZr0anWr+eRYmO3Hyp2sbvQskBSk46Rlig8X2qHw8ydraEI45JLSlcCjjqQ7U5bGmcEdlpY9/tHfrfTRKZc++cT0PLQSnQXwkl3HVCa9in1ms3B9bQRlQGoMvp6tMAgT9iGWbq8oe5XmTa3o/3zeg==">
  <input type="hidden" name="cf_captcha_kind" value="h">
  <input type="hidden" name="vc" value="d5ee13262074f0af7453aaf36ff0de4c">
  
  <noscript id="cf-captcha-bookmark" class="cf-captcha-info">
  <h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>
  </noscript>
    <div id="no-cookie-warning" class="cookie-warning" data-translate="turn_on_cookies" style="display:none">
      <p data-translate="turn_on_cookies" style="color:#bd2426;">Please enable Cookies and reload the page.</p>
    </div>
  <script type="text/javascript">
  //<![CDATA[
    var a = function() {try{return !!window.addEventListener} catch(e) {return !1} },
      b = function(b, c) {a() ? document.addEventListener("DOMContentLoaded", b, c) : document.attachEvent("onreadystatechange", b)};
      b(function(){
        var cookiesEnabled=(navigator.cookieEnabled)? true : false;
        if(!cookiesEnabled){
          var q = document.getElementById('no-cookie-warning');q.style.display = 'block';
        }
      });
  //]]>
  </script>
  <div id="trk_captcha_js" style="background-image:url('/cdn-cgi/images/trace/captcha/nojs/h/transparent.gif?ray=67569e957c6031f1')"></div>
</form>
  
  <script type="text/javascript">
    //<![CDATA[
    (function(){
        var isIE = /(MSIE|Trident\/|Edge\/)/i.test(window.navigator.userAgent);
        var trkjs = isIE ? new Image() : document.createElement('img');
        trkjs.setAttribute("src", "/cdn-cgi/images/trace/captcha/js/transparent.gif?ray=67569e957c6031f1");
        trkjs.id = "trk_captcha_js";
        trkjs.setAttribute("alt", "");
        document.body.appendChild(trkjs);
        var cpo=document.createElement('script');
        cpo.type='text/javascript';
        cpo.src="/cdn-cgi/challenge-platform/h/g/orchestrate/captcha/v1?ray=67569e957c6031f1";
        document.getElementsByTagName('head')[0].appendChild(cpo);
    }());
    //]]>
    </script>
  


              </div>
            </div>

            <div class="cf-column">
              <div class="cf-screenshot-container">
              
                <span class="cf-no-screenshot"></span>
              
              </div>
            </div>
          </div>
        </div>
      </div>

      <div class="cf-section cf-wrapper">
        <div class="cf-columns two">
          <div class="cf-column">
            <h2 data-translate="why_captcha_headline">Why do I have to complete a CAPTCHA?</h2>
            
            <p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>
          </div>

          <div class="cf-column">
            <h2 data-translate="resolve_captcha_headline">What can I do to prevent this in the future?</h2>
            

            <p data-translate="resolve_captcha_antivirus">If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.</p>

            <p data-translate="resolve_captcha_network">If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.</p>
            
          </div>
        </div>
      </div>
      

      <div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
  <p class="text-13">
    <span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">67569e957c6031f1</strong></span>
    <span class="cf-footer-separator sm:hidden">&bull;</span>
    <span class="cf-footer-item sm:block sm:mb-1"><span>Your IP</span>: 49.71.20.210</span>
    <span class="cf-footer-separator sm:hidden">&bull;</span>
    <span class="cf-footer-item sm:block sm:mb-1"><span>Performance &amp; security by</span> <a rel="noopener noreferrer" href="https://www.cloudflare.com/5xx-error-landing" id="brand_link" target="_blank">Cloudflare</a></span>
   
  </p>
</div><!-- /.error-footer -->


    </div>
  </div>

  <script type="text/javascript">
  window._cf_translation = {};
  
  
</script>


</body>
</html>
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2021-7-28 06:46:32 | 显示全部楼层
你看得到的网页上的内容是你浏览器向网站服务器请求回来的,打开调试抓包就好了
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2021-7-28 06:49:56 | 显示全部楼层

                               
登录/注册后可看大图
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2021-7-28 10:28:11 | 显示全部楼层
网页一般由 html(骨架),css(皮肤),js(动作)组成。
浏览器会把3个加载后渲染处理所以码源上会有变动,而urllib只是get,并不会渲染,所以两个码源看起来不一样,这是正常现象。
你说一下你想爬的内容,我帮你看看。
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2021-7-28 14:06:08 From FishC Mobile | 显示全部楼层
只能说你选择了一个错误的网站,这个网站反爬狠厉害
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2021-7-28 20:37:47 | 显示全部楼层

主要是去哪得到这些代码啊
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2021-7-28 21:36:30 | 显示全部楼层
江南野外的狸 发表于 2021-7-28 20:37
主要是去哪得到这些代码啊

点击f12,开启调试功能
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2021-7-28 21:37:45 | 显示全部楼层
小甲鱼也教过这些方法
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2021-7-28 21:38:49 | 显示全部楼层
suchocolate 发表于 2021-7-28 10:28
网页一般由 html(骨架),css(皮肤),js(动作)组成。
浏览器会把3个加载后渲染处理所以码源上会有变动,而u ...

当然是那些好康的图片啦
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2024-5-13 04:38

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表