原创 

python3 selenium chromedriver获取当前页面请求的所有URL 代码示例干货demo

分类:python,selenium    1769人阅读    IT小君  2021-08-22 21:46

BrowserMob Proxy是什么

官方解释:BrowserMob 代理基于Selenium 开源项目中开发的技术和商业负载测试和监控服务,它可以捕获 Web 应用程序的性能数据(通过HAR 格式),以及操纵浏览器行为和流量,例如将内容列入白名单和黑名单、模拟网络流量和延迟以及重写 HTTP 请求和响应。

BrowserMob Proxy允许您操作HTTP请求和响应,捕获HTTP内容,并将性能数据导出为HAR文件。 BMP作为独立的代理服务器运行良好,嵌入Selenium测试时尤其有用。官网地址:http://bmp.lightbody.net/

怎么做

第一步:官网下载browsermob-proxy服务器代码, 下载好之解压,其中bin目录找到 browsermob-proxy(这个是Linux或者Mac运行文件)或者browsermob-proxy.bat(这个是Windows运行文件)。呆会代码中需要设置它的路径


第二步:安装python3调用 browsermob-proxy的库,命令如下:

pip3 install browsermob-proxy

第三步:编写代码,如下:

from browsermobproxy import Server
from selenium import webdriver

# Purpose of this script: List all resources (URLs) that
# Chrome downloads when visiting some page.

### OPTIONS ###
url = "http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464"
chromedriver_location = r'F:\Program Files (x86)\webdriver\chromedriver.exe'
browsermobproxy_location = r"F:\browsermob-proxy-2.1.4\bin\browsermob-proxy" # location of the browsermob-proxy binary file (that starts a server)
###############

# Start browsermob proxy
server = Server(browsermobproxy_location)
server.start()
proxy = server.create_proxy()

# Setup Chrome webdriver - note: does not seem to work with headless On
options = webdriver.ChromeOptions()
# options.binary_location = chrome_location
# Setup proxy to point to our browsermob so that it can track requests
options.add_argument('--proxy-server=%s' % proxy.proxy)
options.add_argument('--headless')
driver = webdriver.Chrome(chromedriver_location, chrome_options=options)

# Now load some page
proxy.new_har("Example")
driver.get(url)

# Print all URLs that were requested
entries = proxy.har['log']["entries"]
for entry in entries:
    if 'request' in entry.keys():
        print(entry['request']['url'])

server.stop()
driver.quit()


运行结果:

http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/css/index.css
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/js/rotation.js
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/images/header.jpg
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/images/focus1.jpg
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/images/focus2.jpg
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/images/focus3.jpg
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/images/still1.jpg
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/images/focus4.jpg
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/images/still12.jpg
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/images/still5.jpg
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/images/still7.jpg
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/images/still10.jpg
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/images/still9.jpg
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/images/still8.jpg
http://view.jqueryfuns.com/2021/8/13/01cf9937ab5612f2fd1f3379b8d0e464/images/still11.jpg















支付宝打赏 微信打赏

如果文章对你有帮助,欢迎点击上方按钮打赏作者

 工具推荐 更多»