V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
androidBrant
V2EX  ›  程序员

求助:想抓取这个网页的图片,有什么好办法?

  •  
  •   androidBrant · 2014-08-01 14:03:00 +08:00 · 4213 次点击
    这是一个创建于 3770 天前的主题,其中的信息可能已经有所发展或是发生改变。
    http://www.szeros-wedding.com/html/service/804.html#1

    里面的图片,帮忙,谢谢。。
    26 条回复    2014-08-03 00:24:55 +08:00
    xiandao7997
        1
    xiandao7997  
       2014-08-01 14:04:29 +08:00 via Android
    Wget
    faceair
        2
    faceair  
       2014-08-01 14:14:42 +08:00
    http://www.szeros-wedding.com/UpFile/editor/2014032002455418.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002455652.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002456340.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002456480.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002457027.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002457496.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002457996.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002458527.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002458652.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002459152.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002460184.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002460340.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002460512.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002461262.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002461902.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002462480.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002463027.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002463746.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002464809.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002464934.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002465652.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002466230.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002466730.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002466918.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002467590.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002467746.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002468449.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002469090.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002469230.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002469902.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002470699.jpg
    http://www.szeros-wedding.com/UpFile/editor/2014032002470840.jpg

    贴到迅雷应该可以批量下载
    zzetao
        3
    zzetao  
       2014-08-01 14:16:56 +08:00
    其实一些浏览器的插件可以做到
    androidBrant
        4
    androidBrant  
    OP
       2014-08-01 14:17:03 +08:00
    @faceair 如何快速抓到这些地址的?
    nealv2ex
        5
    nealv2ex  
       2014-08-01 14:23:55 +08:00   ❤️ 1
    list = $('.pic img').map(function(o,item){
    var a = document.createElement('a');
    a.href = $(item).attr('original');
    return a.href;
    })
    androidBrant
        6
    androidBrant  
    OP
       2014-08-01 14:25:02 +08:00
    @xiandao7997

    jiaqiqunaerdeiMac:pic jiaqiqunaer$ wget -r http://www.szeros-wedding.com/UpFile/editor/

    --2014-08-01 14:20:58-- http://www.szeros-wedding.com/UpFile/editor/
    Resolving www.szeros-wedding.com... 211.154.142.215
    Connecting to www.szeros-wedding.com|211.154.142.215|:80... connected.
    HTTP request sent, awaiting response... 403 Forbidden
    2014-08-01 14:20:59 ERROR 403: Forbidden.
    NemoAlex
        7
    NemoAlex  
       2014-08-01 14:27:23 +08:00
    faceair
        8
    faceair  
       2014-08-01 14:29:01 +08:00
    imn1
        9
    imn1  
       2014-08-01 14:29:58 +08:00   ❤️ 2
    save as...
    complete html
    Roboo
        10
    Roboo  
       2014-08-01 14:32:35 +08:00
    idm
    xiandao7997
        11
    xiandao7997  
       2014-08-01 14:36:02 +08:00 via Android
    Wget -r --level=2 --accept=jpg [标题里的 url]
    结束后在子目录的 upfile/editor 里面找
    xiandao7997
        12
    xiandao7997  
       2014-08-01 14:36:51 +08:00 via Android
    @imn1 感觉自己 《社交网络》白看了
    wesley
        13
    wesley  
       2014-08-01 15:24:05 +08:00
    先清空浏览器缓存, 再打开那个网页, 再去浏览器缓存文件夹里找
    androidBrant
        14
    androidBrant  
    OP
       2014-08-01 15:25:19 +08:00
    @faceair 用xpath如何找到这些地址,表达式,谢谢
    mengzhuo
        15
    mengzhuo  
       2014-08-01 15:36:09 +08:00   ❤️ 1
    再来个python版

    import requests
    from lxml import html
    URL = 'http://www.szeros-wedding.com/html/service/804.html#1'
    [x.attrib['src'] for x in html.fromstring(requests.get('http://www.szeros-wedding.com/html/service/804.html#1').text).xpath('//img')]

    -------

    ['/skins/20140425/images/bg74.gif',
    '/skins/20140425/images/t0.gif',
    '/skins/20140425/images/t3.gif',
    '/skins/20140425/images/t01.gif',
    '/skins/20140425/images/t01.gif',
    '/skins/20140425/images/bg75.gif',
    '/skins/20140425/images/logo.jpg',
    '/skins/20140425/images/bg6.jpg',
    '/skins/20140425/images/bg7.jpg',
    '/skins/20140425/images/bg8.jpg',
    '/skins/20140425/images/bg9.jpg',
    '/skins/20140425/images/bg10.jpg',
    '/skins/20140425/images/f.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002455418.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002455652.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002456340.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002456480.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002457027.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002457496.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002457996.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002458527.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002458652.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002459152.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002460184.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002460340.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002460512.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002461262.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002461902.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002462480.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002463027.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002463746.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002464809.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002464934.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002465652.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002466230.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002466730.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002466918.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002467590.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002467746.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002468449.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002469090.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002469230.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002469902.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002470699.jpg',
    '/ueditor/asp/../../UpFile/editor/2014032002470840.jpg',
    '/skins/20140425/images/f.jpg',
    '/skins/20140425/images/jd.jpg',
    '/skins/20140425/hzjd/1.jpg',
    '/skins/20140425/hzjd/2.jpg',
    '/skins/20140425/hzjd/3.jpg',
    '/skins/20140425/hzjd/4.jpg',
    '/skins/20140425/hzjd/5.jpg',
    '/skins/20140425/hzjd/6.jpg',
    '/skins/20140425/hzjd/7.jpg',
    '/skins/20140425/hzjd/8.jpg',
    '/skins/20140425/hzjd/9.jpg',
    '/skins/20140425/hzjd/10.jpg',
    '/skins/20140425/images/link.jpg',
    '/skins/20140425/images/logo1.jpg']
    zoudm
        16
    zoudm  
       2014-08-01 16:34:16 +08:00
    @androidBrant

    Xpath:

    /html/body/table/tbody/tr[1]/td/table/tbody/tr[5]/td/div/p[1]/img
    ...
    ...
    /html/body/table/tbody/tr[1]/td/table/tbody/tr[5]/td/div/p[5]/img
    muziyue
        17
    muziyue  
       2014-08-01 17:05:27 +08:00
    如果不是特别多的页面的话,我一般都是curl+s 然后文件夹里找
    decken
        18
    decken  
       2014-08-01 17:54:01 +08:00
    decken
        19
    decken  
       2014-08-01 17:55:25 +08:00
    mopvhs
        20
    mopvhs  
       2014-08-02 10:20:49 +08:00
    来分享下我的常用方法:



    http://gist.github.com/4b93757c88b5fe558846
    mopvhs
        21
    mopvhs  
       2014-08-02 10:37:14 +08:00   ❤️ 1
    jprovim
        22
    jprovim  
       2014-08-02 11:53:51 +08:00
    BGLL
        23
    BGLL  
       2014-08-02 17:30:11 +08:00
    Chorme 扩展:Fatkun
    androidBrant
        24
    androidBrant  
    OP
       2014-08-02 21:10:31 +08:00
    果然还是v2人才多,wget命令一下全下来了


    @xiandao7997 给力
    xiandao7997
        25
    xiandao7997  
       2014-08-02 21:16:37 +08:00 via Android
    保存网页然后去文件夹找那个最简单,21楼的方法也很酷
    sobigfish
        26
    sobigfish  
       2014-08-03 00:24:55 +08:00
    firefox + downthemall
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   5952 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 40ms · UTC 02:00 · PVG 10:00 · LAX 18:00 · JFK 21:00
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.