V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
Ewig
V2EX  ›  Python

scrapy 爬网站 用代理的时候 报错如下

  •  
  •   Ewig · 2019-01-04 16:51:00 +08:00 · 4546 次点击
    这是一个创建于 1910 天前的主题,其中的信息可能已经有所发展或是发生改变。
    2019-01-04 16:26:57 [csrc][scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.csrc.gov.cn/pub/newsite/xxpl/yxpl/index.html> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_record', 'wrong version number')]>]
    1131 2019-01-04 16:27:04 [csrc][scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.csrc.gov.cn/pub/newsite/xxpl/yxpl/index_1.html> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_record', 'wrong version number')]>]
    1132 2019-01-04 16:27:09 [csrc][scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.csrc.gov.cn/pub/newsite/xxpl/yxpl/index_2.html> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_record', 'wrong version number')]>]
    1133 2019-01-04 16:27:16 [csrc][scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.csrc.gov.cn/pub/newsite/xxpl/yxpl/index_3.html> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_record', 'wrong version number')]>]
    1134 2019-01-04 16:27:21 [csrc][scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.csrc.gov.cn/pub/newsite/xxpl/yxpl/index_4.html> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_record', 'wrong version number')]>]


    查了国外网站,没找到原因是啥,其他网站没问题,就这个网站,报错 不知道为啥??

    http://www.csrc.gov.cn/pub/newsite/xxpl/yxpl/index.html 这个网站
    6 条回复    2019-01-05 10:54:42 +08:00
    Ewig
        2
    Ewig  
    OP
       2019-01-04 17:13:58 +08:00
    @houzhimeng import base64


    class proxy_middleware(object):

    def __init__(self):
    proxy_host = "w.t.16yn"
    proxy_port = "***"
    self.username = "***"
    self.password = "**"
    self.proxies = {"http": "http://{}:{}/".format(proxy_host, proxy_port)}
    self.proxy_server = 'https://w5.t.16yun.cn:6469'
    self.proxy_authorization = 'Basic ' + base64.urlsafe_b64encode(
    bytes((self.username + ':' + self.password), 'ascii')).decode('utf8')

    def process_request(self, request, spider):
    request.meta['proxy'] = self.proxy_server
    request.headers['Proxy-Authorization'] = self.proxy_authorization

    我改成这样还是不行
    15399905591
        3
    15399905591  
       2019-01-04 17:28:04 +08:00
    self.proxy_server = 'https://w5.t.16yun.cn:6469'
    改成
    self.proxy_server = 'http://w5.t.16yun.cn:6469'
    huaerxiela
        4
    huaerxiela  
       2019-01-04 17:39:44 +08:00
    houzhimeng
        5
    houzhimeng  
       2019-01-04 18:29:05 +08:00
    https://github.com/scrapy/scrapy/issues/1855 你看看这个情况跟你一样么?
    Ewig
        6
    Ewig  
    OP
       2019-01-05 10:54:42 +08:00
    @15399905591 为啥这个原因
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   我们的愿景   ·   实用小工具   ·   5504 人在线   最高记录 6543   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 27ms · UTC 06:04 · PVG 14:04 · LAX 23:04 · JFK 02:04
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.