Home
Sign Up
Sign In
1a1a11a's recent timeline updates
1a1a11a
V2EX member #209800, joined on 2017-01-09 06:42:21 +08:00
1a1a11a
提问
技术话题
好玩
工作信息
交易信息
城市相关
爬虫判重
编程
•
1a1a11a
•
Mar 12, 2017
• Lastly replied by
1a1a11a
28
»
More topics by 1a1a11a
1a1a11a's recent replies
Mar 12, 2017
Replied to a topic by
1a1a11a
›
编程
›
爬虫判重
@
Lax
谢谢
Mar 9, 2017
Replied to a topic by
1a1a11a
›
编程
›
爬虫判重
@
jiangzhuo
谢谢
Mar 9, 2017
Replied to a topic by
1a1a11a
›
编程
›
爬虫判重
@
v2pro
好东西,学习了
Mar 9, 2017
Replied to a topic by
1a1a11a
›
编程
›
爬虫判重
@
Lax
HLL 是什么,三个字母不太好搜,能不能给个名字?谢谢啦!
Mar 9, 2017
Replied to a topic by
1a1a11a
›
编程
›
爬虫判重
@
bjlbeyond
好像不太对题?还是我没理解到点?
Mar 9, 2017
Replied to a topic by
1a1a11a
›
编程
›
爬虫判重
@
wmttom
这个主意不错,非常感谢
Mar 9, 2017
Replied to a topic by
1a1a11a
›
编程
›
爬虫判重
@
binux
我把去重完的 url 写磁盘了和待爬 url 队列写磁盘了,要不内存不一会儿就满,服务器是前兆带宽,我满载了在爬。。。
Mar 9, 2017
Replied to a topic by
1a1a11a
›
编程
›
爬虫判重
@
jiangzhuo
哦,对,你的数值错了,是 470 亿,单位是 10 亿,不是 1 亿,所以你的计算要再乘以 10
Mar 9, 2017
Replied to a topic by
1a1a11a
›
编程
›
爬虫判重
@
jiangzhuo
为什么我 72G 的内存刷刷刷就满了 :(,估计有不少费 url ,加上 python 可能比较费内存,你发的这个网址挺好玩的,不知道为什么中间突然降下去了,我一直觉得网页总数是单调增加的,不过从这个网站看,还挺稳定的。
Mar 9, 2017
Replied to a topic by
1a1a11a
›
编程
›
爬虫判重
@
samcode
这个可以考虑,谢谢啦。
»
More replies by 1a1a11a
About
·
Help
·
Advertise
·
Blog
·
API
·
FAQ
·
Solana
·
2595 Online
Highest 6679
·
Select Language
创意工作者们的社区
World is powered by solitude
VERSION: 3.9.8.5 · 15ms ·
UTC 11:05
·
PVG 19:05
·
LAX 04:05
·
JFK 07:05
♥ Do have faith in what you're doing.
❯