1
oIMOo 2019-03-12 20:50:28 +08:00
你好
我没怎么写过带 js 的 python requests 脚本 您能看看如何写能检测出来那个 js 返回的 enroll 按钮是否显示有课呢? (目前是不能报名的状态) 谢谢 |
2
jenlors 2019-03-13 11:14:14 +08:00
支持一个
|
3
tonywangcn 2019-03-13 12:23:35 +08:00
|
4
my8100 OP @tonywangcn 刚刚确认过这两个链接都可以打开,请先确认你的网路能够正常访问 https://medium.com/
|
5
tonywangcn 2019-03-13 17:25:18 +08:00
$ https_proxy=localhost:6152 curl -vvv https://-medium.com/@my8100/https-medium-com-my8100-how-to-efficiently-manage-your-distributed-web-scraping-projects-55ab13309820
* Trying ::1... * TCP_NODELAY set * Connection failed * connect to ::1 port 6152 failed: Connection refused * Trying 127.0.0.1... * TCP_NODELAY set * Connected to localhost (127.0.0.1) port 6152 (#0) * Establish HTTP proxy tunnel to medium.com:443 > CONNECT medium.com:443 HTTP/1.1 > Host: medium.com:443 > User-Agent: curl/7.54.0 > Proxy-Connection: Keep-Alive > < HTTP/1.1 200 Connection established < * Proxy replied OK to CONNECT request * ALPN, offering h2 * ALPN, offering http/1.1 * Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH * successfully set certificate verify locations: * CAfile: /etc/ssl/cert.pem CApath: none * TLSv1.2 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): * TLSv1.2 (OUT), TLS change cipher, Client hello (1): * TLSv1.2 (OUT), TLS handshake, Finished (20): * TLSv1.2 (IN), TLS change cipher, Client hello (1): * TLSv1.2 (IN), TLS handshake, Finished (20): * SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305 * ALPN, server accepted to use h2 * Server certificate: * subject: businessCategory=Private Organization; jurisdictionCountryName=US; jurisdictionStateOrProvinceName=Delaware; serialNumber=5010624; street=760 Market Street; postalCode=94102; C=US; ST=California; L=San Francisco; O=A Medium Corporation; CN=medium.com * start date: Jun 1 00:00:00 2017 GMT * expire date: Aug 30 12:00:00 2019 GMT * subjectAltName: host "medium.com" matched cert's "medium.com" * issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 Extended Validation Server CA * SSL certificate verify ok. * Using HTTP2, server supports multi-use * Connection state changed (HTTP/2 confirmed) * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 * Using Stream ID: 1 (easy handle 0x7fa7b5806600) > GET /@my8100/https-medium-com-my8100-how-to-efficiently-manage-your-distributed-web-scraping-projects-55ab13309820 HTTP/2 > Host: medium.com > User-Agent: curl/7.54.0 > Accept: */* > * Connection state changed (MAX_CONCURRENT_STREAMS updated)! < HTTP/2 302 < date: Wed, 13 Mar 2019 09:23:05 GMT < content-type: application/octet-stream < set-cookie: __cfduid=d800d3f4d7ffa024ead64e91a29e1ebb41552468985; expires=Thu, 12-Mar-20 09:23:05 GMT; path=/; domain=.medium.com; HttpOnly < set-cookie: uid=lo_rj0lT6mjVKUE; Expires=Thu, 12-Mar-20 09:23:05 GMT; Domain=.medium.com; Path=/; Secure; HttpOnly < content-security-policy: default-src 'self'; connect-src https://localhost https://*.instapaper.com https://*.stripe.com https://glyph.medium.com https://*.paypal.com https://getpocket.com https://medium.com:443 https://*.medium.com:443 https://*.medium.com https://medium.com https://*.medium.com https://*.algolia.net https://cdn-static-1.medium.com https://dnqgz544uhbo8.cloudfront.net https://cdn-videos-1.medium.com https://cdn-audio-1.medium.com https://*.lightstep.com https://*.branch.io https://app.zencoder.com 'self'; font-src data: https://*.amazonaws.com https://*.medium.com https://glyph.medium.com https://medium.com https://*.gstatic.com https://dnqgz544uhbo8.cloudfront.net https://use.typekit.net https://cdn-static-1.medium.com 'self'; frame-src chromenull: https: webviewprogressproxy: medium: 'self'; img-src blob: data: https: 'self'; media-src https://*.cdn.vine.co https://d1fcbxp97j4nb2.cloudfront.net https://d262ilb51hltx0.cloudfront.net https://*.medium.com https://gomiro.medium.com https://miro.medium.com https://pbs.twimg.com 'self' blob:; object-src 'self'; script-src 'unsafe-eval' 'unsafe-inline' about: https: 'self'; style-src 'unsafe-inline' data: https: 'self'; report-uri https://csp.medium.com < x-frame-options: sameorigin < x-content-type-options: nosniff < x-xss-protection: 1; mode=block < x-ua-compatible: IE=edge, Chrome=1 < x-powered-by: Medium < x-obvious-tid: 1552468985229:d47a5d7da221 < x-obvious-info: 36855-3d9334e,3d9334ed6db < link: <https://medium.com/humans.txt>; rel="humans" < cache-control: no-cache, no-store, max-age=0, must-revalidate < expires: Thu, 09 Sep 1999 09:09:09 GMT < pragma: no-cache < set-cookie: sid=1:1Yj8mG1saeQMx1r5h/kFLMw3J77PPMa784rb2HRk3z9J8bnuZYy18oGwoDGakmHV; path=/; expires=Thu, 12 Mar 2020 09:23:05 GMT; domain=.medium.com; secure; httponly < tk: T < location: /suspended < strict-transport-security: max-age=15552000; includeSubDomains; preload < expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct" < server: cloudflare < cf-ray: 4b6cf274ef5132fb-HKG < * Connection #0 to host localhost left intact 看这里: location: /suspended |
6
my8100 OP @tonywangcn 我只能说,这很不"科学"。建议访问内网中文版本 https://juejin.im/post/5bebc5fd6fb9a04a053f3a0e
|
7
my8100 OP @tonywangcn 有空得好好拜读兄台的大作啊 https://medium.com/@tonywangcn
|
8
tonywangcn 2019-03-13 20:10:28 +08:00
@my8100 哈哈哈哈 和你的相比,差得太远。最近在计划把 scrapy 集成到 k8s 中,正需要这样一个控制面板,方便的话可以 wx 学习下 NTMyNDcyODQx
|
9
my8100 OP @tonywangcn 今天发现退出 medium 账号后搜索不到自己了,不明原因地被 suspend 了。索性把文章转移到 https://github.com/my8100/files/blob/master/scrapydweb/README.md
|