这两天尝试写个 Go 爬虫爬北邮人论坛,期望能登录后保存 cookie,后续的访问都带着这个 cookie。查看资料推荐用net/http/cookiejar
。
目前能登录成功,获取成功登录 Json。但是发现并未获取登录后的 cookie,导致后续直接 Get 帖子正文报错**“您未登录,请登录后继续操作”**
请教各位大大,这种情况哪里出错了?
package main
import (
"net/http/cookiejar"
"net/url"
"strings"
"fmt"
"net/http"
"crypto/tls"
"io/ioutil"
)
func main() {
// init cookiejar
var cookieJar *cookiejar.Jar
cookieJar, _ = cookiejar.New(nil)
// init client with cookiejar
httpClient := &http.Client{
Jar: cookieJar,
}
// login param
postValues := url.Values{}
postValues.Set("id", "ID")
postValues.Set("passwd", "PWD")
postValues.Set("s-mode", "0")
postValues.Set("CookieDate", "3")
// request for login
httpReq, _ := http.NewRequest("POST", "https://bbs.byr.cn/user/ajax_login.json", strings.NewReader(postValues.Encode()))
httpReq.Header.Set("Content-Type", "application/x-www-form-urlencoded; param=value")
httpReq.Header.Add("X-Requested-With", "XMLHttpRequest")
httpReq.Header.Add("Connection", "keep-alive")
httpReq.Header.Add("User-Agent", "Mozilla/5.0")
httpReq.Header.Add("Referer", "https://bbs.byr.cn")
httpReq.Header.Add("Accept", "application/json, text/javascript, */*; q=0.01")
httpReq.Header.Add("authority", "bbs.byr.cn")
// for nginx/1.10
httpClient.Transport = &http.Transport{
TLSNextProto: make(map[string]func(authority string, c *tls.Conn) http.RoundTripper),
}
// login
httpResp, _ := httpClient.Do( httpReq)
fmt.Printf("req cookies: %s \n", httpReq.Cookies())
fmt.Printf("resp cookies: %s \n", httpResp.Cookies())
// request to get article content
httpReq1, _ := http.NewRequest("GET", "https://bbs.byr.cn/article/Golang/842", nil)
httpReq1.Header.Add("X-Requested-With", "XMLHttpRequest")
httpResp1, _ := httpClient.Do( httpReq1)
body, _ := ioutil.ReadAll( httpResp1.Body)
fmt.Println(string(body))
}
输出(可见 cookie 为空):
req cookies: []
resp cookies: []
(...省略...)
<h5>产生错误的可能原因:</h5><ul><li><samp class="ico-pos-dot"></samp>您未登录,请登录后继续操作</li>
(...省略...)
困扰多时,求各位指点
1
lzhr 2017-08-08 20:37:30 +08:00
|
2
pqee 2017-08-08 20:52:43 +08:00 via Android
一楼正解
|
3
golmic 2017-08-08 20:56:47 +08:00
并不会 go 语言,如果是 Python 的我可以帮你看看
|
4
jarlyyn 2017-08-08 23:59:35 +08:00 via Android
这种一般而言先 curl 跑一圈。
这登录方式看上去不靠谱啊 |
5
ovear 2017-08-09 00:06:30 +08:00
你得看看你的 resp 返回了啥啊
|
6
mantianyu 2017-08-09 00:28:33 +08:00
go 的语法看着就蛋疼...
|