最近的一个兴趣项目——Blogbar：聚合个人博客（ Alpla 版）

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

爱意满满的作品展示区。

这是一个创建于 3634 天前的主题，其中的信息可能已经有所发展或是发生改变。

http://www.blogbar.cc

平常挺喜欢写博客，所以这一个多月抽时间做了个博客相关的兴趣项目 Blogbar，用一句话介绍这个产品就是：聚合个人博客。

之前已经在一个帖子中发布了（ http://www.v2ex.com/t/147969 ），不过那个时候还是刚刚 Alpha 初期。最近把界面简单地整了一下，修复了一些bug，现在这个版本算是 Alpha 正式版了，基本可用。

使用简介

推荐博客：你可以向 Blogbar 推荐个人博客（不接受非个人博客、或个人运营的商业博客）
博客广场：给博客打了一些标签，你可以在博客广场浏览各种博客
编辑推荐：每天都会在当天更新的文章中推荐一些到首页（RSS订阅源： http://www.blogbar.cc/feed/posts.xml ）
Wiki：在 http://www.blogbar.cc/wiki 总结了下写博客的各种方式，如果打算写博客，可以看看

如果有一篇个人博客挺有价值的，但是并不提供 RSS/Atom 订阅，可以写一个 Spider 放到 https://github.com/blogbar/blogbar/tree/master/spiders 下面，在 Spider 中继承 BaseSpider，重写一些属性 & 方法就 OK。比如我想爬王垠的博客（http://www.yinwang.org，目前已 offline…）：

# coding: utf-8
import datetime
from .base import BaseSpider, get_inner_html, remove_element


class WangYinSpider(BaseSpider):
    url = 'http://www.yinwang.org'
    title = '当然我在扯淡'
    author = '王垠'

    @staticmethod
    def get_posts(tree):
        posts = []
        for item in tree.cssselect('.list-group-item a'):
            title = item.text_content()
            url = item.get('href')
            # 获取日期
            date_list = filter(None, url.split('/'))
            day = int(date_list[-2])
            month = int(date_list[-3])
            year = int(date_list[-4])
            published_at = datetime.datetime(year=year, month=month, day=day)
            posts.append({
                'title': title,
                'url': url,
                'published_at': published_at
            })
        return posts

    @staticmethod
    def get_post(tree):
        content_element = tree.cssselect('body')[0]
        remove_element(content_element.cssselect('h2')[0])  # 去除h2标题
        remove_element(content_element.cssselect('p')[0])  # 去除第一个段落
        return get_inner_html(content_element)