Django annotation, 减少 IO 次数利器

annotation 的中文含义是 "注解". 正如这名字所暗示的, 传递给 annotate 函数的每个参数, 都会以 "注解" 的形式添加到 model queryset 返回的每一个 object 里面.

和 annotate 经常在一起使用的是 aggregation 函数.

举个栗子

Blog Model 有一个外键 entry 指向 Entry model. 我们想计算每个 blog 有多少个 entry:

>>> from django.db.models import Count
>>> q = Blog.objects.annotate(Count('entry'))
# The name of the first blog
>>> q[0].name
'Blogasaurus'
# The number of entries on the first blog
>>> q[0].entry__count
42

我们一起 break down 上面这部分代码:

q = Blog.objects.annotate(Count('entry'))

这里使用了 Count 这个 aggregation 函数, 作用是对一个指定的 Blog object, 计算它对应的 Entry object 有多少个. Blog.objects.annotate(Count('entry')) 就是对每个 Blog object, 计算一下与之对应 entry 有几个. 返回值是一个 queryset. 与

Blog.objects.all()

的区别在于, Blog.objects.annotate(Count('entry')) 中的每一项, 都多了一个 entry__count 字段, 这就是我们想要的那个数据.

q[0].name
q[0].entry__count

q 是一个 queryset,q[0] 就是获取第一个 object, 他里面多了一个 entry__count 字段.

举个反栗子

如果你不知道 annotate 这个东西, 你肯定会想到一种 "pythonic" 的方法:

q = Blog.objects.all()
for blog in q:
    entry__count = blog.entry.count()
    print(blog.name)
    print(entry__count)

这种方法更容易理解, 但是会杀死你的性能. 假如你有 10W 条 blog,q = Blog.objects.all() 这里进行了一次查询, for 循环那里, 对每一个 blog 都要进行一次查询, 所以总查询次数是 10W+1 次, 也就是那么多次 IO. 而前面那种方法, 总查询次数只有一次, IO 只有一次, 计算 entry 的个数是在数据库内容进行的, 效率当然要高很多!

数据库查询有一个黄金原则: 尽可能减少 IO 次数. 而 Python 的 for 循环天然就会增加 IO 次数, 所以, 请拥抱 annotation 吧.

来源: https://juejin.im/post/5c58f10b51882562276c1be3

与本文相关文章

暂无,快来抢沙发吧！