rails db 查询优化_如何优化查询以解决Rails中常见的可伸缩性瓶颈

2023-09-06 阅读 20 评论 0

摘要：rails db 查询优化by Usama Ashraf 通过Usama Ashraf 如何优化查询以解决Rails中常见的可伸缩性瓶颈 (How to optimize your queries to solve common scalability bottlenecks in Rails) N + 1问题的(完美)解决方案 (The (perfect) solution for the N+1 problem) Th

rails db 查询优化

by Usama Ashraf

通过Usama Ashraf

如何优化查询以解决Rails中常见的可伸缩性瓶颈 (How to optimize your queries to solve common scalability bottlenecks in Rails)

N + 1问题的(完美)解决方案 (The (perfect) solution for the N+1 problem)

The n+1 query problem is one of the most common scalability bottlenecks. It involves fetching a list of resources from a database that includes other associated resources within them. This means that we might have to query for the associated resources separately. So if you have a list of n parent objects, another n queries will have to be executed for fetching the associated resources. Let’s try to get rid of this O(n) conundrum.

n + 1查询问题是最常见的可伸缩性瓶颈之一。它涉及从数据库中获取资源列表，其中包括其中的其他关联资源。这意味着我们可能必须单独查询关联的资源。因此，如果您具有n个父对象的列表， 则必须执行另外n个查询以获取关联的资源 。让我们尝试摆脱这个O(n)难题。

If you’re comfortable with Rails, Active Model Serializers, and already have a good idea about what our problem is going to be, then maybe you can jump straight into the code here.

如果您对Rails， Active Model Serializers感到满意，并且对我们要解决的问题已经有了很好的了解，那么也许可以直接进入此处的代码。

一个具体的例子 (A Concrete Example)

Say you’re fetching an array of Post objects at a GET endpoint. You also want to load the respective authors of the posts, embedding an author object within each of the post objects. Here’s a naive way of doing it:

假设您要在GET端点上获取Post对象数组。您还希望加载帖子的各个作者，将作者对象嵌入每个帖子对象中。这是一种幼稚的方法：

class PostsController < ApplicationController    def index        posts = Post.all              render json: posts    endend

class Post  belongs_to :author, class_name: 'User'end

class PostSerializer < ActiveModel::Serializer    attributes :id, :title, :details

belongs_to :author end

For each of the n Post objects being rendered, a query will run to fetch the corresponding User object. Hence we’ll run a total of n+1 queries. This is disastrous. And here’s how you fix it by eager loading the User object:

对于要渲染的n个Post对象中的每一个，将运行查询以获取相应的User对象。因此，我们将总共运行n + 1个查询。这是灾难性的。以下是急于加载User对象的方法：

class PostsController < ApplicationController    def index        # Runs a SQL join with the users table.    posts = Post.includes(:author).all              render json: posts    endend

当不可能进行简单连接时 (When A Simple Join Is Not Possible)

Until now there’s been absolutely nothing new for veterans.

到目前为止，对于退伍军人来说绝对没有什么新鲜事。

But let’s complicate this. Let’s assume that the site’s users are not being stored in the same RDMS as the posts are. Rather, the users are documents stored in MongoDB (for whatever reason). How do we modify our Post serializer to fetch the user now, optimally? This would be going back to square one:

但是让我们复杂化。 假设该站点的用户未与帖子存储在同一RDMS中。 而是，用户是存储在MongoDB中的文档(无论出于何种原因)。 我们如何修改Post序列化程序以最佳方式立即获取用户？这将回到平方第一：

class PostSerializer < ActiveModel::Serializer    attributes :id, :title, :details, :author

# Will run n Mongo queries for n posts being rendered.  def author    User.find(object.author_id)  endend

# This is now a Mongoid document, not an ActiveRecord model.class User    include Mongoid::Document    include Mongoid::Timestamps    # ...end

The predicament that our users now reside in a Mongo database can be substituted with, say, calling a 3rd party HTTP service for fetching the users or storing them in a completely different RDMS. Our essential problem remains that there’s no way to ‘join’ the users datastore with the posts table and get the response we want in a single query.

我们的用户现在驻留在Mongo数据库中的困境可以用例如调用第三方HTTP服务来获取用户或将他们存储在完全不同的RDMS中来代替。 我们的基本问题仍然是，无法将用户数据存储与posts表“联接”并在单个查询中获得我们想要的响应。

Of course, we can do better. We can fetch the entire response in two queries:

当然，我们可以做得更好。我们可以通过两个查询来获取整个响应：

Fetch all the posts without the author attribute (1 SQL query).
提取所有没有author属性的帖子(1个SQL查询)。
Fetch all the corresponding authors by running a where-in query with the user IDs plucked from the array of posts (1 Mongo query with an IN clause).
通过使用从帖子数组中抽出的用户ID的where-in查询来获取所有相应的作者(1个带有IN子句的Mongo查询)。

posts      = Post.allauthor_ids = posts.pluck(:author_id)authors    = User.where(:_id.in => author_ids)

# Somehow pass the author objects to the post serializer and# map them to the correct post objects. Can't imagine what # exactly that would look like, but probably not pretty.render json: posts, pass_some_parameter_maybe: authors

输入批处理加载程序 (Enter Batch Loader)

So our original optimization problem has been reduced to “how do we make this code readable and maintainable”. The folks at Universe have come up with an absolute gem (too obvious?). Batch Loader has been incredibly helpful to me recently.

因此，我们最初的优化问题已简化为“我们如何使此代码可读和可维护”。宇宙的人们提出了一个绝对的宝石(太明显了？)。批处理加载器最近对我非常有用。

gem 'batch-loader'

bundle install

class PostSerializer < ActiveModel::Serializer    attributes :id, :title, :details, :author

def author    object.get_author_lazily  endend

class Post  def get_author_lazily    # The current post object is added to the batch here,    # which is eventually processed when the block executes.       BatchLoader.for(self).batch do |posts, batch_loader|

author_ids = posts.pluck(:author_id)        User.where(:_id.in => author_ids).each do |user|        post = posts.detect { |p| p.author_id == user._id.to_s }        #'Assign' the user object to the right post.        batch_loader.call(post, user)            end        end    endend

If you’re familiar with JavaScript Promises, think of the get_author_lazily method as returning a Promise which is evaluated later. That’s a decent analogy, I think since BatchLoader uses lazy Ruby objects. By default, BatchLoader caches the loaded values, and so to keep the responses up-to-date you should add this to your config/application.rb:

如果您熟悉JavaScript Promises，则可以将get_author_lazily方法视为返回一个Promise，然后对其进行评估。我认为这是一个不错的类比，因为BatchLoader使用了惰性Ruby对象。默认情况下， BatchLoader缓存加载的值，因此BatchLoader响应保持最新，应将其添加到config/application.rb ：

config.middleware.use BatchLoader::Middleware

That’s it! We’ve solved an advanced version of the n+1 queries problem while keeping our code clean and using Active Model Serializers the right way.

而已！我们已经解决了n + 1查询问题的高级版本，同时保持代码整洁并以正确的方式使用Active Model Serializer。

使用AMS嵌套资源 (Using AMS for Nested Resources)

One problem though. If you have a User serializer (Active Model Serializers work with Mongoid as well), that won’t be called for the lazily loaded author objects, unlike before. To fix this, we can use a Ruby block and serialize the author objects before they’re ‘assigned’ to the posts.

不过有一个问题。如果您有一个用户序列化程序(Active Model序列化程序也可以与Mongoid一起使用)，则延迟加载的作者对象将不会被调用，这与以前不同。为了解决这个问题，我们可以使用Ruby块并在作者对象“分配”到帖子之前先对其进行序列化。

class PostSerializer < ActiveModel::Serializer    attributes :id, :title, :details, :author

def author    object.get_author_lazily do |author|      # Serialize the author after it has been loaded.           ActiveModelSerializers::SerializableResource                             .new(author)                             .as_json[:user]    end  endend

class Post  def get_author_lazily    # The current post object is added to the batch here,    # which is eventually processed when the block executes.       BatchLoader.for(self).batch do |posts, batch_loader|

author_ids = posts.pluck(:author_id)      User.where(:_id.in => author_ids).each do |user|        modified_user = block_given? ? yield(user) : user        post = posts.detect { |p| p.author_id == user._id.to_s }          # 'Assign' the user object to the right post.        batch_loader.call(post, modified_user)            end        end    endend

Here’s the entire code. Enjoy!

这是完整的代码。请享用！