Rails foreign keys and associations: when to denormalize?

I’ve been wrestling with database normalization and the “Rails Way” and
I
wanted to get peoples opinions as to if there’s a best practices guide
to
how to do it. Here’s some background:

Let’s assume we have a model/schema similar to the following:

class Student < ActiveRecord::Base
has_many :student_exams
has_many :exams, through: :student exams
end

class StudentExam < ActiveRecord::Base
belongs_to :student
belongs_to :exam
end

class Exam < ActiveRecord::Base
has_many :questions
belongs_to :professor
has_one :college, through: :professor
end

class Question < ActiveRecord::Base
belongs_to :exam
has_many :responses

question_text: text

end

class Response < ActiveRecord::Base
belongs_to :question
belongs_to :student

response_text: text

end

class Evaluation
belongs_to :response
belongs_to :professor

grade: integer

end

The idea is that a Professor creates an Exam and* Questions*, and
then
assigns students through StudentExam to take that exam. The students
then
generate Responses to the exam questions, and then those Responses
are
graded by Evaluations done by the Professor.
Now, given the above background, I wanted to ask a few questions to get
peoples opinions on how to structure this app:

Questions

  1. In general, are there any changes you guys would make with the
    given
    associations?
  2. For the Response model, does it make sense to do “belongs_to
    :student”, or should it instead be “belongs_to :student_exam”? It
    feels
    more logical to have the former, because we can get the exam through
    the
    question if need be (response.question.exam), and its really a
    student
    responding to it, not a student_exam. But if we used student_exam, we
    could
    use response.student_exam.exam to get back to the exam, and
    response.student_exam.student to get back to the student.
  3. If we were on an Evaluation and we wanted to back to a exam, it
    would seem quite cumbersome to do evaluation.response.question.exam
    just
    to get back to the exam.
    • Now we could always create a method on the model that just
      shortened it for us, but is there a point when you are chaining
      associations that somewhere along the line you add in another
      foreign_key
      to a table for easy lookup
      ?
  4. If we wanted to see all Evaluations for an Exam, what would be
    the best way to do that?
    • Only way I could see to do it easily would be to break it up,
      something like:
      • @questions = @exam.questions
      • @responses = Response.where(question_id:
        @questions.pluck(:id)
      • @evaluations = Evaluation.where(response_id:
        @responses.pluck(:id)
    • This seems kind of painful

Let me know your thoughts – or if you have any great books on the best
strategies for normalizing/denormalizing within or without the scope of
a
rails app, I’d love to hear them.

Before I respond, I think that it would be helpful if you provided your
database entity–relationship model. Also, I don’t see that you have a
Classes, the school variety, table.
Liz

Hey Liz,

Thanks for responding. Here’s some initial responses:

  1. While I understand the need in an actual application for things
    like
    a Classroom model (owned by a Professor), and then something like
    a
    StudentClassroom model (setting up Students to a Classroom) to
    better help organize the data, what I’m looking for really is how
    associations fit together when deeply nested outside of the context
    of the
    actual application. I set up this example simply as a way of
    exploring some
    of the problems it poses.
    • Ditto for a School model – we can assume that a *School
      *has_many
      :professors, etc.
  2. The idea is that as we get deeper into our models, it becomes
    harder
    and harder to get back to the beginning.
    • In my example, From an Evaluation, we would need 5 hops to get
      back a School model for example (imagine
      evaluation.response.question.exam.professor.school)
    • Similarly, from the reverse side, it would be hard to query for
      all
      Evaluations for a School:
      • @school = School.first
      • @all_professors = @school.professors
      • @all_exams = Exam.where(professor_id:
        @all_professors.pluck(:id))
      • @all_exam_questions = Question.where(exam_id:
        @all_exams.pluck(:id))
      • @all_exam_responses = Response.where(question_id:
        @all_exam_questions.pluck(:id))
      • @all_exam_evaluations = Evaluation.where(response_id:
        @all_exam_responses.pluck(:id))
    • While the above works, it took 6 lines to get there! Not to
      mention
      the expense of doing a crazy amount of lookups
  3. So instead, what’s the solution?
    • We could write an “all_evaluations” method in what would be our
      School model using the above code
      • Solves the multiple lines problem, but not the query
        complexity
        problem
    • Perhaps using delegate? Little more elegant than perhaps a
      method,
      but again the complexity problem
    • Adding an extra foreign key to a model somewhere and
      denormalizing
      the database a bit
      • Eg for Evaluation, maybe add exam_id to the class
        • To get back to a School now, it’s 2 less hops (
          evaluation.exam.professor.school)
        • To get all Evaluations for a School, we can now do the
          following, which is two less lines of code (and less joins):
          • @school = School.first
          • @all_professors = @school.professors
          • @all_exams = Exam.where(professor_id:
            @all_professors.pluck(:id))
          • @all_exam_evaluations = Evaluation.where(exam_id:
            @all_exams.pluck(:id))
        • This feels arbitrary though
        • if we are going to do this, why not just add a school_id
          to
          an evaluation?
        • Or a bunch of other foreign keys to all the tables to make
          all hops 1-2 at most

In the end, I’m mostly trying to understand which strategy to use under
which scenarios – I see a lot of blog posts talking about the law of
demeter and simplifying your associations (usually using delegate), but
most of them have pretty simple connections like
business.location.city_name, nothing as complex as chaining through 4-5
associations to get back to the original object. It also feels dangerous
to
chain that much – if at any point in the
evaluation.response.question.exam.professor.school the associated object
ends up being deleted somehow, then you break your ability to get back
to
the original object.

Hope that helps explain my question more.

On Mon, Aug 17, 2015 at 1:18 AM, Scott G. [email protected] wrote:

I’ve been wrestling with database normalization and the “Rails Way” and I
wanted to get peoples opinions as to if there’s a best practices guide to
how to do it.

IMHO best practice is to keep it properly normalized until you’ve
identified some need to denormalize. Usually this is due to
performance… and identified through benchmarking with a profiler, as
any human’s guess as to why an app is slow is most likely wrong. If
you do have a performance problem, and you think denormalization would
help, consider other approaches first, or possibly whether your app
doesn’t need it to be relational in the first place and might benefit
from a non-relational approach. (That’s not quite the same thing as
“NoSQL”.)

class StudentExam < ActiveRecord::Base

This name tells us nothing. I’d recommend something like
“ExamTaking”. (Might also have this include the date or some such
differentiator, in case the student takes the same exam multiple
times.)

@questions = @exam.questions
@responses = Response.where(question_id: @questions.pluck(:id)
@evaluations = Evaluation.where(response_id: @responses.pluck(:id)

This seems kind of painful

Right. I think you need to look at the “includes” method.


Dave A., consulting software developer of Codosaur.us,
PullRequestRoulette.com, Blog.Codosaur.us, and Dare2XL.com.

Was my offering helpful?

Oh!

  1. You could just have a ‘Participant’ table, containing students,
    professors, teachers, evaluators as distinguished by a role, is_active,
    etc…
  2. You could just have a ‘Evaluation_Detail’ table, containing at a row
    level student_id, class_id, teacher_id, professor_id, evaluator_id for
    each
    question, its response, then its evaluation as a comment and/or look-up
    value from a grading table, etc…

Queries would obviously be more ‘where’ intensive, but your associations
wouldn’t be as deep.

Liz