Rails foreign keys and associations: when to denormalize?

Rafa_F · August 17, 2015, 7:19am

I’ve been wrestling with database normalization and the “Rails Way” and
I
wanted to get peoples opinions as to if there’s a best practices guide
to
how to do it. Here’s some background:

Let’s assume we have a model/schema similar to the following:

class Student < ActiveRecord::Base
has_many :student_exams
has_many :exams, through: :student exams
end

class StudentExam < ActiveRecord::Base
belongs_to :student
belongs_to :exam
end

class Exam < ActiveRecord::Base
has_many :questions
belongs_to :professor
has_one :college, through: :professor
end

class Question < ActiveRecord::Base
belongs_to :exam
has_many :responses

question_text: text

end

class Response < ActiveRecord::Base
belongs_to :question
belongs_to :student

response_text: text

end

class Evaluation
belongs_to :response
belongs_to :professor

grade: integer

end

The idea is that a Professor creates an Exam and* Questions*, and
then
assigns students through StudentExam to take that exam. The students
then
generate Responses to the exam questions, and then those Responses
are
graded by Evaluations done by the Professor.
Now, given the above background, I wanted to ask a few questions to get
peoples opinions on how to structure this app:

Questions

In general, are there any changes you guys would make with the
given
associations?
For the Response model, does it make sense to do “belongs_to
:student”, or should it instead be “belongs_to :student_exam”? It
feels
more logical to have the former, because we can get the exam through
the
question if need be (response.question.exam), and its really a
student
responding to it, not a student_exam. But if we used student_exam, we
could
use response.student_exam.exam to get back to the exam, and
response.student_exam.student to get back to the student.
If we were on an Evaluation and we wanted to back to a exam, it
would seem quite cumbersome to do evaluation.response.question.exam
just
to get back to the exam.
- Now we could always create a method on the model that just
  shortened it for us, but is there a point when you are chaining
  associations that somewhere along the line you add in another
  foreign_key
  to a table for easy lookup?
If we wanted to see all Evaluations for an Exam, what would be
the best way to do that?
- Only way I could see to do it easily would be to break it up,
  something like:
  - @questions = @exam.questions
  - @responses = Response.where(question_id:
    @questions.pluck(:id)
  - @evaluations = Evaluation.where(response_id:
    @responses.pluck(:id)
- This seems kind of painful

Let me know your thoughts – or if you have any great books on the best
strategies for normalizing/denormalizing within or without the scope of
a
rails app, I’d love to hear them.

zenlinuxnh · August 17, 2015, 2:48pm

Before I respond, I think that it would be helpful if you provided your
database entity–relationship model. Also, I don’t see that you have a
Classes, the school variety, table.
Liz

zenlinuxnh · August 17, 2015, 4:54pm

Hey Liz,

Thanks for responding. Here’s some initial responses:

While I understand the need in an actual application for things
like
a Classroom model (owned by a Professor), and then something like
a
StudentClassroom model (setting up Students to a Classroom) to
better help organize the data, what I’m looking for really is how
associations fit together when deeply nested outside of the context
of the
actual application. I set up this example simply as a way of
exploring some
of the problems it poses.
- Ditto for a School model – we can assume that a *School
  *has_many
  :professors, etc.
The idea is that as we get deeper into our models, it becomes
harder
and harder to get back to the beginning.
- In my example, From an Evaluation, we would need 5 hops to get
  back a School model for example (imagine
  evaluation.response.question.exam.professor.school)
- Similarly, from the reverse side, it would be hard to query for
  all
  Evaluations for a School:
  - @school = School.first
  - @all_professors = @school.professors
  - @all_exams = Exam.where(professor_id:
    @all_professors.pluck(:id))
  - @all_exam_questions = Question.where(exam_id:
    @all_exams.pluck(:id))
  - @all_exam_responses = Response.where(question_id:
    @all_exam_questions.pluck(:id))
  - @all_exam_evaluations = Evaluation.where(response_id:
    @all_exam_responses.pluck(:id))
- While the above works, it took 6 lines to get there! Not to
  mention
  the expense of doing a crazy amount of lookups
So instead, what’s the solution?
- We could write an “all_evaluations” method in what would be our
  School model using the above code
  - Solves the multiple lines problem, but not the query
    complexity
    problem
- Perhaps using delegate? Little more elegant than perhaps a
  method,
  but again the complexity problem
- Adding an extra foreign key to a model somewhere and
  denormalizing
  the database a bit
  - Eg for Evaluation, maybe add exam_id to the class
    - To get back to a School now, it’s 2 less hops (
      evaluation.exam.professor.school)
    - To get all Evaluations for a School, we can now do the
      following, which is two less lines of code (and less joins):
      - @school = School.first
      - @all_professors = @school.professors
      - @all_exams = Exam.where(professor_id:
        @all_professors.pluck(:id))
      - @all_exam_evaluations = Evaluation.where(exam_id:
        @all_exams.pluck(:id))
    - This feels arbitrary though
    - if we are going to do this, why not just add a school_id
      to
      an evaluation?
    - Or a bunch of other foreign keys to all the tables to make
      all hops 1-2 at most

In the end, I’m mostly trying to understand which strategy to use under
which scenarios – I see a lot of blog posts talking about the law of
demeter and simplifying your associations (usually using delegate), but
most of them have pretty simple connections like
business.location.city_name, nothing as complex as chaining through 4-5
associations to get back to the original object. It also feels dangerous
to
chain that much – if at any point in the
evaluation.response.question.exam.professor.school the associated object
ends up being deleted somehow, then you break your ability to get back
to
the original object.

Hope that helps explain my question more.

zenlinuxnh · August 17, 2015, 5:56pm

On Mon, Aug 17, 2015 at 1:18 AM, Scott G. [email protected] wrote:

I’ve been wrestling with database normalization and the “Rails Way” and I
wanted to get peoples opinions as to if there’s a best practices guide to
how to do it.

IMHO best practice is to keep it properly normalized until you’ve
identified some need to denormalize. Usually this is due to
performance… and identified through benchmarking with a profiler, as
any human’s guess as to why an app is slow is most likely wrong. If
you do have a performance problem, and you think denormalization would
help, consider other approaches first, or possibly whether your app
doesn’t need it to be relational in the first place and might benefit
from a non-relational approach. (That’s not quite the same thing as
“NoSQL”.)

class StudentExam < ActiveRecord::Base

This name tells us nothing. I’d recommend something like
“ExamTaking”. (Might also have this include the date or some such
differentiator, in case the student takes the same exam multiple
times.)

@questions = @exam.questions
@responses = Response.where(question_id: @questions.pluck(:id)
@evaluations = Evaluation.where(response_id: @responses.pluck(:id)

This seems kind of painful

Right. I think you need to look at the “includes” method.

–
Dave A., consulting software developer of Codosaur.us,
PullRequestRoulette.com, Blog.Codosaur.us, and Dare2XL.com.

zenlinuxnh · August 19, 2015, 5:59pm

Was my offering helpful?

zenlinuxnh · August 18, 2015, 1:32am

Oh!

You could just have a ‘Participant’ table, containing students,
professors, teachers, evaluators as distinguished by a role, is_active,
etc…
You could just have a ‘Evaluation_Detail’ table, containing at a row
level student_id, class_id, teacher_id, professor_id, evaluator_id for
each
question, its response, then its evaluation as a comment and/or look-up
value from a grading table, etc…

Queries would obviously be more ‘where’ intensive, but your associations
wouldn’t be as deep.

Liz