I’ve to design a system in which different datasets are added
periodically
to the system and user should be able to perform several predefined set
of
operations on the data. Datasets are large enough so it makes sense to
keep
them in separate tables/databases for performance reasons.
The problem I’m facing is that how does it fit into rails ActiveRecord
architecture. Do I generate new Models for newly generated tables every
time a new dataset is added? Suppose even if i’m able to somehow do
that, I
still need to provide a generic interface for all datasets. I’m not sure
what the architecture should look like while following the rails
conventions? Also it is preferable to add new datasets without having to
worry about changing the code.
I’ve to design a system in which different datasets are added periodically to
the system and user should be able to perform several predefined set of operations
on the data. Datasets are large enough so it makes sense to keep them in separate
tables/databases for performance reasons.
The problem I’m facing is that how does it fit into rails ActiveRecord
architecture. Do I generate new Models for newly generated tables every time a new
dataset is added? Suppose even if i’m able to somehow do that, I still need to
provide a generic interface for all datasets. I’m not sure what the architecture
should look like while following the rails conventions? Also it is preferable to
add new datasets without having to worry about changing the code.
Any kind of help will be appreciated
I would take a serious look at views or materialized views, if these are
reporting data rather than something your users will be adding rows to.
That would let you abstract away the implementation details of the
database layer from your application. You have a single stable model,
and its “table” is a SQL view of one or more actual tables in the DB. We
use this all the time for read-only data.
It’s not a performance issue. the problem is that i don’t have one
database, but multiple databases added at run-time. Each database is
independent and I don’t know what they look like however, i need to
perform
similar operation on each of them like taking averages, min, max, count,
selections and other read operations. I want to be able to do this
without
making any changes to code when a new database is added. I’m thinking of
going with a schema-less JSON based approach.
Ok, i’ll explain this with a hypothetical scenario. Suppose the
requirement
is that we need to create a portal to provide stats of different
products
available in the market. And suppose you get data periodically from
different places in spreadsheets.Each spreadsheet contains data for one
kind of product, e.g cellphones, television etc and each kind of product
has different fields. There can be hundreds of thousands of records in
one
spreadsheet so it must structured for efficient retrieval, however data
is
read-only, there’ll be no write operations.On each data set we need to
be
able to perform generic, predefined read operations. e.g. selection of
subsets fulfilling certain criteria, sorting, grouping, aggregates. For
example, I should be able to write a generic functions, which gets table
name, field name and returns count of unique values in that field. this
can
be used to count televisions with different resolutions and also to
count
books by different authors. here resolution and author is unique field
of
respective datasets.
The problem I’m facing is that I’ve to create a different tables and
respective models every time a new product type is added. Even though
the
models are essentially performing the same operations.
It’s not a performance issue. the problem is that i don’t have one database, but
multiple databases added at run-time. Each database is independent and I don’t
know what they look like however, i need to perform similar operation on each of
them like taking averages, min, max, count, selections and other read operations.
I want to be able to do this without making any changes to code when a new
database is added. I’m thinking of going with a schema-less JSON based approach.
How does that work? How are the databases added while the app is
running? What does the adding? There’s nothing in Rails itself that is
going to help you do this (as far as I know). If the databases are being
added by another process while the Rails app is running (remember –
production Rails apps only scan the database for structure once, at
startup) then you’re going to need that other process to kick off some
notification to your Rails app so that it can be forced to reload and
get the new structure. Otherwise, what it sounds like you need is some
abstraction layer above the database layer so that you can talk to that
through a single API, rather than relying on ActiveRecord for this.
There’s nothing magical about JSON or NoSQL storage engines for this,
either. If it quacks like ActiveRecord, then it’s designed to export its
structure during app startup, and stick to that structure while it is
running.
I would just make an importer to put all the data in a normalized set of
fields in a single table. Then you can have one set of fields and one
model to wrap around them. Parsing Excel is pretty easy these days,
because it’s just XML.
Parsing isn’t the issue. The problem is that I don’t want to rewrite
code
every time a new dataset is added. I want to provide an abstract
interface
as well under which I can plug any data-set.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.