Object/Relational Mapping is the Vietnam of Computer Science

On 3/21/07, Chad P. [email protected] wrote:

On Thu, Mar 22, 2007 at 02:27:27AM +0900, David M. wrote:

Austin Z. wrote:

Data is king. Applications are pawns.
You know, that is a profound statement.
It’s not a new idea to me, but it’s an excellent formulation of the
idea.

Austin – is that original? Should we call it Ziegler’s Law?

As you said, the idea isn’t original, but as far as I know, the
formulation is unique to me and this particular conversation. I’m
somewhat flattered to have the suggestion, but something like “the
Rule of Data” is probably better :wink:

-austin

On Thu, Mar 22, 2007 at 04:43:23AM +0900, Austin Z. wrote:

As you said, the idea isn’t original, but as far as I know, the
formulation is unique to me and this particular conversation. I’m
somewhat flattered to have the suggestion, but something like “the
Rule of Data” is probably better :wink:

I’m inclined, then, to go with Ziegler’s Rule of Data, or something
along those lines. It’s a good’un.

On Thu, Mar 22, 2007 at 04:46:51AM +0900, Austin Z. wrote:

That’s academic nitpicking.
If you want a “real programming language” version of SQL, just use
PL/SQL with Oracle. Ew.

Which is a better language than most people think. What’s interesting
is that it isn’t a version of SQL, but a version of Ada (or Modula 2?)
with SQL cursors as a native data type and built-in recognition of
existing database data types and SQL statements. It’s closer to Pro*C
(C/C++ with embedded SQL) than a programming language version of SQL.

It’s still saddled with the limitations of SQL.

That’s really the major problem I have with it – the limitations of
SQL, thanks to including SQL.

Another way of looking at it is that it’s just SQL with Ada-inspired
sugar. While I haven’t until now run across the description of it from
the other direction (that it’s Ada with embedded SQL), I still think
that calling it SQL with Ada-inspired sugar better encompasses my
distaste for it.

On Mar 21, 1:43 pm, “Austin Z.” [email protected] wrote:

As you said, the idea isn’t original, but as far as I know, the
formulation is unique to me and this particular conversation. I’m
somewhat flattered to have the suggestion, but something like “the
Rule of Data” is probably better :wink:

Given that “Data is King”, I particularly like the (non-sexual) double-
entendre of “Rule of Data”. :slight_smile:

On Thu, 22 Mar 2007, Phrogz wrote:

Given that “Data is King”, I particularly like the (non-sexual) double-
entendre of “Rule of Data”. :slight_smile:

one of the great things about this

-a

On 3/20/07, Austin Z. [email protected] wrote:

Data is king. Applications are pawns.

Data is a dead fish. Applications are knowing how to fish. There are
minor
edge cases where data is more important (“feed me now or I die of
starvation”), but in the grand scheme of things data is insignificant
compared to the applications that produce and transform it.

We use relational databases as object stores because they’re cheap and
easily available, not because they’re good for the task.

  • James M.

On 3/21/07, James M. [email protected] wrote:

On 3/20/07, Austin Z. [email protected] wrote:

Data is king. Applications are pawns.
Data is a dead fish. Applications are knowing how to fish. There are
minor edge cases where data is more important (“feed me now or I die
of starvation”), but in the grand scheme of things data is
insignificant compared to the applications that produce and transform
it.

Thank ghu I don’t have to do business with you, because I wouldn’t trust
your programs to work with my most important assets. I assure you that
my data is far more important than the applications which do something
with the data. The applications increase value, but they NEVER provide
value. It’s the data.

  • What’s the most valuable thing that Amazon has? It isn’t the programs;
    those are constantly updated and occasionally replaced. It’s the
    customer DATA that they’ve amassed.
  • What’s the biggest worry intelligent people have about Google? It
    isn’t the programs, it’s the amount of DATA that Google contains about
    people.
  • What have a number of commercial firms found themselves in trouble for
    in the alst eighteen months? Losing personal DATA about their
    customers.
  • What do hackers and phishers want from you? Your personal DATA. They
    don’t really give a damn about your programs.

Scientists are worried about losing data from older sources, not the
programs. Data, once available, can be squished and manipulated and
dealt with in many dozens of different ways – and often MUST be.

I can work with my pictures in iPhoto or LightRoom with no problems. The
pictures are more important than which program I use to edit them. I can
play my MP3s with any of a dozen different programs; the songs are more
important than which program I use.

I reiterate: Data is king. Applications are pawns. You can squawk all
kinds of ways to next Tuesday that this isn’t true or that it’s “minor
edge cases”, but in reality it’s just the opposite – and ALWAYS WILL
BE. The application is more important than the data in the most rare of
cases. This is where a lot of OO-heads screw up. They think that the
application is far more important than the data. This is never true. The
application is, for the most part, a footnote to the data. Businesses
don’t care that much when they lose an application. They care
significantly when they lose data.

We use relational databases as object stores because they’re cheap and
easily available, not because they’re good for the task.

No, that’s why we use SQL databases. The reason that we don’t use object
databases is that they’re not cheap, they’re not easily available, and
they’re disastrous for your DATA because they completely lock you into a
single view of that data. Which matters a LOT more than any pissant
little program ever will.

Please. Try a little harder next time before you try analogies that
don’t hold up to even the barest of comparisons.

-austin

We use relational databases as object stores because they’re cheap and
easily available, not because they’re good for the task.

Really? What’s better?

On Thu, 22 Mar 2007, James M. wrote:

We use relational databases as object stores because they’re cheap and
easily available, not because they’re good for the task.

here at the national geophysical data ceter

http://ngdc.noaa.gov/

we say that data is useless, only the combination of applications and
human
reasoning can turn it into information. so, with that in mind, i’d
say
that data and applications are useless and that it’s only by combining
the two
using logic (aka business rules) that anything meaningful arises.

cast in point : we’ve 260tb of ‘data’ sitting in our mass storage
device.
less than 0.01% ever comes back out. that small percentage is massaged
into
meaningful information via complex application and human logic
though and
it’s those kernels we’re interested in.

2 cts.

-a

On 3/21/07, [email protected] [email protected] wrote:

http://ngdc.noaa.gov/

we say that data is useless, only the combination of applications and human
reasoning can turn it into information. so, with that in mind, i’d say
that data and applications are useless and that it’s only by combining the two
using logic (aka business rules) that anything meaningful arises.

I have to disagree with you, Ara. If you start with a set of data and
your business rules, you can reformulate the applications to derive
value from the set of data. On the other hand, if you have a set of
applications that implement your business rules and no data … you
can’t derive value at all.

Without data, you absolutely cannot do anything. If customers have to,
they can buy or write new programs to work with their data. They can
almost never recover lost data.

Case in point: Alaska Revenue just had to spend $200,000 in overtime
to rescan paper data that had been lost from their online system and
the backup was unreadable.

cast in point : we’ve 260tb of ‘data’ sitting in our mass storage device.
less than 0.01% ever comes back out. that small percentage is massaged into
meaningful information via complex application and human logic though and
it’s those kernels we’re interested in.

Right, but if you needed to, you could easily (fsvo easily) rewrite
the complex application; it would be far harder to try to recreate the
data – especially the historical data you have from satellite
imagery. It’s not as if you can rewind the clock seven days to get a
satellite image you lost a week ago.

You are right that programs help you derive value from the data, but
programs are far more easily replaced than data.

-austin

On 3/21/07, Michael Bevilacqua-Linn [email protected]
wrote:

On 3/21/07, James M. [email protected] wrote:

We use relational databases as object stores because they’re cheap
and easily available, not because they’re good for the task.
Really? What’s better?

I’ve debunked Mr Moore’s base premise in a separate post, but as far as
simply storing data – not ensuring transactional integrity or any
number of other things that database management systems provide you –
absolutely nothing beats flat files on the filesystem where the
filesystem provides your indexing and can perform amazingly quickly as
long as you’re working with fixed data.

The problem, of course, is that filesystems are hierarchical in nature
and if your data – or at least your indexing scheme(s) can’t be
represented hierarchically, you’re toast.

-austin

Chad P. wrote:

On Thu, Mar 22, 2007 at 01:29:18AM +0900, Olivier R. wrote:

And I thought SQL could be classified as a functionnal programming language.
But yes, “describing language” seems to be an appropriate definition.
Technically, it’s a “query language”. Why come up with more names for
it?

It’s not a name, but a description. “describing” language is
not the standard term - Olivier means SQL is a “declarative”
language. But even that’s only true of the standard, the actual
implementations have procedural features as well.

On Thu, 22 Mar 2007, Austin Z. wrote:

rescan paper data that had been lost from their online system and the backup
was unreadable.

don’t get me wrong - i understand your point. still, it’s not quite so
clear
cut imho though. for instance, we store both raw and derived satellite
products in our mass store. people tend to consider the raw in just
those
terms you are describing - the foundation of it all. however, as the
developer that manages the system which manages that data i can say that
there
are literally dozens of small but critical peices of software which
touches
the data before it hits disk. and this doesn’t even take into account
the
fact that the data has been stored and replayed from a crappy magnetic
tape
which then relayed the stuff to a downlink and then bounces a few hops
around
the world to get to us. in reality the ‘raw’ data is only as good as
the
weakest link in all those applications and hardware bits.

that might seem far fetched, but my experience is that this sort of
thinking -
that data is something hard and real - is pervasive in science and
nearly
always wrong. not long ago i go a bunch of data dumped on my lap:
hundreds of
cds of ionosounde data from stations all around the world. in theory
the data
should all carry a unique signature and all the code this group used
made this
assumption. of course they were wrong: i wrote a script to scour the
data
looking for ‘impossible’ contradictions. the results? thousands of
dups and
logical contradictions that they didn’t even know about.

i could tell 20 more stories like this. people think the word ‘data’ is
holy
and that it’s somehow different from the data collection sofware and
hardware
which collected it. maybe this is obvious to most people, but it’s
worth
stating for posterity that the ‘data’ is often ‘crap’ because not enough
attention was paid to the software and hardware which collected and
verified
it and, in that sense, this whole commentary is a bit circular.

nevertheless i do agree that ‘data’ is more important when a person is
using
the normal divisions we use when thinking about it. it’s just that
those
divisions can be artificial sometimes without people being aware of it.

Right, but if you needed to, you could easily (fsvo easily) rewrite the
complex application; it would be far harder to try to recreate the data –
especially the historical data you have from satellite imagery. It’s not as
if you can rewind the clock seven days to get a satellite image you lost a
week ago.

this is true of course.

You are right that programs help you derive value from the data, but
programs are far more easily replaced than data.

it depends on the collection method - but that is nearly always true
too.

cheers.

-a

Chad P. wrote:

That’s really the major problem I have with it – the limitations of
SQL, thanks to including SQL.

The “limitations” of SQL stem from it being a true child
of the 1970’s, but also from the relational model it adopts,
and from the requirements of transactional processing. This
last one is the most intractable and the least-commonly
understood. It’s easy to pile criticisms on SQL, but it’s
funny how the people who do it almost never seem to have
a deep understanding of the amazingly complex field of
transactional processing.

Object databases are even more fraught with problems arising
from the model they espouse than relational ones. I could go
on for a week about this, but suffice it to say that generally,
by trying to force persistence into an object model, they’ve
lost the plot regarding transactional behaviour, despite some
well-meaning attempts and even partial successes.

The right answer is more subtle and simpler than either, and
it’s fact-oriented databases. I’ll be having more to say about
that as my ActiveFacts project progresses.

Clifford H…

On Mar 20, 2007, at 17:53 , Austin Z. wrote:

relational databases are evil."
And what about those of use who don’t speak out of ignorance and
STILL don’t like relational DBs??? Or would you just assume we’re
ignorant too???

And I thought I wouldn’t touch this topic with a 10 foot pole… I
generally won’t touch a thread that is one of your hot topics because
it just isn’t worth it (see your comment about Pascal above). You
entered this thread as abusively as you could, pretty much on par
with all your other hot topic threads. I think you do a lot of good
work, but this regrettably makes pretty much most of it unapproachable.

On 3/21/07, Chad P. [email protected] wrote:

or be aware you could be wrong. It doesn’t matter whether it’s Google,
Wikipedia, Britannica, the OED for etymology, or Ask Chad – general
resources are not authoritative primary sources on specifics (generally
speaking, har har).

Sorry. I just tend to get my back up a little when someone singles out
a specific general resource, as though the problem isn’t endemic to
general resources in general, almost tautologically.

I’m in total agreement. But it does seem like the Wikipedia is a
favorite whipping-boy these days.

A week or so ago a columnist in the local paper*, wrote a piece about
his experience with the accuracy of wikipedia. It seems that he
anonymously inserted** a bunch of imaginative junk in the article
about himself in wikipedia***.

He went on an on about how long such stuff can live on in WP, but the
upshot of the column was that someone discovered his fanciful ‘spam’
and deleted it.

All in all the self-policing of the wikipedia seems to work a lot
better than it’s press would have one believe, and there’s some
evidence that the wikipedia is just as, if not more, reliable on
average than more ‘respectable’ and often more dated sources, like the
Britannica which has been on an intermittent vendetta against it.

Not to belittle the point that it’s certainly not a primary source and
shouldn’t be considered as such, any more than the other references
cited.

  • I can’t recall his name, and I don’t know if he’s syndicated or how
    widely.

** He wasn’t clear if the article in question even existed before he got
there.

** Which of course violated the wikipedia policy of not posting
directly about yourself.


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Gary W. wrote:

I’m curious as to why query language development got hung up on SQL.
I’ve read a little bit about Tutorial D. Is SQL simply
another example of pre-mature standardization?

There’s a different kind of standardization?

What would a Ruby interface to the underlying database engine (indexed
tables) look like? Could it get closer to Tutorial D by bypassing the
standard technique of ‘marshaling’ requests into SQL statements? Is
the impedance mismatch between Ruby (or any other OO language) and
Codd’s relational algebra too great to cross smoothly?

Not with a fact-based model. ConQuer, though it’s not
yet been realized commercially, is the absolute bees
knees - a raw beginner can compose queries that would
make a seasoned DBA quake. See www.orm.net for more…

On Thu, Mar 22, 2007 at 07:05:07AM +0900, Rick DeNatale wrote:

All in all the self-policing of the wikipedia seems to work a lot
better than it’s press would have one believe, and there’s some
evidence that the wikipedia is just as, if not more, reliable on
average than more ‘respectable’ and often more dated sources, like the
Britannica which has been on an intermittent vendetta against it.

Not to belittle the point that it’s certainly not a primary source and
shouldn’t be considered as such, any more than the other references
cited.

I agree with that assessment 100% – and not just because I was the
Wikimedia Foundation’s first-ever paid employee.

I guess maybe I should have mentioned that disclaimer earlier.

On Thu, Mar 22, 2007 at 07:10:05AM +0900, Clifford H. wrote:

Gary W. wrote:

I’m curious as to why query language development got hung up on SQL.
I’ve read a little bit about Tutorial D. Is SQL simply
another example of pre-mature standardization?

There’s a different kind of standardization?

There are at least four types of standardization:

  1. premature standardization
  2. post-obsolescence standardization
  3. theoretically optimal standardization, which may or may not be real
  4. Microsoft standardization, which is anti-standardization with a bow

On Thu, Mar 22, 2007 at 05:40:05AM +0900, James M. wrote:

We use relational databases as object stores because they’re cheap and
easily available, not because they’re good for the task.

Except in cases of catch-and-release sport fishing, fishing is about the
fish. If data’s the fish and applications are fishing, data is still
king.

Now . . . having someone hand you munged data is not as valuable as
being able to do it yourself, so applications are important.
Ultimately, however, it’s the data that matters, and applications should
be designed accordingly. What good is fishing in a lake with no fish?