if you want to avoid the usage of big and complicated SQL-Statements, why not implementing Views instead of denormalization. A good RDBMS should offer this mechanism, especially for performance and usage optimizations.
You could have a View called users, could still say SELECT * FROM users, but what is exactly within users can be changed at any time, without changing the interface to the application; The View users in this example.
One thing youâre not taking into account is a system like memcached that avoids hitting the database at all. When using such a system, it might be better to actually be storing normalized data so that your hits on the database for cache misses are small and involve as little table locking as possible.
Before normalizing denormalizing think about what you need to to with the database and where you are going to do it. One example: We have a quite big database with test results. Most of the work is gathering data for reports. Normalizing Denormalizing is an issue and you only know what is faster if you know how you access it.
Iâm not sure I agree with the total denormalisation in the example, Jeff. Iâm (relatively) recently out of University, so Iâve still got the normalisation-is-everything rules in my mind Being pragmatic, though, and as a couple of other people have noted, denormalisation is an optimisation to a fully-normalised design. Iâd argue that denormalisation should be done in stages, in just the same way that normalisation is.
For example, I donât think Iâve ever taken a production database past 3NF. Certainly the only time Iâve take a design to 5NF is for my University database assignment. And arenât stored procedures supposed to improve performance for this sort of queries. Yes, youâre always going to take a performance hit for a query utilising six joins (there are some crazily-sized cartesian products going on in there), but wouldnât a multi-layered approach aid this? I.e., for common or computation-intensive operations (such as viewing a userâs profile in the example above), using multiple stored procedures to extract sections of data and then another stored procedure to put them together. It could also be accomplished using the user_id primary key to extract the relevant record from each table and then combining the records at the application level.
Having said all that, I havenât done any database programming in a good couple of years now, so I could be blowing a lot of hot air
Databases are one of those things that you just canât take a cavalier approach to. If you screw that design up, and worse still, launch it, youâre in for a world of hurt when the problems with the data come.
Normalisation means the only problem youâre going to get is scalability: your data is going to be good (presuming you were also clever/paranoid enough to get all the integrity checks in there too).
As the quote says, you can always denormalise later for speed.
Addendum: And as others have said, there are much better solutions than denormalising your database. Having Materialized Views for reads, or memcached.
Quite a horrible job naming the tables. Why the underscores in field names and not in table names. Why user_id and not user_sex? Why table nimes in singular? Of course this is beside the point, but lolâŚ
nub on July 15, 2008 02:33 AM
because when youâre selecting multiple id fields you want to be able to distinguish between them, example: user_id and affiliation_id
instead of
select user.id user_id, affiliation.id affiliation_id from âŚ
you can just have
select user_id, affiliation_id from âŚ
taht way you donât always have to prefix the fields with the table name and you also you donât have to alias the fields everytime;
and table names are singular because each row in the table designates one specific person; one row = one user
That 6 join query is wrong, and the rest of your article follows from that incorrect join.
Why would you ever want a query that returned the user data multiple times, once for every combination of screen_name and phone_number?!? No real site would ever want to do that.
If you need phone number, you get phone numbers, if you need screen names, you get those - but why would you need every possible combination of phone number and screen_name?
Youâre simply wrong. Speeding up joins is never a reason to denormalize for the simple reason that it doesnât! Test it - youâll see Iâm right. But, do me a favor and write real joins - not ones that give you tremendous numbers of duplicated rows.
And if you donât even understand why your query is wrong, you have no business designing databases.
A valid reason to denormalize is to precalculate data, which you touch on very briefly. But you always write that twice: once normalized, and thatâs the primary data, and then again, the cached/precalculated version. You should always be able to regenerate that from the normalized data.
OK, one final reason to denormalize which you didnât even write: if you need to do a where clause from one table, but the order by from a different table, you need to denormalize because you can not create a combined index from both tables. (Databases with function indexes might be able to but thatâs pretty complicated.)
Iâm sorry to bash you so much, but you shouldnât write about what you have no experience with.
The approach weâve taken at work is to have data normalised (well all the new tables, we have some truly hideous legacy tables that we havenât tidied up yet!) and then either use memory cache or search tables to speed up access.
In fact our search table is one honking great big denormalised table. There canât be any joins at all. This is generated by an off line task constantly.
Iâm always wary about denormalising the âcannonicalâ tables as this leads you into a maintenance nightmare, especially when it comes to adding or removing columns.
and table names are singular because each row in the table designates one specific person; one row = one user
james: are you serious? Tables donât have 1 row in them they have many. So you make them plural (1 row singular, multiple rows plural). Rule: table names are plural, column names are singular.
Also, you do not want to prefix every single column with the name of the table, what does that save you? Just type tablename.columnname instead of tablename_columnname - no need for aliasing. Sometimes you violate that rule where it might be confusing, but for that most part donât double prefix stuff.
Your technique of describing a denormalized database model is one of the basic principals of dimensional modelling primarily used for data warehousing.
Itâs pretty much the opposite of normalizing your model (it contains redundant data, utilizes lots of space etc). Especially if end users need to use this data to query for e.g. reporting purposes then dimensional modelling is absolutely the way to go.
However, if youâre designing a normalized database, I would suggest to keep it normalized and donât denormalize at all unless youâre designing a dimensional database structure from the beginning. Keep it separated and donât mix the two modelling techniques even though it might be more efficient in some cases. Making things unnecessary complex is not something you want to do very often.
DB nuts come out in forces. Views as a performance improvement LOL.
Even on a very small DB those 6 joins will cost you. Oracle uses huge amounts of CPU for joins, whether or not you have lots of data. With any kind of load at all it will cost you.
Just forget about using a relational DB and use Berkeley.
Iâm starting to study object-relational database, and I wonder if it wouldnât be a nicer solution for you, since is wipes away the necessity to write long complicated joins.
My studies havenât yet gone too far on that topic, but it seems ORDBs are easier to deal with than relational ones.
Which software did you use do create those diagrams?