The Diablog - it’s Diabloglical™

From sorting beads to a left-join ... exploring parent-child dynamics in content management

Brett Barron

26 March 2010

My daughter loves to sort beads and stack cups. It's human nature to want to organize multiples of things, and my daughter will find that her organizational skills will evolve to be applied to increasingly more complex structures. One day she may find herself an information architect, doing this for a living, dealing daily with one of the most common, simple ways of organizing information: by categorization.

Web developers quite frequently present information organized by category for easy assimilation. Products, for example are almost always categorized when displayed in an on-line catalog, but in a broader sense, so are news releases when they are arranged by month or year, likewise blogs when they are tagged. The developer of a content-managed website must designate data structures to categorize items so that the information may be easily navigated by the site visitor and precisely displayed the way the designer defines. This article explores a common information presentation scenario, describes different approaches to accomplishing the task and explores factors that impact implementation strategies.

Scenario: Display a list of stores in a shopping center by category, suppressing categories that do not (yet) have any stores, allowing stores to be displayed in one or more categories.

Simple enough. On the back-end, a table of Categories and a table for Stores. Each Store must identify one or more related categories. On the front end, a page with H2 headers for each category (parent) followed by an unordered list of stores (children) in each category, suppressing categories without stores. Two non-trivial parts: how to associate the store with one or more categories, and how to suppress, at render, any category without stores.

In general, developers should consider many implications when evaluating a specific approach to solving a data management task, to name a few:
  • security
  • performance/scalability 
  • efficiency of implementation
  • efficiency of maintenance (the original developer may not be available when it breaks)
First, the many-many relationship between stores and categories can be managed in a number of ways. 
  • A common (and clumsy) way is to stipulate an upper limit of possible categories a store could be assigned and add that number of category fields to the store table. This approach fails if the guess of high end categories turns out wrong, and makes SQL statements both inefficient and awkward. 
  • The third-normal (strictly proper) way is to create a join table between the category table and the stores table with one row for each combination of category and store. Scales well but requires a solid understanding of SQL.
  • Pack into one store field, the categories that store belongs to. This approach makes for simpler SQL, simpler template substitutions, and scales sufficiently for the data sets used in most dynamic web situations. In situations where real-world performance is not negatively impacted, this approach results in code that is the simplest to implement and easiest to debug/maintain.

The code logic for displaying the listing of stores by category is either category centric or store centric.

You can first cycle through the list of categories that have stores, and for each display the list of stores.
  • Multiple SQL calls, one for the categories, and additional calls to display stores for each category
  • rows do not have to be evaluated to manage category headers
  • requires some mechanism of elimitating categories without stores
    • SQL join when building the category list (requires a good understanding of SQL)
    • a field in category table that identifies categories with stores (requires scripting to keep this field up-to-date)
Alternately, you can sort the stores by category and cycle through them directly, detecting and accommodating changes in category header or cycle though the list of categories and for each look for associated stores. Let's evaluate working with the stores table directly:
  • one SQL call sorted by category returns the list of stores (a good strategy if the SQL server is not on local host)
  • server-side scripting (PHP, ASP, etc.) evaluates each row in the result and tests for a change in category, setting the category header when the category changes
  • only works if the HTML defined by the front-end-coder fits with this structure
  • requires developer understands server-side scripting sufficient to code the conditional placement of the header

Most dynamic website projects - the interesting ones, anyway - have a number of such little puzzles to solve. Often there is no clear "best" approach as several options will provide an acceptable balance of time to develop (cost) and performance at page render. Developers each have their own development style. Developers with strong SQL skills tend to seek answer in elegant SQL statements while PHP wizards rely more on scripting logic. Additionally, in small, single-server implementations with small data sets and modest traffic, render performance may not be as critical as swift development cycles and code that is inexpensive to build and maintain. In the end, as long as the developer is aware of the trade-offs of a particular approach (and has considered options to try if the first strategy fails during testing and QA), it probably doesn't matter exactly how the puzzles are solved as long as the project is delivered on time, is under budget, and performs to expectation.

Dialogs is a flexible dynamic website development platform that accommodates virtually all developer styles and approaches including the ones discussed here. In Dialogs, the web developer easily configures Lists (data tables) to hold data the way he or she sees fit. Lists can be configured to be related to other Lists (one-to-many or many-to-many). List templates may be cascaded to allow the category-centric approach; List templates can also be configured with complex SQL definitions to accommodate the store-centric approach. Many common elements of contemporary dynamic websites are as easy to configure in Dialogs as sorting beads or stacking cups. If you're tired of environments that constrain the way you develop, request a Dialogs.com login to take a closer look at the Dialogs sandbox. We think you'll find we work the way you do.

Sometimes it's OK to break the rules.

Charlie Brown

12 January 2010

Website content in Dialogs is defined by information structures we call “Lists”. Each List corresponds to a MySQL table within the Dialogs database. Dialogs presents a simple interface to the developer for List creation and maintenance so developers don’t have to directly interface with MySQL unless they want to. For those more technical developers wanting to understand how to optimize content maintenance and website performance, some basic SQL skills and some insight into how Dialogs List Templates are used to map database fields into rendered pages goes a long way.

It is quite common for data in one Dialogs List to be related to another. Imagine a List of Artists and a companion List of the art they’ve created. Relational database theory defines a certain way to manage that interrelation according to something called third-normal-form (3NF). Among other things, 3NF states that no information should be duplicated in both tables. Is that really a rule that shouldn't be broken? In the days of expensive storage and crude data handling tools, it sure made sense. Adhering to 3NF, however, makes for unnecessarily complex SQL within Dialogs List Templates, something that for efficient development and ongoing maintenance we'd like to avoid. Here's an example to illustrate the problem. Lets say you have these two tables:

Artist
item_iditem_name
1Vermeer
2Rubens
3Michelangelo
Art
item_iditem_nameartist_item_id
1The Milkmaid1
2Hippopotamus Hunt2
3David3
4Sistine Chapel ceiling3
5Pieta3

The relationship between the tables is obvious. The 'artist_item_id' field 'links' to the item_id of the 'artist' table.

If you want to display the art that's easy:

SELECT * FROM art
... but if you want to include the name of the artist it's this mess:
SELECT art.*,artist.item_name AS artist_item_name 
FROM art LEFT JOIN artist ON art.artist_item_id=artist.item_id 
WHERE artist.item_id IS NOT NULL
Yuck!! And it's worse than that. If you want to specify a sort you can't just say
ORDER BY item_name
you have to remember to prefix the table name and the same goes for anything in the 'WHERE' clause as well.

It's time to go off the reservation. If we break from the 3NF rules and add the artist's name to the art table we would get this:

Art
item_iditem_nameartist_item_idartist_item_name
1The Milkmaid1Vermeer
2Hippopotamus Hunt2Rubens
3David3Michelangelo
4Sistine Chapel ceiling3Michelangelo
5Pieta3Michelangelo
Super! Now we're back to
SELECT * FROM art
... and we have the artist's name! The only problem is, if Michelangelo's name is misspelled, and someone changes it in the 'artist' table, it will still be wrong in the 'art' table. This is where Dialogs shines. All we have to do is add one line to the list automation script which gets run for every edit to a list item. This one line will update the art table and keep the artist's names in sync with the 'artist' table:
$this->query("UPDATE art LEFT JOIN artist ON art.artist_item_id=artist.item_id
  SET art.artist_item_name=artist.item_name");
Cons:
  1. Causes baldness in database purists.
  2. Wastes hard drive space.
Pros:
  1. Easy to implement: add one more field to the list, add one line to the automation script.
  2. Dead simple list templates because all the fields you need are there.
  3. CSV and XML exports have all the fields you need too.
  4. Hard drive space is beyond cheap.

Dialogs is a real-world development tool for the professional web developer. We developed Dialogs with the belief that successful deployments either follow best-practices or define them. So go ahead, break the rules. We won't tell your database theory professor. For more detailed description of how to do this in Dialogs see the Knowledge Base Article.

You really should wrap that rascal.

Brett Barron

17 November 2009

For years, the closed padlock on our web browsers has identified when our data transmissions over the Internet are "secure." Every time you enter a login and password on the internet you should look for the same security.

By now, we all know the “safe” way to transmit sensitive data over the internet is via such secure pages encrypted using the HTTPS protocol (Wikipedia definition of HTTPS protocol). Cautious on-line shoppers check to be sure a page is secure before entering ecommerce data into a form. It's a good thing, too. Encryption of this nature is done at the "transport" layer which means that data is "wrapped" within an envelope into which others on the same network cannot see. Just as you wouldn't want to receive your credit card statement each month on a big post card, you've learned you should not post sensitive information without HTTPS because it is the equivilant of mailing a message without an envelope.

Most internet users have login credentials to many, many websites today. Many of these websites have little or nothing to do with ecommerce, and may include webmail, blog tools, forums, wikis, group sites, and content management systems. Every time you enter your login and password, you should first check to see that you are doing so over an HTTPS (secure) transaction. Why?

Consider the many places where we work today. Perhaps you check your mail or update your blog using a laptop at Starbucks. Maybe you login to a web application using your iPhone or other smart phone. And even our home broadband networks place us on a common network. If you are authenticating with remote services on such open networks without encryption, you're essentially putting your login and password on a post card and passing it around the room. There may not be financial data at risk, but do you really want someone elso knowing how to gain access to your email, blog, or website?

There's really no need for the risk. All it takes is a dedicated IP address and a cert. A certificate from a Certificate Authority (such as GeoTrust or Verisign) is appropriate for websites that require authentication for public visitors as such certificates provide reassurance that the server is who it presents itself to be. However, if the only users logging into a website are internal users, low-cost certificates or self-signed certificates are quite sufficient.

Developers should take the initiative to spread the word that credentialed access to any website should be secure. Additionally developers should ensure that all communication with a site authenticate securely, including ftp and telnet over ssh. Play it safe, wrap that rascal in HTTPS or the equivalent.

We're firm believers in security. Dialogs ships with automatic login redirects to an HTTPS connection for users trying to authenticate from a non-encripted page, and every Dialogs website we've provisioned in our own data center for years has had a dedicated IP address and a certificate to permit HTTPS communications.

1 2

“Dialogs has been a breeze to work with and I’m dreading ever having to build a site without it.”

— Jason B., agency partner