Suppose we have a table of widgets, with various descriptive attributes:
widget | ||||
---|---|---|---|---|
id | size | shape | color | alignment |
1 | Big | Round | Purple | Chaotic |
2 | Tiny | Star | Purple | Lawful |
3 | Tiny | Linear | Red | Neutral |
And another table whose rows correspond to sets of widgets:
widget_set | |
---|---|
id | description |
1 | Widgets of sizes other than Tiny |
2 | Chaotic-aligned or Star-shaped widgets |
3 | Big and Red widgets |
Now we'd like to set things up so we can look up the widgets that belong to a given set (or look up the sets that contain a given widget). The most natural way would be to define a view:
create view widget_set_member as select 1 as widget_set_id , w.id as widget_id from widget w where w.size!='Tiny' union select 2 as widget_set_id , w.id as widget_id from widget w where w.alignment='Chaotic' or w.shape='Star' union select 3 as widget_set_id , w.id as widget_id from widget w where w.size='Big' and w.color='Red'
One limitation of this method is that new sets of widgets can't be defined
with only data-manipulation
(insert
/update
/delete
) statements; it
requires redefining the view. We could avoid this problem by using a table
instead of a view:
widget_set_member | |
---|---|
widget_set_id | widget_id |
1 | 1 |
2 | 1 |
2 | 2 |
This method has its own limitation, though: changes to the
widget
table won't be automatically reflected in
widget_set_member
. Luckily, there is a way to have the best of
both worlds. We'll start with an attribute_type
table whose
rows represent the columns of descriptive attributes of widgets:
attribute_type | |
---|---|
id | name |
1 | size |
2 | shape |
3 | color |
4 | alignment |
Then we'll use that table to add an attribute_member
view, which
turns the structure of the widget
table into
data:
create view attribute_member as select w.id as widget_id , at.id as attribute_type_id , at.name as attribute_type_name , case at.name when 'size' then w.size when 'shape' then w.shape when 'color' then w.color when 'alignment' then w.alignment end as attribute_value from widget w , attribute_type at
The criteria that define our sets will be boolean predicates, based on the
descriptive attributes of widgets. The primitive conditions will test that a
particular attribute has a particular value—say, that the
size
is Big
. Then several of those primitive
predicates can be combined into one compound predicate with and
,
or
, and not
. We'll represent those compound
predicates in
disjunctive
normal form. attribute_member
gives us the primitive
predicates, so the next layer up will be a widget_subset
table
corresponding to conjunctions, a widget_subset_pred
table
showing which (possibly negated) primitive criteria make up a conjunction,
and a widget_subset_member
view showing which widgets satisfy
the criteria for which subsets.
widget_subset | |
---|---|
id | |
1 | |
2 | |
3 | |
4 |
widget_subset_pred | |||
---|---|---|---|
widget_subset_id | attribute_type_id | attribute_value | negated |
1 | 1 (size) | Tiny | 1 (negated) |
2 | 4 (alignment) | Chaotic | 0 (not negated) |
3 | 2 (shape) | Star | 0 (not negated) |
4 | 1 (size) | Big | 0 (not negated) |
4 | 3 (color) | Red | 0 (not negated) |
create view widget_subset_member as select wss.id as widget_subset_id , w.id as widget_id from widget_subset wss , widget w where (select count(*) from widget_subset_pred wssp , attribute_member am where wss.id=wssp.widget_subset_id and wssp.negated=1 and wssp.attribute_type_id=am.attribute_type_id and wssp.attribute_value=am.attribute_value and am.widget_id=w.id)=0 and (select count(*) from widget_subset_pred wssp , attribute_member am where wss.id=wssp.widget_subset_id and wssp.negated=0 and wssp.attribute_type_id=am.attribute_type_id and wssp.attribute_value=am.attribute_value and am.widget_id=w.id)= (select count(*) from widget_subset_pred wssp where wss.id=wssp.widget_subset_id and wssp.negated=0)
Next, we'll add a widget_set_pred
table showing which subsets
(conjunctions) make up a set (disjunction) and a
widget_set_member
view showing which widgets satisfy the
criteria for which sets.
widget_set_pred | |||
---|---|---|---|
widget_set_id | widget_subset_id | ||
1 | 1 | ||
2 | 2 | ||
2 | 3 | ||
3 | 4 |
create view widget_set_member as select distinct wsp.widget_set_id as widget_set_id , wssm.widget_id as widget_id from widget_set_pred wsp , widget_subset_member wssm where wsp.widget_subset_id=wssm.widget_subset_id
That's it. With these tables and views, defining a new set can now be done
by inserting rows for its criteria into the
widget_subset
, widget_subset_pred
,
widget_set
, and widget_set_pred
tables. Members of
the set will then be immediately visible in the
widget_set_member
view. We don't need to create or modify any
views, and we don't need to keep any separate tables in sync when the
widget
table is changed. However, as you might guess from
looking at the widget_subset_member
view, the query performance
may be rather slow, depending on the amount of data there is to work
through.
We can improve performance by creating another table to enumerate all
possible attribute values, and another to associate widgets with the
attributes that apply to them (replacing the attribute_member
view):
attribute | ||
---|---|---|
id | attribute_type_id | value |
1 | 1 | Big |
2 | 1 | Tiny |
3 | 2 | Round |
4 | 2 | Star |
5 | 2 | Linear |
6 | 3 | Purple |
7 | 3 | Red |
8 | 4 | Chaotic |
9 | 4 | Lawful |
10 | 4 | Neutral |
We can set a uniqueness constraint on the
(attribute_type_id , value )
combination. |
attribute_member | ||
---|---|---|
attribute_type_id | attribute_id | widget_id |
1 | 1 | 1 |
2 | 3 | 1 |
3 | 6 | 1 |
4 | 4 | 1 |
1 | 2 | 2 |
2 | 4 | 2 |
3 | 6 | 2 |
4 | 9 | 2 |
1 | 2 | 3 |
2 | 5 | 3 |
3 | 7 | 3 |
4 | 10 | 3 |
attribute_type_id is not semantically
necessary, but we can create a uniqueness constraint on
(attribute_type_id , widget_id ) to ensure we
don't accidentally give one widget two different colors, etc. |
Then we would modify the widget_subset_pred
table to have an
attribute_id
column instead of attribute_type_id
and attribute_value
, and redefine the
widget_subset_member
view:
create view widget_subset_member as select wss.id as widget_subset_id , w.id as widget_id from widget_subset wss , widget w where (select count(*) from widget_subset_pred wssp , attribute a , attribute_member am where wss.id=wssp.widget_subset_id and wssp.negated=1 and wssp.attribute_id=am.attribute_id and am.widget_id=w.id)=0 and (select count(*) from widget_subset_pred wssp , attribute_member am where wss.id=wssp.widget_subset_id and wssp.negated=0 and wssp.attribute_id=am.attribute_id and am.widget_id=w.id)= (select count(*) from widget_subset_pred wssp where wss.id=wssp.widget_subset_id and wssp.negated=0)
The disadvantage with these extra tables is that now we have to maintain the
data in the attribute
and attribute_member
tables,
which is more complicated than maintaining widget
alone. On the
other hand, the descriptive columns could be dropped from
the widget
table, so the data would still only need to be
maintained in one place. The structure of the original widget
table could be reproduced as a view.
Note that this table structure can't ensure that every widget has one
attribute of each type. Some widgets might be missing a size, or an
alignment, etc. This would be somewhat like having a null
value
in the corresponding column in the original widget
table, but
the semantics in the set membership views would be different.