Main Page Content
Beginning Database Design Part I
Creating and Naming Database Tables and Fields
This series of articles is based on an outline I made while reading "Database Design for Mere Mortals" by Michael J. Hernandez. Mike graciously gave permission for its use in this derived work. I highly recommend the book to anyone who wants a good grasp of database fundamentals. The series is intended to familiarize the uninitiated with the concepts of database design. I'll include an extensive bibliography for those interested in more technical or in-depth coverage.
Why Use A Database?
Data overload is a common problem in business today. Corporations and individuals have plenty of raw data, but can't always find it or aren't aware that they even have it. Raw data must be filtered and organized to become useful information. Databases are a primary tool for the task; a tool which takes advantage of the speed and power of modern computers.
Goals
Creating a database can be a daunting task to a beginner. As with most complex projects, good database design can be broken down into discrete steps which are easier to manage.
In a future article, we'll discuss the information-gathering process necessary before database design even begins. For now, let's talk about something less abstract — tables and fields. For the examples in this article I'll try to provide samples which reflect real-world situations.
Terminology
First, some terminology. We won't be using the highly technical terms "relations" and "tuples"; even in the manuals they introduce the terms and then go right back to "tables" and "rows" so we'll skip the interim step. However, the technical term "attributes" works well to describe the concept embodied in "columns" so we'll use whichever term seems more understandable in context. And each "record" makes up one "row," another familiar term. Again, we'll use whichever fits best conceptually. Finally, for each row there will be an "attribute" in each row. This is a "field."
A database is made up of one or more tables. Each table is made up of one or more rows of information. Each row contains various attributes, one in each column.
From Flat Spreadsheet to Relational Database
If you're familiar with spreadsheet software, you're familiar with tables of rows and columns. For instance, you might have columns called "Name," "Company," "Address," "Specialty," and "Phone." This works fine for a while, but you soon realize that if you have multiple entries from the same company, the same specialty, or perhaps even the same name, you're duplicating entries. One of the geeky things database administrators get excited about is efficiency. It's not efficient to store the company name "Spinhead Web Design" twelve times; it's the same every time, right? And if you have two contacts at Spinhead who answer the main phone number, you've entered that twice, haven't you? And look; you have "Database Administrator" listed for three different contacts. This obviously isn't a big issue with small databases, but in your plans for world domination, small databases will rapidly be replaced by huge databases. Plan ahead. Build efficiently. [1]
How can we make the spreadsheet more efficient?
If there were a way to use multiple spreadsheets, one for each type of data, and have the spreadsheets share information, you could have the "People" spreadsheet linked to the "Skills" and "Companies" spreadsheets. Each spreadsheet would only contain unique records, not duplicates [2]. If two people had the same skill or worked for the same company, they could both refer to the same record the appropriate spreadsheet.
Congratulations. You now understand the basic concept behind a relational database.
Tables and Fields
Each of our spreadsheets compares to a table in our "Contacts" database. But how exactly do we decide what each field should contain? And which fields go in which table?
In the real world, a certain amount of research precedes any real database design work. We'll discuss it further in a future article, but essentially, through interviews with the intended users and analysis of the existing processes, we have collected a long and probably disorganized list of possible candidates. Through two processes, we'll decide which will become fields, which will become tables, and which fields go in which tables.
First, look for nouns which identify an object or event but do not modify some other object or event. This list is our preliminary list of Tables. At the same time, we're doing the opposite; looking for nouns which do modify another object or event. This list becomes our preliminary list of Fields. For example, "vendor" would go on the "Tables" list, but "phone number" is likely a characteristic of "vendor" and belongs on the "Fields" list. "Tables" usually refer to collections of things; something that can be broken down into smaller components. "Fields" would be those components. Don't worry about precise names yet. We'll stick to simple descriptive words or phrases for now.
So far we haven't done anything to organize our fields into tables.
Assigning Fields to Tables
Once we're certain the our "Tables" and "Fields" lists are complete, it's time to assign each field to the appropriate table. For each entry on the preliminary field list, choose the table which it describes or to which it applies. If some fields don't seem to fit any table, set them aside for the moment. We'll be altering our lists as we proceed, so don't worry about orphaned tables or fields.
Now, let's take a closer look to ensure that our tables and fields are as efficient as possible at this stage. A short checklist for an ideal table:
- Represents only a single subject
- Has a field (or pair of fields) which uniquely identify each row (the Primary Key; more on that later)
- Does not contain unnecessary duplicate fields
- No repetition of the same type of value
- No fields which belong in other tables (we'll address links between tables later)
And for an ideal field:
- Represents a characteristic of a table subject
- Contains a single value
- Is atomic; that is, not multi-part [3]
- Is not calculated or concatenated
- Is unique throughout the database structure
Naming Tables and Fields
Once we've reached this point, let's establish our table and field names. Names for all database elements should be:
- Unique
- Clear, meaningful, unambiguous
- Short
Some restrictions for naming tables:
- No acronyms or abbreviations
- No proper names or words which unduly limit the data which can be entered
- Should not imply more than one subject
- Should be plural
And for naming fields:
- No acronyms
- Use abbreviations only if clear and meaningful
- Should not imply more than one subject
- Should be singular
You'll probably see some duplicate field names, such as "Name" in both the "Companies" and "People" tables. Let's make them unique across the database. You might choose "ContactName" and "CompanyName" but whatever you use, stick to the guidelines above, and be consistent.
Review
Database tables and fields take the simple spreadsheet concept to another dimension. Establishing tables which refer to a single subject, populated with fields which refer specifically to that subject, will provide a sound foundation for future growth.
You'll be using table and field names frequently in your database work. Make sure they're succinct, but not arcane.
Give it a try. On paper, make up some sample lists of subjects and characteristics. Group the characteristics by subject, and try your hand at meaningful names.
Next Steps
We still don't have our tables talking to each other. Where's that efficient information sharing we talked about? And what prevents us from entering "Database Administrator" in the "Skills" table twice, detracting from our neat and tidy structure?
Up Next: The Keys to Data Validation, Integrity, and Efficiency
Notes
[1] From a practical perspective, effiency of this depth adds little value to small, lightly queried databases. It is presented with the hope that if we strive for perfection our results will surpass those achieved by striving for mediocrity. (return)
[2] Well, if you want to split hairs, of course there's duplication every time you create a reference to the same data. But the repetition of the simple reference is more efficient than the repetition of the potentially complex data itself. (return)
[3] This is, to a degree, subjective. Is a phone number atomic, or is it composed of area code, exchange, and suffix? General usage may suggest an answer, but how the data will be used should be the determining factor; not our preconceived ideas about what a phone number is. (return)