The popularity of Access seems to go up and down on a regular basis. People tend to use Excel (albeit incorrectly) as a database tool as it is a relatively easy program to use, and most people are familiar with it so it is easy to create lists of information. Unfortunately, that is all it is…a list, not a fully functioning relational database. Excel uses formulas to create “links” which normally means VLOOKUPs galore. This is particularly inefficient as your spreadsheet will refresh all formulas each time you update the worksheet and as your data gets bigger this can take a long time to process.
Access on the other hand is designed for the very purpose of building databases. The downside is that it is not particularly intuitive – you can’t just open it and go unless you know what you are doing. In this series of blogs about Access I will cover;
- Database design & data normalisation
- Creating tables
- Linking tables & relationship types
- Creating queries
- Creating forms
- Creating reports
That’s not a definitive list but it covers the majority of what you will need to get creating and running a database.
First things first…designing your database.
Before you even open your PC or laptop, get a big sheet of paper and design your database there. This may seem like a backwards step but until you can visualise and create a database layout with all the necessary tables and links in your head…draw it on paper first!
But how do you go about designing a database?
First, think about how many tables you will need. Chances are if you are thinking in Excel terms you will create one massive table that contains everything. The biggest problem with this is duplication of data. You should be looking to create tables, where each one contains a unique set of related data, and each line in your table represents a unique record.
Let’s look at an example; I want to build a database to track my customers asking for training. At any customer site I deal with there will be one or more contacts that I deal with, perhaps in different departments, managers and staff etc. Problem is, that if I use a single table, each time I add a new contact I also have to add in all the details about the customer name, address etc. which results in a lot of data duplication. I could just enter the customer name and leave everything else blank, adding only the new contact information, but then how do I apply filters or run any look ups? It’s simply not going to work.
Wherever you see data duplication this is hint to create a separate table. This process is called DATA NORMALISATION. Continue doing this process until you remove all forms of duplication from your lists.
Now you’re thinking…I’ve created a bunch of separate tables, each containing unique data with nothing connecting them. At this point this is true, but this is where you need to create common fields between tables that are related to each other.
Continuing our example, we know that the contacts are related to the customers so we need to create a common field between the two tables. At this point think about which comes first – the customer or the contact? Can you have a contact without having a customer or vice versa? It is possible to know random people, but where did you meet them? At the pub, at work, at the gym? You both need something in common however brief to create the connection.
Bearing this in mind, the customer comes first in our example and contacts can share a customer name in common. We therefore need to add a new field in our contacts table that contains the customer ID. Each customer will have a unique ID reference or PRIMARY KEY field and this is what we will use.
This seems to go against having duplication in a table, but all you are duplicating (assuming there are multiple contacts at a customer site) is the ID number. No address information, or any other details relating to the customer are present in the contacts table. Unlike Excel there are no VLOOKUPs to refresh, there is simply a link between the two tables, which we create using the RELATIONSHIPS screen in our DATABASE TOOLS which I will describe in a later blog.
Once the relationship is created it will produce what is called a one to many relationship where the customer can only appear once in the customer table (the one side of the relationship), and each customer can have multiple contacts (the many side of the relationship).
So our database, in this case using only two tables, should look like this once in Access;
As you continue to add tables, think about how each one is related to other tables in the database. Don’t become over-zealous though with your linking, trying to link every table to every other table. As long as you can access all tables following a “route”, no matter how convoluted, around your database, you can extract pretty much any data from any part of it.
Looking at a possible database design above, you can see that the TOPICS table is not directly linked to INVOICES, but there is one via TRAINING JOBS so I could check the number and value of invoices per topic covered. There is no direct link between EXPENSE TYPE and the CUSTOMER tables but because there is a link via EXPENSES > TRAINING JOBS > CUSTOMERS I can extract a report that shows me what type of expenses I tend to have with each customer and you can see how this works in my blog about building queries.
Now imagine the number of VLOOKUPs or INDEX and MATCH or INDIRECT functions I would need to create to try and connect all this data if it was in Excel!
One more thing you need to consider when thinking about the number of tables you require…sticking with our training database, how can I connect the TOPICS table with the CUSTOMER table? Should I even bother? After all, that’s what the customer wants – me to teach a particular topic or range of topics.
Let’s think about it…it’s possible that any customer could want training in a variety of topics. Bearing in mind that each customer appears only once in the CUSTOMER table giving us a one to many relationship (one customer – many possible topics). How about the topics I teach? Each topic could be requested by several customers. Once again, if we look at the tables, our topics appear only once in the TOPICS table which would give us a one to many relationship with the CUSTOMER table (one topic requested by many customers). But we know this is not possible as both the topics and customer names appear only once in their respective tables. The type of relationship we are trying to create here is a many to many type and this is not possible when linking two tables, each containing a primary key field or list of unique values.
If you come across this type of relationship you will need an extra or intermediate table that sits between the two tables. In reality this is an orders table, or something we refer to as a TRANSACTIONAL table.
If you work with an ERP system, just think about the screens you go into. You will have supplier lists, item masters etc. but where you enter an order on a supplier, it is on a completely different screen or form and therefore a separate table in the database. So this is a very common set up in databases…however big or small.
In the table above we can see that each customer appears several times, and the topics also appear several times, and in one case, Excel appears twice for the same customer. What is unique though, is the order number and that will provide me with a PRIMARY KEY in this table.
So now you have the basic principles of database design, and in particular thinking about the number of tables you might need and how to split the data to create tables containing unique record sets, removing all possible duplication through the process of data normalisation. At first this process is not obvious, but with practice you soon see patterns emerging in your data and from there can quickly identify the tables you need to create in your database.
In my next blog I will show you how to create your tables in Access.