Process evidence: Client:
Why do some argue is most important thing:
Static structure of system - like a map
Advantages of database approach Program-data independence Minimal data redundancy Improved data consistency Improved data sharing Improved data quality Reduced program maintenance
Difference between data model and database
Data modelling is DESIGN Creative solutions needed Data model is one solution to req’s
Data modelling is IMPORTANT Need a blueprint to build a house
Data modelling is a discipline Professional judgement required
Entity relationship model: Detailed graphical representation of entities and associations
Focus is purely data Not what happens to it Depicts the form of the data Not the data values Specific to the needs of the business
Person, place, object, event, concept that is of interest to the organisation single, unambiguous names Student, book, product
Time Space Fuzzy
Association between the instances of one or more entity types name strongly make a sentence to describe the relationship may have attributes – an associative entity
A book is borrowed by a patron A patron may borrow a book
Conceptual Schema A detailed, technology independent specification of the overall structure of a database Physical Schema Specifications for how data from a conceptual schema are stored in a computer’s secondary memory
Each entity is described by attributes. These are what we use to describe the instances of each entity. There are strict rules about attributes, they need to be independent and not repeating information. They also need to be atomic (can’t be broken down), and not be dependent on other values.
In the example below, StudentName is unsuitable as it can be broken down into first and surname; StudentAge isn’t necessary as it can be calculated (putting it in by hand is asking for trouble), and StudentGrades is not atomic - ot contains multiple values - which will not work (we’ll come back to this one).
We also need some way of uniquely identfying each instance, this becomes the Primary key for the entity. This has to be not change be unique. We could use the name but, despite us all feeling special, our names aren’t. We could combine the name with date of birth and make a concatenated key. Unfortunately, dates are very difficult to deal with (what is the significance of the 12th of September?) and it doesn’t resolve our non-unique problem (Sam went to school with two Jacqueline Clare Robertsons, both born on 30/7/69). So we have to use a new value: a STUDENT_ID.
We can get the same information from these attributes and the data is structured much more powerfully. The courses a student is enrolled in is more difficult to solve, but the solution is the heart of the power of a data based (and hence database) approach to understanding the business and implementing a solution.
In the example above Bob is taking only Geography (wise choice), so that is easy to deal with. Jane, however is taking French, Geography and History. If we try and store this information in an attribute called Courses it is very difficult to extract this information - while it would be easy to get a list of Jane’s courses, a list of who is taking Geography would be very hard to extract (and involve string manipulation etc).
The real strength of data modelling, is in the identification of relationships. A relationship describes the association between entities.
We can see that there are clear relationships between STUDENT and COURSE, and COURSE and TEACHER. These assocations can be expressed as sentences: “a STUDENT takes this COURSE”, “a COURSE is taught by a TEACHER” (we’ll come back to cardinality).
The STUDENT - TEACHER association, however, is not so clear. The relationship (at least the one we are considering here) is already expresssed by the model. We can extract which teacher is teaching which students via the COURSE they are teaching and enrolled in. We don’t need a STUDENT-TEACHER relationship.
Even that this level of consideration, we a being forced to carefully think about the business we are dealing with. We could have described a STUDENT-TEACHER-COURSE model. Why have we chosen not to have the STUDENT-TEACHER relationship rather than not having STUDENT-COURSE?
We can also express the numbers of instances that might be involved in the relationship. Here we have “many STUDENTS may take many COURSES”, or conversely “many COURSES may be taken by many STUDENTS”. We express the “many” on our diagram with a “crows foot” (hanging on to the many end). Unfortunately, this gives us a none-specific relationship, we know we have lots of students and lots of courses, but not who is taking what.
One solution to these unconnected lists, is to include attributes for each of the courses someone is taking (below)..
Unfortunately this doesn’t work either. What happens when someone enrols in more courses than you have previously thought of, where do the results actually go, and we still don’t have a way of finding out who is enrolled in Geography.
What we need is a new entity to represent this relationship. This is called an associative entity. We might have identified it earlier as it is really quite strong in terms of ‘thingness’. The relationship betweem STUDENT and COURSE also has several parameters in its own right: date, result, perhaps internal assessment grades and so on. The fact that the relationship has attributes indicates that it is really an entity.
Again, we come back to the value of this process in understanding the business. Is it just one entity? We can think of sveral possible names for the entity - are they the same thing.
There are cases where multiple names indicate multiple associations - the Human resources manager authorises payment of salaries (one relationship), but is also on the payroll herself.
This middle entity shows the association between STUDENT and COURSE, ENROLMENT.
When we implement the model, the ENROLMENT entity (table) does not contain the student’s names, nor the courses - the computer can quite efficiently go and get them. Instead we just use the primary keys from the other entities - they become foreign keys.
There is though, a flaw in our model. Jane Smith seems to be enrolled twice in History. This is not a mistake (except for Jane) - she failed it the first time. In order to allow this to happen we need to represent more information, the year of enrolment. Our model can generate the information required, it will be useful as the development progresses as we work to make it more tigtly defined.
The primary focus in functional requirements is the process itself, by developing such a model, we are forced to think carefully about the business we are working with.