Description
- entity sets with
- weak entity sets and
- subtype entity sets as needed,
- relationship sets with appropriate
- multiplicity (e.g., many-one’s & many-many’s),
- specifying appropriate attributes and keys, and
- stating any additional constraints that should be considered as part of the schema
using the notations and diagramming rules as described in the textbook and in class.
E/R Dialects
The textbook — and our examples in the lectures — uses the “Stanford” style of E/R modelling. Note that there are many dialects of E/R, however. For consistency, for this project, you must adhere to the Stanford (the textbook’s) dialect.
A different textbook was used previously in this course at York a while back which used a different dialect of E/R (the “Wisconsin” dialect). So there is “3421” material around with E/R examples written in the Wisconsin dialect. The two dialects are not significantly different, but there are notable syntactic differences. Feel free to use these for studying and for reference, of course. But be aware of the differences.
There are many additional dialects. Some studentsa may have experience with them. However, for purposes of this class, we insist that you do your work in the Stanford dialect.
Congratulations! You have been hired by Health Canada as a database-design expert. Your group is busy working to design and field a contact-tracing database to be able to trace rapidly people who may have been in close proximity (contact) to someone diagnosed with a highly infectious disease. This can be used to help contain an epidemic.
For your first assignment, you have been requested to design a schema (as an E/R diagram) for the contact-tracing database.
The Contact Tracing Domain
To be able to trace with whom a person has been in contact, we need to know where that person has been and when. We have to know this about everyone else too. Then we could figure out the potential contacts by seeing who was in places at the same time as the person. During an epidemic with a highly infectious disease, by tracing the recent contacts of a person who has become ill, these people can be warned to take appropriate action. This can greatly help to stem the epidemic.
Thus Person is fundamental in our database. We are tracing people, after all. Information we should keep for people is a name, address, and phone#. We can keep sin, a person’s social insurance number to identify the person. (Assume that the government will issue anyone in Canada who does not have a sin a temporary sin for this very purpose. Also note that, in building such a database for real, in truth, using sin for this would likely not be a good choice. But, for the project, let us assume it is.)
We should also record Places. This will include public building and other indoor places where people can meet, and thus come into contact. (We assume for now that the disease does not transmit outdoors, so we are not keeping track of all possible locations.) We can use a place’s name to identify it, and we want to keep about a place its gps coordinates, address, and a description.
Central to the whole database’s operation is to collect “observations” about which places a person has been, and when they were there. Thus, we are designing a mass-surveillance database! But we are assuming that this is for a good cause, and that the database, once up and running, will not be abused for other purposes.
Let us call an observation that such a person was in such a place at such a time a Recon — a shortened form of the military term reconnaissance — to have an easier way to organize our thoughts. A Recon names a Person (who) as being at a Place at a given time (when). Let us employ the notion of Time Slot for handling times. There will be effectively an entry in Time Slot for every fifteen-minute period; e.g., 3:00pm 25 September 2020, 3:15pm 25 September 2020, 3:30pm 25 September 2020, and so forth. For any given time slot in which a person was observed somehow to be at a place, we would have a Recon entry. Thus each Recon is associated with a Time Slot, telling us when the person was there. (A better way to handle time in such a scenario is to work with time intervals. This is significantly more involved design-wise, however. Therefore, for our first cut of a schema for our contact-tracing database — that is, this project — let us employ Time Slot to handle the “when” aspect.) In fact, we can use when to identify any given Time Slot. (Say we assume the when value identifies the beginning of the time slot; e.g., 3:00pm stands for 3:00pm up to 3:15pm.)
A Recon then identifies — and therefore, is identified, at least in part, by — the person, the place, and the time slot (when), which is “saying” that Person (who) was at Place during this Time-slot time (when). For any Recon, we also must record the Method, the way, that we know that the person was at the place at that time. Examples of methods might be, for example, contact-tracing phone app, surveillance camera with facial recognition, and a registry entry (the person had to sign a registry book on entering and exiting the building). Note that there could be more than one recons telling us a person was at a place at a given time, each recon supported by different evidence (Method).
In our database, we also need to track when a person is tested, a Test, for the disease. Such a diagnosis Test is administered to a given Person at (“upon”) a given time (Time Slot). This information identifies any given Test. A Test is administered in a Test Centre — which is a Place — and is of a given Test Type. (There are different types of test for the disease, which may differ in efficacy and cost.) Thus, we want to record where the Test was administered, and what type of test it was.
A Test may result in an Action. Let Action be identified by an action name. (An action might be taken given the result of the test; for instance, if it came back positive for the disease, the person might be placed in quarantine.) Assume that a given Test results in at most one Action. Of course, a Test might not result in any action. (Say, the test came back negative and no action was necessary.) We want to track actions resulting from tests.
Not all Test Centres are equiped to administer all types of test (Test Types). Thus, we want to record which Test Types are offered at which Test Centres. A Test Centre may offer different types of test; and a type of test may be available at a number of test centres.
Lastly, the Government is to require that each Person identifies their bubble; that is family and friends — other people (Persons) — with whom they are regularly in contact. We need to record this “bubble” information in the database.
Go forth and design! And keep us all safe.
Considerations
This design exercise is meant to simulate a real life system-analysis situation; hence, the domain description above is unstructured, and sometimes fuzzy. It is your responsibility as an analyst / designer to translate what you read into a proper E/R diagram that models the domain correctly.
Keep the following things in mind for your design.
- In real life, many more data elements than described in the “requirements” above would be needed to build a useful database. We do not, however, want to turn this project into something huge. So keep in mind that this is a highly simplified case.
- As a general approach, if a particular constraint is not explicitly given, then assume the least restricted situation. For example, “many” in the E/R Model is less restricted than “at most one”.
- You are not required to specify the domains of the attributes here in your diagram (step A).
- Make certain that you clearly indicate any constraints that should be considered as belonging to your design but that are additional to those captured in the logic of your E/R diagram.
That is, state any constraints that are indicated by the requirements, but that you know are not enforced by your design in brief documentation attached with your design. For each such case, explain whether the unenforced constraint
- is beyond the scope of E/R,
- seems to be beyond the scope of E/R (but you do not know for certain),
- would greatly complicate the design beyond value, or
- is simply not accommodated in your design.
Submit an electronic copy in PDF of your project using EECS’s submit
utility.
Due: by 11:59pm October 9 2020.
by 11:59pm Saturday 10 October 2020.
(The due date has been extended a day by common request.)
Your submitted PDF document should include the following.
- E/R Diagram
Your full E/R diagram for the Episodes database. - Documentation
Paragraphs explaining details about the design.- Any clarifications about your E/R diagram that are not evident in the model itself.
- Any assumptions that you had to make with respect to the requirements. (In the real world, these would then be resolved in a follow-up requirements meeting.)
- Any constraints (business rules) apparent from the requirements that you are unable to model via your E/R diagram.
The documentation may be minimal — or even absent — if your model has no such needed clarifications.
Your project must be typeset; that is, no hand-drawn diagrams. Have a cover page for submitting your work, filling out your student number, name, and section — A for Parke Godfrey’s section); or B for Wenxiao Fu’s section — something as follows.
Student#: | |
Sur (Family) Name: | |
Given Name: | |
Section: |
Use the EECS submit
utility to submit online your PDF as follows.
% submit 3421A er your_file.pdf
or
% submit 3421B er your_file.pdf
again, according to the section of the course that you are in, 3421A (with Parke Godfrey) or 3421B (with Wenxiao Fu). The “er” in the submit
command above is a label (representing Project #1) for what is being submitted.
Note: Do ensure that your PDF file is readable by standard tools such as Acrobat Reader before submitting it.
glossary
If the arrowheads or other pictoral notation that you use in your diagram differs significantly from that of the textbook or of that used in our slides, put a glossary of your elements paired with what they are so we can understand your E/R schema.
late submission
If you still need more time, you may turn in the project late by 11:59pm Tuesday 13 October 2020 but with a 10% penalty.
If you do turn it in after 11:59pm Saturday 10 October 2020 but before 11:59pm Tuesday 13 October 2020, use the following submission coordinates:
% submit 3421A er-late your_file.pdf
or
% submit 3421B er-late your_file.pdf
(again, according to the section of the course that you are in.)
Note that if you submit a project to both er
and to er-late
, we shall mark the latter one (with a 10% penalty).
Work Strategy
We suggest working on your design iteratively, little by little. E/R is a refinement process. Get a rough sketch of your design first. Then improve on it to fix issues as you find them.
Diagramming Software
There are a number of professional tools companies use for E/R, but many are rather expensive and specialized. They often provide lots of additional functionality, such as verifiers and automated tools to help translate to relational.
Many generic drawing packages that include diagram / semantic support work nicely, however. Some suitable applications good for drawing E/R are
- LibreOffice (its Draw facility) of The Document Foundation,
- The Dia Diagram Editor,
- Google Docs (the Drawing tool)
- Microsoft Visio, and
- OmniGraffle (The Omni Group),
Google Docs surely suffices; this requires an account, but all York students have access to Google’s suite of tools. Visio and OmniGraffle are great, but somewhat expensive proprietary drawing and generic diagramming applications. Dia is not bad, and is free and open source software under the GNU General Public License version 2.0 (GPLv2), which we used to use.
We quite recommend LibreOffice, which is excellent free and open source software under the Mozilla Public License (MPLv2.0). It is what we are using for the E/R in the class slides. This is on PRISM (EECS) machines, is in the OS image that EECS distributes, and is freely downloadable.
There are any number of other drawing applications that could work, of course.
Some of the diagramming packages may be missing one or two things kind of needed for E/R. For example, there seems to no easy way to underline text in Dia, but key attributes should be underlined. In this case, you could use bold for key attributes (and partial key attributes in week entities), instead. Just make note in your E/R design document any such notational changes that you make due to drawing-application limitations. Pick some reasonable, obvious convention for your arrowheads, following the textbook’s style, the class’s lecture style, or something that is obvious and similar.
Practice
The best way to learn E/R modelling well is lots of practice! Look through the examples in the textbook and the slides.
Look through the examples in the exercises linked here below, and previous E/R assignments in this course.
- E/R project from winter 2020:
- E/R project from winter 2019:
- E/R project from winter 2018:
- E/R Project: Episodes
- A solution appears on the description of that class’s schema project, Relational Schema: Episodes.
- E/R project from fall 2016:
Warning: The E/R below are in the Wisconsin dialect, not the Stanford dialect.
- E/R project from fall 2015:
- E/R project from winter 2014:
- E/R project from winter 2013:
- Assorted exercises covering the relational model, E/R design, relational design, relational algebra & calculus, and normalization:
Typesetting E/R is painful! Can’t I just hand-draw this thing?
No. You can hand-draw your sketches while working on the project, of course. But the E/R diagram that you turn in must be typeset.
OK. I’m using LibreOffice then for my final E/R. Do you have a template that you use for E/R in the lecture slides that you can share?
Sure! LibreOffice-ER-template.odg.
You can select and cut and paste elements in Libre. Use Arrange > Bring Forward to hide connecting lines under the boxes.
I have “Program X” that does drawings for E/R diagrams, but the E/R dialect is quite different than the Stanford dialect used in class. Can’t I just use “Program X”? Please, please, please?
No, you need to use the Stanford dialect for the project. Apologies to be so strict. But it would be chaos if everyone is drawing their E/R’s in different dialects. (We do care about keeping our and our TAs’ sanity intact!) And we want to ensure that people are understanding the logic of the E/R model.
— Parke Godfrey (Section A)
— Wenxiao Fu (Section B)
— Nasim Razavi (TA)