This page is to work out a list of main Citation Elements. As a starting point, I took all the CEs, i.e. the left column, from GeneJ's spreadsheet Zotero Fields_alpha_97-04v.xls. There are exactly 100 CEs in that file. Surely, some will be missing, and others will be redundant.
Please add to the list the Elements that you think we need. Don't remove the redundant ones, add a line under them instead to state your opinion. To indent a line, use the > character one or more times (using wikitext editing mode). Elements can also be discussed in seperate discussions, analog to the requirement catalog.
I've added some comments already, signed with 4 tildes (~)





































































































Comments

ttwetmore 2011-12-21T06:55:50-08:00
Redundancy
Note in the list that there are a number of elements named xxxxxTitle, e.g., see encyclopediaTitle and blogTitle and forumTitle.

One title element will suffice. Scanning through the list there are numerous cases like this. I think we can reduce the number of elements. In addition, I expect a number of other elements can be combined.
ttwetmore 2011-12-22T14:06:23-08:00
@GeneJ,

That makes sense (about a generic title). I guess I don't see a problem then.
GeneJ 2011-12-22T14:11:31-08:00
@ Tom,

Well, I guess somewhere along the way, more than a few of us hope we'll focus on supporting features that drive folks to actually buy and use software.--GJ
louiskessler 2011-12-22T14:39:26-08:00
Tom said:

"If RootsMagic did not embrace Better GEDCOM, it would not be able to use BG data at all, let alone round trip it. If RootsMagic did embrace Better GEDCOM, it would first have to write a great deal of new software, and if at the end of that, it could not round trip BG data, then it would still have bugs in its BG implementation that it would have to fix. If BG decides to allow source chaining in the model, then vendors would have to support it. This means it would have to be able to recognize chained records on import, but it wouldn't have to allow its own users to create them from within the program."

And that in effect is reason for the chicken-and-egg problem that BetterGEDCOM faces.

We want BetterGEDCOM to be able to transfer data perfectly between programs. To do so, we need the programs all to adopt and implement BetterGEDCOM and to do so PERFECTLY.

The more details, capabilities and complexity that's added to BetterGEDCOM, the more resistent the developers will be to adopting it, and even if they do, the more likely they are to implement it incorrectly - thus causing the exact incompatibilities between programs that we don't want.

More capability ==> Less compatibility

But of course, also ...

Less capability means some of the necessary constructs will not be handled by BetterGEDCOM. Meaning programs that implement them will be unable to transfer their data.

So Less Capability is no good either.

The near-to-impossible task is to find the happy medium for BetterGEDCOM, that is neither too simple nor too complex.

Louis
GeneJ 2011-12-22T14:43:26-08:00
@Tom,

Don't get me wrong, I like the concept of higher and lower sources. (Even wrote about it, early on.)

The credit line involves a little different genealogical logic, but I think we can work through that, too.

What I'm concerned about is unlimited hierarchical schemes in terms of higher and higher and higher references. --GJ
AdrianB38 2011-12-22T15:19:00-08:00
Tom
Re "I wouldn't expect email to have a title. Email has a subject..."

D'uh - how dumb am I that I didn't look? So scrub that as an example but the possibility still holds that _something_ has a "Title" that doesn't work on its own. For instance, WordPress posts have Titles (I checked this time!) but I wouldn't imagine that the contents of the Title field would suffice as a Title for a source, due to duplication. For instance, I have a post "The Mystery of the Old White Bear Inn" and that, on its own, is surely most unsatisfactory as a Title for a Source due to the lack of clarity about what it means. Surely it should be extended by the name of the WordPress site, etc.?

Having said that.... I wonder if I am confusing two aspects here? The source record needs to have a unique title in the database and for a long time I used the GEDCOM TITLE item for this purpose. That would certainly need to have the title of the post and of the blog. Plus I'd prefix it with "Posting " because I like all my census sources, my baptism sources, etc, altogether. But that's the source record (I now use the ABBREV item for this) - nothing to do with the detail to be cited.

In which case, maybe we don't need the full title that we cite to appear on the source record after all.... In which case maybe we don't have a clash between the actual title and a descriptive one.

Even so we certainly, as Gene says, concoct titles that aren't quotes from bits - such as the "1910 U.S. census" instead of that phrase, and doing this does carry the risk that at some point our concocted title might include the thing marked as the actual title.

Maybe I now go back to thinking this is a risk but in most cases, as you originally said, the subtle differences don't matter because the values never "cross" between source types. It's just that somewhere there may be a material issue.... But until we find it....
AdrianB38 2011-12-22T15:27:27-08:00
Gene - "What I'm concerned about is unlimited hierarchical schemes in terms of higher and higher and higher references"

Me too - however, I think it may not be a risk in practice. Since someone's having to enter the hierarchy, they're not going to create long chains unless there's validity. For instance, while the Ancestry censuses may be digitised from microfilms, do we need to record that stage really? Why not go straight to the original? (Especially as we may not know if there's an intermediate m/f and the world doesn't stop if we don't). (This is less of a hierarchy than a chain but I don't think it matters.)

Providing there is intelligent design of the templates (e.g. reference to "Highest Title" so we don't need to know how many elements in the chain) it may not be an issue.

But I shall steer clear of template design because I seriously don't think BG should be doing it.
GeneJ 2011-12-22T15:40:39-08:00
@ Adrian,

It's pretty popular among programs to export the title of a "Master Source" to ABBR. See the RM input and export at
http://bettergedcom.wikispaces.com/Software+Citations
GeneJ 2011-12-22T16:00:46-08:00
Golly, that went too fast.

I'd create a "master source" for each blog article (most likely), so I'd have two titles in that master source--the title of the blog and the title of the blog article. (Zotero does too!! They call the source type "blogPost" and have elements for both the "blogTitle" (the name of the blog) and the "Title" (the name of the blog article).

That makes sense to me. The item type is "blogPost" so the "Title" is the blogPost title.
GeneJ 2011-12-22T16:02:22-08:00
Shoot.

PS I posted a graphic example showing how Zotero capture data into fields/keys for its item type, "blogPost."

http://bettergedcom.wikispaces.com/Zotero+blogPost+graphic-example
ttwetmore 2011-12-22T17:30:41-08:00
@Adrian,

You said this :The source record needs to have a unique title in the database

I know I am taking this out of context, but I don't understand what you mean by this. I would say that every record needs to have a unique id in the database, regardless of the record type, but that is the only thing that must be unique.

I probably misunderstand your context.
ttwetmore 2011-12-22T17:42:22-08:00
@GeneJ's example:

I'd create a "master source" for each blog article (most likely), so I'd have two titles in that master source--the title of the blog and the title of the blog article. (Zotero does too!! They call the source type "blogPost" and have elements for both the "blogTitle" (the name of the blog) and the "Title" (the name of the blog article).

For me this is best treated as a two-level source, one for the article and one for the blog. This would allow many articles to refer to the same blog. Otherwise every article from the blog must contain duplicate information about the blog. If only one article would ever be referred to, no great shakes, but if ten articles were referred to, save yourself some typing. And of course, both titles can be just "title", exactly as I believe they should be.

If we are to put complex sources inside a single source then we are going to need much more complexity in how we handle citation templates. Imagine the combinatoric problem involved here.
ACProctor 2011-12-23T09:44:28-08:00
Wow! This thread grew quite a bit since I last looked.

Re: "Separately, though, don't unlimited hierarchical schemes conflict a bit with the premise of 80-20 source types.

At some point, perhaps we need a presentation about the compatibility of unlimited hierarchical schemes with source types and programmed citations. Would data from an unlimited hierarchical scheme transfer information to say RootsMagic so that a programmed could be honored?"

I'm afraid this is very rigid thinking. My hierarchy is a structural hierarchy rather than a semantic hierarchy, which basically means there would be recommendations for how to apply it but it would be up to the software being used. I prefer the particular ordering I quoted but there are more levels that can usefully be inserted - I will be writing this up soon.

As for transportability: a hierarchy is simply a set of elements with a "parent" property. Once a hierarchy of elements is generated by one product, there is no choice in any other product but to honour the stored hierarchy. That is part of your data and it should not be reorganised in any way by other software.

In summary, it's a mechanism that can accommodate individual preferences. It would be wrong to restrict this generality to just one way of working. 'Best practices' can be generated for it but it eventually comes down to how good the software is and very it provides the full flexibility to you - which is a nice differentiating commercial factor for them. [I think one of Tom's posts was suggesting something similar, unless I misread it]

Tony
GeneJ 2011-12-21T07:35:16-08:00
There will be a need for more than one title element (Title, subTitle, [article]Title ...), and we will need to resolve what is otherwise a conflict between proponents of hierarchical and more linear systems. For example, Zotero has what's called a "flat" model.
http://forums.zotero.org/discussion/391/1/hierarchical-item-relationships/

The same is true for "place" elements--jurisdictions vs civil districts, even street address; event place vs publication place ....

It's hard to take a list of elements very far without placing elements/attributes in the context of that item type.
ACProctor 2011-12-21T11:30:50-08:00
I'm one of those people that believe schemes like Dublin Core are the wrong approach [I had to tone down my original description here]. In summary, I believe the elements of each source-type have semantics that can only be understood in the context of that source-type.

There mere fact that two source-types have an element called, say, "Title" should not be given any weight. A blog title is not the same as a publication title, or a dictionary title, etc.

Trying to whittle down all possibilities to a set of shared names with fixed semantics will not work. Dublin Core already proves that by virtue of all the extensions.

I described this to Gene, the other day, as analogous to column names in a relational database. It would be suicide to design such a database with a fixed set of shared column names, and to try and predict all possible column names used in real-life. The mere fact that table T1 has a column called TITLE and so does table T2 is mere coincidence.

Tony
ttwetmore 2011-12-21T12:33:27-08:00
@Tony,

Not intending any great argument here, but I see the semantics of the citation keys as being based on the source types. The title in a book source is the title of the book, and so on, and these would be defined by the specifications of each source type. Thus no problem in reusing the same key for different meanings in each source type. And obviously the differences in meaning aren't going to be all that much.

I guess I'm at a compromise point between Louis and you. Louis believes that the three or so citation elements of GEDCOM can be made to work in more general terms for BG. You believe in a rich set where each source type can define its own set of citation elements. I don't think either extreme is best, but I'm enough burned out arguing about these things that I'll go wherever the consensus leads.
GeneJ 2011-12-21T17:42:23-08:00
@ Tony,

You wrote, "I believe the elements of each source-type have semantics that can only be understood in the context of that source-type."

I agree.

Separately, though, don't unlimited hierarchical schemes conflict a bit with the premise of 80-20 source types.

At some point, perhaps we need a presentation about the compatibility of unlimited hierarchical schemes with source types and programmed citations. Would data from an unlimited hierarchical scheme transfer information to say RootsMagic so that a programmed could be honored?

We talked about this a little yesterday.
GeneJ 2011-12-21T17:43:42-08:00
P.S. .... and then how could RootsMagic hope to round trip that data back to the unlimited hierarchical scheme?
AdrianB38 2011-12-22T10:00:07-08:00
Re Title - Having thought about this, I realise Tony is right. GEDCOM source records use the TITLE item in two subtly different ways - plus variations in between. One way is to record here what I could refer to as the "actual title" - the title that is actually printed on the book. The other end of the spectrum is to record a "descriptive title" - just a phrase that describes the source. Both items end up in TITLE.

To expand a bit -
- a book has a title printed on the front (usually!). This is an example of an "actual title" and (often but not always) there will therefore be no debate about what the title is - just read it. (See below for "But...")
- an unpublished document such as a census form or an employee staff record will not usually have an "actual title". There'll be enough data _somewhere_ in the source's paper to describe it but it may be a bit here, a bit there, a bit from the front page, a bit from three pages back. Put together, these create a "descriptive title" that describes the source in a useful manner - though even then it may need the archives' "call number" to be absolutely precise. One issue with descriptive titles is that two genealogists may disagree on what the descriptive title will be - "How many page numbers from the American census am I supposed to record?" "How many place names from the English census am I supposed to record?" Huge parts of ESM's books appear to be spent defining these descriptive titles.

(Before I go further, these are actually 2 opposite ends of a spectrum and real documents may need a position part way - for instance, is a sub-title part of an "actual title"? If the book is part of a series, how much of the series gets into the TITLE?)

Now, all this analysis might be just me rambling on - surely the TITLE is a TITLE is a TITLE and how it's got derived doesn't matter to the database? Which is kind of Tom's viewpoint when he said "Thus no problem in reusing the same key for different meanings in each source type..."

But there are several source types where the distinction is actually a potential pain. For instance, an email has an "actual title" - but it's an "actual title" that is pretty useless as the TITLE of a source record. What everyone does (sensibly) is concoct a "descriptive title" that consists of something like the "actual title" + "sender" + "recipient" + "date-time-sent". Oh, that's interesting - the 2 titles are being used in different fashions so you can't pretend they're really the same item.

Even here, it doesn't matter, you could just store what is actually the descriptive title in TITLE _provided_ you never want to retain the components of the descriptive title. IF, however, you do want for an email to record and keep the "actual title", "sender", "recipient", "date-time-sent" permanently as attributes of the source, as well as the full descriptive title - because it contains other text to make it read sensibly, then you do need to store the two title items differently.

Or in summary, the TITLE of an email is not the same as the TITLE of a (non-series) book.

(Non-IT-geeks look away now - There are other items raised by this - for instance, the descriptive title of an email should not duplicate the separately stored "actual title", "sender", "recipient", "date-time-sent" - so there's probably some sort of macro-language style text substitution to go on - e.g. "Email &ActualTitle from &Sender to &Recipient sent &DateTimeSent")
ttwetmore 2011-12-22T13:39:37-08:00
@Adrian,

I wouldn't expect email to have a title. Email has a subject, a sender, a recipient, and a date and the body. A few other things maybe, but no title. If you are constructing a title by bringing different values together into a longer string, then you are not really building a title, you are building a citation, and the rules you use for bringing the values together is the "template."
ttwetmore 2011-12-22T13:55:22-08:00
@GeneJ,

"Unlimited hierarchical scheme?" By this do you mean the ability of one source to refer to another source?

The mechanism in DeadEnds of allowing every source to refer to another source is simply there to enable the "chaining" or the "tree-structuring" of sources in complex cases. There is no recommendation (from me at any rate) that such a mechanism, because it is available, be used inappropriately.

From the examples already given people have come across a number of situations where the concept would help them out of a representational bind.

One of my principles in making models is to add trivial capabilities if it seems like they could be made good use of. The adding of a source reference to source record is one of those trivial things that enables many possibilities that would not exist otherwise. If the extra source reference were not used, we have the source model of current genealogical systems. With one trivial addition we get a model that can handle sources of any conceivable complexity. In fact, if you go back to the original DeadEnds specification, you will see that DeadEnds actually allows any number of source references in a source, so a source, like a census record, could point to higher level source records in different dimensions of specificity, say the the microfilm chain, say the enumeration district chain, and say the location on a website chain. I didn't mention that in the doc that started this topic, because I didn't want you completely blow you away at the start.

In exactly the same manner, the feature that the DeadEnds person record can hold any number of person references, is the enabler that allows DeadEnds to give full support to the evidence and conclusion models, aka the records and person models, of genealogical research. A trivial addition to the model, enables an entirely new generation of genealogy software to be contemplated. If some genealogical vendor doesn't want to support personas, no sweat, he can just pretend these references don't exist. It ain't use 'em or loose 'em, it's use 'em if you need 'em.
ttwetmore 2011-12-22T14:01:53-08:00
@GeneJ writes:

P.S. .... and then how could RootsMagic hope to round trip that data back to the unlimited hierarchical scheme?

Is this is a meaningful question? If RootsMagic did not embrace Better GEDCOM, it would not be able to use BG data at all, let alone round trip it. If RootsMagic did embrace Better GEDCOM, it would first have to write a great deal of new software, and if at the end of that, it could not round trip BG data, then it would still have bugs in its BG implementation that it would have to fix. If BG decides to allow source chaining in the model, then vendors would have to support it. This means it would have to be able to recognize chained records on import, but it wouldn't have to allow its own users to create them from within the program.
GeneJ 2011-12-22T14:02:35-08:00
@Tom,

Believe what Adrian is referring to is the frequent use of generic titles in genealogy.

Heck, most of us even use a generic title for the U.S census. (Who wants to write out _Thirteenth Census.... United States_" and how meaningful is that when you can say 1910 U.S. census.)

Generic titles come in different flavors, with various descriptions "built in." An untitled letter might be entered with the title "John Smith to Tomas Jones," and a photograph might be "Nellie Smith's wedding c1898."

Since the "subject line" in an email might be irrelevant (or even inappropriate), I often see generic title entries.
gthorud 2011-12-22T06:13:19-08:00
A structured approach
I think we need to discuss how this work should be done before it starts.

- I think we need to do the work in combination with development of source types.

- We also need to considder:
--the use of data types to handle culture dependencies. Please read my data model 0.4 document.

--implications of "hierarchchical sources" and "source of the source" and their implications for templates.

-- how multiple "creators" of different types

--Can a CE repeat?

And there are more issues.

The list should be maintaned in a spreadsheet rather than a page, by one or two persons. We also need spreadsheet(s) for sources.

Please do not use comments directly on the page, it will be impossible to preserve the discussion (it might perhaps be possible to backup discussions, I don't see any way do backup comments on pages).
GeneJ 2011-12-22T06:34:52-08:00
Hi Geir,

SourceTypes / Spreadsheet.
If we work on a spreadsheet that will build generally into something that looks like the posted Zotero spreadsheet, I recommend we have a main page here on the wiki that links to the spreadsheet and also links to
(a) a wiki page for each "source type"--where we can either examples or links to same and have organized discussions about that source type.
(b) a page for particular CE/Key/Field for which we know advanced solutions are needed (creators and roles) and advance CE issus (duplicate or extend).

The spreadsheet can be added to the BetterGEDCOM GoogleDocs page, but then folks will have to request access.