Home > Sources and Citations

Sources and Citations

Introduction


As documented throughout the wiki and in numerous blog articles and other postings, it is not possible to transfer the source and citation data recorded in several major genealogy programs to a different program using GEDCOM. Data become lost or distorted during the transfer process. The reason is that these programs have extended their source and citation data with fields that are more specific than those in GEDCOM, or use standard GEDCOM with custom formatting conventions, and/or export data that do not conform to GEDCOM. Below we attempt to catalog a host of BetterGEDCOM wiki pages and discussions about sources and citations. The pages discuss alternatives for extensions of GEDCOM that can solve this problem, additional functionality related to sources and citations, the implications of various citation practices and also solutions for automatic download of data about sources (metadata) from internet services for use in citations. The data content in the sources is outside the scope of this page.

Please note that there can be many discussions on pages, few discussions are listed separately below..

Please help us get this page complete, send page links with summaries to ghtorud or GeneJ.

Subject categories
  1. Most recent work
  2. Previous major discussions and working documents
  3. User Requirements
  4. Current programs, GEDCOM and problems
  5. Citation styles and guides
  6. Citation methods, practices and examples
  7. Other solutions for Biographic meta data and Citations (not genealogy specific)
  8. Terminology
  9. Other, not categorized

The same wiki page or discussion thread may appear in several subject categories.

1. Most recent work

A Data Model for Sources and Citations. This page proposes a model for recording Sources and Citation data supporting international use. It defines entities for transfer of instances of S&C data and definitions of Master Source Types and Citation Element Types. Template definitions control the output of citations in reports and conversion of data. The entities may be combined in several ways in a file format or other transfer solution, e.g. web services.

2. Previous major discussions and working documents


3. User Requirements


4. Current programs, GEDCOM and problems


5. Citation styles and guides

6. Citation methods, practices and examples


7. Other solutions for Biographic metadata and Citations (not genealogy specific)


8. Terminology

Unfortunately the terminology used on the various pages is not consistent. Some terms are defined in the Definitions page (link below and on the BetterGEDCOM wiki left side navigation bar); some of the documents, pages or discussions define their own terminology. More work is needed to come up with a consistent terminology.

9. Other, not categorized


a. GENTECH Genealogical Data Model (2000). Gentech, among other things, has a multilevel source model.

b. Robert Raymond, "Interoperable Citation Exchange 2009-03-11.pdf" (2009). Presentation sometimes referred to as "I.C.E." BetterGEDCOM discussions below.

c. RootsTech 2011 Wikipage, "Open Interactive Sources."

d. References to John H. Yates work to develop a standardized implementations for Evidence Style.

e. GeneJ's personal blog for articles:

f. Mark Tucker, ThinkGenealogy, "Better Online Citations (series)."

g. Randy Seaver's blog, Geneamusings, articles about his attempts to create and sync sources and citations. [will need to catalog and post these... ]

Comments

louiskessler 2011-12-12T08:13:11-08:00
Incorrect and Misleading Assumption
You start off this Sources and Citations article with:

"it is not possible to transfer the source and citation data recorded in many genealogy programs to a different program using GEDCOM. Data become lost or distorted during the transfer process."

That is both incorrect and misleading. You make it appear that GEDCOM itself is the problem. GEDCOM is not perfect, but is adequate enough to allow the transfer almost all of the data correctly.

The problem is a with the program developers who do not export their data correctly to GEDCOM, and who do not import their data correctly from GEDCOM, and do not use the same data structures internally to represent their source and citation data.

Developers want to export their data their way. They do not want to be forced to fit their data into some standard. So they squeeze their data into GEDCOM into whatever manner they see works for them that is most compatible with their data structure.

Doing so, the data does not become lost or distorted. Example: RootsMagic exports their source and citation data to an (almost legal) GEDCOM file with user-defined tags used for their template definitions. They can then read that information in again PERFECTLY!

When I look at a RootsMagic4/5 GEDCOM with Behold, I can see all the source and citation data. It is all there. I have to do work if I want to interpret it, though, since they are not true to GEDCOM but are refitting their own data structures into a GEDCOM template.

We can't tell developers how to design their data structures. Each will do it their own way. So they will always be different.

The only way a BetterGEDCOM will work to improve this transfer process is if it is designed so well that all developers decide to do the work to transfer their own data structures properly and accurately to the BetterGEDCOM format. And then they must read in that format properly and accurately and convert it to their own internal structure.

Convincing developers to do this is the Herculean task you are addressing. If they don't, it doesn't matter whether you use GEDCOM, BetterGEDCOM, or PerfectGEDCOM. None will transfer the data if the developers don't follow it.

Louis
ttwetmore 2011-12-16T08:52:38-08:00
@Adrian,

I agree with you. The reason I’m asking is because I’m wondering whether BG would need a special notation for assertions. I don’t believe this is necessary.

I believe that an evidence record (e.g., a persona) is an overall assertion about the existence of a person, and that the source record that it refers to provides the justification or “proof” of the assertion. Say the evidence provides the person’s name and birth date, so the persona record might be:
<person id=”i1”>
  <name> Hannah Trask </name>
  <birth> <date> 23 March 1797 </date></birth>
  <sourceref id=”s1” page=”34”/>
</person>
Ignoring the source reference I would say this persona is an assertion, and the source reference provides its “proof.” If you want to say that the persona is a complex assertion, made up of a name assertion and a birth date assertion, that makes sense too. The source record the reference points to would be something like:
<source id=”s1>
  <title> Trasks of Nova Scotia </title>
  ...
</source>
The source record is also an assertion, about the existence of a book in this case, put I don’t feel any compunction to provide a proof of the books existence, so the buck stops here.

Do we need a more complex mechanism for assertions or this enough? I mean, is it sufficient just be be able to look at the data in a BG file and say, “this is an assertion,” or “that isn’t an assertion” and that’s all?
GeneJ 2011-12-16T09:06:46-08:00
I suggest that if you don't understand my use of the term "assertion level" then you may not have been actively engaged with any number of genealogy software programs with expanded source and citation systems--groups of different fields are entered at the level of a master source/source, and other fields are accessed at the time that master source/source is cited. We tried on the term "citation level" to describe the entry process relative to that latter group. "Citation level" doesn't fit well--it conflicts with the now established work in the field of genealogy. Tada ... "Assertion level."

I don't care whether you are entering a "persona" a "conclusion" (or the pfact or a turkey-baster for that matter). --GJ


P.S. The Evidence Explained definition for assertion is "assertion: a claim or statement of 'fact.'

I believe Tom opted to use a different definition in the entry to the Definitions, but the meaning of the word "assertion," was previously discussed. http://bettergedcom.wikispaces.com/message/view/Pending+Definitions/35150918
AdrianB38 2011-12-16T14:42:51-08:00
"The reason I’m asking is because I’m wondering whether BG would need a special notation for assertions. I don’t believe this is necessary."

Neither do I. "Assertion" is synonymous with PFACT, relationship, etc., so needs no special notation beyond what already exists. It's just another name as far as I'm concerned, albeit one that sweeps up PFACT, entity-existence, entity-relationship, etc., so has attractions as a _term_ to me.

Gene's post above seems to indicate she means the same as me by assertion, so again assertion-level simply refers to the stuff used to justify a PFACT etc.
louiskessler 2011-12-16T15:50:25-08:00
Yes, I agree with Tom's restatement:

"t is not possible to transfer the source and citation data recorded in several genealogy programs to other programs. Data become lost or changed during the transfer process. The reason is that these programs have either extended their source and citation data beyond standard GEDCOM, or use standard GEDCOM with custom formatting conventions, or export data that do not conform to GEDCOM."

If you remove the extra words in Geir's statement, he originally said:

"It is not possible to transfer ... data ... using GEDCOM."

By taking off the "using GEDCOM", the blame comes off the GEDCOM (which it should) and is placed on the programs who have extended GEDCOM their own way, which they shouldn't have.

Louis
ttwetmore 2011-12-16T16:32:37-08:00
Adrian, you said:

"Assertion" is synonymous with PFACT, relationship, etc., so needs no special notation beyond what already exists. It's just another name as far as I'm concerned, albeit one that sweeps up PFACT, entity-existence, entity-relationship, etc., so has attractions as a _term_ to me.

Thanks. I believe the same. I was confused by the term "assertion level". I think I now understand it at the citation elements that locate evidence within sources, rather than the citation elements to describe sources. I make the same distinction in the DeadEnds model, where the citation elements that describe the sources are in the source records, and the citation elements that describe where the evidence comes from in the sources is in the source reference.

I don't think this is exactly what GeneJ means, because she puts non-source information into her source records (e.g., references notes, conclusions, bits of evidence), and she may mean this information as the assertion level.
gthorud 2011-12-17T19:14:44-08:00
First of all, I wonder why the whole discussion has moved from the Data Model page to this page, and I wonder why we are discussing all these issues in a topic with the title "Incorrect and misleading assumptions" – this has lead to just another mess. Please create new topics for new issues.

Louis suggests to have a list of Citation Elements ready before Rootstech. I don't think that is realistic, such a job will take many months and may even take years if you want to have a 80% COMPLETE international solution.

Tom (or should I write Tmo, he clearly makes a point out of spelling a foreign name wrong) writes that the only job of the source reference – I assume he means reference note – is to identify where information was found. I do not agree with this limitation, current practice allows for inclusion of summaries, extracts and reasoning in a reference note – and there is no reason to prevent that. What I see Tom doing is to tailor user requirements to his technical E&C solution – it should be the other way around.

It is interesting to note that Tom accuses those who does not agree with him to "trivialize BG", whatever he means by that. I am not able to take such arguments serious.

I will write about slitting and lumping elsewhere.
ttwetmore 2011-12-17T21:39:54-08:00
I don't think there is any way to control how discussions occur on this wiki.

I didn't mean reference note when I said source reference, though they are similar.

Misspelling your name was not intentional; I apologize.

I did not tailor those user requirements to fit my model. I designed my model to fit the user requirements as I have determined them to be during twenty years of using genealogical software and imagining how I would want more advanced genealogical software to work. Possibly my imagination is too limited, however, to see the full set of user requirements.

Summaries of information from sources, if that information has not been directly extracted and placed in evidence records, belong in notes. My source reference structure allows these notes, as I'm sure does the reference note.

Extracts belong in evidence records that partition the information into units that describe persons and events. In models without evidence records, extracts, if they are to be put into the database, would have to put in the source records or the source references. This could be done as notes, but then the information is not structured well enough for the software to deal with it. Louis has suggested that the extracts could be put into person or event based structures that are kept within the source records. The only real difference between my recommendation for evidence records and Louis's recommendation for evidence structures in source records, is that in my case the information is in separate records, and in Louis's case the information, basically identical in content, is found in the source records. I would say that this one difference is the only major disagreement between Louis and me. It boils down to the fact that Louis doesn't see any purpose for those independent records, whereas I see the whole evidence and conclusion research process as needing them.

Reasoning, if it has to do with figuring out what specific evidence means, belongs as notes in the source references, which I guess, would be how it is done in reference notes.

Reasoning, if it has to do with concluding which sets of evidence records refer to the same person, belongs in the conclusion references in the conclusion records.

The real problem with reasoning is worrying about whether it can appear in a structured form that software can recognize and work with. I am stumped by this problem. Currently I can only imagine that reasoning in a conclusion reference be just text that describes why the researcher has made their decision about which evidence applies to which persons, how they have resolved discrepancies and gaps in the combined evidence, and how well their believe they have answered their research questions.
louiskessler 2011-12-18T09:01:11-08:00

Geir said:

"Louis suggests to have a list of Citation Elements ready before Rootstech. I don't think that is realistic, such a job will take many months and may even take years if you want to have a 80% COMPLETE international solution"

Geir. We've been hacking around for over a year. The discussion is wonderful, but we can discuss until the cows come home.

We've got to make the attempt to formalize something. It need only be a 0.1 draft version, but it has to be something. It must be simple and understandable and shouldn't be expected to include everything.

Once we've got something, it can be changed or expanded. But it will give everyone a focus and a basis for what the potential must be.

And believe me the sense of accomplishment we'll all feel when BetterGEDCOM can announce that it has done something.

Otherwise, we'll just be like the GEDCOM mailing list that has discussed GEDCOM ad-infinitum for 20 years and produced nothing. That's not what I want to spend my time doing.

My proposal for getting started is simple. The initial template is GeneJ's Zotero spreadsheet. For my recommend plan of attack, see: http://bettergedcom.wikispaces.com/message/view/Sources+and+Citations/48362656

Louis
gthorud 2011-12-18T17:46:16-08:00
Tom,

This is just a repetition of a discussion we had in the spring.

There must be a more than one century old practice to refer to sources, with extracts, summaries and arguments in reference notes. I do not see any reason to prevent that, and no one has presented any objective argument against it, other than "Because I think ….".

Summaries – just have a look in e.g. Evidence Explained for use in reference notes, and there are plenty of examples in the genealogy literature.

In your paragraph about Extracts you write "This could be done as notes, but then the information is not structured well enough for the software to deal with it." What do you mean by that? An extract that you would present in a reference note is text, why can it not be handled (this goes without me saying they should go in notes). You already have TEXT_FROM_SOURCE at the source_record and source_citation level in Gedcom – what is your problem?

When a solution for evidence (information representing the content in sources) has been developed, I would like to see a way to import parts of an e.g. a transcribed source as an extract into a reference note, but it has to be done in a way that does not require a program to implement the evidence solution internally (and this does not mean that I do not want to see a solution for evidence in transcripts, image, tabular or codified form) unless you are saying that BetterGEDCOM cannot have a solution for Sources and Citations before it has a solution for evidence.

Where have you found all these rules that you state about whether the various things should be put in inline text (or wherever your notes end up) or in reference notes?

Reasoning in structured form that software can recognize… well, I leave that problem to you.
ttwetmore 2011-12-18T21:07:52-08:00
In your paragraph about Extracts you write "This could be done as notes, but then the information is not structured well enough for the software to deal with it." What do you mean by that?

Say you have an item of evidence that says “Hannah Trask was born on 18 October 1789.” You create a source record or a reference note where you describe the source. How are you going to extract that fact about Hannah Trask? Are you going to add that sentence to a reference note? If you do that software cannot give you any support for finding that fact later. It’s just text in a record, and the best you can do is some kind of text search.

But if you were to create a persona record from that sentence, and have that persona record refer to its source, then you have an object in your database that your software can use. You can ask your software to please find all the evidence recorded about persons named Hannah Trask, and this Hannah Trask persona will show up in the list of all Hannah Trask personas. Your software can give you a table of all the Hannah Trasks with all the key info you have found about them. You can see cleanly in front of you in that table, all the Hannah Trasks that seem to be the same person, and those that don’t seem to be. You can immediately see patterns that build into conclusions, and you can immediately form hypotheses about the different Hannah Trasks, and your software can give you complete support in grouping together different Hannah Trask personas into conclusion persons. How are you going to do that if all the evidence information you have extracted about Hannah Trasks are stuck in notes in reference notes? How can your software help you? It’s no better than shuffling through a deck of real 3x5 cards by hand.

An extract that you would present in a reference note is text, why can it not be handled (this goes without me saying they should go in notes). You already have TEXT_FROM_SOURCE at the source_record and source_citation level in Gedcom – what is your problem?

I just explained that.

When a solution for evidence (information representing the content in sources) has been developed, I would like to see a way to import parts of an e.g. a transcribed source as an extract into a reference note, but it has to be done in a way that does not require a program to implement the evidence solution internally (and this does not mean that I do not want to see a solution for evidence in transcripts, image, tabular or codified form) unless you are saying that BetterGEDCOM cannot have a solution for Sources and Citations before it has a solution for evidence.

I have been arguing from the start of BG that the use of persona records and evidence event records is the best possible solution for handling evidence and records-based genealogy. The solution exists. I am not waiting more of your years when an obvious and well-understood solution is staring ourselves in the face. Do you remember all the references I have made to the nominal record linking work that has been done for the past 40 years? There is a long and well-established scientific tradition of using persona records for family reconstructions and many other tasks that require linking together records from different types of sources (which is what genealogical research is). There is a trail of academic papers in existence about this. I mentioned web sites where classical papers of the field can be found and read. Papers going back at least to the 70’s. The one common thread of every one of these papers, of every one of these efforts to find ways of linking persons, is to extract the evidence into persona form so that the data can be processed. I am talking about “records-based” processes here, processes that have been around a long time. The fact that vendors of today’s genealogical systems only seem to understand the conclusion nature of genealogy, is no excuse for BG to ignore the entire body of work that has been done applying software to the records-based area.
louiskessler 2011-12-18T22:38:36-08:00

... and I'm sorry Tom. Every time you make a point that the Persona is the best way to do it, I have to point out that I prefer including the names and event data in the Source Detail (without interpretation).

These can be searched the same way as your Persona can.

As we've discussed earlier ( http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138?o=60#32728224 ), our methods are very nearly the same, except I prefer not to use the Persona.

Louis
ttwetmore 2011-12-19T01:44:52-08:00
@Louis,

Yes, I know, and I continue to be amazed that you don't see the value of the Persona concept. I wonder whether you have ever considered all the successful work done in nominal record linking. I keep hoping you will think about the problems that must be addressed by records-based genealogy, and come around to the right side!! I continue to maintain that the Persona is the one key concept needed to elevate genealogical software into the research-quality domain. I suppose there are worse things to worry about, however!
ttwetmore 2011-12-14T21:38:48-08:00
@GeneJ,

I read that. It didn't help. In fact it stimulated my comment. You don't there define "assertion level." You state that something that I don't understand is an example of one.
GeneJ 2011-12-14T21:54:24-08:00
@ Tom,

We had a thread back in August that discussed the term "Assertion Level." See the thread below and the graphics on the attached page.

http://bettergedcom.wikispaces.com/message/view/Software+Citations/41147303

If you feel you need to throw your hands up in the air, feel free. --GJ
ttwetmore 2011-12-14T22:32:34-08:00
Thanks. I'm not heading off on this tangent. It has nothing to do with the problem of structuring sources, referencing sources, citation elements, and generating citations. The issue of representing conclusions is an entirely different area. At the rate we're going here I doubt we'll ever get there.
louiskessler 2011-12-14T22:45:48-08:00

AdrianB's definition:

"assertion level data (i.e. that which does NOT come from the source record)"

Well from my way of thinking, everything in the Source and in the PAGE Where-within-source and for that matter the entire Source Reference ONLY should contain information from the source record. There should be no interpretation. It should only be the raw data.

Any information NOT in the source record have to be notes or conclusions that are placed with the conclusion data. None of my posts about Keys/Values Source Types and Citations had anything to do with any of that.

Which is why I too am confused by this term.

Louis
louiskessler 2011-12-14T22:54:59-08:00

... The one exception to this in GEDCOM 5.5.1 is the QUAY, which is an assessment of the Quality of the source record. I feel that should not be there, but moved out to be with the conclusions.

But this is a completely different matter that will get us way, way, way off topic if anyone chooses to continue in this thread. :-(

Louis
ttwetmore 2011-12-14T23:22:14-08:00
@Louis,

I agree with you 100%. There seems to be some misunderstandings about where conclusions belong.

I know exactly what's going on. In the software systems of today, there is no good place to put actual evidence data, and there is no good place to put conclusions. Therefore, some people have figured out how to use source records in some programs to do quadruple duty, holding source information, reference notes, evidence and conclusions. Those who do it this way do not understand that by adding evidence records and a proper approach to handling conclusions, we can finally let sources be sources, and let the other three types of information go to where they belong. They are legitimately concerned that by changing what goes in source records they might loose some of the advantages they have gained by essentially redefining the purpose of the source record to fit their needs. And what is very unfortunate in my opinion is that some cherish this approach so much, and are so sure that it is the perfect solution, that they can never agree to a change This is too bad, and I believe it can only lead to the trivialization of BG. And I am tired of people criticizing the ideas without presenting any solid alternatives. All I get are references to old discussions that barely apply to the subject at hand. You and I have given full-bodied examples of how this approach to sources and citations work. There have been no alternative approaches presented, and certainly no other complete examples of how to hold source and citation data in an archive file. Anyone who would take the time to read the DeadEnds model, carefully enough that they reach an understanding of it, will see the proper places for evidence, for source info, for notes, and for conclusions.
AdrianB38 2011-12-15T14:17:24-08:00
Tom
Re (1) "two level source .... the accident report you are interested in would be a source and it would have a source reference to the annual report, and that source reference would hold the page"

That's absolutely fine. In fact, after I'd originally posted, I worked that very idea out and it was going to be my counter proposal if you still didn't want Page in a Source's details. Clearly I'd misinterpreted you - though my excuse is that in GEDCOM the Page would be ultimately subsidiary to the level 0 Source even in your clarified view.

Re (2) "I think it is important to realize that this approach, with sources and source references, is wholly predicated on the idea that we will have evidence records in the database." OK - I wasn't making that assumption.
AdrianB38 2011-12-15T14:37:36-08:00
Re "assertion level data":
My understanding of the term is that it refers to data that is an attribute of, or a relationship to / from, an assertion, where assertion means any property, fact, attribute, characteristic, trait, of an entity, relationship to / from an entity or even existence of an entity.

Thus, if I've got this right, the source reference for a date of baptism tells us (say) which source is used for the date, where within that source, etc. That's an assertion level source reference in MY understanding.

If we have a 2 level source, e.g. a series of baptismal entries, each with their own source record, beneath a parish register with its own source record, then the source records for the baptismal entries each have a source reference that says which parish register the entry is in, where in it, etc. That's I guess a source-level source reference. In MY book.

One issue is that we still don't have a name for the concept of any property, fact, attribute, characteristic, trait, of an entity, relationship to / from an entity or even existence of an entity. PFACT only covers part of it (not relations and not existence of entities) and beside, it'll never fly as a term to be used by normal people. Much as I look with concern at such a GENTECH like term, assertion seems closest. When / if we need such a term.
ttwetmore 2011-12-15T17:34:08-08:00
@Adrian,

Thanks. Is the title of a book an assertion? I thought we had settled on the terms citation element or metadata (though I objected to that) for properties of sources. I prefer the term attribute which can be applied to all properties of all records, but I gave up on that when citation element seemed to be the consensus. Based on yours and GeneJ’s comments I assume that assertion is another synonym for citation element in the source context.
AdrianB38 2011-12-16T05:03:38-08:00
"Is the title of a book an assertion?"
If I put my mathematician's hat on, then any statement that I deem to be true is an assertion but that's probably not helpful.

I was interpreting (and I emphasise this is _me_) "assertion" as just relating to properties / relationships / existence of individuals, families, etc - the external,real world stuff, not that within the study of genealogy (i.e. not source records).

Thus, the existence of someone who is married to Mary Roe is an assertion, that their name is John Doe is another assertion, that they are married to one another is an assertion. I wasn't envisaging the existence of a baptism certificate to be an assertion, nor that the title of a book is XYZ to be an assertion.

If it helps, an assertion is something that I assert to be true and that I therefore need to demonstrate (a.k.a. "prove") is true. The act of asserting something seems key to the existence of an assertion. We don't normally go to the trouble of proving that the title of a book is XYZ - we just take it as read that it is and say so. So I wouldn't take the title of a book as an assertion. (Of course, anyone sufficiently anally retentive to want _everything_ proving might, but I'm not going there.)

I guess also that any assertion needs to be either:
1. common knowledge
or
2. justified by a source reference (or citation if you're writing it directly) with any extra proof-statement as required to interpret the source reference.

That's what I have understood assertion to be - other people's mileage may vary. And that means that in my view, assertion is not a synonym for a citation element. Rather the collection of citation elements in the source context JUSTIFY an assertion.

This may or may not help but I think we got into the arena of assertion level by discussing properties of sources and whether some properties (citation elements) could appear both at the source record level and in the context of a source reference justifying an assertion about a person or family or place etc.
GeneJ 2011-12-16T05:56:34-08:00
It would be good to have reasonable agreement about the description of the problem on the front page for this section of the wiki.

Geir suggested,

"As documented throughout the wiki and in numerous blog articles and other postings, it is not possible to transfer the source and citation data recorded in several major genealogy programs to a different program using GEDCOM. Data become lost or distorted during the transfer process. The reason is that these programs have extended their source and citation data beyound the limited capabilities of GEDCOM, and/or export data that do not conform to GEDCOM."

Louis responded, "Wrong … GEDCOM has one really nice construct in their sources that unfortunately few programs have decided to use. Do they not bother to read the GEDCOM specs? It is full extendibility in the source information."

Louis referred to his analysis of a RootsMagic export to GEDCOM (http://www.beholdgenealogy.com/blog/?p=874) and commented that "RootsMagic decided NOT to include a Title field. I guess they figured it was better for them to not allow the person to make up their own title. So what they do is generate the title by using the "Collection", "Repository", "Repository Location", "Format" and "URL" fields together, and separating any non-blank fields by semicolons."
http://bettergedcom.wikispaces.com/message/view/Sources+and+Citations/48033200#48078834

I disagree with Louis' assessment.

In about March, we described the RootsMagic export on the wiki this way, "RM exports some Citation Elements in standard GEDCOM tags, not in a very standard way. The source name goes in ABBR, and a few other elements are placed in standard tags. The text of a full footnote reference goes into the TITL. A short footnote and an bibliographical note are placed in proprietary fields. Two proprietary _TMPLT structures, one at the source and one at the citation level (where in source), contain source-value pairs for each element (field) for the source type, with the full name of the element as the type. An ID for the source type is stored in the _TMPLT structure in the SOUR record."
http://bettergedcom.wikispaces.com/Application+Data#RootsMagic

Chose your terminology, but RootsMagic's user interfaces recognizes an extended group of source types and fields, all of which are tied to a programmed citation (whether RootsMagic default or custom). On export to GEDCOM, the field values from the master source are written in a sort of freeform style to "title" and the values about RootsMagic's "Source Detail" fields (assertion level input) are exported to GEDCOM's field "PAGE."

--Create a master source in RootsMagic using one of the published source types (book, journal, etc.--a source type that includes fields "Author" and "Title"), cite the source (so that you can access the assertion level fields), and export same to GEDCOM (excluding the Extra details that are RM specific). You find the values for the master source fields "Author, Title, etc.) all export to the single GEDCOM field "TITLE."

0 @I2@ INDI
1 NAME Nicholas /Firestone/
2 GIVN Nicholas
2 SURN Firestone
1 SEX M
1 _UID 69C542210EFF4CC0AB21DFC994833A9100FD
1 CHAN
2 DATE 1 DEC 2011
1 BIRT
2 DATE 25 MAR 1712
2 PLAC Berg, Alsace
2 SOUR @S5@
3 PAGE 241


0 @S5@ SOUR
1 ABBR PER* QTR Russell 1964
1 TITL George Ely Russell, "Founders of the American Firestone Family," <i>The N
2 CONC ational Genealogical Society Quarterly</i>, 52 (December 1964): .


--Create master source in RootsMagic for a census and again cite that source (so that you access and complete the assertion level fields); export same to GEDCOM (excluding the Extra details that are RM specific). You'll find the values for the master source tags exported to GEDCOM's TITLE, and the values for assertion level tags are exported to GEDCOM's PAGE.

0 @I1@ INDI
1 NAME Asa Ruggles /Thomas/
2 GIVN Asa Ruggles
2 SURN Thomas
1 SEX M
1 _UID A42147C706A04C7EA850C1871BB1CC274291
1 CHAN
2 DATE 17 NOV 2011
1 BIRT
2 DATE CA 1799
2 PLAC Maine
2 SOUR @S2@
3 PAGE Madison Township||Madison twp.; 334; p. 410A (stamped); dwelling 121, family 121; Asa Thomas household; 27 December 2006

0 @S2@ SOUR
1 ABBR CEN IA Jones 1880 T9, roll 348
1 TITL 1880 U.S. census, Jones County, Iowa, population schedule, , ; digital i
2 CONC mages, <i>Ancestry.com</i> (http://www.ancestry.com : accessed ); citin
2 CONC g NARA microfilm publication T9, roll 348.

Our blog, Randy Seaver's blog and the various BetterGEDCOM wiki pages (see the attached page) document earlier research about how the various vendors with expanded systems struggle to pass information via GEDCOM. Terry's TMG Tips does a nice job describing the TMG export (http://tmg.reigelridge.com/Sources-exporting.htm). I also wrote a post about programmed citations, see "Programmed Citations, a general overview."
http://bettergedcom.wikispaces.com/message/view/A+Data+Model+for+Sources+and+Citations/48011898

If Geir's description of the problem is "incorrect and misleading," then how do others feel the problem should be described? --GJ
ttwetmore 2011-12-16T08:23:21-08:00
Gier's statement seems very close to me. I would modify it slightly to

It is not possible to transfer the source and citation data recorded in several genealogy programs to other programs. Data become lost or changed during the transfer process. The reason is that these programs have either extended their source and citation data beyond standard GEDCOM, or use standard GEDCOM with custom formatting conventions, or export data that do not conform to GEDCOM.
louiskessler 2011-12-14T18:05:42-08:00

GeneJ:

I was under the impression that you think the Zotero is a wonderful system for citations. Do you use it in practise? Does it meet your needs? Or do you feel it lacking, because you keep talking about complications in structuring sources that Zotero doesn't handle?

If Zotero is an excellent citer, then why isn't its model and level of detail and simplicity of structure good enough for BetterGEDCOM?

The problem I have with making anything too flexible and/or too complicated is that it leaves both the program and the user doing what they want. That in itself prevents the one thing we all want - the data to transfer over precisely and unambiguously - and that is the current problem with GEDCOM.

I am trying to jump start this effort again.

Why can't we just start with a simple set of Source Types versus Keys and create Citation templates using them? It could be kept within manageable terms by using your wonderful Zotero spreadsheet as the initial structure.

Once we have that, then we will have our first tangible result. Then BetterGEDCOM will have produced its Version 0.1 and the organization will finally become significant.

Doing this one task will prove if this general idea is possible and how many of the basic citation templates can be built.

We can only expand and correct a starting model if we have a starting model.

But if you want to keep discussing all the exceptions and cases why any model cannot handle everything, then we'll keep on talking for another year without getting any further.

Louis
louiskessler 2011-12-14T18:20:20-08:00

... and re ambiguities, e.g. one genealogist thinks a their high level item is a source and the other wan't a low level.

I say forget that right now. Just make a decision and define which Keys are most often Source and which are most often for defining the location in the source and go for it.

I don't want to describe how many times I've had to write and rewrite and rewtite an algorithm and the code for it in Behold. You can't get everything you want the first time. You have to start with something simple and then you can see how it works and correct it and built on it.

Louis
GeneJ 2011-12-14T18:20:44-08:00
@ Louis.

Gulp. Maybe I've missed something.

I'm saying that it's an unnecessary complication at this stage to force some set of fields to the master source and other fields to the assertion level. --GJ
louiskessler 2011-12-14T18:23:57-08:00

... and G-d I hate not being able to edit my typos in my posts after entering it too quickly.
louiskessler 2011-12-14T18:31:06-08:00

GeneJ:

Please explain again what "assertion level" means. Because, as Tom said and I agree, the Source Reference should not be considered evidence and should not have anything to do with asserting conclusions. It is simply a declaration that there is some material in existence that may be of use to someone.

The Source is the Item. The PAGE tag is the location in the item. This is simply done so that you can conveniently cite a source, and IBID it again with only the changed info.

e.g. in footnotes:

1. Book, Title, Author, Publisher, Page 14
2. IBID, Page 18
3. IBID, Page 42

The evidence/conclusion process is a completely different matter. We should only be discussing Sources and Citations here.

Louis
ttwetmore 2011-12-14T18:42:01-08:00
@Adrian, responding to your comments:

1. The Annual Report for Railway Accidents in the UK. I have an interest in one accident report from the 1903 edition. I'd create a source record for the single report with a title of format something like that used for a article in a journal - but it would be nice (if not essential) to add the page number as its own item on the source record.

If the annual report is a collection of accident reports, then you can treat it as a two level source, which means the accident report you are interested in would be a source and it would have a source reference to the annual report, and that source reference would hold the page. At least that’s how I would do it.

2. An English parish register contains lots of baptisms, marriages and burials. I'd create a source record for each baptism, marriage or burial (splitting in action here....) in order to associate a note containing the identification logic with the specific data, rather than have a hundred identification statements in the source record for a whole register. Again, as part of identifying where in the register the baptism, marriage or burial is, I'd like to associate a page number with that source record. Again, currently it can be done without it - I usually put the page number as part of the transcribed text, but aren't we supposed to be codifying stuff properly?

I believe you aren’t seeing the value of the source reference idea. You can go either way on whether the register should be the bottom of the source tree or whether the individual items in the register are the bottom of a two-level tree. However you make that decision, the page number goes in the source reference to the source at the bottom level of the tree. Even if you had just the register level for a source, you still wouldn’t have a 100 identification statements in the source; that information would be in the 100 different source references that point to the source. And those 100 source references would be located in the 100 different evidence records that you extracted form the source.

I think it is important to realize that this approach, with sources and source references, is wholly predicated on the idea that we will have evidence records in the database. If BG decides that it will just clean up the conclusion only model of GEDCOM into something a little more complete, and eschew the evidence level of data, then everything I’ve been promoting about these source and source reference ideas are out the window. I think the biggest problem that people are having understanding the ideas is that they are having difficulty understanding the paradigm shift between systems that only hold conclusion data, to systems that hold both evidence and conclusion data. With no evidence records, the source reference doesn’t make any sense, so if you want to keep any information at all from the source or about the source you have to stick it in the source record, so you are forced to create a separate source record for each item of evidence. Some of us seem so comfortable with this idea, because it is forced upon us, that we can’t see the problems that it creates, nor can we see how well we can clean up the problem with evidence records and source references.
louiskessler 2011-12-14T18:43:43-08:00

... and the Evidence/Conclusion process is where Tom and I disagree.

I state that the Source Reference can include the events as stated in the source - not as interpreted. Whereas Tom's concept is to create multiple levels of Personas to contain those events.

That has been discussed in detail in other threads dealing with Evidence and Conclusions.
GeneJ 2011-12-14T20:43:34-08:00

Hi Louis,

You wrote: Please explain again what "assertion level" means.

For this purpose and in GEDCOM terms, there are three fields at the assertion level=Page, Text and Media, but we're really just focused on Page and Text. In practice, I believe there really isn't much consistency to how those remaining to fields are used. We might as well call them mystery field a and mystery field b, since programs have renamed them and effectively spit each into a varying numbers of sub-fields. It's easier for me to just refer to them as fields at the assertion level.

And then you wrote, "Because, as Tom said and I agree, the Source Reference should not be considered evidence and should not have anything to do with asserting conclusions. It is simply a declaration that there is some material in existence that may be of use to someone."

I'm thinking we are all getting a little tired ... because of course *every single field* in the master source/source/source_record and at the assertion level (how ever you want to number and name them) has something to do with asserting conclusions.

I've already said that the differences being discussed are unlikely to have anything to do with what you or Tom consider "evidence."

I see the _ibid._ reference, but I don't follow how that relates to defining rules for "fields" that must be applied at one level or another, beyond the one to many relationship. In Adrian's example, all the information he is going to refer to comes from one page, so there is not one "source" to many "pages" relationship.
ttwetmore 2011-12-14T21:02:21-08:00
@GeneJ, responses to your question, etc.

So, help me understand why you care where I commit a field? If a "source type" is a "birth certificate" then there will be a certain number of fields associated with it. (Ala, the Zotero spreadsheet.)

I’m guessing that to “commit a field” means to place a citation element in a record somewhere. So it sounds like you’re suggesting that different users should be able to put their citation elements in different places based on their preferences as to how sources should be structured. I believe such freedom would throw the whole notion of source templates for constructing citation strings out the door. I continue to strongly suggest that BG do my tasks 3 and 4, which is to determine a set of source types that we officially support, and a set of citation elements we officially support, with specifications as to which citation elements are recommended in which source types.

Let's pick a nice controversial field that is not on the Zotero spreadsheet--"Household ID" (US Census), which has a value "Asa Thomas household." Say I include that field in my "master source" and you record the field in "Source Details."

By “source details” I am guessing you mean a source reference.

If we both recognize the field, please help me understand why you care that I have entered that field/data in the master source, but you have entered the same field/date at the assertion level.

If you want to think of the record of the Asa Thomas household as a source unto itself, fine, then “Asa Thomas household” is probably the title citation element for that source. If you want to think of the census as the source, then “Asa Thomas household” belongs in the source reference from the evidence to the source.

If you don’t care about software support in generating your citation strings, then it makes very little difference how a user chooses to break up their “source tree,” and software could simply allow users to build their source records with any source type they enter in with any citation element types and values they enter in, with the source records structured to any number of levels they desire. But it will make a difference if BG embraces a fixed set of source types in order to solve the problem of generating citations with templates. I frankly don’t care how you would like to structure the source and citation information, but I hope you realize that allowing the flexibility you seem to be recommending, the whole citation generation scheme is in jeopardy. Which leads me to wonder how important you believe the citation generation feature is.
ttwetmore 2011-12-14T21:22:02-08:00
@GeneJ, @Louis,

I also don’t understand what “assertion level” means. By context my guess is that citation elements are the things that are existing at the “assertion level”. If this is true then we have FOUR terms we are now using for the exact same concept (the concept being "information about a source"); those four being citation element, metadata, assertion, and attribute. I am throwing up my hands in despair.
GeneJ 2011-12-14T21:24:39-08:00
ttwetmore 2011-12-14T21:33:50-08:00
Why limit it to four? Why don't we add fact, property, characteristic and trait? Then I can coin another term: pfmaacet, which we could pronounce to rhyme with facet, which is also another synonym of the whole bunch.
GeneJ 2011-12-13T19:11:49-08:00
Hi Louis and Tom,

"...defining a set of source keys, source reference keys."

In the abstract, Is there a reason to assume any particular elements would be defined for the "source" and a then some different group for the "source reference"?

Or, does that mean that those most common/standardized templates (the 80-20 group) would have elements defined at your "source" and "source reference?"

While there are clearly some logic parameters for the kind of information most frequently appears at one level or the other, for the circumstance beyond that 80-20 rule, it won't be hard for to find some exception to many of the rules.*

--GJ

*Even if we don't consider lumper-splitter preferences.
ttwetmore 2011-12-13T21:38:03-08:00
In general terms ... my opinions on this are ...

A source record describes a source of evidence, and that source can contain a little to a lot of evidence.

But when we wish to cite where our evidence comes from we need to include the source, obviously, but we often need to be more specific in specifying where in the source, the particular evidence we are citing came from.

So, a source record, in my opinion, should describe an overall source, and all the citation elements we find in the source record do the job of describing that source. So if the source is a book, the citation elements in the source record for it are the obvious things like, title, author, publisher, publication year, ISBN.

Now, a source reference is something that connects an evidence record (e.g., persona), to a source record, but not only must is connect to the source record, it must also specify where in the source the evidence came from. Specifying where is also done by citation elements, and in this example, the obvious example is the page number.

We don't want the page number in the source record, because we're probably going to want to extract other evidence from the same book and that other evidence is going to come from different pages.

But page is an important citation element. It doesn't belong in the source record itself. Ergo, it belongs somewhere else, and the obvious place to put it is either in the evidence record itself or in the connection that the evidence record has to the source record. See this example (and since the last examples I did were using XML, this time I'll use GEDCOM:

0 @I1@ INDI
  1 NAME Daniel Van Cott /Wetmore/
  1 SEX M
  1 BIRT
    2 DATE 18 November 1791
    2 PLAC Hammond River, Kings County, New Brunswick, Canada
  1 SOUR @S1@
    2 PAGE 345
 
0 @S1@ SOUR
  1 TYPE BOOK
  1 TITL Wetmore Loyalists of the American Revolutionary War
  1 AUTH Daniel Hancock Wetmore
  1 PUBL Vermont Free Press, Saint Albans, Vermont
  1 DATE 2014
  1 ISBN 10-64987632

In this example the INDI record is a persona that includes only the information about a single person that was extracted from a single item of evidence in a book (which is what the definition of a persona is).

And the SOUR record is the source record for a book. So we have here two records, an evidence record and a source record. We need to connect the evidence record to its source. So in the INDI record there is a 1 SOUR line with a PAGE sub tag. This is the source reference I keep mentioning. It both points to the SOUR record by including its ID, but it also contains the citation element needed to locate the evidence in the source.

This is just one way to represent this connecting concept in a model, but this is a useful and easy to understand way to do it. Or the source reference could be modeled as a separate record type, and in a relational database it would have to be represented as a separate table (with person id, source id, and other columns for page and other citation elements). I much prefer simply treating the source reference as an "attributed pointer" inside one record that points to another. But as I said this concept can be represented in different ways.

Remember, the whole job of the source reference is to connect an item of evidence with the source it came from, while also giving the details of where that specific item of evidence came from in the source.

So, in this example, we have five citation elements in the SOUR record (TITL, AUTH, PUBL, DATE, ISBN), and we have one citation element in the source reference (PAGE). They are all citation elements, but the first five apply to a source as a whole, while the last refers to a specific location in the source. I think all citation elements can be broken into these two main varieties, and the source and source reference breakdown is the best way to separate them.

Note specially, as I mentioned above, by keeping the page number in the source reference, the SOUR record for the book can be used by all the personas we extract from the book, not just one,
louiskessler 2011-12-13T22:51:03-08:00

GeneJ:

Tom explained that very well. (Except I still shudder at his calling the Individual a "persona" - but then I also hate the word "repository" which makes me think of a garbage can.)

Let me describe in more detail what I think would work well specifically with regards to key/value pairs that I talked about earlier.

Looking first as the source, e.g.: the particular census, the book, the letter, the family bible, the personal interview.

One way I might do the keys for above examples (subject to you experts negotiating me into a better set of keys) are:

Source Type: Census; Year: yyyy; Place: pppp

Source Type: Book; Title: tttt; Author: aaaa; Publisher: pppp; Place: pppp; Year: yyyy;

Source Type: Letter; Date: dddd; Sender: ssss; Receiver: rrrr;

Source Type: Family bible; Title: tttt; Owner: oooo; (help me here...)

Source Type: Personal interview; Date: dddd; Place: pppp; Interviewee: iiii; Interviewer: iiii;

So here I've got a number of different keys:
- Source Type
- Title
- Author
- Publisher
- Place
- Year
- Date
- Sender
- Receiver
- Owner
- Interviewee
- Interviewer

These together with hopefully not too many more, should be able to completely define any source.

The main key is the Source Type which is always defined for every source. All the other keys will depend on what the Source Type is and are specific to a Source Type.

What GEDCOM tried to do was stuff these into three GEDCOM tags: AUTH, TITL and PUBL, with Author under AUTH; Publisher, Place and Year under Publisher; and almost everything else, including the Source Type under the TITL tag. That's why after all I didn't think RootsMagic was that wrong in stuffing their Source keys under the TITL tag.

But I don't think 3 keys are all that's needed. In my example above, for the 5 sources, I've used 12 different keys already without even trying. I think we need to be careful and define the minimum number of keys here that will define every source accurately. Keys meaning the same thing can be used in different source types. We would hopefully end at no more than 50.



Now let's look at the PAGE tag, which I'll prefer to call the Where-within-source. Here we need to make up a new set of keys just for the Where-within-source. The particular keys are again defined by Source Type. Here we can again try to reuse keys when possible if they have the same meaning in different source types. But hopefully these keys for the Where-within-source are different from the keys in the Source (above).

Source Type: Census; Enumerating District: eeee; Page number: pppp; Line number: llll; Dwelling number: dddd; Family number: ffff; (maybe only some are applicable)

Source Type: Book; Page number: pppp;

Source Type: Letter; Page number: pppp;

Source Type: Family bible; Page number: pppp;

Source Type: Personal interview; Time from start: tttt;

So here, I just have these keys:
- Enumerating district
- Page number
- Line number
- Dwelling number
- Family number
- Time from start

but maybe there's 50 here as well.

You can think of the Where-within-source keys as the stuff you still have to specify when you IBID something. (Hope that helps!)



Now this is how I think this can with your citations (and I might get some of it wrong because I'm not an expert at citations):

You have multiple citation templates for each Source Type. You have different ones for the various formats, e.g. primary citation, subsequent citation, endnote, footnote, bibliographic entry, etc. (you know what they are, I don't).

A template for one might look like:

Book (Primary Citation): $Author, <i>$Title</i> ($Place: $Publisher, $Year), $Page

If we define these templates in terms of the keys in the Sources and the Where-Within-Sources in a simple programming-like definition such as the above (including variables beginning in $ for the keys and HTML-like markup for style), then EVERY PROGRAMMER will know perfectly and unambiguously how to program this exactly the same way!!!!!!!! (as many exclamation marks as you want here)

Louis
louiskessler 2011-12-13T23:02:36-08:00

And GeneJ,

As far as I'm concerned, your Zotaro spreadsheet right now is just about perfect. It's at: http://bettergedcom.wikispaces.com/file/view/Zotero+Fields_alpha_97-04v.xls/243233631/Zotero%20Fields_alpha_97-04v.xls

In that spreadsheet, you've got the Source Types on the top row, and you've got the Keys on the left.

All I think that needs to be done is to separate the Source Types into those that describe the Source, and those that describe the Where-within-source.

And we may be close to done ... other than the inevitable arguing on every single entry as to whether it should or should not be there.

Louis
louiskessler 2011-12-14T07:06:40-08:00

I'm sorry. I mean't to say:

"All I think that needs to be done is to separate the KEYS into those that describe the Source, and those that describe the Where-within-source."

You know, this will both meet the goals of defining sources for BetterGECCOM, and (if the templates can be refined and developed) will also satify the needs of BetterGEDCOM's committment to SourceTemplates.org.

It sure would be nice if we could commit to developing these, maybe to have a draft ready prior to RootsTech.

Louis
GeneJ 2011-12-14T09:04:19-08:00
Hi Louis and Tom,

I'm thinking that all of this will only seem simple when we are looking back on it in the rear view mirror. Can't tell you how much I appreciate that you're hanging with me.

Wanting to standardize the elements in the source/master source/source_record is a valiant effort. In theory, you can write such a standard but it likely won't work across the standard 80-20 source types and it definitely won't hold in practice.

Tom somewhat described the problem during the last Developers Meeting when he described the thought process to answer the question, "what is my source."

See the Zotero item type (source type/master source type) "blogPost." (The item type name probably gives this one away.) i Zotero captures the minimum bibliographic identity (the blogTitle and related data), but it drills down to the level of the article "title" (blogPost/blog article).

Zotero's "blogPost" item type captures information at the level of the article, but folks who are going to "cite" a whole series of articles from the same "blog" will prefer one bibliographic entry for the blog. Some of those folks will set the master source/source/source_record at that high level, and then enter specific blogArticles at the assertion level.

Other folks have no interest in a series of articles from the same blog--they will likely consider the blogPost (article) to be their "master source" and it identify one specific article.

It's not hard to find examples of a source that could be even more specific than that "blogArticle." Say someone posts the digital image of a letter in a blog article. That level of detail is beyond the scope of Zotero (and would be beyond the scope of our 80-20), but there will be folks who declare that letter/digital image to be their "Source."

So, same blog, same "information," different users = different requirements/fields applied to the "master source/source/source_record" which means that different information is recorded at the "assertion level."

Blogs are considered a form of publications by default albeit a rather moden form. What's interesting is that for traditional published materials--where the so-to-speak bibliographic metadata has long been standardized--these same differences exist in the approach to information at the "master source." (Even in software that implemented the identical published source example from in Mills Evidence Explained).

The implications of these "what is the source" mechanics* is more unruly when the focus shifts to archival materials. I have more examples that I think will help describe the differences. Will continue to post those examples with more detail as time permits.

In haste here ... as I wrote above, "you can write such a standard but it likely won't work across the standard 80-20 source types and it definitely won't hold in practice."

(a) When I say it won't work across the standard 80-20, that is because the named fields (that communicate well understood information) might appear in the master source for one item, but at the assertion level in another. Example, a "photograph" vs a "photographic album" vs a "collection" (that includes a photographic album that include photographs).

(b) As for not work in practice. It probably wouldn't matter what the group defines as the 80-20 master source fields for a US census. I am a splitter and I'll find some way, by golly, to create a master source for each household in the census. Likewise, there are folks who have a master source "Census" or "1850 Census." Well, those folks too will find some way to force their high level information fields down to the assertion level.

Sorry to have rushed this. --GJ

*I used the term "mechanics" to describe assigning information (fields keys/elements) to the "master source" (source, source_record) and then determining what other information (or fields....) is declared in the "assertion" level fields (now PAGE <WHERE_IN_SOURCE>, and "TEXT....).
ttwetmore 2011-12-14T10:58:59-08:00
@GeneJ,

I take the underlying message of your latest to mean that people will want to do things in very different ways in deciding on what their sources are. I'm not sure, but I get the impression you believe that BG should be very accommodating to these desires.

I have two responses.

First, the hierarchical nature of sources that I have introduced in DeadEnds (it's not original to me, I'm claiming no credit), gives users quite a bit of flexibility. I could imagine (though would never myself) make each household from a census as a separate source record, but it would then have a source reference to the census it came from, so we get those two preferences you mention taken care of by simply structuring sources. My solution to this census issue, is to create a separate evidence record for the household (it is what I have been calling an evidence event record), and then to have that evidence record have a source reference point to the census. That source reference could have the house number, the family number, the enumeration district, or whatever.

Second, I'm harder hearted than you. I don't believe in going to extremes on flexibility. If we believe that there should be a serious attempt to use templates as a way to support citation generation, we must take a stand on the issue of how to define sources. We cannot allow willy-nilly flexibility on the part of the users, allowing them to decide exactly what they want to call a source and what not to. We define a list, we give a mechanism for extension in the rare cases where a source doesn't fall in our list, and support no other flexibility. There has to be a reasonable compromise between structure and flexibility.
ttwetmore 2011-12-14T11:26:11-08:00
@GeneJ,

Possibly some of the differences in these views may stem from the fact that I am a strong proponent of evidence records, that is, persona and evidence-event records in which information extracted directly from evidence is placed. With evidence records there is no need for low level source records, because the info you otherwise need to place in source records is more appropriately placed in the evidence records. With evidence based records the need for these low-level source records is eliminated.

The point is this. Evidence should not be placed in source records. In current, conclusion only systems, there is no handy place to put evidence, however, so careful genealogists like yourself need to find a reasonable place to put it. So you add it as reference notes to source records, and this leads you to need very low level sources since each of your sources ends up holding specific information about specific evidence.

But when we have a system that allows the evidence to exist as records in their own right, then we have the right mechanism available, and we do what we should, store the evidence in non-source, evidence records. And this removes any need for low level sources.
testuser42 2011-12-14T12:37:46-08:00
Tom, with this last post I think you have touched one of the main reasons for lots of the misunderstandings and disagreements that happened!
testuser42 2011-12-14T12:49:52-08:00
...and I agree about the tree-like structure to sources. I think it should solve the problem of "lumpers and splitters" as GeneJ described.

In the Blog-Post-Image case, a "lumper" might have one source only, with the type of "Image" with a title like "Will of Z, on the Blog X, Article Y, as retrieved on Date d" and his where-within-source might be "bottom right".

A "splitter" might have a source of type "Image" and title of "Will of Z" which refers to a higher source of type "Article" called "Article Y" which refers to a source of type "Webpage" called "Blog X". The where-within-source could still be "bottom right". The Date of Access should probably go with the where-within, too.
GeneJ 2011-12-14T13:44:46-08:00
HI Tom:

I'm quite certain differences really don't have anything to do with what you or Louis would consider the details that would be entered into an "evidence" record .. but I'll not even try to convince you otherwise, okay?

I've said before that I could see a benefit to a "lower level source" (I call it the third level), but hierarchal structures would require more time to learn and manage; believe they would be even harder to standardize. Would be nice to hear Bob Velke's take on hierarchal structures compared to the TMG structure for sources and citations. The good folks at GENBOX implemented the "lower level source" concept.

Tom wrote, "There has to be a reasonable compromise between structure and flexibility." Indeed, so let me turn this around ...

So, help me understand why you care where I commit a field? If a "source type" is a "birth certificate" then there will be a certain number of fields associated with it. (Ala, the Zotero spreadsheet.)

Let's pick a nice controversial field that is not on the Zotero spreadsheet--"Household ID" (US Census), which has a value "Asa Thomas household." Say I include that field in my "master source" and you record the field in "Source Details."

If we both recognize the field, please help me understand why you care that I have entered that field/data in the master source, but you have entered the same field/date at the assertion level. --GJ
AdrianB38 2011-12-14T15:23:05-08:00
"page is an important citation element. It doesn't belong in the source record itself"

Unfortunately, I can say that I would rather like to put "page(s)" in the source record. Two examples:

1. The Annual Report for Railway Accidents in the UK. I have an interest in one accident report from the 1903 edition. I'd create a source record for the single report with a title of format something like that used for a article in a journal - but it would be nice (if not essential) to add the page number as its own item on the source record.

2. An English parish register contains lots of baptisms, marriages and burials. I'd create a source record for each baptism, marriage or burial (splitting in action here....) in order to associate a note containing the identification logic with the specific data, rather than have a hundred identification statements in the source record for a whole register. Again, as part of identifying where in the register the baptism, marriage or burial is, I'd like to associate a page number with that source record. Again, currently it can be done without it - I usually put the page number as part of the transcribed text, but aren't we supposed to be codifying stuff properly?
ttwetmore 2011-12-12T09:46:58-08:00
Louis,

I agree with you. It would be straightforward to add to the current GEDCOM5.5 standard a number of new tags for citation elements, and we would have a near perfect solution. We would also need to allow the 1 SOUR tag to have level 2 tags for citation elements for things like pages. And we would have to allow 0 SOUR records to contain 1 SOUR source references to handle highly structured sources (e.g., my standard example of a journal article). It's a simple, solid solution that would be as good as any other we could come up with.

I've tried to make the point that GEDCOM syntax, XML syntax, JSON syntax, Google protocol buffers, etc., are all isomorphic to one another, and this point you are making is a consequence.

But you will have to admit, however, that GEDCOM, as it is defined today, and as it is implemented today, does not allow full sharing of data between any pair of programs.

As you point out none of this works without developers implementing good solutions. If our standard specifies how to represent all sources and how to represent all citation elements, we have done our bit of the puzzle; it is then up to the developers.
theKiwi 2011-12-12T10:37:29-08:00
I can move almost all of my genealogy data from Reunion to TNG by GEDCOM file.

In brief, this is because Reunion allows me to assign a GEDCOM tag to every different field, so as far as I can tell it ALL gets exported.

TNG ( http://tngsitebuilding.com/ ) is quite forgiving in its import and lets me decide what to do with each field type on import. So this page on my TNG site

http://roger.lisaandroger.com/getperson.php?personID=I16&tree=Roger

is almost entirely derived from a Reunion GEDCOM file - the exception is the mapping section which currently is done within TNG since Reunion doesn't support mapping or exporting the LAT/LONG to GEDCOM file.

The sources listed at the bottom of the page are of course not "EE Perfect" but I think that they contain enough information in general to allow someone else to find the same information.

Now of course trying to move this same GEDCOM file to almost any other software is no where near as successful!!
gthorud 2011-12-12T14:43:15-08:00
Louis,

I do not read this as GEDCOM is the problem, I am not saying anything about who should be blamed. I think you have to look closer at RootsMagic's GEDCOM export, it is far from GEDCOM compliant - a refernce note is not the same as the title of the document.

But, the text can be improved, what about:

"As documented throughout the wiki and in numerous blog articles and other postings, it is not possible to transfer the source and citation data recorded in several major genealogy programs to a different program using GEDCOM. Data become lost or distorted during the transfer process. The reason is that these programs have extended their source and citation data beyound the limited capabilities of GEDCOM, and/or export data that do not conform to GEDCOM."

Roger,

Maybe you could review Reunion, and make an entry on this page?

http://bettergedcom.wikispaces.com/Application+Data
louiskessler 2011-12-12T17:49:34-08:00

Geir:

You said: "these programs have extended their source and citation data beyound the limited capabilities of GEDCOM".

Wrong! Wrong! Wrong! Wrong! Wrong!

GEDCOM has one really nice construct in their sources that unfortunately few programs have decided to use. Do they not bother to read the GEDCOM specs? It is full extendibility in the source information.

It is the: +1 PAGE <WHERE_WITHIN_SOURCE> construct, and is defined as:

WHERE_WITHIN_SOURCE:= {Size=1:248}
Specific location with in the information referenced. For a published work, this could include the volume of a multi-volume work and the page number(s). For a periodical, it could include volume, issue, and page numbers. For a newspaper, it could include a column number and page number. For an
unpublished source or microfilmed works, this could be a film or sheet number, page number, frame number, etc. A census record might have an enumerating district, page number, line number, dwelling number, and family number. The data in this field should be in the form of a label and value pair, such as Label1: value, Label2: value, with each pair being separated by a comma. For example, Film:
1234567, Frame: 344, Line: 28.

By using the Label/Value pairs, the reading program will understand what each of the values mean and the whole source CAN be interpreted. Each program would still need to "understand" the meaning of each of the labels if they wanted to be smart and translate things to their own convoluted format, but there's really no need to. If the labels define the interpretation of the values, then a reading program only need to display the labels beside the values and full meaning is understood by the user. This can also easily be searched by label, e.g. "IF film = 1234567 and Frame = 344"

GEDCOM does NOT have limited capabilities on this front. It has very extendable capabilities. Maybe we at BetterGEDCOM could perform one great feat and inform genealogy developers of this PAGE tag in GEDCOM, promote its use, and come up with a standard set of labels to be used.

Louis
GeneJ 2011-12-12T18:02:33-08:00
What about at the level of the Source_Record, Louis?
GeneJ 2011-12-12T18:23:19-08:00
Louis,

Below is a simple journal entry. By simple, I mean that have only one author (no editors or other contributors), I'm not reporting any credentials for the author, the article was not serialized (it appears in just one issue). If we were to develop a benchmark case, it would not be so simple.

Here is the bibliographic entry:

Russell, George Ely, “Founders of the American Firestone Family.” _National Genealogical Society Quarterly_ 52 (December 1964): 241-44.

There are additional requirements at the full reference note level, but all of the information in that simple journal bibliographic entry needs to be supported/exported to the Source_Record and none of it should be entered at the assertion level.

--GJ
louiskessler 2011-12-12T18:25:21-08:00

With regards to RootsMagic compliance to GEDCOM. I don't think they've done anything really wrong.

Take a look at the example I posted at http://www.beholdgenealogy.com/blog/?p=874

RootsMagic decided NOT to include a Title field. I guess they figured it was better for them to not allow the person to make up their own title. So what they do is generate the title by using the "Collection", "Repository", "Repository Location", "Format" and "URL" fields together, and separating any non-blank fields by semicolons.

If they decide to generate the source title that way, there's nothing wrong with it. You can argue you don't like how RootsMagic does it, but as far as they are concerned that is the title and they export it that way. That doesn't make them non-complient.

It gives them the advantage of being able to parse it when it comes back in, so they can fill their fields up again.

It's not so bad for other programs, since at least all the data is there. And it displays well in Behold, and I don't have to touch it (other than remove that extra colon and semicolon they add at the front). It is the most important data and will get loaded into other program's title field.

This is how RootsMagic set up its internal data structure. They don't have a Title field, but have those 5 fields instead, which to them represent the title. I totally disagree with them implementing the Repository and Repository Location as fields on the source, since they should be in the Repository information that is available from the Repository button on the screen.

Also RootsMagic has scores of templates. That example in my post was only for one particular template. Each template has different fields. But those "Master Source" templates all get exported to GEDCOM into the TITL tag as semicolon-separated values.

Now there is no reason, if they were doing this already, to not have just made them label/value pairs. Then they wouldn't even have to worry about which template they were using, because they'll be able to figure it out again by the unique combination of pairs read in.

What I'm saying is that maybe GEDCOM needs an "offical" TAG that can accept label/value pairs in the SOUR record, just like the PAGE <WHERE_WITHIN_SOURCE> construct that I mentioned in my last post.

RootsMagic was on the right track. They just didn't take it far enough.

Get them to hire me. I can tell them what to do to fix it all. :-)
louiskessler 2011-12-12T18:36:21-08:00

GeneJ:

I am not a citation expert, so I leave the proper development of those up to you.

But for your example, using the RootsMagic method, and label/value pairs, this is what I would do:

0 @S1@ SOUR
1 TITL Author; Russell, George Ely; Title; “Founders of the American Firestone Family.”; Journal; _National Genealogical Society Quarterly_; Volume; 52; Date; (December 1964)

and in the Source Reference, I would have:

1 SOUR @S1@
2 PAGE Page; 241-44

Now, there are some GEDCOM source Tags that could be used instead (such as AUTH, PUBL, etc.), but I wanted to illustrate this value/pair methodology to the extreme here.

Louis
louiskessler 2011-12-12T18:47:34-08:00

One more comment.

I believe GEDCOM did NOT include a label/value pair ability for the source record, because they thought they had all the fields they needed to define the source record in their:

+1 AUTH <SOURCE_ORIGINATOR>
+1 TITL <SOURCE_DESCRIPTIVE_TITLE>
+1 PUBL <SOURCE_PUBLICATION_FACTS>

and they were trying to get everyone to standardize on using those.

If there are a few others that should be added, then let the BetterGEDCOM team determine that.

Or if we need a label/value pair because there are too many, then let the BetterGEDCOM team decide on that.

Take a look at all your source templates and extract what's needed to define just the source reference, and see how many of them will fit into the structure of the above three tags, and what other tags might be needed.

Louis
louiskessler 2011-12-12T18:52:33-08:00

Here's the definitions:

SOURCE_ORIGINATOR:= {Size=1:248}
The person, agency, or entity who created the record. For a published work, this could be the author, compiler, transcriber, abstractor, or editor. For an unpublished source, this may be an individual, a government agency, church organization, or private organization, etc.

SOURCE_DESCRIPTIVE_TITLE:= {Size=1:248}
The title of the work, record, or item and, when appropriate, the title of the larger work or series of which it is a part. For a published work, a book for example, might have a title plus the title of the series of which the
book is a part. A magazine article would have a title plus the title of the magazine that published the article.
For An unpublished work, such as:
! A letter might include the date, the sender, and the receiver.
! A transaction between a buyer and seller might have their names and the transaction date.
! A family Bible containing genealogical information might have past and present owners and a physical description of the book.
! A personal interview would cite the informant and interviewer.

SOURCE_PUBLICATION_FACTS:= {Size=1:248}
When and where the record was created. For published works, this includes information such as the city of publication, name of the publisher, and year of publication. For an unpublished work, it includes the date the record was created and the place where it was created. For example, the county and state of residence of a person making a declaration for a pension or the city and state of residence of the writer of a letter.
ttwetmore 2011-12-13T18:02:17-08:00
@Louis,

But I hope you would concede that the necessity to use a sequence of key/value pairs as the values of GEDCOM lines is because GEDCOM does not provide the necessary citation element tags. And I assume you would also concede that if every vendor choose to support a different set of keys, that sharing would be almost impossible.

Why not simply decide on a standard set of keys and then create a new GEDCOM tag for each one, and write those keys into the new standard? This is what I am certainly advocating as the proper course for BG, and I think others agree.

I agree with you, that this extension could all be done within the context of good ole GEDCOM simply by creating some additional citation element tags to add to the three you have described. I am not against that idea, as I think it would be wonderful to be able to express BG data in GEDCOM format. But you and I are dinosaurs on this issue. XML is the writing on the wall.

Can you take some consolation in the fact that a simple plug-in would allow you to convert BG data to GEDCOM? I'm happy with that. My DeadEnds software, though using a much fuller model than that of standard GEDCOM 5.5 (that is, the DeadEnds model), can export in GEDCOM with the flick of a switch.
louiskessler 2011-12-13T18:54:05-08:00

Tom:

Yes, I thought that's what I was trying to say. I'd be very happy if BetterGEDCOM could come up with the "standard" set of keys for both the source and the source reference, which then could be used as BetterGEDCOM tags and/or XML and/or whatever. The format is irrelevant. The definition of the keys are what's important.

Then, those two sets of keys can be used as the variables in various (or even a multitude of) citation templates (Shown Mills and whoever else) that will standardize these as well. The problem is that these are not defined in a precise way, and each programmer interprets them differently.


Important Point:

By defining a set of source keys, source reference keys, and citation templates using the keys as variables - this work by BetterGEDCOM would be do-able.



I don't like plugins or APIs. By my experience, they are slow and limiting for large GEDCOMs. Writing the code to do what they do directly is 10 to 100 times faster, and can be customized and can do error detection, etc.

My plan is once BetterGEDCOM is defined, I'll write into Behold an input from BetterGEDCOM and an output to BetterGEDCOM. Since Behold inputs and soon will output legal GEDCOM 5.5.1, Behold will be its own converter.

Louis
louiskessler 2011-12-16T15:23:13-08:00
Separation of Sources from Conclusions
The other thread I started on "Incorrect and Misleading Assumptions" provided lots of good discussion.

It seems everyone has have differing ideas of what is in the source information. Does it include evidence? Does it include assertions? Does it include conclusions?

As Tom points out, this is because some programs use the source records "to do quadruple duty, holding source information, reference notes, evidence and conclusions"

To me this is maybe even the biggest problem with GEDCOM today - the full separation of sources from conclusions. If we could do this one thing as a first cut and include our source type / key / citation templates to complement it, then we can produce a BetterGEDCOM that will give the genealogy community a reason to adopt.

I'm stating that Sources and Source Records must be "just the facts". There should be no assumptions, assertions, interpretations or conclusions (AAIC) in them.

The AAIC (if I may yet make another acronym we can all groan over) must be in with the Conclusion persons, families, places and events. Or if you adopt Tom's model, in the persona or evidence information.

But Sources, Source Records, and Citations must be free of any interpretation.

Why?

Because one thing GEDCOM can't be as it is today is a freestanding format that can be used for storing source records.

If we can enable it to do that, then every repository (archive, library, genealogy society, online website, etc.) can store all their source/source record/citation information in a BetterGEDCOM structured file.

Genealogy Programs will be able to read in this file and interpret it. Genealogy programs could easily scan through these files and locate information.

What is needed in these files are the following records:

REPO - repository (similar to what's in GEDCOM today)

SOUR - source (similar to what's in GEDCOM today)

SREC - source record (which is a specific piece of information from a source and doesn't have a record in GEDCOM). Tom and I used to refer to it as an EVID (Evidence) record. But the source record is prior to anyone using it. Once it is used in an assertion, only then may the source record may be called evidence.

That's all that I see is needed right now. Let's leave the discussion of assertions and conclusions to later and finish off the "just the facts" part.

This is exactly what I was trying to propose several months ago, but no one clued into it at the time:
http://bettergedcom.wikispaces.com/Vision
http://bettergedcom.wikispaces.com/message/view/Vision/41369239

So if we could leave evidence/conclusions to Version 2 of BetterGEDCOM, and now try to just concentrate on doing Sources, Source References and Citations, including our source type / key / citation templates to complement it (which also will give what we need for SourceTemplates.org), then we'll have accomplished something significant that the genealogy world will take notice of.

Louis
louiskessler 2011-12-16T15:35:17-08:00

To try to make it clear, a simple GEDCOM example is:

0 @I1@ INDI
1 BIRT
2 SOUR @S1@
3 PAGE where-within-source
3 TEXT text of source record
3 OBJE sourcepicture.jpg
3 NOTE information about the source detail
3 NOTE interpretation and conclusion stuff

0 @S1@ SOUR
1 TITL Title of source


Needs to become in GEDCOMish:

0 @I1@ INDI
1 BIRT
2 SDET @SD1@
3 NOTE interpretation and conclusion stuff

0 @S1@ SOUR
1 TITL Title of source

0 @SD1@ SDET
1 SOUR @S1@
1 PAGE where-within-source
1 TEXT text of source record
1 OBJE sourcepicture.jpg
3 NOTE information about the source detail
louiskessler 2011-12-18T08:49:15-08:00

I'm going to go one step further and say we should start off and do this first part of SourceTemplates.org

SourceTemplates.org currently has more than this (see: http://sourcetemplates.org/SourceTemplates.jpg ) but I think this would be an excellent start for us.

We should start with these structures: "Master Source" (rename it to "Source"); and "Source Details" (rename it to "Source Detail"); and "Source Type" (rename it to "Citation"). Please don't argue with me about the naming right now. It is for simplicity and understanding purposes. Let's do the work first.

Next we only really need to do the following to get started

Source:
- Source Type: name The Zotaro column headers
- dataValues: key/value pairs
The keys are the Zotaro rows

One "Source" record points to one or more "Source Detail" records.

Source Detail:
- dataValues: key/value pairs These are the where-within-source keys
- This is where all the other text, media, structured data would be, but we don't have to define it now.

Citation:
- Name: string;
The name of the citation type
- Originator: string: // Who developed it: ShownMills, Lackey, Zotero, which publication or web reference
- PrimaryCitation: string;
- SubsequentCitation: string;
- BibliographicEntry: string;

The Source and Source Details are effectively done already as a first cut by GeneJ's Zotero spreadsheet.

It will be up to you citation experts to see if you can develop the Citation text that incorporate the keys properly.

e.g. The PrimaryCitation for a Book might be:

"$Author, <i>$Title</i> ($Place: $Publisher, $Year), $Page"

For more about this, see: http://bettergedcom.wikispaces.com/message/view/Sources+and+Citations/48156020
GeneJ 2011-12-17T11:58:22-08:00
Record-based model user requirements for S&C;
In terms of Sources & Citations, do record-based models have different user requirements than conclusion/GPS* models?


P.S. If Louis' _Behold_ and Tom's _Dead Ends_ models have different user requirements for Sources & Citations, hopefully one or both will set up a new wiki page about those user requirements. We can then link that page/those pages back to the main BetterGEDCOM "sources and citations" wiki page.
ttwetmore 2011-12-17T12:21:59-08:00
@GeneJ,

Off the top of my head, user requirements for sources and citations include the following.

Recording the details of sources referenced so that provenance and citations are possible and so that the records that hold the evidence taken from the sources can refer to the sources of the evidence.

Recording the details of locations within sources where specific items of genealogical evidence were found so that this information is also available for citations.

The ability to generate properly formatted citation strings based on patterns proscribed by different authorities.

These are the major requirements that apply directly to sources and citations. There are closely related requirements that move into the evidence and conclusion areas. For example.

The ability to extract genealogical data from evidence found in sources into records and to link those records to the source records.

Personally, I think the duality of evidence and source is so close that it's hard to talk about sources and citations without talking about the information we find in the sources and how we want that information to link to the sources.

One could also extend the source and citation would into the conclusion realm, but that might best be left to another day.
louiskessler 2011-12-17T13:51:48-08:00

GeneJ:

This stuff is so abstract the way everyone is talking about it. I have no idea what "user requirements" for "record-based models" might be.

Users want to be able to enter sources and develop citations from them. They want to be able to use their sources as evidence and draw conclusions from them. And they want all this information to be transferred correctly from program to BetterGEDCOM file to program.

Users have no interest or care in the underlying model as long as it works for them. And I don't see any reason why different programs can't use different models as long as they can correctly export to and import from a BetterGEDCOM so that all the data can be transferred correctly.

Louis
GeneJ 2011-12-17T14:25:58-08:00
@ Tom and Louis,

Thank you for your comments.

I used the term "record-based" in the hope we would get beyond the difference in the meaning of evidence. Sigh.

(1) Tom wrote, "Recording the details of locations within sources where specific items of genealogical evidence were found so that this information is also available for citations."

Does Dead Ends require the user make an "assertion level" entry? Some common sources in existing models have no distinct citation elements at the assertion level. (For example, a letter, where the author or generic title leads the bibliographic entry.)

(2) Louis wrote, "...different models as long as they can correctly export to and import from a BetterGEDCOM."

I agree.

Thanks again.
ttwetmore 2011-12-17T15:53:10-08:00
The DeadEnds model is records based (actually the DeadEnds model encompasses other systems as well -- it is records-based if a user chooses to create records-based evidence records, and it is not records-based if the user chooses to stick with standard conclusion records). Where we decide to store evidence is key to the BG model. In a conclusion-based system we don't have to store evidence at all. I have written about this issue extensively, and one of the longest threads at soc.computing.genealogy was based on a thread I started when I asked the community where people believed their evidence should be stored in a genealogical data base. It makes good reading, and somewhere on the BG threads I summarized the conclusions there.

The most common answer was the idea of persona records to hold the evidence, as well as event records to glue role-players together. That was the answer I had hoped for, for that was the answer that seems clearly the best to me, and the evidence records of the DeadEnds model (persona and event) implement that model. Hope another mention of evidence records is not making you sigh once again. Having evidence records is key to supporting the research process fully in software. The persona and evidence events are where the records-based paradigm manifests itself in the DeadEnds and any other evidence and conclusion-based model.

I believe you put your evidence in your source records, which was another of the answers, though I'm not sure if that is what you do. Louis and I are strongly suggesting that that is not the right place to put it. I believe you put it there because TMG doesn't allow you to put it anywhere else, that is, TMG (and just about every other program) is not records-based, which I hope you can understand, means there is no appropriate place to put the evidence (if you aren't records-based, you don't store info from records, so you don't need evidence records -- is that clear?). Because you are an expert and careful genealogist, you know you much keep your evidence, so you do the only thing you can with it -- put it in the source records. I also believe that you have come to believe, that since that is how you do it, that that must be the right way to do it, and you seem to object to every idea that does not leave your evidence and your references notes and your conclusions in those TMG source citation records. I believe this is the one and key fundamental misunderstanding you have about the points that Louis and I are making. Of course, if I don't understand how you handle your evidence, and the way you handle it is very different, I apologize for making these assumptions. And I've never used TMG so I might be all wrong about TMG not being records-based, and if so, I apologize for that also.

I will not comment on any point with the term "assertion level" in it since I don't know what you mean by it. I would be happy if you actually took the time to define it, and then I might be able to comment. You have pointed to a few discussions that have used the term, but I am too ignorant to be able to decipher a good definition from those discussions. I would think you at least would understand that terms must be carefully defined, and that this is one that has never been defined. And since you are the user of the term you are under some obligation to fully define it.

And in case it isn't obvious, where we choose to store our evidence is key to the source and citation system, since it is the evidence that we must provide the provenance for. So I am surprised that you seem to object to the fact that Louis and I must bring the concept of evidence into the picture in any discussion of the source and citation world. You can't have one without the other.
louiskessler 2011-12-18T08:08:28-08:00
Well, part of the confusion might be that a source becomes evidence from the perspective of the researcher when they use that source to support a conclusion.

However, as Tom states and I agree, all assumptions about that evidence (the source from the user's perspective) should not be stored with source but should be stored somewhere else. Where that should be is where Tom and my views differ.

Louis
ttwetmore 2011-12-18T04:30:37-08:00
Timeliness
@Geir says: Louis suggests to have a list of Citation Elements ready before Rootstech. I don't think that is realistic, such a job will take many months and may even take years if you want to have a 80% COMPLETE international solution.

I don't believe this. And if it would take years to formulate an 80% solution, I believe we must accept that Better Gedcom has failed. We do know that Family Search is planning to release a new model at Rootstech. Even with Family Search's long history of never making good on any of its promises to modernize and support GEDCOM, can we afford years before we offer an alternative model? We know that most vendors, without any guidance from any new standards, are all making extensions to their databases to include features not envisioned by the GEDCOM semantic model. If they continue to export these features with extended GEDCOM additions, they will continually get less and less interested in supporting a new standard. Time is of the essence. Frankly, with Family Search's activities, whatever they prove to be, the window is closing fast.
AdrianB38 2011-12-21T13:01:04-08:00
Louis - thank goodness we are actually in agreement after all.

I still feel a concern over how easy it will be to handle the digital-image of an original case compared to the transcription of an original case. The latter seems pretty obvious to me - in absurdly vague terms, cite the transcript, stick in the word "transcribing", then follow it up by suitable bits that describe the original. And, as we agree, it's all one citation note in the final report. (Not thought about how we handle bibliographies - thanks Tony!) But the digital-image of an original _seems_ at first glance to follow a different pattern because I'm automatically thinking of effectively citing the original and then using the publication details to point to the digital image. Maybe that's not the best method but it stems from ESM's(?) advice to cite as if it were the original. Maybe there are other ways (e.g. cite the digital image but title it as if it were the original), which is why I'd suggest we add this to the list to look at.

Tony, I _think_ your ordering is reverse of my thoughts but it's a bit hard for me to be sure - terminology is tricky. It's probably not an issue for you, if it is the reverse, until we transfer stuff between us.

And Gene - yes, I seem to remember a number posts from you in other places commenting on the accuracy (or not) of this source-of-the-source stuff in places like FS. That probably has some implications for the optimum ways to construct these chained sources and the citation templates they need.
AdrianB38 2011-12-21T13:03:13-08:00
Tom - re "I see the "source chaining," which is what I called hierarchical sources or source trees, as the solution to so many potential issues for representing resources"

Despite my concerns, that's my gut reaction too.
AdrianB38 2011-12-21T13:04:42-08:00
Well, I've never managed to double post before....
GeneJ 2011-12-21T14:52:20-08:00
@ Adrian. All fixed. --GJ
GeneJ 2011-12-21T14:55:41-08:00
@Adrian,

Linking the source of the source in a chain will be controversial.

Someone will have to explain to me how a source of the source (credit line/reference) has anything to do with "hierarchical sources" --it's not a lower level of something or a higher level of something, it's a reference to something external.
AdrianB38 2011-12-21T15:30:44-08:00
"Linking the source of the source in a chain will be controversial."
Which is why we need examples.

"source of the source (credit line/reference) has anything to do with "hierarchical sources" "
It doesn't. But we could use chains of sources pointing to other sources to record BOTH "source of the source" AND source within a source - providing we brand the link with the right relationship.

E.g. "Census of John Doe, Rootsweb posting" "is transcribed from" "Census of John Doe Kansas City (original)"
- is a source of a source type

"Census of John Doe Kansas City (original)" "is within" "1850 census of USA"
- is a different type of relationship entirely. Not quite sure what I'd use if for yet...

At least - that's what I think we're talking about. One solution in a model that, by recording a type, can solve 2 different problems.
GeneJ 2011-12-21T16:33:08-08:00
@ Adrian,

The concept of linking to a source of the source is rather cool, but ...

There is a substantial logic difference between representing a source and the source of the source.

When the user authors a citation, that user is making a representation about the source.

When that user includes a credit line/source of the source/reference, however, they are referring to a representation made by someone else (the author of the database, book, content provider etc.)--not unlike quoting someone. Indeed, if the reference is not known to you or if you suspect there may have been a mistake, the rule is to quote the reference. If you confirm a mistake, then you enter corrections in editorial brackets.

It would take a lot of work to slice and dice that third party representation into the fields for another chain of sources, and still be able to retain the specific third party representation.

I have good examples. Perhaps folks could decide where they want the cases/examples to be posted. Then we can work together to build the references. Do you want testuser's page to be control central? --GJ
gthorud 2011-12-21T19:10:25-08:00
Hierarchical sources and "source of the source" are not the same thing, but they could both be relations between sources. So maybe we replace what has previously been called a hierarchy with "relations" - Dublin core has a number of them, see e.g. http://dublincore.org/documents/dcmi-terms/index.shtml

About "source of the source", the big question is if the CEs for the "source of the source" are the same independent of the next step in the chain. There would be great advantages if that was the case, then you could create a "sub template" for the source of the source CEs to be part of the citation for the next step. Adrian's microfilm and transcription example suggests that this is problematic - I wounder what Chicago says about the two cases.

If there are only two/three ways to do it (e.g. two main types/classes of a chain relation) - that might also be an advantage - it would reduce the number of sources (as in EE and programs) substantially. I think it is worth trying to identify the alternatives.
gthorud 2011-12-21T19:20:14-08:00
See also slide 60 in I.C.E https://devnet.familysearch.org/static-files/presentations/2009/Interoperable%20Citation%20Exchange%202009-03-11.pdf

BTW, should we start a new topic on this, and refer back to this discussion?
AdrianB38 2011-12-22T13:26:35-08:00
Gene
Re "There is a substantial logic difference between representing a source and the source of the source". Absolutely. So far I've got as far as thinking that there's a link to the source-of-the-source that generates a phrase like "summarised from" or "transcribed from" or "citing"... No idea where in the output citation it should go. Then it needs something in the output citation to describe the source-of-the-source - no real idea what it needs.

So there's a lot of hand-waving going on at this point in my suggestions. However, the interesting thing is that whether we store the data as 2 separate, linked sources, or whether we simply store the source-of-the-source inside an attribute of the source (as now), if we sit down to resolve what items we need, they are, of course, the same thing.

"if the reference is not known to you or if you suspect there may have been a mistake, the rule is to quote the reference." This is interesting - I'd not thought of errors or unknown / unclear references as all(?) my transcribed / summarised sources have "obvious" sources-of-sources and I've probably generated the titles for them myself e.g. if it's a transcript of a parish register, I generate the name of the appropriate PR working from (say) the catalogue for the appropriate Record Office.

Plainly the possibility of error / lack of clarity has implications for what we collect / store for the source-of-the-source under some circumstances.
GeneJ 2011-12-22T14:34:29-08:00
Hi Adrian,

Thank you for your posting.

Users don't often know the credit line/reference/source of the source is in error until they search for it (or unless they have previously searched for it).

It was about this time last year that I was trying to track down the credit line in the FamilySearch Historical Collection, "New Hampshire Deaths," only to become mired in various problem.
http://theycamebefore.blogspot.com/2010/12/closer-look-at-familysearch-historical.html
While things are changing, at least for the indexed items, but last I checked, it is still not possible to know the underlying source of the collections that are images only.

Ancestry, too, is getting better, but they have a long way to go. This week, Tony and I took a look at the way Ancestry.com presents its credit lines. As a general rule, their references for U.S. census are a loose combination of apples and oranges. I can usually figure out what that reference _should_ be, but it's a different story when I'm working with a record group for the first time.

When we figure out where to post examples (see Geir's posting about structured approach), I'll post examples of some "broken" references.
GeneJ 2011-12-22T15:03:57-08:00
@ Adrian,

You wrote about hand waiving.

The solution may involve allowing either referenced (citing XXsometextXX) or linked (citing @S51 ...) entries into that field.

I'll go hide now, cuz I'm sure the virtual pea shooters are taking aim. --GJ
testuser42 2011-12-19T13:12:41-08:00
A good list of Citation Elements is already written isn't it? Just take the Zotero-fields spreadsheet as Louis suggested, and there we are with a very good starting point for CEs. Then we can start to discuss which CEs are missing and which could be scrapped or combined. There's also the page Citation Specific Fields as a start. I'll make a new page to have a list to work on: List of main Citation Elements
GeneJ 2011-12-19T13:33:34-08:00
Hi testuser,

Yes, err... yes.

Geir's model somewhat keys off Dublin Core, an international metadata standard. There are different extensions of Dublin Core, and Geir's BG Core is an extension for genealogy and family history.

You mentioned Zotero; it reads/maps metadata such as Dublin Core into the Zotero fields.

Zotero "item types" aren't extended for many of the historical item types BetterGEDCOM needs to accomodate.

Dublin Core maps to other standardized metadata too, and other extensions of Dublin Core exist.
louiskessler 2011-12-19T17:24:16-08:00

Testuser:

Great idea! Putting those main Citation Elements up in that Wiki Page will allow everyone to start thinking about them.

GeneJ:

Would you be able to give us examples (or even better: a list) of "the Zotero item types (that) aren't extended for many of the historical item types BetterGEDCOM needs to accomodate." - I can't imagine what you're referring to here.

Louis
GeneJ 2011-12-19T17:43:20-08:00
@ Louis,

If you look over the Zotero item types, you'll see most lean toward the published and born digital types.

Everything else includes archival records, government documents, artifacts, etc. This would include item types we use frequently--most of our vital records, census and family files (generally).
AdrianB38 2011-12-20T10:13:15-08:00
"If you look over the Zotero item types, you'll see most lean toward the published and born digital types"

I also don't see anything that looks like a "digitisation of a microfilm of a paper document". (Or any similar multi-level derivative). That always seems to me to be one way that genealogical citations get complex - some are clearly "I'm citing document X, which in turn is citing Y" chains, while others are "I'm citing document X, originally published blah-blah and which is now published in digital form as Y" formats. These are 2 distinct styles in my view - the first would be admirably suited to chaining of sources (i.e. source X has a source reference in its data pointing to source Y) while the other format seems not to gain from chaining of sources but is better as multiple publications.

To distinguish those, consider:
- I cite abstract X which in turn has cited original document Y;
- I cite census page X which was unpublished originally then published on microfilm M123 then digitally published on Ancestry....
In the first case I cite the latest document, in the second I cite the original but explain how it's been copied.

This is the complexity we need to illustrate, otherwise we are wasting our time.
louiskessler 2011-12-20T10:22:05-08:00
No, Adrian. We need to keep things simple and figure out simple ways handle complexities. Complicating citations is not the correct way.

As Tom pointed out before and I agree, the best way to do what you want is to have the source refer to its source.

And I don't see why that need to be part of the citation.

In your example, your source is your digitization, and that's what your citation should be for.

The source details for your source will refer to the microfilm. That can have its own citation.

The microfilm can refer to the paper copy. And it should have its own citation.

Three separate citations. Not one chained citation.

But three chained sources.

See the difference?

Louis
AdrianB38 2011-12-20T14:59:59-08:00
Louis - sure I see the difference, but it simply isn't what people want as the _end_ result. (How we get to that end result is a different matter).

If I cite an image of a marriage record in a parish register off the Cheshire collection in FindMyPast, I want the final report to cite the >>parish register<< and say something like:

St. Mary & All Saints, Great Budworth, Cheshire, "Richard Booth Doe & Eliza Roe, 24 May 1847; Register of Marriages for the Parish of Great Budworth (June 1845 - May 1852)" (digital image of original published in "The Cheshire Collection, Church of England Parish Registers 1538-1910", FindMyPast).

Let's not argue whether that's quite the right format, the key is that the emphasis on the citation is that it's as if I've seen the actual PR itself. That's because the means of copying are microfilm and digital image.

Conversely, if I cite a transcript of a marriage record in a parish register off the Cheshire Parish Register Database, I want the final report to cite the transcript first and say something like:

Cheshire Parish Register Database Project, "Isaac Doe & Ann Roe, 15 May 1769; 'Davenham, St. Wilfrid, Marriage records database'" (http://www.csc.liv.ac.uk/~cprdb/ accessed 23 Aug 2011; this data abstracted from Register of Marriages for the Parish of Davenham July 1754 - June 1779).

I repeat, those are my final target formats and how we get there is almost irrelevant.

If we follow the chaining idea (which has _lots_ to recommend it), in both cases we surely start with the source we actually use - the image in the first case and the transcript in the second, and in both cases finish up with a source record pointing to a source record for the original paper.

BUT in the first case the resulting citation format is supposed to start with the original paper, and in the second it's supposed to demote that original down the line. Try as I might, I can't think how it's feasible to set that up. If it is, then fine, that's exactly what I want to see demonstrated and I'd be happy to be proven wrong. Otherwise, my suggestion is to utilise the publication slot in the citation in some fashion for one direction.

But in whatever way we do it, the end result simply has to be printed as one citation. If we go out saying that we need 2 or 3 separate printed citations where one currently is used, we'll get laughed out of court. I seriously hope I'm misunderstanding what you propose. Two or three separate source records on the file, yes; each chained, yes. But it's got to end up with a single printed citation. That means a template has to exist that combines the 2 or 3 chained things into one. That's the challenge that I see and one that needs to be faced to gain credibility.
louiskessler 2011-12-20T19:14:52-08:00
Adrian,

Sorry, yes you are right. The chained citation can be part of the citation.

But I don't see it as a major imposition. It just adds one field, the source of the source and that can be a link to the other source. Then, using your 2nd example, it can be included in the citation as:

Citation 1 = $title, "$project" ($url $accessed; this data abstracted from $source).

Citation 2 = $title $daterange

Then Citation 2 becomes:
Register of Marriages for the Parish of Davenham July 1754 - June 1779

Citation 1 becomes:
Cheshire Parish Register Database Project, "Isaac Doe & Ann Roe, 15 May 1769; 'Davenham, St. Wilfrid, Marriage records database'" (http://www.csc.liv.ac.uk/~cprdb/ accessed 23 Aug 2011; this data abstracted from $source).

where $source = the link to Citation 2

And the program will replace $source with Citation 2

and Tada! you've got what you want.

Louis
louiskessler 2011-12-20T19:18:39-08:00

... but the point is that the citations are still separate.

In this case, the one $source field can magically link them together.
GeneJ 2011-12-21T07:59:49-08:00
Hi Louis,

The "reference" or "source of the source" ... will take a little work. I like your linked references, but we will need to consider that "element"/field/key in context.

IMO, what's most important right now is that we see to all agree on the importance of that reference/source of the source field.

I'm having a hard time thinking of a a source type for which that fileld wouldn't be at least conditionally relevant, and it is probably so at all "levels," but determining further requirements for source of the source can come as a next step and it will take a little work.* -GJ



*Repositories and content providers won't always accurately or fully described the origin of the item and descriptions suffer from oversight/mistake (ala, FS Historical Records, many Ancestry.com databases).
ACProctor 2011-12-21T10:42:39-08:00
In my independent work, I use Citation chains of indefinite length. My Citation elements are used to "cite" the actual information I used, the work or publication it came from, the collection it's a part of, and the repository where it's stored, etc.

In this specific case of what you cite and what your examined being different, this appears to work well because I cite the abstract work at the lowest level, and then link that to the physical work, then its location, etc.

For a book, for instance,the abstract work could be the title+author+publisher+edition. The physical source could have been a book from the library or an online copy, but that would be one level further up the chain and not normally cited in the bibliography.

For a digital image of a baptism, say, then the abstract work would be the entry in the parish register. The next level might be the scanned copy at findmypast or FamilySearch.

The same applies to a census. I only want to see the unique census-page reference, and so that is my abstract work. Whether I saw the image at the National Archives, online at findmypast or Ancestry, or on some published CDs would be at the next level.

Does that help?
Tony
ttwetmore 2011-12-21T12:37:17-08:00
@Tony,

I see the "source chaining," which is what I called hierarchical sources or source trees, as the solution to so many potential issues for representing resources, that BG would be almost crazy not to include it in its model. And as mine and other examples have shown, it is nearly trivial to include. So I agree with you.
GeneJ 2011-12-18T09:09:03-08:00
Trying to agree on a timeline for progress of the work seems premature. We are allowing time for others to comment on the current proposal or, as we discussed in the last Developer's meeting, make independent proposals.

Much research and thought went into the current proposal ...
http://bettergedcom.wikispaces.com/file/view/Sources%20and%20Citation%20data%20model%20DRAFT%20v%200.4%2027nov2011.pdf

In the last Developers Meeting, folks with alternate proposals thought their work would be in process, not necessarily completed in time for the meeting tomorrow.

Once we have a proposal focus, we'll have better insight into both the work that needs to be done and the resources available to accomplish that work == we hopefully the kind of information we need to develop a project plan.

My take at least. --GJ

P.S. Howdy, Russ! Hope the Holiday spirits are taking good care of you and yours.
louiskessler 2011-12-18T09:13:49-08:00

Andy,

I have to work on Monday so can't attend the Developer's meeting.

But could you add to the agenda my proposal that we try to define a first cut set of Source / SourceDetail / Citation tables in time for RootsTech.

Because of the piecemeal nature of this Wiki, you'll find my descriptions of this here:
http://bettergedcom.wikispaces.com/message/view/Sources+and+Citations/48324558#48362656
and here: http://bettergedcom.wikispaces.com/message/view/Sources+and+Citations/48033200#48156020

I don't think it will be too hard since we'll really only need the citation experts here at BG to try to make up the citation templates for the several dozen source types.

It need not be complete, but it needs to be substantial.

Louis
louiskessler 2011-12-18T09:30:15-08:00

GeneJ:

That draft was one that I started weeks of discussion. It was so complicated and abstract that I couldn't see how I would even begin, as a programmer, to implement it. It seems like the attempt at perfection, which is definitely at least two or three steps ahead of where we are now.

Okay. So let's say that model (even though it's called version 0.4) is our ultimate goal.

Then what would you start doing NOW to start to move towards that goal. Continuing to discuss and refine the model will not produce anything tangible. We must now do something tangible and get some specs on paper.

We've got the support of SourceTemplates.org and they are patiently waiting for us to do something. We'll lose another supporter if we make them wait forever for this.

I'm proposing the first step, and it should be a non-controvertial step. Because it will be an initial formalization of a small part of my thoughts, Tom's DeadEnds model, Geir's 0.4 model and the SourceTemplates.org model. It's something we all need. It also addresses first the Citation Templates which you are most interested in.

With RootsTech coming, this is our one chance to show the genealogy world that BG is serious and has started something. This would probably attract more volunteers and support and bode well for BG's future.

Louis
GeneJ 2011-12-18T09:58:55-08:00
@ Louis,

Actually, we'd planned for the folks managing SourceTemplates to comment on the wiki about our 0.4 work. (See the Developers Meeting notes https://bettergedcom.wikispaces.com/DevelopersMeetingNotes+2011-11-21+%28Nov+21%29 )

I'm still hopeful Dovy and others will comment. It would be nice to have a full team.

Based on the effort to date, it's fair to say that much work will be necessary to support any international proposal on this topic. Without an overall project plan, we'll likely have our own version of "brick walls" and it will be more work (much more for those engaged in the continuing research).

I agree that the Zotero work gives us a "leg up" (especially because Geir has BG Core in there). I'll go further and say that traditional published sources would be a good place to organize an initial effort around agreed upon proposal.

I'm working on some "DTO" documents. Will try to comment more on the page attached to Geir's proposal. --GJ
Andy_Hatchett 2011-12-18T10:40:40-08:00
Hi Russ,

No, I'm not affirming that BetterGEDCOM has failed. What I am saying is that if we don't make some commitment soon and get something on paper- however small a beginning it may be- that we will be perceived by the genealogical community to have failed.

Discussions are fine but they can't continue on and on without bearing any actual fruit, and by that I mean something that shows what we are actually doing vs. showing research on what we hope to do.

The research is important but only if it leads to actual product - and so far, in 14 months, it hasn't.

I do agree with Tom that our window is closing and that something needs to be put before the public in the relatively near future.

I hope that clarifies where I'm coming from.

Andy
ttwetmore 2011-12-18T11:10:07-08:00
Regardless of notices of its demise, I too did not state that BG is a failure. Though I did say that if it takes years for BG to sort out source types and their citation elements, as Geir intimated it might, it will be a failure.

It seems to me that the problem with BG moving ahead is there is no means to come to decisions in the midst of disagreement. Consensus seems impossible in the current organization. I will go out on a limb and say that if this issue not resolved soon that BG will be a failure.

There is a large elephant in the room.

I will not be attending the meeting on Monday as I'll be tending to an aged parent. I have written a short introduction to the DeadEnds approach to sources and citations that I will try to finalize this evening.
louiskessler 2011-12-18T11:25:22-08:00

GeneJ:

Geir's 0.4 model has nothing specific in it for Internationalization (other than a few country codes here and there). It is primarily a Sources and Citation data model.

It is to me just another alternative data model for Source and Citations, and a very complicated one at that.

I can tell you what Dovy would say. He'd say it looks very much like his model at: http://sourcetemplates.org/SourceTemplates.jpg except that all the terminology is different and his model is simpler. He would ask why is all the extra complication needed?

A model for data is different than a standard for transferring data. Dovy has his model he is using. He is not asking us to redo that. He is asking us to make up the citation templates he needs.

Louis
GeneJ 2011-12-18T15:14:31-08:00
If we have questions about Geir's model, including how it compares to Dovy's model, might we attach them to the page, "A data model for sources and citations?"

It would be good to get some discussion going there. --GJ
GeneJ 2011-12-18T15:19:38-08:00
Sorry.
The link to the current work page, "A data model for sources and citations."
http://bettergedcom.wikispaces.com/A+Data+Model+for+Sources+and+Citations
gthorud 2011-12-18T18:38:55-08:00
I have stated that it will take months and possiby years to COMPLETE a standard for master source types (MSTs) and citation elements that will cover 80% of the most used MSTs around the world. Keep in mind that what I commented on was to have this done before Rootstech. I have not said that I am against starting work on it, and I have not said that everything has to be finished before something is published.

In the last Developers meeting I also stated that we should such work for a limited number of sources, starting e.g. with censuses, but I want to capture my view on the discussion about my model so we for once can try to work in a structured way.

References has been made to SourceTemplates. Please note that, at least in my mind, my model is closer to what SourceTemplates proposed, than the alternative presented by Tom is. Also, note that what SourceTemplates wants is a solution for the current major programs, that means among other things a solution for Evidence Explained.

I agree with Louis that initial work will be beneficial independent of the data model, but there are dependencies on the data model, especially since Tom's model does not have Data Types - that will have to be sorted out. And you also have the possibility of multilevel sources where there is no precedence wrt citation styles/guides.

And, there is other work going on in BG wrt DTO - it will require a lot of work. And there is something called Christmas, and at least I have certain other things to attend to that will take a lot of time at least one week into the new year.

The fact that someone thinks the window is closing, does not make the job that should be done smaller.
Dovy 2011-12-18T22:24:45-08:00
BG:
I represent the company behind AncestorSync. We are committed to releasing something that works best for the community. We created our model from a combination of what's already in the industry. Our goal is to keep SourceTemplates extremely simple, but flexible enough to work for all applications.

We exposed this model early October and really didn't get any negative comments for nearly 3 weeks. Granted I do not actively visit the wiki or message boards. At that time we begin re-working the Legacy templates to work for SourceTemplates on our own. We asked for interaction, but with leadership changes and the hint of a model forthcoming we did not receive as much as we had hoped. It wasn't until November 21st that we had a chance to see a completed model from Geir. It has been 1 month since, but given the holiday season (as well as our company's priorities), we have not had time to adequately examine Geir's model. For this I personally apologize. I must keep my company focused so that we can persist in this industry.

Correct me if I am wrong but the only plans for BG, as per various developer meetings, is to have a document prepared to submit to the government for a non-profit status by RootsTech. Our CEO Earl Mott even gave some specific help in this area.

Given this knowledge I have instructed my team to spend all time from now until RootsTech in the development of AncestorSync to production. We will continue SourceTemplates as we can, but our company's future is dependent on AncestorSync. We are still planning to work and release whatever we come up with to fit our growing genealogy formats.

I appreciate the work of the various parties. I hope we can all adopt the mentality of "who cares" in reference to which model is used and instead just develop something that works. That's what I want.

On a personal note, I am not interested in a few select members discussing model problems as much as I am BG as a whole deciding on a better solution than we have proposed. As pointed out, if the model is not simple enough it will not be supported by developers. I think we can find a healthy middle ground.

I am very confident that we can work together towards something great if the structure of leadership in BG becomes solidified and if decisions begin to lead to concrete consensus not further discussion.
GeneJ 2011-12-18T22:55:59-08:00
Hi Dovy,

Thank you for the update.

Hope the Holiday spirits are being good to you and yours. --GJ
Andy_Hatchett 2011-12-18T07:37:34-08:00
In the developers meetings, various vendors have indicated in so many words that they simply don't have the time to wait around for a standard to be developed. If their users are wanting something and they feel they can provide it then they are going to -if a standard is available they will use it and if no they will solve it their own way but they won't wait to do it.
hrworth 2011-12-18T08:56:41-08:00
Andy,

So what does your comment have to say about this project. It sounds like you are affirming Tom's "Better Gedcom has failed".

I think we have already seen "they will solve it their own way" as far as sharing of our research.

What is the "next step"? Continue as you have, face reality, or find a solution?

Russ
ttwetmore 2011-12-19T02:00:40-08:00
Sources in DeadEnds
I have written a short intro to the DeadEnds approach to sources and citations. It is available here:

http://bartonstreet.com/deadends/DeadEndsSources.pdf

I would have written more, but I have to go take care of my father for a few days and will have little access to the web and no access to my own work.
ACProctor 2011-12-19T07:45:11-08:00
Thanks Tom.

I think my biggest comment has to be the use of UUIDs to identity records. I understand about them being designed to be spatially and temporally unique (i.e. not clashing with ones generated elsewhere in the world, and at any time) but they are amorphous and convey nothing on their own.

On the surface of it, it sounds like a matter of personal preference whether one uses UUIDs or some local symbolic name generated from the corresponding Person or Place name (or whatever) since they would both work. In the latter case, uniqueness can still be guaranteed by interpreting those names as local to the current "dataset", and prepending the dataset name where necessary (e.g. dataset:localid). The latter is my preference and I've based my own work on this scheme. In effect, it references the entities through a hierarchical namespace.

I think one small advantage of using a name with some visible semantics is when diagnosing a problem such as a broken link. It's far easier if the data is just about humanly readable.

Tony
testuser42 2011-12-19T14:51:26-08:00
I like it. Clear and clean and very flexible.

We should also show where the elements of a complex citation will be found in this model. GeneJ has posted a lot of these citations, e.g. here About Citations and here Citation Graphics. GeneJ, could you pick one or two from these and let's try to see how they'll fit? It might be necessary to identify or explain some or all parts/elements of your citation first (I've never seen a book about citations, I'm baffled how complex they can get...)
GeneJ 2011-12-19T16:00:16-08:00
Hi there testuser!

In the meeting this morning, we talked about setting up some examples to profile different model features. Some focus right now in getting a draft of the organizational documents out. (Having a bit of egg nog, too.)

Other than the source documents and related style requirements, I haven't had time to think about how to incorporate application requirements of the larger and smaller source systems.

Geir mentioned using a census in one of his recent postings. The EE and Register style differ a bit on citations for census, another reason to consider census. They are both based on CMOS, which gives us another reference.

We might use Asa Thomas' 1880 census, since some application work has been done with that.
http://bettergedcom.wikispaces.com/Citation+Graphics#112%201880%20US%20Census
http://bettergedcom.wikispaces.com/Software+Citations

Other than the source documents and related style requirements, I haven't had time to think about how to incorporate more application requirements (ala, larger/larger and larger/smaller source systems). The gang may have other ideas on how to represent those existing requirements. Thoughts welcome. --GJ

P.S. Only somewhat along the same lines, I've been noodling about posts by Tom and Louis here.
http://bettergedcom.wikispaces.com/message/view/Sources+and+Citations/48343192#48347520
http://bettergedcom.wikispaces.com/message/view/Sources+and+Citations/48343192#48357670
That concept probably needs an example "with moving parts." When we worked on the model for E&C, you might recall the different working file requirements and effect of "alt" pfacts (multiple record-based entries for the same event). We need to solve the need for an abstract/extract in the source data when the various different record data became/rolled up (choose your term) into a conclusion tag/pfact. Louis and I touched on the circumstance in our comments to his blog post, "Ordering Events by Date." http://www.beholdgenealogy.com/blog/?p=883 (BTW, read Robert B's comment about events that only occur once.) We might be able to use the various references to Asa Thomas' birth from "Citation Graphics" as an example.
GeneJ 2011-12-19T16:08:20-08:00
Err... in above post, "We had to address the need for an abstract/extract in the source data when the various different record-data entries became/rolled up (choose your term) into a conclusion tag/pfact.
ttwetmore 2011-12-20T04:40:55-08:00
@Tony,

I don't feel a need for record id's to convey meaning, as I view them as strictly "internal" values. I wouldn't want to have to create an id by hand for every person, though I guess I could conceive of an algorithm that would try to make something meaningful from pieces of names and dates. But imagining a database of ten thousand individuals, for example, how meaningful could an id really be?

The GEDCOM standard has the REFN tag, which allows you to give a person a user key. I take advantage of this on a very few key persons so I can get right to them with a couple keystrokes, and then navigate from them to find just about anyone quickly. Since DeadEnds allows pretty much arbitrary attributes, the REFN concept is supported.

In many databases the records have many id values for all kinds of purposes. I try to avoid a plethora of ids in my file formats.
ttwetmore 2011-12-20T04:46:00-08:00
@Testuser,

I agree that there should be a section in the DeadEnds specs that defines all the source type tags and all the "source attributes" that are appropriate for each one. Also which "source attributes" are appropriate in the source references that refer to each type of source type. This is what I have called tasks 3 and 4, but here as applied to DeadEnds rather than to Better GEDCOM.

I have chosen to call the attributes "source attributes" rather than "citation elements" because there will be source attributes that are not citation elements. For example, attributes for things like primary or secondary, direct or indirect, surety and believability, and so on, for each item of evidence.
AdrianB38 2011-12-20T09:51:46-08:00
"there will be source attributes that are not citation elements"
Another example might be the URL of a site. As I recollect, someone (possibly even ESM) pointed out that if you'd got FamilySearch down as (say) the publisher of a source record on a web-site, it is over-egging the pudding to go on and say in a printed citation that the URL is www.familysearch.org. On the other hand, you might very well want to keep the URL internally for your own purposes.

There will also be source attributes that some people will use as citation elements while others, with a "thinner" scheme, will not use - even for the same source record. An example of that might be the use of an enumeration district (ED) for a census source record. They seem to be quite prevalent outside the UK, so one _might_ want to capture them as source attributes. However, English & Welsh citations for censuses don't generally include the ED as we substitute the (UK) National Archives document reference instead. (In fact, our English & Welsh citations for censuses have little relation in style terms to (say) US censuses and are much more like a generic (UK) National Archives archival document instead.)

As a result, someone in the US might collect English census records with the ED and might generate printed citations with the ED in. Especially when you consider those Brits can't make their minds up as the Scots reference _do_ include EDs. But if you send me your source records in as BG file with EDs throughout, I might want to tweak the printouts to exclude EDs for English censuses and the safest way is to alter the type so that the ED data is retained, just not printed.

(I think ESM advocates using the style for the country originating the source but I'm not sure I really see the point of forcing everyone to learn those styles for only a few records. Especially if your style is your comfort blanket.)
ACProctor 2011-12-20T10:04:08-08:00
It's a good point about some data in a citation reference is optional, or rather 'superfluous' since the need is merely to uniquely identify the source and enable it to be re-examined if necessary.

Give, that census returns in different countries (& even different years) cannot be treated in the same way, I don't see why a UK/Wales one cannot specify only the NA census reference and leave the ED as a mere property.

Incidentally, the UK 1911 census does have the ED in the full reference :-)
AdrianB38 2011-12-20T15:02:54-08:00
"Incidentally, the UK 1911 census does have the ED in the full reference :-)"

Yeah - but that's only because FindMyPast concocted a reference that included everything apart from the kitchen sink! I'm following previous formats and omitting it because it's not necessary.
ttwetmore 2011-12-27T05:08:43-08:00
Report Writing
How should Better GEDCOM support report writing? How should Better GEDCOM support footnote generation?

I ask these questions because of the complexity of many of GeneJ’s “citation” examples, with “citation” in quotes because many of her “citations” read like free-format footnotes rather than standard bibliographic entries.

Where is the boundary between what a genealogical program should be able to produce as an automated, boiler plate report, and what a genealogical professional must be able to produce as a publishable document or a report for a client? Should a genealogical program be able to automatically generate a professional quality document, suitable for publishing, and if this true, should the Better GEDCOM model support this capability?

If a genealogical program should be able to generate professional quality reports, does that mean that genealogists must provide much of the report as “notes” that are added to the program’s database, or does that mean the structure of the database must be complex enough that the program can gather together information and then “compose” a professional quality report? As far a Better GEDCOM is concerned, should a professional quality report be “transferrable” as Better GEDCOM data, so that reports can be exported by one program, imported by another, and generated with an identical formats by the importing program? Or should the ability to generate professional quality reports be a “value-added” feature that distinguishes one genealogical program from the others, with no support from Better GEDCOM?

Should a genealogist be able to use their genealogical software as their word processor, or should they use a standard word processor and compose their reports using information extracted from their genealogical databases?

There seems to be quite a gap between simple citations that can be defined by templates, and the long, informative reference notes that GeneJ provides as examples. Looking at GeneJ’s examples it seems that she expects that her genealogical program, and be extension Better GEDCOM, be able to produce publishable quality footnotes. Does this extend to her wanting her genealogical program, and by extension Better GEDCOM, to be able to produce final, professional reports?

I can see how to allow Better GEDCOM to support high-quality footnotes of the type GeneJ uses by allowing users to add free-format text fields to their citation elements. I’m not sure how to extend this into the realm of allowing Better GEDCOM to support the generation of professional reports.

Any opinions on these questions?
ACProctor 2011-12-28T16:30:39-08:00
Re: "Those who claim that data and presentation should be kept separate, and want to prevent the transfer of presentation templates, either has not understood the complexity of citations, are trying to sweep the problems under the carpet, or ignore important user requirements"

I'm one of the people being cited here but I strongly disagree with the assessment. I have requested Gene to "take me to task" and present me with some specific examples. I will then try and illustrate the principle and how it might work from end-to-end.

Tony
ttwetmore 2011-12-28T16:33:17-08:00
Geir says,

I have no problem with transferring the templates outside the BG file, but we still have to define how it shall look like - the template language, and the linking to the source type. And in my view, there is an exception, the overruling templates - they should go in the BG file.

I believe that BG should have no direct concern with the template language. Templates obviously need to know about BG source types and citation element types, but BG needs to know absolutely nothing about templates. BG data should never contain or transfer templates of any kind.

I am not against the idea of a standard template language that many vendors could use. BG people might even be interested in participating in developing such a language, and their joint knowledge of the BG model could be nothing but a help in working on that project. But templates are part of the presentation layer, fully disjoint from the data layer.
ACProctor 2011-12-29T04:09:52-08:00
Again, I'm in total agreement with someone - Things are looking up :-)

The definition of a template language and its acceptance as a standard is - as Tom says - separate from BG. It could be done as a completely separate project and hooked into any number of other data models.

Whether we're a part of that design process is a matter for everyone to decide on. I personally believe we should work closely with the likes of Zotero since CSL fits the bill nicely. Also, it cements our position as being involved in genealogy standards in general.

For the task I've just accepted, to illustrate how such a scheme could handle Gene's citations, I will pluck a language out of thin air that will hopefully be accessible to everyone. The overall concepts are not that difficult but I hope that by "joining the dots" it will be better understood.

Tony
ttwetmore 2011-12-29T06:14:45-08:00
Geir says,

Those who claim that data and presentation should be kept separate, and want to prevent the transfer of presentation templates, either has not understood the complexity of citations, are trying to sweep the problems under the carpet, or ignore important user requirements.

Geir, that is absolute, total and complete rubbish.
ttwetmore 2011-12-29T06:56:54-08:00
I wrote this in response to Geir's question as to why I started this thread on this page:

I posted to this page because I was concerned how to fit GeneJ’s reference notes into a consistent BG model, and what the implications were. GeneJ has always presented her reference notes in the form of how they would appear in a final report, and never as how they would be encoded in Better Gedcom data or dealt with in a consistent Better Gedcom model.

For Geir's further elucidation I feel I should expand a bit. Because GeneJ has always presented her reference notes in the form that they would appear in on a final report, I have to wonder whether she believes that her genealogical program should produce those report-format citations, which leads me to wonder whether she believes her genealogical program should be able to produce her final reports, which then leads me to wonder whether she believes that Better GEDCOM should provide the narrative capabilities necessary for her genealogical program to produce her final reports. I was hoping to open up this topic for discussion, because I think we are now starting to go crazy in the citation area. That GeneJ finds this all irrelevant is rather telling, though not surprising.

I think it would be ironic if Better GEDCOM gave itself the goal of supporting report quality citations (that could not be generated by simple templates), if it did not address the problem of generating professional quality reports. And since I am an obvious adherent to the separation of data layer from presentation layer, for me the irony would border on stupidity.

Am I the only one who believes that this long, and apparently no-end-in-sight tangent that we are on to deal with sources and citations, which seems based on keeping GeneJ assured that BG will be able to support her reference notes, is one more over-the-top experiment in complexity, that seems to me to have no common sense being applied to it? All we have to do to get GeneJ the capabilities that she needs, in converting citations to nice reference notes, is to allow BG source records and source citations to have free-format notes. Can't we just say this and get on with something interesting, like how to properly support evidence in our model? If we have a simple solution, if most of us agree with that simple solution, at what point do we declare that it has been properly discussed, and we make the decision to proceed?
AdrianB38 2011-12-29T10:02:33-08:00
Geir says "Those who claim that data and presentation should be kept separate, and want to prevent the transfer of presentation templates, either has not understood the complexity of citations, are trying to sweep the problems under the carpet, or ignore important user requirements."

Like others, I'm afraid I need to see specific examples. The principle of separation of data and the presentation of it, is so deeply enshrined in IT thought, that deliberately breaking it would lose BG any credibility among those genealogy software suppliers who actually understand IT.

Now, "separation of data and its presentation" is one of those principles that doesn't bear too much investigation if you want to apply mathematical rigour. How many CSS-mavens use <em> tags in their HTML and how many set up specific CSS schemes to do exactly the same thing but with a lot more code? Quite... If it's a one-liner then to heck with separation (as Tony suggested earlier). But to break that principle for things that ARE complex needs major justification with examples.

I actually have concerns whether we are even talking about the same thing. The sort of templates that I am dismissing as outside the scope of BG are those that produce the OUTPUT citation in the form of a reference foot/end-note, bibliography entry, whatever. These are exclusively concerned with output presentation and if we are to take those within the scope of BG (Geir suggests the overruling templates should be within the BG file, by which I interpret him to mean within the BG "language" and standard) then we will be forever stuck in trying to decide whether ESM formats or Chicago formats apply to a book and whether the default date format should be dd/mmm/yyyy or mmm/dd/yyyy etc. And that's just for simple stuff.

HOWEVER - I am worried that confusion has arisen because what little I've seen of software that uses "citation" templates, uses the templates for TWO functions - definition of what data is to be collected and definition of how that data is to be presented (on screen or in a report).

The use of a template to define how that data is to be presented is, guess what, outside the scope of BG in my view.

The use of a "template" (or whatever you want to call it) to define what items are to be collected for a type of source record is about defining data and is so very much within the scope of BG. For the avoidance of doubt about my thoughts, such a "template" should simply be a list (possibly structured with subordinate items) with no relation to how the stuff should be printed.

So - is anyone under the impression that rejection of a template within BG defining how "citation" data is to be presented on a report or screen, also means that BG is also rejecting use of a "template" to define what data items are to be collected for a type of source record??? Because I certainly reject the first (presentation) from BG but definitely accept the 2nd (data definition).
AdrianB38 2011-12-29T10:11:26-08:00
Tom re "All we have to do to get GeneJ the capabilities that she needs, in converting citations to nice reference notes, is to allow BG source records and source citations to have free-format notes."

I think this is a part of the solution - the notes could be (perhaps) defined as those appearing those before the actual pure source-identification bit and those appearing after. I suspect though that there are more items to consider - but apart from multi-reference citations, nothing of dramatic complexity. Unfortunately, because of the limitations of my own software I can only give instances from real history books written by word-processors so can't talk myself with authority on the sort of constructs created by TMG, Genbox, etc.

In my view you were very right to raise the topic - unfortunately we seem short on specifics for you.
hrworth 2011-12-29T10:59:57-08:00
Question:

Isn't a "Reference Note" a combination of Source data and citation data?

I the program I use, The Reference Note includes both Source Information and Citation Information, AND I have the option of including another field from my Citation data.

Russ
ttwetmore 2011-12-29T11:31:09-08:00
Russ,

One of the issues I see in straightening out this area is how overloaded the term "citation" is. You use the term "citation data," and you know exactly what you mean by it, but I don't. I can guess that you mean things like title, author, page number, and so on, but you might also mean free-format additions like a summary of the evidence, a statement about conflicts between the evidence and other evidence, and so on.

In my opinion, a citation is a string of text that only locates evidence within a source, and a reference note is something that includes more than a citation. However, there do seem to be some more extended definitions of citation that would include more than just an "evidence locator." I guess BG can feel free to define a citation in our own way.

One source of confusion might be that some of us think of a citation and a reference note as being synonymous, while others of us think of citations as just evidence locators, and reference notes as more general thingies.

I think of the purpose of a source record and the reference to it as providing all the "evidence locator information" in citation elements. But I think that we can easily extend this "pure citation" idea into a more general "reference note" by allowing free format notes to be added in the source reference. I believe this is what GeneJ does with TMG now, and I think it gets us the same power that TMG provides.
ttwetmore 2011-12-29T12:00:53-08:00
Adrian,

I agree with all your points.

I would say that the template for generating output, and the template for collecting citation elements in the user interface, though they are closely related, are not the same thing, so that's how I would avoid the slight dilemma in your final question.
hrworth 2011-12-29T12:23:28-08:00
Tom,

You only said Citation, but I also had Source, that make up what I see, in the program that I use, the Reference Note.

What am I looking at, like a Book would be the Source.

Where in the book, or where in a record, did I find that information that I am Citing.

The formatting or the way this is transported is up to you all, but in the Application I see.

< Source Data > < Citation Data > is presented in the Reference Note.

You said:

italics"In my opinion, a citation is a string of text that only locates evidence within a sourceitalics

Agree with that.

italicsand a reference note is something that includes more than a citation"italics

I am just adding, again from within the application that I use, the Reference Note also includes some Source information. (Book, Author), as well.

Or, several strings of data elements, a source string, a citation string would make up the Reference Note. If the Data Element strings are understood and defined, which you ALL have been working on for over a year, then the Application could Present the Reference Note, if it so chooses, to the end user.

To take this one step further, the Data Element Strings (Citation as you defined it) could then be presented by that Application however it see fit, like Footnotes, or EndNotes.

I think that what I am trying to say, is the Reference Note might belong to the Application for presentation (or not) to the end user. The Data Elements are most important for BetterGEDCOM.

Only one end user's opinion.

Thanks for listening.

Russ
ttwetmore 2011-12-29T15:22:23-08:00
Russ,

Thanks. I agree with what you are saying.

For me a source record has the elements that describe the source as a whole (title, author).

Then a source reference, which refers to source record, contains additional elements that locates specific evidence within the source.

Then extra notes could be added to the source reference to describe other aspects of the evidence, the things that I think we are saying would belong in a reference note, like a summary of the evidence, or a comment on the quality of the evidence, or a comment on the conclusions we feel we can derive from the evidence. That is, anything we might want to add that we feel needs to be understood before the evidence can be fully understood. I think that this handles most of GeneJ's needs.

A citation string could be generated by the location information in a source reference combined with the source elements in the source record.

A reference note could be generated from the same elements in the source reference and the source record, with the addition of the extra notes added to the source reference.

I believe that this is just my way of saying the same thing that you are saying.
ACProctor 2011-12-28T10:29:43-08:00
I entirely agree with Andy. BG should be a computer-readable representation that is used for exchange and long-term storage of genealogical and family-history data.

All presentation of data to users - whether on-screen, in a paper report, or in a chart - should be under the control of individual software products. Reports should be not exchanged via BG. User-readable data and computer-readable data are separate things.

As for citation references, the stored electronic format should only contain the data to enable a report-writer module (possibly using some library layer) to construct an in-text reference, a reference note, or a bibliography entry, as appropriate. These concepts themselves should have no representation inside BG.

Tony
ttwetmore 2011-12-28T10:53:17-08:00
There seems clear agreement that the Better GEDCOM format is for holding data only, and that professional reports are either the function of a richly featured genealogical program, or, more prosaically, by a word processor.

I specifically brought up the subject of GeneJ's reference notes, because they seem to exist at a kind of intersection point between data and presentation. Her reference notes are good examples of rich statements that seem (to me) to be useful, as is, in a professional report.

Because of that I was wondering exactly what is the relationship between GeneJ's rich reference notes, and the more usual standard citation or bibliographic entry that follows some standard form to just describe a source.

I think it would be possible to get those rich reference notes handled in BG simply by adding note fields to source records and source references. In fact, source templates could be used, by allowing them to specify where in the template the extra notes added by the user should appear in the presentation. Recalling the fact that the whole purpose of templates is to "convert" data to presentation this seems the right way to do it.

Likewise, automatically generated reports are the outputs from a more complex type of template. For example, in my LifeLines program I have templates for Register reports, Henry reports, all kinds of ancestor and descendent charts, and so on. Each of these is created by a report template that LifeLines interprets against the GEDCOM data in the database. In the Register report, the template defines how to convert DATE and PLAC values into sentences, it describes where the values of NOTE lines should be inserted in the report output, and so on. The templates can be used to generate pure text, or postscript, or PDF, or TeX, or gruff, or RTF, or CSV files, or whatever format for converting into a typeset or word processor file.

So I end with a question. Are GeneJ's example reference notes data or are they presentation?
ttwetmore 2011-12-28T10:58:28-08:00
Sorry, the automatic spell corrector built-in on my updated Mac Lion system converted groff to gruff.
ACProctor 2011-12-28T11:07:46-08:00
Not quite sure I understand which way you're thinking Tom.

I firmly believe that BG should have no presentation information in it at all. That means no fonts, italics, bold, colo(u)rs. However, there would be a place for the generic XHTML concepts such as <em> and <strong>.

Gene's case is probably not that unusal. If there is a source-type definition for each of her distinct referenced sources, and those definitions had corresponding elements for each item of data in her references, then her requirements would be entirely catered for by a private set of formatting templates that generated the exact reference notes from the stored electronic form.

I'm sorry for banging on about this but I believe the enumeration of all possible source-types, and the enumeration of all associated elements, and the presentation styles for the user-readable references, are outside the scope of BG as a data standard. We should focus on a generic electronic representation and indicate (through POCs or documentation) how that gets in to a report. That may require other associated standards to be defined for it but maybe that's something we need to swallow now rather than later. There are distinct parts of the problem we're trying to solve and BG is just one precise part of it.

Tony
GeneJ 2011-12-28T14:02:25-08:00
I wish you all well as you move toward being, err.... irrelevant.
ACProctor 2011-12-28T14:10:52-08:00
Eh? Sorry Gene, Zen was not something I studied :-)

Tony
AdrianB38 2011-12-28T14:30:43-08:00
Let me try and summarise in perhaps slightly more IT-speak where we're at.
1. Most of us seem to agree that BG should be about transporting data content and not about transporting the presentation of that data (analogy - BG equates to HTML content, something else equates to CSS and that something else is not part of BG);
2. In order to achieve a desired presentation of the transferred data, the data needs to be marked up in some fashion.
3. The question therefore has to be - do we have mark-up available for all the bits of data that we might want to print? I think this is the nub of what Tom's asking.

I suggest the answer to q3 is - no, we don't.

If we had, then we could use templates (outside the BG standard, but possibly, I suggest, transferred as part of the BG package as an attached file) to define the presentation of said data. Exactly as Tom suggests.

What items - or mark-up for them - might we be missing?

Let me step right outside genealogy, then we can avoid any arguments about whether the format is correct or not. I've just opened a history book on the Brus family 1100-1295 to review what footnotes look like in a professional historian's book. Some things that spring out at me are:
- each footnote (and these are footnotes not end-notes) can refer to a number of sources for the fact in question. On the page I'm looking at, 6 footnotes refer to a total of 10 different source documents;
- a footnote can contain text that expands on the meaning of the fact being footnoted, as well as the reference to the source justifying the fact;
- some of the source references are prefixed by words like "See also" suggesting relevance but not necessarily straight justification;
- explanatory or expansionary notes in the footnote can appear before or after the reference to the source;
- some of the footnotes contain no source references at all but simply expand on the text above - e.g. text explaining measures of area such as carucates, hides, bovates and acres;

How many of these forms can be marked up in GEDCOM now?
- "each footnote can refer to a number of sources" - no - a source reference, at least in my software, generates a separate footnote. We cannot, in GEDCOM, collect source references into one footnote. I'm entirely unclear whether we should - to answer Tom's question we need to consider that;
- "a footnote can contain text that expands ... as well as the reference to the source justifying the fact". As the GEDCOM structure allows a source-reference to be accompanied by a note, it seems on first glance that we can do this. Except that - at least in my software - if I switch on printing of the accompanying notes, then all such notes get printed, whereas some of them are designed for printing, some are not. Tom referred to different sorts of notes a while ago. What we clearly(?) therefore need is the ability to mark up notes by type and designate some note-types for printing and some not. The mark-up aspect is clearly within the scope of BG and is not present in GEDCOM.
- "some of the source references are prefixed by words like "See also" ". Not currently part of GEDCOM as there is no means of qualifying the relevance of a source? Should there be? We need to know to answer Tom's question and that relevance is surely part of the mark-up and therefore in BG's scope;
- "notes in the footnote can appear before or after the reference to the source" - again, not currently possible in GEDCOM but (AFAIK) possible in native apps. If this is desired, we need to have at least 2 notes linked to any source reference held on the BG file, not just one. One to be marked up to precede the source reference in the footnote, one to follow;
- "some of the footnotes contain no source references at all but simply expand on the text above" Not currently possible in GEDCOM. We need to consider whether to allow the provision of such - that would result in the footnote text being marked up, without a source reference. That might well be done by adapting the current source reference structure to have an optional source.

I suggest that all of these - with the possible exception of the first - are clear candidates for appearing in the genealogical datafile, in a marked up form, and are therefore within the scope of BG. The first ("each footnote can refer to a number of sources") is something we have discussed elsewhere and I believe a lot of its purpose can be done by programming the report-writer part of the app - i.e. no need for mark-up. EXCEPT I'm unclear how multiple references with a single note phrase all in one footnote might work.

So - Tony - you mention "Gene's case is probably not that unusual. If there is a source-type definition for each of her distinct referenced sources, and those definitions had corresponding elements for each item of data in her references" - in essence, absolutely so. However, I think some of the missing elements that she needs are not just properties of sources but - like the 2 sets of notes marked up for printing or not - are (not quite) in the existing GEDCOM and could be or are in but need to be marked up.

Tom - you suggest "it would be possible to get those rich reference notes handled in BG simply by adding note fields to source records and source references" - again, I totally agree that this is one part of the solution.

So, in my postings on this thread I have indicated several possibilities for recording the data in a marked up manner, which is definitely in the scope of BG. I also have one aspect that I'd rather not handle in BG! These are just my ideas for enriching the data model to produce enriched reports.
gthorud 2011-12-28T15:30:29-08:00
I do not understand why this topic has been posted on this page. Tom’s logic seems to be that if we can/should not do one thing, that is a reason for not doing another thing. I don’t see this dependency. I do not agree with that approach.

Generating professional looking reports (I take that to mean, try to get rid of the robot language) should not be considered outside the scope of BetterGEDCOM – at least not before it has been properly discussed, and there some ideas around that could be discussed on a separate page – I have been playing with ideas that I guess is similar to what Tom is describing. The outcome of that discussion has no bearing on sources and citations. I would not be surprised if the author of Behold has a view on this.

Some are promoting the view that data and presentation should be kept separate, so BG should not have anything to do with presentation. I do not see why that should be a general principle that must apply in all cases. I think each case must be considered separately. And, talking generally, you don’t always have to make that choice – if you allow the presentation to be transferred, you could also allow a program ignore it, or some aspects of presentation should be chosen by the importing program and others could be transferred. So, please let us discuss each case individually. Well, I choose to simply ignore such general arguments. Those who claim that data and presentation should be kept separate, and want to prevent the transfer of presentation templates, either has not understood the complexity of citations, are trying to sweep the problems under the carpet, or ignore important user requirements.

In my document “A Data model for sources and citations 0.4” there is a feature called “overruling templates” that use the current template functionality found in several programs. It allows the construction of a free form reference note containing references to Citation Elements transferred as separate data – just as any other template (although there should be limitations wrt what functions of the template language can be used). As I have stated in discussions about the document, this feature should only be used after bilateral agreement, so it is not a green light for all the variants in Gene’s citations – you cannot use this feature when transferring to anybody and expect them to output it, but you should be able to transfer them between your own programs and good friends.

The other feature I envisage is to have a Citation Prefix and a Citation Suffix as Citation Elements to be rendered before and after the citation template for the reference note (or explicitly mentioned in the template). The current NOTE in Source Citations in GEDCOM is by most programs rendered at the end of the reference note, so the Suffix is already there. This feature requires no bilateral agreement. One could, as it seems Tom is proposing, allow the Prefix and Suffix (or whatever you call them) to appear anywhere in the template – that is an option that could be explored in the process of standardizing the “80%” source types and templates, but I fear it will be a complicating factor – it is not easy to predict where these would occur in the standard template - remains to be seen.

The ability to reference a citation in a NOTE, as mentioned by Adrian, should be included.

But, again, I disagree that we have to make a choice between data and presentation wrt sources and citations.

I will go on to read Adrian's last posting.
gthorud 2011-12-28T16:07:34-08:00
Adrian,

I have no problem with transferring the templates outside the BG file, but we still have to define how it shall look like - the template language, and the linking to the source type. And in my view, there is an exception, the overruling templates - they should go in the BG file.

Several programs can merge citations into one reference note, simply by concatenating them with semicolons between, but they are kept separate in the "data".

I have not seen a program that can reference several sources in a (non merged citation), but if you do all you reasoning in a reference not it is not hard to envisage an argument based on several sources. If all it takes is to indicate that two citations should be concatenated - without a semicolon - maybe that is a solution - it is not far from the functionality in several programs.

I have no problem with a "non-printing" CE for comments.

About prefix (See also) ... see my last posting above.

A plain "Reference note" - there is a complicating case here, if I choose to have inline citations, I still want to have footnotes/endnotes for other purposes than citations. But, you could do it by having the program treat "footnote/endnotes referring no source" modeled as citations differently from other citations. Perhaps by having a "footnote/endnote" dummy source type.
gthorud 2011-12-28T16:13:20-08:00
See last paragraph - it should start "A plain footnote/endnote ..."
ACProctor 2011-12-28T16:23:09-08:00
Re: "I still want to have footnotes/endnotes for other purposes"

In my independent work, which I'll introduce next Monday, my Narrative text elements can have a Key (just like my top-level entities - Person, Place, etc) and these can be referenced from other Narrative text sections.

For a computer interface these would generate hyperlinks, but for a printed output they would generate some appropriate mode of reference note (e.g. footnote or in-text).

These Narrative text elements already allow embedded references to Persons, Places, Events, Citations, etc., by their Keys.

They also have attributes that differentiate evidence from conclusion, provide privacy controls, and surety assessments. This can all be used to provide a primitive but flexible E&C linkage to depict a sort of decision tree.

Tony
ttwetmore 2011-12-28T16:24:50-08:00
@Geir,

I do not understand why this topic has been posted on this page.

I posted to this page because I was concerned how to fit GeneJ’s reference notes into a consistent BG model, and what the implications were. GeneJ has always presented her reference notes in the form of how they would appear in a final report, and never as how they would be encoded in Better Gedcom data or dealt with in a consistent Better Gedcom model.

Tom’s logic seems to be that if we can/should not do one thing, that is a reason for not doing another thing. I don’t see this dependency. I do not agree with that approach.

What are the two things that you think that I think we should not do? I know that I don’t think that the Better Gedcom team should fly to the moon. So therefore I guess that means that I don’t think that BG data should include presentation information.

Generating professional looking reports (I take that to mean, try to get rid of the robot language) should not be considered outside the scope of BetterGEDCOM – at least not before it has been properly discussed, and there some ideas around that could be discussed on a separate page

It seems to me that when you say “not properly discussed” it really means that you don't agree with it.

Some are promoting the view that data and presentation should be kept separate, so BG should not have anything to do with presentation. I do not see why that should be a general principle that must apply in all cases. I think each case must be considered separately. And, talking generally, you don’t always have to make that choice – if you allow the presentation to be transferred, you could also allow a program ignore it, or some aspects of presentation should be chosen by the importing program and others could be transferred. So, please let us discuss each case individually. Well, I choose to simply ignore such general arguments. Those who claim that data and presentation should be kept separate, and want to prevent the transfer of presentation templates, either has not understood the complexity of citations, are trying to sweep the problems under the carpet, or ignore important user requirements.

How many years is it going to take to consider separately all those cases? Who is going to make a list of all those cases? How long are we going to argue those cases and what new cases should be on the list?
Andy_Hatchett 2011-12-27T05:46:50-08:00
Tom asks...
"Should a genealogist be able to use their genealogical software as their word processor, or should they use a standard word processor and compose their reports using information extracted from their genealogical databases?"

While I differ with ESM on a great many things, on this particular subject we are in full agreement. I don't believe that any genealogy program is capable of writing a truly professional report- nor should they be expected to.

I see BetterGEDCOM's role as transferring data. The assembling of that data into a professional report is best left to the individual genealogist and their word processor of choice.
ttwetmore 2011-12-27T06:30:37-08:00
@Andy,

I agree with you. When I am writing reports I have my word processor (pages) and my genealogical program (LifeLines) open at the same time, and I go back and forth between the two. Of course LifeLines can generate lots of boiler plate reports, but they are more like glorified notes than a real report. However, I could imagine, in the best of all worlds, a program that combines the features of both, but I don't know how realistic it is to hope for such a program.

Next question then. Are the citations and reference notes that GeneJ gives as examples something that Better GEDCOM should directly support, or are they part of a report handled by a word processor, where those footnotes should be composed by hand from information in the database?

I think that GeneJ composes her footnotes directly using TMG source citations by using note elements. I'm not sure where this falls in the the data versus report spectrum. I have generally thought that GeneJ expects too much out of Better GEDCOM in this area, but I'm not sure. Her reference note examples are very nice, exactly what one would want to see in reports, but I don't know whether Better GEDCOM should support their automatic generation.

I guess I see the support that TMG seems to give for composing report-quality footnotes more of a "loophole" that GeneJ takes advantage of, than as a feature that it is important for Better GEDCOM to provide. Though, of course, if Better GEDCOM allows notes in source records and source references, it also provides the same loophole.

Frankly I think the question of whether BG should somehow support publication quality report generation, is a core issue that has so far remained an elephant in the living room.
ttwetmore 2011-12-27T06:41:50-08:00
GeneJ 2011-12-27T08:31:18-08:00
@Tom,

Cutting to the chase (we have had these same discussions before), I really don't see how the sources and citations discussion has much to do with report writing.

BetterGEDCOM is either being developed to support an outdated concept that combined record data produces a genealogy, or we recognize what apparently for some are more modern data concepts supporting conclusions, ala, the GPS.

Perhaps it is "expecting too much" for all BetterGEDCOM wiki-ites to support the GPS. Those who practice the GPS find the combined record data approach falls far short of the mark.

P.S. The references to Mills are out of context. As above, however, I don't believe that is the issue here.
ttwetmore 2011-12-27T08:49:14-08:00
What have been the results of those discussions? Do we want our genealogy programs to generate reports or not?

I don't know what you mean by "an outdated concept that combined record data produces a genealogy." Do you mean GEDCOM? Do you have specific models in mind? What do you mean by "record data"?

How would you describe the "more modern data concepts supporting conclusions, ala, the GPS"? What are the more modern data concepts you have in mind? The DeadEnds approach of adding evidence records and conclusions was my way of supporting GPS. It seems to me, by your frequent posts, that you see the way of supporting GPS is primarily to allow users to create complex reference notes. Is that correct? You've never been in favor of evidence records. Is that a fair thing to say?

What is the "combined record data approach" that "falls far short of the mark"? Are you referring to the ideas around persona records when you say this? I would like to understand what you mean, but these terms leave me confused.
GeneJ 2011-12-27T09:32:30-08:00
(1) "What is the 'combined record data approach' that 'falls far short of the mark'?

In part, quoting below from my earlier posting, "Combining evidence personas versus conclusions."
https://bettergedcom.wikispaces.com/message/view/Goal+Oriented+Research/40711171

Evidence Personas ... I assume we all know our sources (and record groups) are often incomplete or flawed, at least for our purpose. As well, our interpretation of a source may be flawed as to some of the details we seek to record. All told, I assume we know that in a body of work, various sources will contain information that conflicts with information, as we interpret same, from some other source--which may or may not have been discovered in a timely or convenient manner.

Combining Evidence Personas ... As I understand the description of the your process, Tom's "conclusion person" is formed by collecting documents that refer to generally same- or similarly-named persons and grouping the detailed data therein with the logic, "I think this is the same person because ..." Even if all records could be readily/simplistically codified (a separate issue) and if no genealogist made transcription, translation or typographical errors (we just know they don't), the process of grouping various records under one person with that "same person" logic only accomplishes a compilation of record data. Ala, a combination of flawed and conflicting records (whether we realize it at the time or not) forming a larger group of oft' duplicative, flawed and conflicting data--dramatically, a frankenstein.

I see a gap between that "combined" record and the "conclusion person" record in modern software. In the combined record, no doubt even when every record truly is about the same person, very different identities will become conflated and it might include enough parents or children to form a small community.

(2) "What have been the results of those discussions? Do we want our genealogy programs to generate reports or not?"

I don't follow the relevancy. Many programs on the market support narratives and other genealogical reports such as family group sheets and charts. I'm failing to see how programs support for such output has much to do with whether users record conclusions and the related reference notes and bibliographic citations.
ttwetmore 2011-12-27T12:42:00-08:00
@GeneJ says,

Combining Evidence Personas ... As I understand the description of the your process, Tom's "conclusion person" is formed by collecting documents that refer to generally same- or similarly-named persons and grouping the detailed data therein with the logic, "I think this is the same person because ..." Even if all records could be readily/simplistically codified (a separate issue) and if no genealogist made transcription, translation or typographical errors (we just know they don't), the process of grouping various records under one person with that "same person" logic only accomplishes a compilation of record data. Ala, a combination of flawed and conflicting records (whether we realize it at the time or not) forming a larger group of oft' duplicative, flawed and conflicting data--dramatically, a frankenstein.

Other than the FUD words (“simplistically”, “only accomplishes a compilation”, “a frankenstein”) this is an accurate description of the process. I am surprised you don’t recognize this process as precisely what you do in your head as you examine evidence and decide what evidence applies to the same person and what does not. Having personas gives you a great representational leap ahead. Of course there can be errors in the evidence, but how is that more of a problem in a system with evidence based records than in a system that is not. At least in an evidence based system you can correct the errors in an obvious and traceable way, whereas those corrections are hidden away or not even recognizable in a system that does not represent the evidence.

I see a gap between that "combined" record and the "conclusion person" record in modern software. In the combined record, no doubt even when every record truly is about the same person, very different identities will become conflated and it might include enough parents or children to form a small community.

I don’t understand. When mistakes are made (“different identities will become conflated”) in “my” process or in your “head process” there are no differences. I have to adjust the combination of evidence records to correct the mistake, and you have to rearrange your reference notes and conclusion persons into a different set to correct the mistake.

(Sorry that this has nothing to do with the report writing question I started off with.)
AdrianB38 2011-12-27T16:32:06-08:00
OK - trying to return to Tom's original question...

Let me point out that there is very little software that records output conclusions and keeps them separate from input evidence, which is in turn separate from source texts. Also very little software that links sources by various relationships such as "transcribed from" or "part of". Yet many of us are quite prepared to say that the BG data should facilitate such aspects, even if most of us are hazy over the details of the functions involved. Seems to me therefore that we should attempt to be equally open minded over facilitating the production of high quality reports and attempt to accommodate that within the BG data model.

Of course, there is the big question - is professional quality report production feasible? I think it's a lot more feasible than you may suspect. Just think of the literary (in)capability of some of the postings that you've read on some boards - not this, of course - face it, it would be difficult for a computer program to write text as bad as some of those posts. The definition of success has to be some sort of literary Turing test - can the reader tell whether it's a computer generating the report or whether it's a human writing it? While any software generated report that I've seen is repetitious in format, I might add that some of the US genealogical society reports are indeed repetitious in format when compared to a "real" biography.

All we need are the basic facts and some natural language algorithms that combine these facts in varying manners and we have something that - while it might not pass a literary Turing test if you're expecting a Pulitzer prize winning author, might just fool someone expecting a report of defined format written by a normal human being. And I wouldn't care to say how far we are off those algorithms - if Siri (e.g.) can understand natural language, what is required to go in the opposite direction? (I bet Apple have a good idea...)

Even if expert quality report generation is not possible (yet), it seems to me that a mediocre quality report, run off quickly, has advantages. In all honesty, I'm more likely to send one of them than a GEDCOM.

So, what can the BG format contribute? Well, not, I suggest natural language forms, alternative constructs etc. I believe this to be yet another instance where BG should contribute the stored data and the presentation of that stored data should be kept wholly separate.

But there are aspects where I think the data needs to be stored in a manner that facilitates the output presentation if only because that's how humans think - in several directions at once. Some things that spring to mind are:
- alternative descriptions for dates (e.g. "During the Second World War", to be interpreted as "Between 1939 and 1945" - or similar) - already possible in GEDCOM as date phrases though not greatly used?
- text notes (i.e. NOTE in GEDCOM) to be constructed in a manner that allow source references to appear next to the phrases within the note that they justify, instead of being applied to the NOTE as a whole.
- differentiation of notes between those intended for printing and those not. I suspect a lot of people don't intend their notes to facts for printing, keeping their use for personal use only, whereas notes for me are totally literary.
- ability to create footnotes that are footnotes not source reference notes. (OK - this is where I get annoyed with those serious historians who use the same format for sourcing-reference notes and for asides that illuminate. One I take as read, the other is useful to be read in context - how do I tell which is which???)
- ability to create complex source reference notes that contain more than just the source reference but also explanatory text etc, (before or after the citation bitt and maybe even a 2nd reference giving the source of the source (oooh - I think we might have a data model there!)
GeneJ 2011-12-28T07:49:38-08:00
@Tom, Adrian, all

Tom writes, " I have generally thought that GeneJ expects too much out of Better GEDCOM in this area, but I'm not sure. Her reference note examples are very nice, exactly what one would want to see in reports, but I don't know whether Better GEDCOM should support their automatic generation."

Some programs include narrative features and/or are capable of producing different reports. Tom's question suggests to me some believe the requirements for reference notes are different, indeed lower, if the information appears in a database rather than a narrative.

For example, if an author develops a narrative about say, the 1962 Chevrolet Corvette--I assume the work would include different assertions, and the author would cite various authorities long the way. Is there any precedence that lower citation requirements apply if the author chose to publish the work as a database rather than a narrative?

If there is no precedence, then what is the rationale for BetterGEDCOM setting a precedence that reference note requirements are lower in a published database than in a published narrative? --GJ
ttwetmore 2011-12-28T08:24:30-08:00

GeneJ,

Tom's question suggests to me some believe the requirements for reference notes are different, indeed lower, if the information appears in a database rather than a narrative.

That thought has not crossed my mind. Genealogical programs are glorified databases, but they usually have enough note facilities that one can add free-format information in one way or another. The point of this thread was to gather opinions on whether a glorified database with notes was sufficient for generating professional quality reports, with the implied question, that, if not, what might need to be added?

Is there any precedence that lower citation requirements apply if the author chose to publish the work as a database rather than a narrative?

I’ve never heard of an author publishing a database as a serious genealogical report, so I get no sense out of your question.

If there is no precedence, then what is the rationale for BetterGEDCOM setting a precedence that reference note requirements are lower in a published database than in a published narrative?

Where has anyone, at any time, ever said anything about there being a precedence, or about “setting precedence” or about “lowering requirements” about anything? More FUD.
AdrianB38 2011-12-28T08:31:52-08:00
"Tom's question suggests to me some believe the requirements for reference notes are different, indeed lower, if the information appears in a database rather than a narrative."

That's not how I read Tom's suggestions at all. Let's be clear: we (or at least, I believe it's "we") are quite clear that all the information for the particular reference note should be there SOMEWHERE in the database. What may or may not be possible is that all that information be held in such a manner that, with one hit of the "Print narrative report" button, it comes out with the degree of sophistication that a human being could give it when writing in a word-processor.

Take an example that we talked about some time ago - the single footnote, referring to one fact / assertion / whatever, that has two independent sources cited within it. Perfectly standard piece of writing. There are (at least) 3 ways that a genealogy app could deal with this:
- do not provide any facility to produce a single footnote with 2 citations in it (i.e. don't deal with it);
- automatically merge multiple citations for a single fact / assertion / whatever into a single footnote (possible disadvantage is that this then applies to all such);
- allow the user to mark up two or more links from the fact / assertion / whatever to source records (source-references in Tom's terms) to say "merge these into one reference note" and by extension - do not merge others.

The last of those 3 would require extra items in the database to say which source-references should be merged and so would affect BG. But is this what we want in BG given that it impacts only the printed report and not the underlying data, plus it starts impinging on how reports are concocted?

Note please this - the underlying genealogical data is exactly what you want - the issue is only over the presentation.

Or to give another analogy - does anyone think it a good idea to give BG the facility to say what typeface at what point size a report should be created in? I hope not.

What I would like to see is - what is it that we might put into BG that would allow sophisticated reports that look a bit less like something written by a speak-your-weight machine? My previous post mentioned some thoughts. Another idea is that GEDCOM allows a note to appear with a "source reference" pointing from the fact / assertion / whatever to a source-record. This note can be switched on and off for printing - well, it can in my program. From memory at least one of your apps will allow TWO notes to be stored with a "source reference" pointing from the fact / assertion / whatever to a source-record. One - if printed - would go before the printed source citation inside a reference note, the other after the printed source citation.

Now - am I remembering right? Would this help? Is this standard or just your tweak through output templates? What else is there?
Andy_Hatchett 2011-12-28T09:09:15-08:00
ok- I'm going to weigh in on this...

Imho, we need to decide one thing. are we trying to develop a standard for precise and accurate transfer of data or are we trying to develop a standard for the presentation of that data.?

To my mind the presentation (reports, charts, etc.) should be left strictly to the vendors; all we should be worried with is that all data is transferred accurately so that they can do with it as they like when presenting it.
ttwetmore 2011-12-27T05:24:05-08:00
There do not seem to be any requirements in the catalog that apply to the idea of report generation. Should there be?
ACProctor 2012-01-21T11:16:12-08:00
DOI
Has anyone got any strong feelings on the effect of DOIs (Digital Object Identifiers) in citations?

I'm finding references to citations involving DOIs, e.g.

Schiraldi, G.R. (2001). The post-traumatic stress disorder sourcebook: A guide to healing, recovery and growth [Adobe
Digital Editions version]. doi:10.1036/0071393722

DOIs apply to anything, be it a man-made "creation" or a Person, Place, Event, etc. So, in principle, any source type could involve a DOI.

For a simple book, say, the citation could include an ISBN, a URL, a DOI, and probably other alternatives too.

The point I want to make at the start of this thread is that the DOI is suppose to locate meta-data describing the referenced item - this being for interoperability and the semantic Web. However, I've seen nothing that indicates how a citation (either a fully-formed one or the "essence" of one) might be derived from the DOI alone. I am wondering whether someone will eventually say "all we need is a DOI".

Tony
ACProctor 2012-01-21T11:31:45-08:00
...just adding a bit more as that sounded a bit brief on reading it back.

In principle, given the DOI above, it would be possible to return all the information needed for a normal printed citation reference.

Now, that reference could be simply stored as a fully-formed citation in the DOI meta-data... but in what style? Obviously, that would be the wrong approach.

Ideally, the meta-data should be capable of providing the separate elements that the citation needs, e.g. publisher, author, etc., for a book.

Form this point of view, it has a strong relevance to our Source+Citation discussions. I could be wrong but I see nothing in the "DOI Handbook" that suggests this should be a function of the meta-data.

Tony
ttwetmore 2012-01-21T14:21:45-08:00
Allowing DOI as a citation element should take care of it.
ACProctor 2012-01-22T03:42:49-08:00
That's not really my point Tom.

The DOI uniquely identifies an item, just like an ISBN number but a lot more general in terms of source types. However, all those other reference elements that we would expect to see in a printed citation reference are - allegedly - available from the meta-data at the end of a URL built from the DOI.

This raises a couple of important issues.

From an electronic point of view, all you need is the DOI. The other information is secondary and could be retrieved using the DOI. However, the DOI is a computer-readable citation reference, not a humanly-readable citation reference, and so is not a direct replacement. I gave an example of the DOI appearing as just another citation element in a citation reference but what is the relative merit of it in a normal printed form? A similar question could be posed for an ISBN number.

The DOI meta-data will contain the values that we would need for a printed citation reference. If we define elements that are not aligned with them then we're two separate worlds. For a simple book reference, I expect the alignment would be close because the source type is well understood. For the other types, though, then we're unlikely to be aligned because we have no connection with the DOI standard, and I can't see that the DOI standard prescribes any need for such a group of meta-data values.

In effect, the DOI standard doesn't seem to see the need for supporting humanly-readable citation references - possible because it was conceived by computer people and only for the electronic world of the Semantic Web etc. If the standard had been concerned with this requirement then they would be treading exactly the same ground as we have been, and similarly-quoting Shown Mills.

I just wondered if anyone else had thought about this topic much.

Tony
ttwetmore 2012-01-22T04:52:06-08:00
Tony,

Thanks. No, I haven't thought about this topic much. I have thought that BG should make available ISBN and ISSN citation elements for books and journals. And as you say, from these values it would be theoretically possible to retrieve other citation elements (I refuse to say metadata in this context) for the book or journal from some computer database. This would not stop me from supplying author and title myself as citation elements, and I would continue to expect that my chosen templates would be used in the generation of citation strings.

It was with that spirit I made my short comment at DOIs. It seems like user preference to me, not a BG policy. If a user has a DOI for some object and chooses to use it and to eschew all other properties of the object, assuming they will be supplied from an external service, that is their choice. If a user has a DOI and chooses to use it and add theoretically redundant properties for the object, that is also their choice. If such a user-supplied property does not agree with a computer retrievable property that is too bad, but not a deep BG concern.

Tom
ACProctor 2012-01-22T10:09:36-08:00
Thanks Tom. I'm with you on the use of the traditional citation elements. I believe they're essential and we should not be distracted by the availability of a DOI and/or ISBN and/or URL etc. These are all secondary from the point of view of a human reader.

However, in the specific area of DOI, my recent thoughts are leading me to believe that there's a lot of depth that's worth discussing. The most important issue being the relationship between a DOI item's meta-data (that's the official DOI term) and the citation elements that we need to form the "essence" of a citation reference (i.e. independently of printed style or regional settings).

I may trawl for a DOI forum and try raising the issue to see if anyone has thought about ring-fencing a subset of the DOI meta-data specifically for the purposes of supporting a humanly-readable citation reference.

I believe you understand my line of thinking Tom, even if you don't see the same relevance to our work.

Tony
gthorud 2012-01-22T16:33:25-08:00
Last spring I asked our National Archives about any guidance in citing their sources. One of the answers I got was: "Use an URI – all our cataloged sources have an assigned URI". Well, that and DOIs may be the solution sometime in the future, but as you have stated above, we should stick with our old approach of transferring the relevant citation elements in addition to the DOI.

One thing I note is: Yet another metadata scheme – "indecs". It is dangerous for us to choose to align with only one of them, they will continue to change. There will be a need to convert between various schemes if you want to fetch the metadata from a database pointed to by a DOI, ISBN or any of the other 10 or more identification number schemes. In my data model I have assumed that conversion templates can ease such conversion since you are likely to have changing sets of source types defined by BG, and changing external metadata schemes. But, the download and some parsing of that data must be done by a genealogy program or service, and should not concern BG. (For a functional model see http://bettergedcom.wikispaces.com/file/detail/From+repository+meta+data+to+BetterGEDCOM+and+reports.pdf )

I note that a DOI, at least in theory, can identify a figure, table or whatever inside a document.

Regarding storage of a DOI stored in a citation element. In my data model I have proposed a data type for this purpose, the Type Value pair. So rather than having a citation element for each identifier type, there are at least 10 of them (see e.g. the "indecs" specification), I suggest that you have one citation element type that could be called e.g. "Document identifier". That CE type would have a code list for the Type part, the list would contain words such as "ISBN", "ISSN", "URI", "DOI" and many more, that could be presented in a dropdown in the program where the data is recorded. The purpose is to reduce the number of fields that a user would be presented with, and it would simplify templates (and even a CSL like solution if anyone still believes in that). Having one CE type also makes it easier for a program to implement support for special processing of this CE, for example for using the contained identifier to access a database service – programs could have user configurable support for each identifier type.

Re. Tony's last posting, as with Dublin Core, I see little in DOI about how citations should be created based on the metadata referenced by the DOI. I see the term "persistent citation", but assume that is referring to the persistence of the DOI.
ACProctor 2012-01-23T05:48:27-08:00
Thanks for the long post Geir.

Apologies for being blunt but that person at your National Archives was talking through their ***! URLs, URIs, DOIs, are computer-readable citations. They are in no way a replacement for humanly-readable ones. You can't guarantee their persistence for one thing.

I agree that these might be optional elements in a citation but irrespective of whether you found the publication on the Web or in some digital library, there was still a title, and an author, and quite likely a publisher. They are the meaningful items we want to see.

Re: "I see little in DOI about how citations should be created based on the metadata referenced by the DOI".

This is part of my point. I don't think the DOI standard addresses this need at all. It seems to be concerned only with digital storage and access.

When I say "aligned with DOI's meta-data", it was only in terms of matching the element names/types in a subset of an entity's meta-data that had been reserved specifically for supporting printed citation references.

DOI is fairly unique amongst identifier schemes as it is not limited to documents, and not even digitised data. There is talk of using it to reference physical things such as Persons, Places, Events, e.g. a gravestone perhaps.

From that perspective, the provision of data 'supporting' a traditional citation reference becomes much more important. I'm not suggesting that they become involved in the displayed style (which is where CMOS, Shown Mills, etc., are relevant) but in the names and types of the citation elements. In that area, they would be right in the same predicament as we are although for slightly different reasons.

Tony
ACProctor 2012-01-24T06:48:13-08:00
I'm currently communicating with Dr Norman Paskin of the International DOI Foundation on this subject.

He's very interested but acknowledged that it's a huge area. He confirmed that DOI is a digital identifier but the objects being identified may not be digital ones at all.

After identifying our potential source types as a "large heterogeneous set", I placed our S+C goals between Shown Mills (which details the many printed forms but not the computer representation) and ontological schemes like the DOI one (which categorise objects but don't deal with printed citation references).

He gave me a link to the CIDOC ontology (http://www.cidoc-crm.org/index.html) which is compatible with DOI but covers many more object types - "gravestones" was one example I had provided.

I'm not expecting any solutions here but if it gets people thinking about the requirements then it's useful. If, for instance, there was a standard ontology that incorporated whatever citation elements as was needed for each source type, then it would simplify our problems enormously :-)

Tony
AdrianB38 2012-08-07T14:23:03-07:00
Shifting Repositories
Just been battering my head against an issue that has clearly been lurking around but I've never hit it before - and I wondered whether it has implications for BG's user requirements. The issue is - what happens if sources are moved from one repository to another?

The background is that in the UK, our local civil registration records are held at registrars' offices that, in the natural way, get reorganised every so often. The precious books then get shipped off to their new home - and it's not always the case that all books in the old office get sent to the same new one.

So, I have a certificate, copied from the original by Cheshire X office (in practice, we are not allowed to see the originals), under their reference number A/B/C, say. I'd duly entered this into my database with repository Cheshire X, call number A/B/C - let's not worry too much about whether that's the way you'd do it or not, the important thing is that those items will be in there somewhere.

Now, for various reasons, I was updating the source-record (i.e., what some call the master-source - I _think_) and I happened to recheck the details of this certificate on the site holding the indices. And guess what, the office has changed from Cheshire X to Cheshire Y, and the reference number from A/B/C to D/A/C (say). What to do about this?

Option 1 is ignore it - after all, if I'd never updated it, I'd have never known it had changed. And the source I used was indeed accessed from Cheshire X under reference number A/B/C.

But .... if the primary purpose of a citation is to tell you, the reader, how to get the info, then giving you an obsolete office and reference is as much use as the proverbial chocolate tea-pot. Therefore...

Option 2 is to amend the details for the source to reflect the new values. But firstly these new values aren't what I used and secondly I have better things to do than update umpteen references.

So I suspect that option 3 is the best thing - leave the source records alone (as per option 1) but ensure when I create the source record that the descriptive "title" contains enough description to enable the reader to find the source again without the reference number.

But I think I'd still like to link the Repositories somehow to say that Cheshire X has been closed and see Cheshire Y and Cheshire Z.

Any comments on the best way of doing this sort of thing and whether this brings in new requirements - e.g. relationships between repositories?
AdrianB38 2012-08-09T14:10:47-07:00
Tom - I did say the requirement for two repositories for one source record was "a low priority one, as it would seem pretty adequate to record one as the formal repository but add a note to say that another copy exists at the other place and that was also consulted."

I am aware of the convention that you don't bother to record a repository for a published book because it could be anywhere. But it's always given with the caveat that the older and rarer a book is, the more sense it makes to put that repository in, as you might have consulted the only copy so why not tell the world where it was? Especially if its location is NOT recorded in any on-line catalogue accessible to the general public?

The question is - is it worth formally recording 2 repositories? Or is 1 repository and a note sufficient? Like I said, 2 repositories is low priority and I'd only bother with it if we were working in that area for other reasons, otherwise I'd be happy to keep with one and a note. As I do now.
GeneJ 2012-08-09T15:13:08-07:00
@ Adrian,

You don't use bibliographies, as I recall. Do I remember that correctly?

For those who do, its possible there would be other considerations on the examples about title. (I presume the warning about duplicate repositories wouldn't effect the bibliographic entry).

Unless I didn't understand the example correctly, a title like, "1901 Census Entry: Pickstock, Thomas & Ada, Haslington" seems it would largely defeat some of the bibliographic benefit.
louiskessler 2012-08-09T15:21:22-07:00
GEDCOM allows any number of repositories for a given source:

SOURCE_RECORD :=
...
+1 <<SOURCE_REPOSITORY_CITATION>> {0:M}

where:

SOURCE_REPOSITORY_CITATION :=
n REPO [ @XREF:REPO@ | <NULL>] {1:1}
+1 <<NOTE_STRUCTURE>> {0:M}
+1 CALN <SOURCE_CALL_NUMBER> {0:M}
+2 MEDI <SOURCE_MEDIA_TYPE> {0:1}

So you can put notes on your use of that repository for this source, can include the call number (because it may be different at different repositories) and can include separate media, if you took some images from one and some from the other.

Louis
ttwetmore 2012-08-09T17:06:51-07:00
Adrian,

I hope I didn't offend. I was amused by some of the extremes I was led to think about based on the discussion.

You make a good point about the value of repositories for older and rarer books.

For me, and I believe for Louis, sources can have sources can have sources ad infinitum. A repository is "just" a kind of source. (A "source" is ANYTHING that contains or can provide genealogically significant information). In my models any source can refer to ANY NUMBER of MORE INCLUSIVE sources through source references.

So if you choose to have one source refer to two repositories I think it is solely up to your discretion to do so. Any model worth its salt will allow it to happen.

One ancillary point that I think is important. It is not the job of the model to force or require the user to adopt any particular methodology for handling their data. The job of the model is one of enabling the user to work within a well designed structure to handle their data in the way they see most fit.
louiskessler 2012-08-09T21:37:23-07:00

Like Tom said, I also say that sources can have sources and should be allowed to have sources. Tom heard about this from his mom who heard from Uncle Ben who saw it in the family bible which was written there by their great-grandfather who read about it on my website.

But I don't agree that a repository is a "source". To me, a repository is a place where sources are stored and is a place you can visit to find sources. That may be a library, a website, or Uncle Ben's brain.

GEDCOM defines:
SOUR {SOURCE}:=
The initial or original material from which information was obtained.
(Note: I believe "initial or original" should be removed from the definition.)
REPO {REPOSITORY}:=
An institution or person that has the specified item as part of their collection(s).

Also, in my earlier GEDCOM snippet for SOURCE_REPOSITORY_CITATION, I omitted including the very interesting commentary below it. GEDCOM says:

"This structure is used within a source record to point to a name and address record of the holder of the source document. Formal and informal repository name and addresses are stored in the REPOSITORY_RECORD. Informal repositories include owner's of an unpublished work or of a rare published source, or a keeper of personal collections. An example would be the owner of a family Bible containing unpublished family genealogical entries. More formal repositories, such as the Family History Library, should show a call number of the source at that repository. The call number of that source should be recorded using a subordinate CALN tag. Systems which do not use repository name and address record, should describe where the information cited is stored in the <<NOTE_STRUCTURE>> of the REPOsitory source citation structure."

It specifically mentions personal collections, which I don't think people reference often enough.

In my own research, I almost always make a copy of every source, and file them in a set of binders in my genealogy bookshelf. In my genealogy work, I first refer to my own item (source) in my own personal collection (repository). I then indicate the source of that source and the repository where I originally got the source from. If that source is a derived work, then I will document the source of the source of the source but without a repository, because if I would have directly looked that up, I wouldn't have to refer to the derived work.

The whole procedure of sourcing may sound convoluted due to my strange examples, but it really isn't. It is mostly just common sense.

Concerns like Adrian had in the first post of this topic are not an issue if the source recording methods are generalized as they are in GEDCOM. The sourcing system in GEDCOM has some problems, but it really isn't as bad as most people think. The true problem is that most developers ignored the GEDCOM spec for sources and went their own way - causing the problems we all have observed of source data not transferring properly between programs.

Louis
AdrianB38 2012-08-10T02:32:06-07:00
Louis said....
"GEDCOM allows any number of repositories for a given source:
SOURCE_RECORD :=
...
+1 <<SOURCE_REPOSITORY_CITATION>> {0:M}"

To which I said a rude word (about myself) - and then thought to check...

That's a quote from GEDCOM 5.5.1. It's {0:1} (i.e. one repository only) in GEDCOM 5.5. And of course, it's _not_ highlighted as a change between 5.5 and 5.5.1. And since 5.5.1 was only ever a draft, my software (Family Historian) sticks to 5.5, which is why I started down that route of multiple repositories!

GEDCOM version SNAFU!!!!
AdrianB38 2012-08-10T03:57:17-07:00
Gene - re Bibliographies. No, Family Historian doesn't produce them - I'm thinking in terms of a manually produced attempt at one. Yuk - sounds incredibly boring and easy to mess up. Anyway, the warning about certificates moving seems best placed to go there, rather than amongst individual footnotes.

Re contents of title / description for auto generated bibliographies - yes, that's actually another consideration for what would go into a title. (Let's not mention splitters and joiners, please?)

As for a title like "1901 Census Entry: Pickstock, Thomas & Ada, Haslington". Well, I didn't say it was a _good_ title. In fact I deliberately chose a title from earlier in my genealogical career for effect.
ACProctor 2012-08-10T05:30:45-07:00
Re: "I'm thinking in terms of a manually produced attempt at one"

Good word-processor packages can automatically generate a source list or bibligraphy from the set of citation references emplyed in that work.

A good genealogical tool should be no less capable.

Tony
AdrianB38 2012-08-10T06:56:34-07:00
"A good genealogical tool should be no less capable" - unfortunately, as I say, Family Historian doesn't and every other reporting package that I've tried is a shambles at producing reports - e.g. the notes come after _all_ of the events to which they are attached; or they don't tackle custom attributes; or they do all the standard events first and then all the custom events, making a disaster of chronology!

(Thinking aloud - I wonder if I can write some Visual Basic for Applications that could analyse the footnotes?)
GeneJ 2012-08-10T07:58:03-07:00
Hi Adrian,

Most of the programs that I've tested that produce bibliographies do a respectable job. I quickly pulled up one that came from a 2008 family group sheet.

http://bettergedcom.wikispaces.com/Bib2008

Maybe there should be a separate discussion about narratives. Here are two different examples of narrative reports that have been computer generated. The first example output using a chronology order; the second is more like Register style (which I suspect you refer to as "a disaser of chronology." :-), but it's probably my favorite.)

"Edward W. Gapsch" in "Terry & Nancy's Family History"
http://reigelridge.com/roots/gapsch.htm

"Winfield Scott Sackett1" in "The Sackett Family Association."
http://www.sackettfamily.info/g126/p126872.htm
GeneJ 2012-08-10T08:03:04-07:00
P.S. I should have said, "most ... do a respectable job, given the conflicts with sources/citations between GEDCOM other nonGEDCOM standards practices that we have oft discussed.
GeneJ 2012-08-10T08:12:32-07:00
To actually view the narrative section in the following example, please click on any of the names in the navigation links to the left on the screen.

"Edward W. Gapsch" in "Terry & Nancy's Family History"
http://reigelridge.com/roots/gapsch.htm
AdrianB38 2012-08-08T12:57:47-07:00
Louis - I agree with the usefulness of a source to source linkage.

Re your disagreeing with me entirely.... Actually I totally agree with your First / Second / Third points. I'd no idea the reference numbers had changed until yesterday and I'm not about to go through updating them, for those very 3 reasons, not the least of which is that the old references were the ones I used. Let's just say that we both agree that _a_ purpose of the reference is "to allow the reader to be able to check the writer's source" and frankly, yes, I agree that "If their source moved, it is up to the reader to find where it went to".

It's just that, out of the kindness of my heart, if I do realise that something has changed, somehow sticking a minimal effort suggestion in somewhere seems the least I could do.

I'd only add a 2nd pointer from the source-record to a 2nd repository if I were certain it's the same source, just in a different building, and I'd used it in both places. That might be in a case where I've used a few pages out of a book and then when I come to use it again (e.g. a City Directory), the book has moved across town (e.g. formerly separate Library and Archives have merged onto a new site.)
louiskessler 2012-08-08T16:22:03-07:00
Adrian,

If "out of the kindness of my heart" I donated $1 to every poor person in the world, then I'd be broke 10,000 times over.

The problem I have with Mills and other rules-based citation documentation system is that you are spending most of your time figuring out how to format your citation "properly" in order for it to "acceptable" by others. It adds overhead to what you are trying to do, which really is to document your source, the reference of how you found it, and the repository you found it in. Only people publishing a formal genealogy paper really need the meticulous structure of formal citations.

By forcing so much more on people, it dissuades them from documenting their sources. We need to do the opposite: make it as easy as possible. Think S-I-M-P-L-E.

e.g., in your case, the reference numbers changed. Yes, I'd make a note of it. As simple as possible. On the source I found that had changed, I'd add a comment with the new number. Then I'd add a note on the Resistrar's Office repository stating that "from time to time, some of their records get shipped out to xxxx, so check there as well."

Yes, sources can be in multiple repositories, especially books. You need only refer to the repositories you actually got your sources from. The 2nd pointer as in your example is fine, if you used the second one. But obviously you need not mention every library in the world containing that book.

Louis
GeneJ 2012-08-08T17:23:00-07:00
@ Louis,

Might I ask, who is trying to "force" Mills or Lackey or any style on you or anyone?

If you don't want to follow Mills or Lackey, you shouldn't have to, but why should your choice defeat my option?

What your statement doesn't seem to recognize is:

(1) a great number of folks who cite their sources want to do so consistently--and because sources are what their are, the guides and, if you will, templates are just .... err, helpful. (I realize you have your own consistent approach, but you'll understand if I chose something a little more authoritative as the backbone for the "consistent" approach I choose.)

(2) you think the time being spent is on "formatting," but that's not what takes my time --- I'm spending the time working to learn "about" the source.* (You might have this tucked into "document your source.")

(3) you suggest I want my work to be "acceptable" to others, as though it's about the length of my skirt. A lot of folks, me included, just want their consistent approach to be understood by others, even long after they are gone.

As you know, I didn't always cite my sources or cite my sources the way I do now. It's true that my cousin "encouraged" me to become a convert (If we are going to work on this together, then we have to ....); but I am, nonetheless, a convert.

Just a technicality, but respositories are usually not cited for published works.

As to Adrian's question. Things change. The internet arrived. Archives consolidate, people die. Right? In my work, the example I point to is the 100's of pages of probate files I rec'd from Columbiana County. Those files are no longer housed there, but that is where I got them. I'm not too worried about the fact that those files are now housed at a different archive as the modern movement of significant archives is generally discoverable. (That wouldn't prevent me from adding a couple of editorial brackets into which I wrote, "These materials are known as XXX and housed at XXX."

It's the less public and more private repositories that, at least in my thinking, are more complicated. Materials are handed down or inherited by others ... or OMG, lost (but you know what I'm really thinking.") If I learn what happened to things, the good old editorial brackets or just an additional note works there, too. Like and addendum.

*Ha! Like fingernails on the chalkboard . Remember the conversation on Randy's blog recently--about 10 seconds or 10 minutes. I spent 2-1/2 hours this weekend on a citation. Now, most of that time was trying to figure out if the informant/author, A. A. Geauque, was most likely Adolphius A. Geauque, and if A. A. was also most likely the brother of Augustus L. Geauque, but still, it was 2-1/2 hours.
louiskessler 2012-08-08T19:53:04-07:00

GeneJ:

I have felt that there's been a push here at BetterGEDCOM for formal citations to be included in the new standard. There are dozens, if not hundreds of discussion posts on the Wiki about it.

I feel it should just be sourcing thats included in BG, with citation templates being specified separately for those who want it, such as what was initially proposed with sourcetemplates.org.

All the new GEDCOM standard should have to define are the data elements that can compose a source. This is the data that needs to be defined. Assuming all source data can be transferred from one program to another using the new standard, then and only then can and should the various genealogy programs apply Mills or Lackey or anyone else's templates to them to format them properly. And if there is a sourcetemplates.org or equivalent available for those programs, then the citations can be produced in a standard way and look the same no matter what program you're in.

But the two, IMHO, should remain separate. The new GEDCOM standard should be for data transfer - not for formatting. Programs can and should be allowed to display data differently. Citation templates are just one way of displaying data (well, actually a multitude of ways, i.e. Mills, Lackey, Primary Reference, Footnote, etc.)

And you can cite your sources consistently without using a template. If the elements are defined, e.g. Page, Roll, Author, Publisher, etc., a simple delineated list will be both sufficient and complete.

I stated in the comments of my blog post http://www.beholdgenealogy.com/blog/?p=874#comment-378 that adding a source should take no longer than 20 seconds. You commented back stating that was basically impossible.

Yes it is if you force rigid structure, so it's got to be simpler. It's got to be set up so that the data can be entered quickly and easily, without the necessity of forms, templates, dropdowns, selection boxes and other distractions that make the process very cumbersome.

It doesn't have to be simplified this way for you or for the experienced genealogists who know the importance of proper sourcing. You're going to do proper sourcing, and include the complete citation, no matter how long it takes.

The ones who need it are the other 99%, the people who don't source now. It's got to be so easy, it would be stupid for them not to do it.

What's most important for the 99% is that they record all their sources so that, first of all, they can find them again, and second of all, someone else might be able to find them.

But citations, when blended into the sourcing, are an extra complication that makes the task more daunting.

Louis
ttwetmore 2012-08-08T22:25:37-07:00
Louis,

Couldn't agree more.
ACProctor 2012-08-09T02:27:45-07:00
Louis & Tom,

I totally agree here too. Nothing more to add from myself.

Tony
AdrianB38 2012-08-09T09:34:54-07:00
OK - trying to get back to the point at issue....

Firstly, the consensus is that I should document the stuff I used - and that's the only mandate that's on me. Agreed.

Secondly, things change and it's a case of "caveat emptor" (which I think means "buyer beware" but I don't know the Latin for "reader beware") and the reader is just going to have to deal with change. Agreed.

As for helping them along... IF for various reasons I discover things have changed, I consider I am duty bound to warn my future readers - but that's all it's going to be - a simple warning. Nor am I going to look for change. So against the record for the changed Repository, I'll add or update the Note item to say that (e.g.) "Books of certificates from Cheshire Registry Offices may have been moved from one office to another as a result of departmental and local government reorganisations, so repositories and reference numbers may change". If you get the full GEDCOM, you'll be able to see that note. If you get a report from the GEDCOM then I should remember to put that note into the Bibliography(?). Once is enough - I certainly wouldn't want to to appear against every reference note in a report referring to a certificate from a Cheshire office.

Earlier I talked about a relationship between Repository records - on reflection, while it certainly exists in real life, I can't see any value in modelling it - the note is sufficient.

If I happen to work out the new reference number, then yes Louis, as you suggest I'd just update the note against the source record.

Re sources possibly needing more than 1 repository. Yes, I'm talking only about documenting sources I actually used. If the source moved from 1 repository to another but I only used it at the first, then the previous paragraph applies. I'd just record the repository used, with an optional note if I find it's moved.

If the source moved from 1 repository to another AND I used it at BOTH repositories, I might as well replace the pointer from the source record to the repository with the new one. I'd also update the note to the source record to say "Book moved from X to Y, consulted at both places."

So far this can all be done without any change to GEDCOM.

The only scope I can see for a change is where I consult 2 different physical copies of the _same_ edition of a book at 2 different libraries. For instance, if I consult the 1901 edition of the Dundee Directory at both Dundee Central Library and the National Library of Scotland. City Directories are, I suggest, rare enough for it to make sense to record the Repository in the database and in any report. This is a potential requirement on BetterGEDCOM but, I suggest, a low priority one, as it would seem pretty adequate to record one as the formal repository but add a note to say that another copy exists at the other place and that was also consulted.

There is another flavour of 2 different physical copies of the _same_ edition of a book, and that's what do I do if I first of all consult the physical copy at Dundee and _then_ consult the digital images on Archive.com that I discovered later? I think in that instance, I'd probably just download all the pages that I previously transcribed and convert the source to a digital image of a book. So again, no change.
AdrianB38 2012-08-09T10:21:16-07:00
OK - but there is one further point to take away from this. I agreed that it's a case of caveat emptor and if you, the reader of my stuff, discover that the source has moved repositories and changed references, and I never warned you, then that's life and you're going to have to find the new location yourself.

But how?

The answer is "obvious" and "simple". The description I provide of the source has got to be enough to locate it again. Without the repository and their reference. Perhaps not necessarily first go. Suppose I describe a source as "1901 Census Entry: Pickstock, Thomas & Ada, Haslington" and give a reference but the references change. You can't then find the census by the reference but you could (say) search for Thomas Pickstock in the 1901 - you'd get about 3 or 4 in the UK, maybe 2 in Haslington because they're cousins but only 1 would be married to Ada. That's acceptable to me. (All theoretical examples).

But describing a source as "1901 Census Entry: Smith, John, London" is not what I'd call helpful. Clearly, I need to supply some more detail in that description, which goes into what GEDCOM calls the TITL (Source Descriptive Title in full). And do you know what? I have NEVER seen a definition of what should go into the TITL for an unpublished work such as a census. ESM does says that it's a description but I've not found a reference yet from her to say how comprehensive that description should be (though I can't claim to have read that much of her work). But this does suggest a measure, if you like, of how comprehensive that description should be.

I wanted to mention this because, even though it does not indicate any requirements for changes to GEDCOM, it does impinge on previous exchanges above. If I need to provide a description to go into the TITL that's comprehensive enough to allow someone to find it again, WITHOUT the respository's call reference, AND I've never seen this sort of document before, it's going to take me more than 20 seconds to understand the source enough to decide which are the important bits to put into the TITL. If I had an equivalent to ESM for UK practice, why wouldn't I want to use it to give me an immediate template of the crucial items? That's using ESM and similar stuff to understand sources. I don't care about which bits go in italics or whether the archivist's order is big to little or little to big. Only which bits are the right bits. We are in danger of missing ESM's value there and creating presentational issues where there are none.
GeneJ 2012-08-09T10:57:20-07:00
I think most of us would agree that a title is pretty important.

When a style is consistently applied, the actual citation elements (title, for example) take on the "language" of the style. Please let me call it "shorthand."

Although formatting seems oft suggested as "presentational issues" (something important to only some), it is actually part of that shorthand, especially for "titles."

That formatting helps to distinguish the generic titles we put on things (especially important because so many things we work with have no title) from a quoted or published title.

On the other end of the spectrum is the published title. Well, that appears in italics in Mills and several other styles. It's shorthand for check the publication date, then head to archives.org or WorldCat to track down that puppy.

While I know your question, Adrian, is a little different, many archival "files" containing unbound materials are good examples of stylistic things. Take a military pension file--I have several that are nearly 100 pages, all loose in the file and unnumbered. While most of the pages I work with have a date someplace, not all the pages are dated and only some have any kind of a title.

P.S. I already know the definition of insanity, so I'll let others respond more with their approaches.
ttwetmore 2012-08-09T11:54:10-07:00
This is getting so Platonic!! Is a book an ISBN number or is it a physical set of pages WITH an ISBN number? Clearly it's the latter. But if you really refer to two copies of the same book are you REALLY going to be so anal as to record it either once with two repositories (which strictly isn't correct) or record it twice with two repositories? Frankly when I refer to "just" a book, even if refer to many copies over the years, I record it once and don't assign it any repository at all. If anybody else wants to refer to the book it's best to let them find out where the nearest copy to them is located. Most important books for genealogists are slowly making it to google books and ancestry.com anyway.

Back when I was a junior genealogist I referred to a number of copies of the standard, 1861 volume on the Wetmore family. The one in the Harvard library had a pertinent hand written note in the margins that provided the catalyst clue that led to an incredibly rich discovery. So I have a note that mentions that hand written note. I don't t feel any need to do anything else.
AdrianB38 2012-08-09T13:45:25-07:00
Gene - you say " formatting seems oft suggested as 'presentational issues' (something important to only some)". Your bit in brackets is quite wrong. When it comes to printing out the reference note in a report or displaying it on screen, the presentation issue is utterly crucial. I may think the italic convention is lunatic (I do - why create a convention for words that you can't reproduce by hand?) but it's there, it's meaningful, and if you decide to follow Chicago or any similar form, if you get it wrong - or your software gets it wrong - then your report or display, however fragmentary or comprehensive, is plain simple wrong and so misleading. That's hardly "important to only some".

We are, therefore, in violent agreement over the importance of that convention - if we have committed to following that style.

What I mean by "presentational" is that if you look at the contents of your database or your GEDCOM - directly look at it, without any intervention from your application program - you won't see a single italic character. What happens on the GEDCOM is that the data-contents of the so-called title (which is actually sometimes a title and sometimes a description) are in a tag with a tag value of TITL. As I said, no italics. That's the data side of things. (Same thing in a database - no italics). The presentational side of things comes when the application takes the contents of the TITL, interrogates other bits of data to ask how this TITL of this source should be formatted, and displays / prints the data in italics (or not) as the >data< and the software dictate.

I can separate the 2 aspects - storage of data and presentation of data to screen or paper - because they happen at two different times. They are two sides of the same coin and the presentation depends on the data, no-one is saying otherwise. But they are 2 separate stages so can be separated because the software is actually separate lines of code. And if the software is separate lines of code, then us IT-types talk about the 2 stages as separate, err, because they are separate. But, to labour the point, one feeds into the other and both _are_ crucial.
GeneJ 2012-08-09T14:09:54-07:00
Thank you Adrian,

Ha! You might get a kick out of this.

There are still quite a few styles that underline published titles (rather than use italics).

I was under the mistaken impression that the change somehow related to hyperlinks. After reviewing some articles about this online, the change in say, Chicago, from underlining to italics, seems more that computers became commonplace.* Here's a link to one article, see the section "A Note About Underlining Titles."
http://www.writingsimplified.com/2010/03/titles-when-to-italicize-underline-or.html

*Which I suggest might be better described as word processing became commonplace.
ttwetmore 2012-08-08T01:09:50-07:00
Adrian,

Certainly a conundrum. How would this issue impact the design of a genealogical data model? Would you want to record the history of changes in repositories in your database, or just the one place where the source "was" or "is"? The former could be solved (in the model) by allowing sources to refer to more than one repository, maybe giving one of the repository references an "active" indication.
ACProctor 2012-08-08T01:28:31-07:00
I tackled this by distinguishing the "definite source" from the "indefinite source" (analogous to the 'article' in English grammar) and placing them on a chain with the indefinite at the head. It doesn't remove the issue that the repository may have shifted but it separates the reference data for the work (e.g. ISBN, author, title, census page identifier, etc) from the physical copy of the work or the physical location where you obtained it.

You might want to look at the DublinCore OpenURL usage Adrian. They use it to similarly encode a citation reference for a work but without indicating a physical source, and leave it "open" in that any participating library or archive can service the URL to retrieve a copy of the work when required.

Tony
AdrianB38 2012-08-08T03:26:41-07:00
Tony - I personally like the idea of an indefinite source containing the details such as author, ISBN, etc, linked to the physical / definite sources found in libraries, etc. It's actually the difference between an "edition" of a book, and a physical copy of a book. I fear, however, that it is somewhat of an abstraction for most people, especially when moving away from books where the distinction has clear meaning. In any case, as you say, this doesn't really tackle the point here that the repository has changed - it's still the same physical source, so we should only have the one source-record, but it's now at a different repository.

I will try to follow up on the OpenURL usage - might help with some ideas.

Adrian
AdrianB38 2012-08-08T03:44:16-07:00
Tom - I think your suggestion of "allowing sources to refer to more than one repository, maybe giving one of the repository references an 'active' indication" is the appropriate way to tackle this.

It's one physical source all the time, so should be one source-record (a.k.a. master-source?) throughout. But we want to be able to point to multiple repositories - one for the one where I found / consulted the physical source, and another for the one where the physical source now resides (apparently). Arguably you only need two pointers to multiple repositories for this purpose ("the one I used" and "the one you're going to use") but the data model might as well allow multiple pointers, e.g. in cases where a digital image is supposedly on Ancestry, FamilySearch and FindMyPast, though arguably that might take thing a bit too far. In all cases, you really need a status phrase - your "indication" - to explain what's going on.

I think I'd also like to have relationships at the repository level to act as a back-up, as it were, e.g. to record against the original Cheshire East repository that "The original Cheshire East Registers were split between Cheshire East and Cheshire Central in 2010". That would be appropriate to appear as a note in reference-notes when referring to the original Cheshire East repository.

Adrian
louiskessler 2012-08-08T10:28:51-07:00

Adrian said:

"if the primary purpose of a citation is to tell you, the reader, how to get the info, then giving you an obsolete office and reference is as much use as the proverbial chocolate tea-pot. Therefore..."

I disagree with this entirely. To me, the primary purpose of a citation is to tell you where the writer found the info.

This is to allow the reader to be able to check the writer's source. If their source moved, it is up to the reader to find where it went to and the reader can document that there is a new location.

First, you can't expect all writers to constantly be rechecking their sources to see if they moved. They'd have no time left to do any more research.

Second, if the writer updates his source location without rechecking it, he is doing you a disservice, because maybe the source changed in the process. Maybe it was transcribed or digitized or "corrected" or whatever.

The writer must leave his source as his original location he found it, even if the repository and source burns to the ground in a fire.

And just because the repository no longer exists, doesn't mean you need to change it. If Aunt Martha passes away, or loses her memory, you don't wipe her out.

Louis
louiskessler 2012-08-08T10:35:56-07:00

"But I think I'd still like to link the Repositories somehow to say that Cheshire X has been closed and see Cheshire Y and Cheshire Z."

I see no problem with a source being allowed to have its own sources. Doing so is especially useful for derivative works.

e.g.:

0 @S23@ SOUR
1 TITL Compilation of Deaths...
1 SOUR @S46@
1 SOUR @S47@

Louis