What's Wrong With GEDCOM?


A GEDCOM file is a generic genealogy file created by genealogy software specifically for sharing files with:

The problem is that GEDCOM hasn't been updated in 14+ years, yet main stream genealogy software has added more useful bells and whistles. So a GEDCOM file now wouldn't handle those "newly invented" options very well. Examples of GEDCOM file sharing problems are well documented in:


What does all this mean for the average researcher?


























It seems like we both need to get more detail on specific problems with GEDCOM and define what it is genealogists want to see for the future. This page is for detailing horror stories from working with GEDCOM files and detailing what sorts of things users want to see in their genealogy programs in the future.

One issue not necessarily apparent to lots of users is how strong an influence the GEDCOM standard has been in the development of genealogy software. The data model that GEDCOM uses is still the basis for almost every genealogy program available today. One major reason genealogy software has not become more flexible is that GEDCOM development simply stopped, making data transfer of new data types difficult if not impossible. For this reason, BetterGEDCOM discussions often revolve around data concepts that aren't likely to be in common usage for years. Nevertheless, given the kind of influence this standard can have, it is imperative we shoot for the moon.

So, what's wrong with GEDCOM? What won't it do that you want it to? What hasn't worked for you in the past? Please edit this page to add your ideas or contribute by posting your story using the DISCUSSION tab directly above.

GEDCOM Messes This Up

GEDCOM Won't Transfer This

I Want My Genealogy Software And BetterGEDCOM To Do This

Extensions from Applications


Comments

dsblank 2010-11-30T04:38:54-08:00
Stupid wiki
The wiki ate the page. I was making some additions here, and the wiki saved an empty page. Trying to get it back to its previous form.

And the advertising on this wiki is really annoying.
gthorud 2010-11-30T07:36:49-08:00
Do you remember what you did?

Maybe we could extend the content of the "Guidelines For Posting & Editing" page with tips about how to use the wiki.
dsblank 2010-11-30T07:40:04-08:00
I think I clicked on the save button twice.
gthorud 2010-11-30T07:43:47-08:00
I was just trying to edit the page and got a message saying "We have recovered an unsaved draft from 15 minutes ago" - I did not edit the page 15 minutes ago, someone else probably did.

Do i edit the recovered draft or do I ignore it?

Well, I just closed the window and will try to edit later.
dsblank 2010-11-30T08:00:46-08:00
You can abandon that. Thanks!
gthorud 2010-11-30T08:04:22-08:00
Is this our most important page?
I am really glad this page was created. It could become one of our most important marketing efforts.

Thus I think we should stuff it with as much functionality as we can think of - with a clear statement that everything may not make it into a standard.

I really appreciate all the info about Gramps, but I think - in addition to the links - we should try to describe all the main features on this page in a way that a non-tech person would understand, so the reader does not have to follow the links. Perhaps the page could summarize the features near the top, and further details at the bottom.

I will try to have a look at a couple of other programs to see what they have to offer.

Any thoughts?
dsblank 2010-11-30T08:23:06-08:00
I agree that this was a page that was missing, and should serve BetterGEDCOM well.

You can remove, move, change, or summarize the Gramps data in whatever way you want. I thought I would point out these discussions and extensions at Gramps, because we discuss related issues every week over there.

But from a different direction: we create the behavior that we want in the application, and then figure out where to put that information in the various exporters (GEDCOM, Gramps XML, Geneweb, Spreadsheet, SQL, etc).

If GEDCOM doesn't support a major feature we would like (like shared Notes and Events), we proceed very carefully. Sometimes we can workaround the issue with our importer (for example, by looking for identical notes in GEDCOM and sharing them). Sometimes, we just have to extend, knowing that we will lose data through GEDCOM.

I'm not an expert on these differences, but merely point people to these sections so that they can better understand them.

-Doug
AdrianB38 2010-11-30T08:24:47-08:00
Is the GEDCOM Standard ambiguous?
Quote from a discussion elsewhere: "Root cause for GEDCOM failure was mainly because it was so loosely written"

I remain to be convinced about that. The format of the GEDCOM definition is very close (if I recall correctly) to the Backus Naur form used to define ALGOL 60 and no-one ever accused that of being loose.

There may be some possible recursion, e.g. in the way that a source record can refer to a note record, which can in turn refer to a source record. But that's probably very legitimate unless you end up back at the same source record, which makes it a problem "between chair and keyboard" (user error!)

The loose bits are more likely to lie with the family history side of things and that may be down to the English language not GEDCOM (e.g. anyone care to explain the differences between baptism and christening (of a child)? And why some parish registers say one thing and then the other when apparently referring to the same ceremony? Last time I saw that discussion, we had a clergyman in the Church of England saying he couldn't say what was going on!)

So - anyone care to indicate where GEDCOM is ambiguous?

It makes a difference - if it's in the external definitions of the events or properties referred to by the tags, then we could be in as big a problem as GEDCOM.
greglamberson 2010-12-02T13:47:10-08:00
Adrian,

You are right to question the idea that GEDCOM failed due to the fact that it was loosely written. There are lots of things in GEDCOM that simply were badly implemented or even ignored by software developers. This doesn't necessarily mean that GEDCOM was loosely written. I think a lot of people fault GEDCOM for allowing notes and custom tags without defining their use. This might or might not be a particular problem in regard to sources and citation references. I think you'll hear those who say it's a huge problem and others who say, "What?" (I bet most say, "What?") On the other hand, the word network engineers use for something that works 99.999% of the time is "broken."

I do think any possibility of inappropriate recursion is a fault with the model. A good model defines the data in such a way as to disallow impossible situations. In such a case, I would hope a developer would provide rules in the application that would prevent such linkage. If a user were to enter two sources linked by a note in the way you're describing it, I would still fault the app developer AND the underlying data model (i.e., GEDCOM). If a user entered the data, it can't be wrong. There's no such thing. Data is data. I get nervous when anyone involved with technology blames something the user did or entered. They can only do what you let them!

Bob Velke's quote (where'd you get that, Andy?) is particularly interesting, particularly when he mentions, "... the exporting researcher doesn't know that his data has been mangled and the importing researcher doesn't either!" That seems a little bit important. But how does one deal with such issues? Solving problems like that involve follow-up and communication with developers during and after their adoption of BetterGEDCOM. However, we're simply nowhere near this point in the process.

Problems with GEDCOM manifest themselves as a combination of user data, a specific application that data is entered into, GEDCOM, and finally an application that the GEDCOM file is imported into. Even providing examples (the gathering of which is really the idea behind this page) is difficult. To provide proper examples, you've got to have each of the programs involved and the user on hand to explain to you what they expect and eventually the developers to explain how they are mapping the data transfers. In the face of such obstacles, we naturally throw up our hands and decide to fix the problem as we perceive and, resorting to simple trial and error. I hope we'll start to see some examples of some sort so we have better information to work with. Right now we're sort of hashing over all the work technology folks have done on this for the last ten years without any data to define the problems. Anyone with a technology background gravitates pretty quickly to certain concepts like the evidence vs. conclusion issue which have been discussed ad nauseum when trying to solve this problem. If we don't start seeking some other input and examine hard data, we can still plow forward and produce results. However, what does that do for the end user? They're likely to end up being as frustrated and unsatisfied as ever.
Andy_Hatchett 2010-12-02T15:02:15-08:00
Greg,

I knew Bob Velke had written quite a bit about GEDCOM and its shortcomings so I Googled. *grin*

That particular quote comes from The Wholly Genes Newsletter, Issue 2009 Number 9 dated 15 Oct 2009.

Here is a link:

http://www.whollygenes.com/forums201/index.php?s=c1a844647c1cddadf6a952ba7f10630e&showtopic=11968&pid=46439&st=0&#entry46439
VAURES 2010-12-11T23:09:29-08:00
Hi Greg
You wrote:
If a user entered the data, it can't be wrong. There's no such thing. Data is data. I get nervous when anyone involved with technology blames something the user did or entered. They can only do what you let them!

Why do you want to completely avoid misuse? Do you really think that’s possible?

Sometimes I like to compare the various genealogy programs that use GEDCOM as a basis with different languages using the same alphabet
There is not necessarily the same understanding of a tag with different (human) developers.
If you listen carefully to a conversation between two folks, you might find that they have a more or less different understanding of the meaning of single words. More so if people from northern Germany talk with people from Bavaria end even more so if they communicate in a language which is not their mother tongue.
You know part of the problem by using the Google translator (so do I once in a while). However I'm sure that the Google translator results need further interpretation.
My impression is that you're hading towards a car that should be appropriate to take your lady to the opera or a dinner in a five star restaurant and the next day transport cattle.
I have a 15 year loose experience with export data from one program to another. My experience is that a great share of errors and problems with information transport is due to users. Some a caused by programmers.
Take one example: Marriage: GEDCOM uses MARR but does not state whether this is meant for the civil act of getting married or the church ceremony. For the LDS temple ceremony they use ORDI.
In Germany before ~1880 there was no civil marriage possible, all marriages were contracted by church ceremonies. Nowadays you may use one or the other or both.
The programmer of my program used ORDI for the church ceremonies and MARR for the civil marriages. This is a peculiarity of his program, which I like very much but few other programs are able to translate (= import) this correctly.
Another example: I got data from a friend who uses the same program, thus import should be easy. However he used the field for source to enter notes. How do you think you can avoid this? He also used the tag MARR for all marriages (church only before 1880) and civil marriage thereafter not caring for the church ceremony. I admit that very often civil and church marriages are on the same day, but increasingly often (esp. in GE) they are not.

Kind regards (and excuses my misuse of your language)

VAURES
GeneJ 2010-12-11T23:52:00-08:00
Hi VAURES

Your example is great.

We need to find the best thinking about the sometimes fine balance between consistency and accuracy.

In my main file (uses "tags"), I have 478 tag types. Now, this is a file I've been working with for many years, and I couldn't possibly keep all those lesser used tag types straight.

I know some folks who like to keep census tag type by year. Here in the US, that means they have separate tags for 1790, 1800, 1810, 1820, 1830, 1840, 1850, 1860 ..... and so on. Many of those folks use sophisticate census entry/display systems, which are dependent on the tag being delineated by year.

I have read the GENTECH materials on this topic, and certainly understand the logic of having a set that is agreed upon. On the other hand....

Might not this be an area when innovation could play a roll? Would it be possible to have plug-ins or modules to accommodate particular* needs or user customization? (Did someone say apps?)

Thinking out loud. --GJ

*Dog pedigree, House genealogy, etc.
GeneJ 2010-12-12T00:29:16-08:00
P.S.

In the program I use, each "tag" is associated with a default narrative sentence. I'm assuming the tags at issue here are those that would create the life story section of a genealogical narrative, where there are few constructs.

I'll be very interested to see how Louis Kessler's word-processing-in-the-program goes. It sounds like a dream come true. Perhaps he'll chime in on how he'd approach this issue.

Separate from tags, this same issue is apparent in dealing with sources.

Even if we can't solve the 478 tag type issue in this go round with BetterGEDCOM 1, we might be able to tackle this issue with sources.
mstransky 2010-12-12T07:54:07-08:00
478 tag type issue

I suggest "soft text tags" over "hard encoded tags" The only person to responed to be about it was testuser42. I am not a english grammer major but I think testuser might put it in better words then myself.

also soft tags have way more benifits over hard tags when it comes down to app coding, searches, filters and 100% data transfers to other platforms regardless of the tag name.
AdrianB38 2010-12-12T09:32:20-08:00
Re VAURES and his German marriages:
This sounds intriguing, especially the bit about "very often civil and church marriages are on the same day, but increasingly often (esp. in GE) they are not". Am I right in taking this to read that some couples in Germany go through _both_ ceremonies?

If so, this is a great example of cultural differences leading to potential GEDCOM issues (and highlighting it is a vindication of the Wiki approach here!).

In England, any marriage in the Church of England (from 1837 onward) combines the religious and the legal aspect. There is a part that is religious only but the core of the ceremony is actually dual-purpose. The distinction between the religious and the legal can be seen in some non-conformist religious wedding, especially in the early and mid-1800s, when it impacted on who had to attend the ceremony and where it could take place, but it was still one ceremony. But in addition, England has had civil (logically, legal-only) marriage ceremonies since 1837.

Now, if some German genealogists wished to split civil and church ceremonies over 2 different marriage tags, because they take place on 2 different days, and have this replicated in the BG standard, for the purposes of greater clarity, this would be great for them but cause an issue for English genealogists since their civil marriages should be under one tag and their church marriages under another. And if the English succeed in restricting it to one tag, this causes issues for those German genealogists!

It may be that we can get round this specific example by giving tags a type and sub-type (e.g. <event type="census" subtype="1850"> ... </event>
or
<event type="marriage" subtype="civil"> ... </event>
but I think this will only solve some issues as we simply cannot expect to codify in the BG standard (however volatile and reactive it might be) the requisite events in all cultures. Somewhere, someone is going to use an event in a way that makes perfect sense to them and no sense to someone else. I think we simply have to learn to live with this and not imagine we can standardise these problems away. The key, in my view, is to identify those "tags" (i.e. types / subtypes(?) ) where the program potentially needs to understand the meaning rather than simply reformat the contents of the data to put it onto the screen or in a report and standardise those. The German marriage example (_if_ I got the implications right) may be very useful in testing any ideas for how to handle types of events as it seems to show a specific, not theoretical, cultural difference.
testuser42 2010-12-13T13:18:54-08:00
Adrian asked "Am I right in taking this to read that some couples in Germany go through _both_ ceremonies?"
Yes. IIRC, the civil cermony is the one that you have to have, in order to be married legally from the government's perspective. Most people also get married by their church, but not as many as used to. Sometimes these double ceremonies are quite useful: one can be held in the home town of the bride, the other in the groom's home town. :)
testuser42 2010-12-13T13:43:03-08:00
The idea of tags with (sub)types or classes is a great one.
It has been mentioned a few times by different people already. I think Christoffer was the first one here ( http://bettergedcom.wikispaces.com/message/view/Approaches+To+Standardization/30704913#30732699 ) and I think this is similar to what what Mike means by "soft tags".
I'm all for it!

If a user still decides to misuse a tag, then we can't do anything about it. Except maybe, to make sure that the data gets recorded exactly as entered. I.e. it may not be discarded or changed/interpreted. The software on the receiving end then ideally should present a list of problematic tags, and ask for a decision what to do with them.
AdrianB38 2010-12-13T14:04:48-08:00
OK '42 - I've added a bit in the section "I Want My Genealogy Software And BetterGEDCOM To Do This", trying to explain why type and sub-type are a good idea. Does this start to approach what you want?

(I'd get criticised at work for writing both a requirement and a solution at once so I apologise if anyone objects, but maybe an illustration helps)
testuser42 2010-12-13T15:07:01-08:00
Adrian, yes, thanks!
ttwetmore 2010-12-17T04:55:24-08:00
First, as to whether the GEDCOM standard is ambiguous. There are 14 responses in this thread and no answer to the question.

The GEDCOM standard would be ambiguous if one could create a GEDCOM file that could have two or more different valid interpretations. Has anyone come across such a thing?

Tom Wetmore
Andy_Hatchett 2010-11-30T10:43:46-08:00
Here is just one of many quotes- this particular one by Bob Velke...

[QUOTE]
For instance, photos, sound, and video are technically supported by GEDCOM but those specifications are so ambiguous and impractical that we don't know of any developer who uses them (as if embedding dozens of JPG files in a GEDCOM file was practical on its face).

Even more "standard" data types like the precedence of a person's multiple names or the relative birth order among his siblings (especially when some birth dates are unknown) often don't survive a GEDCOM transfer intact. For that matter, GEDCOM transfers frequently mangle incomplete/ambiguous dates, obscure the researcher's knowledge about a place (is "Washington" a city or state?), insert extraneous carriage returns into long notes, and disaggregate sources (is this "Johnson Family Bible" the same as that "Johnson Family Bible"?), among many other problems.

But come on -- how important are little things like names, dates, places, notes, and sources to genealogists, anyway?? Given the ongoing widespread acceptance of GEDCOM, even as genealogical software has long outgrown it, this is not a rhetorical question. What hope do we have that we will ever be able to use GEDCOM to faithfully transfer more advanced data types like the researcher's theories and rationale and new technologies like DNA?

But evaluations of GEDCOM usually miss the most insideous of its problems: the exporting researcher doesn't know that his data has been mangled and the importing researcher doesn't either! If you send a GEDCOM file to Cousin Mary, will she know which names, dates, places, and sources, have been mangled by the transfer? Will she know that, for a certain father and mother in your data, her importing program inferred a marriage even though you didn't record one? Will she realize how much other important data (like photos) you have collected? Probably not. Worse, she may publish your data, even though neither of you realize how badly it has been corrupted. And if she sends the data to another researcher, it will get mangled yet again!

Are you now reconsidering whether GEDCOM is the best way to archive your data for future generations?

Here again the argument usually shifts to who is to blame: the exporting program, the importing program, or GEDCOM itself -- but does it really matter? 20 years later, the problems are persistent and the LDS Church has indicated that the GEDCOM specifications have been updated for the last time. So who really expects GEDCOM transfers to improve in the next 20 years?
[END QUOTE]

My summation is that it was too loosely written and allowed wayyyyy too much leeway in interpreting the standard- otherwise we wouldn't have each software developer producing their own flavor.

In the world I come from a standard is tightly written and enforced and compliance to the full standard is mandatory- interpretation plays no part in it.
AdrianB38 2010-11-30T11:12:26-08:00
"who is to blame: the exporting program, the importing program, or GEDCOM itself - but does it really matter?"

Yes it does - unless we can analyse the cause of issues, we don't learn from the past and we are doomed to repeat the mistakes of the past. (We'll make a few of our own of course - that's called life)

If "compliance to the full standard is mandatory" - how can we assure that? Is there XML validation software that can be used?
AdrianB38 2010-11-30T08:32:57-08:00
Or is it that software developers can't read?
Or were GEDCOM issues because software developers can't read? (NB I can say this - I wrote software!!)

Frankly if the GEDCOM tag ends up in the wrong place (and assuming that the standard didn't move it over time) I find it difficult to believe the fault lies anywhere other than with the developer. HAVING SAID THAT... we need to make the new BG "Standard" as clear as possible - that means a clear and comprehensive data model to roughly industry standards for BG, because this describes (a) the logical structure of the files to be exported and imported and (b) the most obvious logical structure of the internal application files - if they mismatch the structure of the import / export file, then it'll be difficult to create a BG file in the application that hasn't lost something.

Presumably, it also means this project concocts the formal XML definition (which is where my memory and understanding gets lossy)
greglamberson 2010-11-30T11:02:18-08:00
Adrian,

I think the bottom line is, end users don't care. GEDCOM doesn't work as a way to export and import data for them in lots of cases.

Yes, of course we'll develop a DTD or an RNG spec or an XLS (XML Schema, not Microsoft Excel </snicker>) spec or some or all of these. I think about it constantly, with every single idea that is mentioned.

We must be precise in our definitions and specifications, because any minor discrepancy or ambiguity will be distorted horribly once it becomes something end users deal with. What the end user ends up with is all that matters.
mstransky 2010-11-30T11:13:23-08:00
Well said Greg I hope for a percise term to convey my thoughts to a "same level understanding" without wirting something from left feild.

I have a way to transffer all user input data without trying to define each data set for compliance. For now the only way I can say it is using Soft text tags over the norm of hard encoded tags that keeps and loses data in the translation from one to other platform.
AdrianB38 2010-11-30T12:35:58-08:00
"the bottom line is, end users don't care"
I totally agree, Greg.

But, painful though it is for those who want to rush to a solution (and yes, I started writing a data model myself), I think some time thinking about where the issues really are, is necessary. After all, in a Post Incident Review, users don't care whether it's a memory leak or degrading Oracle indices (nor should they), but the company does need to establish the root causes. (Don't run away with the idea that I'm a stickler for Root Cause Analysis - sometimes I feel the accepted Root Cause is a bit too arbitrary and I often joked that the true Root Cause was The Big Bang.)

I'm glad to see that you're thinking about things like DTDs, etc. Something like that probably needs to go into the Goals as desired outputs.
AdrianB38 2010-12-09T12:34:39-08:00
Certainty Assessment
Certainty Assessment: Current GEDCOM values 0 to 3 stand for
- unreliable or estimated (0);
- questionable (1);
- secondary (2);
- primary (3).
This confuses two concepts - we can have unreliable primary sources, so BG needs to split this into 2 items.
- primary or secondary;
- current(?) assessment of reliability;

Further possible concepts might be
- if this is an original or derivative
- if derivative, is it a transcript, abstract, extract, etc?

Values 2 and 3 from GEDCOM Certainty Assessment could update BG's primary or secondary code.
Values 0 and 1 from GEDCOM Certainty Assessment could update BG's assessment of reliability.
EssyGreen 2011-11-12T02:31:44-08:00
GEDCOM do specify that QUAY is only intended as a rough guide and shouldn't prevent researchers from using their own judgement. I think the concept of a quick visual indicator is a good one but I agree with you that there could be extra flags for primary /secondary, reliability etc

Perhaps more importantly tho' there is no ability to have more complex evidence assessment i.e. hypotheses which can span multiple sources/citations and people/events.
WesleyJohnston 2011-11-12T08:21:59-08:00
I have been using Ancestry.com's online trees as my master copies -- not because of their software but because it is shared in the cloud and will be there after I am not.

I have reflected uncertainty as text in an event. But often the text limit does not allow a full description of my analysis. And sometimes -- often -- my analysis really depends on multiple events and multiple people, such as untangling the Nicholas Brokenshires of the same places in Cornwall and Canada.

So I have really not used the certainty assessment values and have left the assessment entirely in text.

I suppose there are folks who do assign a value to every one of their sources, but I suspect most do not.
GeneJ 2011-11-12T09:43:56-08:00
If you search the wiki for "QUAY," you'll find various discussions. QUAY has been considered in the BetterGEDCOM requirements catalog.

A few links follow. (You'll want to search these pages for the term, as often once the topic is raised, it will be be commented on in subsequent postings, too.)

See the Req. Catalog discussion, "Source 02-Certainty Assessment (QUAY)"
http://bettergedcom.wikispaces.com/message/view/Better+GEDCOM+Requirements+Catalog/35761560

Page, "Shortcomings of GEDCOM," for discussion, "GEDCOM's QUAY; comments/feedback
http://bettergedcom.wikispaces.com/message/view/Shortcomings+of+GEDCOM/32262084

Page, "Application Data" for brief mention of discussion, "... comments or feedback"
http://bettergedcom.wikispaces.com/message/view/Application+Data/37008670#37064574

Page, "Glossary of Terms" for discussion, "Surety."
http://bettergedcom.wikispaces.com/message/view/Glossary+Of+Terms/35192272

Page, "Glossary of Terms," for "QUAY."
http://bettergedcom.wikispaces.com/Glossary+Of+Terms

Page, "Goals," for the discussion, "Negative Evidence."
http://bettergedcom.wikispaces.com/message/view/GOALS/30536663#30753513

Page, "Shortcomings of GEDCOM," discussion, "What's wrong with Sources?"
http://bettergedcom.wikispaces.com/message/view/Shortcomings+of+GEDCOM/31633551?o=40#35791432

Page, "Best Practices."
http://bettergedcom.wikispaces.com/Best+Practices

There were a few more I didn't post above. Hope this helps.-GJ
EssyGreen 2011-11-13T23:11:44-08:00
Hiya GeneJ

I'm a bit confused by all the links and (being a newbie) getting lost in these wikis ... is there one simple node I can use as the base for reviewing/participating in BetterGEDCOM topics?
ACProctor 2011-12-03T14:44:03-08:00
On one of the newsgroups I suggested using a rough percentage as a surety value, e.g. Surety=70%.

My rationale was that it allowed 'containment of relative merits', which was a waffly way of saying that it allowed one assessment to be compared with the sum total, as opposed to simply allowing one to be compared with another one. I don't think this was really understood on that newsgroup.

However, I mention it here because it presupposes that there may be multiple competing assessments for something. I believe GEDCOM has been abused in some quarters to provide multiple values for a datum with differing assessments. Are we allowing for this in BG?

I think it would be very hard to process, unless we select a primary one and relegate the others to a related element name [I'll try and contrive an example if that sounds a bit too vague]
louiskessler 2010-12-13T07:27:40-08:00
What's Wrong With Sources?
On the What's Wrong With GEDCOM page, under "GEDCOM messes this up" is listed simply "Sources" with no explanation.

Then under "GEDCOM Won't Transfer This" is listed simply "Sources", again with no explanation.

In my experience, sources and citations are very well defined in GEDCOM. They are saved by most programs to GEDCOM and are read correctly by most programs from GEDCOM.

I have seen thousands of GEDCOMs, and many of them have very extensive sourcing in them.

Sources and citations are defined very openly by GEDCOM, and as such are able to handle proper sourcing techniques such as those developed by Richard Lackey (Cite Your Sources) or Elizabeth Shown Mills (Evidence - Citation & Analysis for the Family Historian).

I don't feel it is up to GEDCOM to try to incorporate sourcing techniques. That is up to the software. GEDCOM just needs the capability to transfer the info.

So my question is, why is it felt that:

(1) GEDCOM messes up Sources, and

(2) GEDCOM won't transfer Sources?
GeneJ 2010-12-15T13:51:14-08:00
@ Mike:

Unless we are prepared to write a series of 800 page tomes, I suggest we do not want BetterGEDCOM to be characterizing othewise un defined "groups" of elements.

The thread above discusses identifying those elements Mills uses in bibilographic entries (populates a "source list") as such entries are thought to be more standardized world wide (ala, WorldCat.org).
gthorud 2010-12-15T15:18:23-08:00
Thank you Adrian.

When separated into at least 3 groups (source, where in source and repository), the list is not that big. I will have to have further definitions in order to sort it out in detail.

About 10 years ago I started to implement an "all singing and dancing" source database - I gave up because it became too complex. One thing I would not recommend is to "link" different editions of a source, eg. on different media. But a 2 level model may be useful in order to handle eg. an article in a book or journal, but I don't know how that fits with Mills etc.

Also, I see a need for at least a short title.
mstransky 2010-12-15T15:34:30-08:00
"But a 2 level model may be useful in order to handle eg. an article in a book or journal"

It is great that we ref an article or jornal from an actual source. But I still see a source say newspaper. if you excerpt data from pg3, "Tony's Life on the Rocks" the article still a part of "Newsday Paper"
Just like a census image 3 of 19 is still part of a collection of a whole.

I hope to get a few examples froms others and show how I break it down. for better db storage and hopefully cover a universal way to group them.
GeneJ 2010-12-15T15:53:29-08:00
In addition to the 1997 Evidence (124 pages), I have _Evidence Explained ..._ (2007) in electronic form; it's 885 pages.

I don't have the second edition of Evidence Explained; nor do I have her most current "Quicksheet" publications.
GeneJ 2011-03-12T13:08:28-08:00
I'd like to see the fields available for recording reference note and source list information expanded in BetterGEDCOM.

For example, the "online source" has come of age. Rather than have to share a field with some other component in the GEDCOM's source or citation groups, online sources should have their own group of fields, just as "repositories" have historically had their own fields.

Also separate from the concept of repositories, can someone help me build a more complete list of the "fields" GEDCOM does recognize in creating the "Source_Record," described as, "used to provide a bibliographic description of the source cited."
http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gcch2.htm#S4

We also needs a field for recording the source-type (digital image; image copy, etc.) something other than an original record is reviewed.

GEDCOM provides one field for "short title" (SOURCE_FILED_BY_ENTRY, aka ABBR). GEDCOM defines the field as, "This entry is to provide a short title used for sorting, filing, and retrieving source records." Unfortunately, many genealogists need to use a "short title" for the second reference note and still want a separate field to organize their sources.

More to come. --GJ
AdrianB38 2011-03-13T10:56:35-07:00
Gene - "can someone help me build a more complete list of the "fields" GEDCOM does recognize in creating the "Source_Record," described as, "used to provide a bibliographic description of the source cited.""

I'm not wholly certain here whether you're talking about the fields used to create a Source record in GEDCOM, or the fields used to create the bibliographic description in a printed report so if I get it wrong and tell you something that's obvious, then apologies.

In the GEDCOM files, there are essentially 3 things - the Repository Record, the Source Record and the, ahem, excuse me, Source-Citation structure that hangs off the event, individual or whatever, and points to the source record.

The Source record contains (summarising the text from GEDCOM 5.5):
- an optional title of format judged to be appropriate - thus GEDCOM says that a magazine article title might have the article title plus the magazine title;

- an optional short title intended to be used inside the program

- optionally, 1 or more event types, listing the types of events recorded in the source, each with a possible date and / or place;

- optionally the RESPONSIBLE_AGENCY - the body in control at the events(?) described in the source;

- an optional SOURCE_ORIGINATOR. Theoretically not to be confused with the above - the author / creator of the source;

- an optional SOURCE_PUBLICATION_FACTS - when and where published or date and place created if not published;

- optional text from the source, intended to be verbatim;

- an optional note;

- an optional pointer to a Repository record including and optional set of Source Call Numbers used to retrieve the item there, and against each of those, an optional SOURCE_MEDIA_TYPE;

- optionally 1 or more USER_REFERENCE_NUMBERS, each with an optional USER_REFERENCE_TYPE. These are intended to be for the personal use of the user.

In addition there are links pointing to multimedia and internal record numbers.

The Source Citation structure consists of:
- a pointer to the source record;

- an optional page number within the source, or volume and page or issue and page, or census ED and page...;

- optionally, the event type that's being used here;

- optionally, the role of the individual (that this fact is recorded against) in the event;

- optionally, the date the details were entered into the source;

- optionally, the relevant bit of text;

- an optional note;

- an optional certainty assessment

In addition, an optional link to multimedia.

It is possible that you knew all this from the manual, of course.

As for the fields used to create the bibliographic description in a printed report, then my guess is that each and every GEDCOM program could use a different set.

The software that I use (FamilyHistorian) concatenates various of the items for its footnotes, viz:
- the author,
- the title in quotes followed by
- the publication data in (),
- (optionally) the text from the source,
- (optional) note;
then from the Source Citation structure,
- the page number within the source, or volume and page or issue and page (etc),
- date,
- assessment,
- (optional) note,
and finally from the source to repository details,
- the call number, the repository name, repository address, repository web-site.

FH doesn't have a bibliography in its reports so far as I know but it does use "IBID" in 2nd citations in some fashion that I can't find an example of.
AdrianB38 2011-03-13T11:05:11-07:00
Gene - I'm not sure if that lot is what you wanted to know. Curiously when I read the GEDCOM manual, one or two of the items weren't as I expected, specifically the title and short title. I'd thought one was simply an abbreviation for the other. Instead, one is what's written down the spine of the book (if it were a book) and the short title is that of the source-RECORD. Two quite different concepts - especially if anyone, like me, has collected all sources of one type together by prefixing their title with "Book:" or "Marriage entry:" or... And I prefixed the title, when I see now that I should have filled the short title in for them all and prefixed that....

Frankly, I see now why you guys write the bibliography, first and subsequent footnotes out manually... You'd never predict what comes out the other end of the application.
GeneJ 2011-03-13T12:35:33-07:00
HI Adrian:

Thank you.

To my knowledge, most applications allow users to fashion a reasonable citation--but it's not uncommon to have to learn creative approaches.

As an example of the problem, using GEDCOM itself-- I note the source/record type (digital image, image copy, etc.) consistently. GEDCOM doesn't have a corresponding field, so my information might be GEDCOM's "PAGE" or it might share a space "PAGE," or "TEXT." Ditto, "Enumeration District," "Source of the Source," analytical comments, collection name, URL, access date and access year, keywords....

I'm at least trying to develop a better list of fields for use in reference notes and source lists, drawing from information in GEDCOM 5.5, Evidence_Explained (2007) and the software programs I am able to access.

Over the last week, I've been extracting "source" specific parts of GEDCOM 5.5 to a personal spreadsheet. (The structure of the Source, Repository, etc., as well as the individual component entries.)

Some time ago, I started working to normalize information from Evidence_Explained onto a different personal spreadsheet. That spreadsheet is being developed from a rather extensive third party worksheet (based more on "templates").

To the extent I have access, have also been looking at the sourcing systems/UI of some genealogical software. I'm most familiar with TMG, have had some experience with GENBOX and can peek at a few others.

My work with Evidence Explained may take a while. I'll share more on GEDCOM as it gets polished up.

P.S. While having nice clean field descriptions would an application hit for me, those aren't manual reference notes I'm usually sharing on the Wiki--they are printing from TMG.
GeneJ 2011-03-13T12:52:30-07:00
Adrian wrote, ""Marriage entry:" or... And I prefixed the title, when I see now that I should have filled the short title in for them all and prefixed that...."

:)

BetterGEDCOM shouldn't as users to choose between organizing their sources and managing their subsequent reference notes.

It's a little like QUAY, though--we want to make improvement without breaking existing content.

My ABBR prints out on WorldConnect. So we find, "MAR-NH," "DTH-MA," "BK-OH," "CEN-1850-NH-GRA," etc., etc.
GeneJ 2011-03-13T15:03:11-07:00
@Adrian,

Believe I saw the ibid selection on the narratives / options (In Family Historian).
AdrianB38 2011-03-13T16:06:31-07:00
Yes, its ability to alter the footnote item is mostly confined to including items in or out with no conditions based on a type or anything. Well, not that I've seen.

"Ibid" is included as one of the other options.

Because it builds up the on-screen details of a source and "citation" from the individual GEDCOM items, I don't see the text of a footnote (actually - end-notes) until I do a Print Preview on a report.
GeneJ 2011-03-14T06:52:23-07:00
For some programs, user sees the tag sentence and output of the reference note(s) when the tag is selected--user inputs the reference note values on another or separate screen.

As I recall, GENBOX's UI included that feature early on.
GeneJ 2010-12-14T12:04:53-08:00
@Adrian:
You wrote, "This is important - if this were a project, this page would form a MAJOR input to the requirements specification"

Doesn't this information fall into the category of "Why BetterGedcom" and also, "how do we keep BetterGedcom better?" If so, isn't it a major part of our work.

You wrote, "So - maybe users propose possible issues and the more technical of us, try and highlight where the issues are."

Excellent. We have some information on the Build a BetterGedcom blog, using a FTM Test project as the basis of those comparisons. We've talked about finding one or more projects (genealogy files) that all of us feel is right sized with the right characteristics (family circumstance, differing points in the research cycle, range of source techniques, range of different program features) ... Humm... did I just say the perfect test file. hahahaha

As always, thoughts and suggestions welcome.

P.S. BTW, google "TMG v7 Sample Project Margo Fariss Brewer" (without quotes), she created these very cool charts about the tag types in the TMG sample project. Snippet below (which you probably can't read from your cell), I haven't corresponded with her to confirm, but it appears this work was done on the TMG Sample Project.
AdrianB38 2010-12-14T14:32:14-08:00
Louis re "Building Mills methodology into GEDCOM is wrong ... "

Totally agree with you - we must be able to store the "citation" data at its elemental level for reconstruction into whatever format is desired. That's why I wrote "if you pass me a BetterGEDCOM of your data, I should be able to totally restructure your citation data to conform to my personal template"

The valuable thing you say is "The 21 elements you talk about must be user-definable". I guess some will be so common that they come with the BG standard (e.g. publisher's name, location, date, etc), but there will always be oddities - e.g. enumeration district may be useful for an American census, but not an English, where the accepted method is to use The National Archives' referencing system. I'm not expecting BG to trawl the archives of the world to add all their referencing systems!
AdrianB38 2010-12-14T14:56:02-08:00
Louis - re "Family Historian does is wrong. They should not be using the _FILE custom tag"

Ah, the perils of illustrating something when you haven't checked it's just plain vanilla. Yes, I'm not sure why FH don't use the FILE tag. Anyway, the only point of that clip was to illustrate how one _might_ handle an image.
AdrianB38 2010-12-14T15:00:51-08:00
GeneJ
I wrote "This is important - if this were a project, this page would form a MAJOR input to the requirements specification"

And you replied "Doesn't this information fall into the category of "Why BetterGedcom" and also, "how do we keep BetterGedcom better?" If so, isn't it a major part of our work?"

I wholly agree - it is. I'm just trying to emphasise its major nature (because it's a bit of a thin Wiki page at the moment!). The reason for the "if" was just to say that if we were following formal methods (as distinct from effectively brain-storming at the moment), then that's where we would be using the output from this page - and the ones you quote.
gthorud 2010-12-14T17:07:05-08:00
Louiskessler wrote

”The 21 elements you talk about must be user-defineable” …. and …. “Those 21 elements should not be defined.”

I must admit I don’t know much about Mills or Lackey – we have managed without making citation a science here – but I think that BetterGedcom has to define as many as possible of the 21 elements (and maybe others from other schemes) - if not, we will create a chaos.

User defined elements may be useful, but they are likely to be a source of incompatibility. We need to define how we expect a program to handle an unknown user defined element.

I don’t expect to see a standard the first year, so we could spend a year on this – if needed.
GeneJ 2010-12-14T17:32:15-08:00
Are history-like discipline citations somewhat standardized world wide (I'm thinking WorldCat.org, for example.)

If so might we consider providing more definition for an array of elements needed for source list/bibliographic entries (sort of like old Gedcom set out to do).

Would be nice to move beyond terminology issues some see with existing GEDCOM. For a more humorous take by Ancestry Insider on this issue (May 2010) when asked by Adrian Bruce :) for a clarification, "Of Sources and Citations: All Bets are Off."
http://ancestryinsider.blogspot.com/2010/05/of-sources-and-citations-all-bets-are.html

In the posting, Ancestry Insider traces the mixed terminology back to PAF. From the blog (quoting):

Let me summarize. PAF calls a bibliography citation a source. And it calls locator information—a portion of a reference note citation—a citation.
Let me say it another way. PAF uses source for something that is not a source and citation for something that is not a citation.
Yes, they meant well. But this error has propagated to subsequent genealogy programs (which also faced the lack of a term for locator information).
Non-genealogists use the terms source and citation and they understand one another. Genealogists use the terms and all bets are off.
GeneJ 2010-12-14T21:34:06-08:00
Errr... I meant, "Are history-like discipline bibliographic entries somewhat standardized world-wide (I'm thinking WorldCat.org, for example.)
Andy_Hatchett 2010-12-14T21:54:56-08:00
History-like bibliographic entries do tend to be somewhat standardized, but... they rarely contain "locator' information- that is left to the citation; and when it comes to standardized citations- "all bets are off!"

:)
GeneJ 2010-12-14T22:42:14-08:00
LOL

Yes. I don't use the term "locator," and usually have many more elements in my 1st reference not (citation) than I do in my bibliographic entries.

@Andy, when you counted the elements in EE, did you include elements for abbreviated concepts used in the second reference note for things like of things like title (ala, short title).
gthorud 2010-12-15T07:37:36-08:00
It would be nice if someone with access to the relevant books could make a list the 21 - or so - elements. There are few in eg. Europe that know these books.

Are there any programs that currently supports all or most of the 21?
AdrianB38 2010-12-15T11:57:24-08:00
The basic elements that _might_ go to make up an ESM style citation / bibliographic entry, etc are (roughly speaking):

1. Name of author / compiler / editor, etc
2. Title of book / document / film, etc
3. Publication place
4. Publisher's name
5. Publication year
6. Publication date (month & year)
7. Title of article or item etc
8. Volume number
9. Publication number (film or disk)
10. Page / frame / folio
11. Document / file name
12. Document / file number
13. Collection name
14. Collection number
15. Repository name
16. Repository location
17. Enumeration or document date
18. Enumeration district name / number
19. Dwelling / family or line number
20. Date of record creation / filing

Add the type of source for the 21st.

In no way is this meant to be a full description of each item - lots of caveats, clarifications, etc., are omitted. This list is from my copy of Elizabeth Shown Mills' "Evidence! Citation & Analysis for the Family Historian" (1997) - _not_ Evidence Explained which I don't possess. That may have more - or less - items.

I've no idea whether any of these are present in short form (e.g. for use in short citations) as well as the full long form (for use in the first, full citation).

Nor would I care to say how a digitisation of a book might end up - seems you might have 2 each of everything (1 for the book, 2 for the digitisation) - and as for a digitisation of a microfilm of an original, I don't know how many titles it would be appropriate to give that.
mstransky 2010-12-15T12:10:42-08:00
Do you see that there is a way to break this down even further?

I ran into that problem, look at breaking it into 3 seprate data area,

1) the physical doc source
2) the obeserve data from within the source.
3) the repository data where the source is stored or held.

This would help store data and help not duplicate the same data too many times.

this is just a suggestion and thought.
AdrianB38 2010-12-13T09:14:32-08:00
"Some people mean that it won't handle the image of the source attached to the source-citation"

Andy - what you've written is fine as far as it goes, but given that GEDCOM has a MULTIMEDIA_LINK from the SOURCE_CITATION structure, I still have to be boringly pedantic and say that we need to understand whether the issue is with GEDCOM itself or with the application's interpretation of it.

So, in the case of "it won't handle the image...", we need to know what "it" is.

I really need to apologise to anyone who's seen me write this before, but it is utterly crucial in these sort of discussions to distinguish between issues with GEDCOM and issues with the application's interpretation of GEDCOM. We can fix the first with BetterGEDCOM - we can't fix the second.

Russ also highlights a 3rd issue - is the standard clear? However, be warned - sometimes things are complex because they need to be. Whether they are as clear as they can be is a different issue.
Andy_Hatchett 2010-12-13T09:33:00-08:00
Adrian,

Let's say I have a source citation to a page in a family bible and in my software I have an image of that page attached to the source citation.

GEDCOM doesn't handle images so the only possible multimedia link would be to that image on my hard drive- which isn't going to help whoever receives the GEDCOM my program generates see the actual image.
louiskessler 2010-12-13T09:50:34-08:00
Yes I agree. Images are not properly handled by GEDCOM. But that does not mean Sources are not handled.
Andy_Hatchett 2010-12-13T10:11:03-08:00
Louis- If I consider an image as part of the source-citation (and I'm sure that some people besides me do consider it so) then, for me, GEDCOM does *not* handle that source correctly.

What happens when some bright developer lets the image be the actual source-citation (which I would dearly love!) rather than a mere attachment to it.
GeneJ 2010-12-13T12:28:17-08:00
Adrian:

You wrote, "... utterly crucial in these sort of discussions to distinguish between issues with GEDCOM and issues with the application's interpretation of GEDCOM."

How can users help do just that?

Is it advisable to set up a wiki page, "Sources to GEDCOM," on to which we work to define "our" understanding of that interpretation? We could post test results, each to a separate discussion to that page.
AdrianB38 2010-12-13T13:07:11-08:00
"the only possible multimedia link would be to that image on my hard drive"

Andy - excellent point now it's clear. FYI this is how FamilyHistorian from Claico Pie handles it:
0 @O1272@ OBJE
1 FORM jpeg
1 TITL Census Entry: Charity, Charles & Hannah, 1851, Haslington
1 _FILE Media\Sources2 InFH\Census\1851\CharityChasHannah-Census1851.jpg
1 _DATE 1851
1 _KEYS Picture
1 NOTE
2 _ASID 1
1 CHAN
2 DATE 22 JUN 2010
3 TIME 17:17:31

As you can see, it uses an extension tag (_FILE) meaning that while it understands it, it's pot luck if anyone else does. So this specific issue is about linked multimedia objects, which can pop up anywhere, not just in sources. I'll see if I can concoct a specific form of words, now we've sorted this.
AdrianB38 2010-12-13T13:19:29-08:00
OK - see page. I've edited the page(mostly) to read thus:
- Supporting documents/images (partly an issue of poor implementation). Where multimedia documents, images and other objects are linked, there is no overall protocol for how they and the GEDCOM that refers to them, shall be transferred. As a result, any user receiving them may need to edit the file references in the GEDCOM on receipt. The overall protocol must facilitate the transfer of GEDCOM and linked objects without manual editing.
- Sources (specifically see "Supporting documents/images")

All comments gratefully accepted.
AdrianB38 2010-12-13T13:42:51-08:00
Gene re "... utterly crucial in these sort of discussions to distinguish between issues with GEDCOM and issues with the application's interpretation of GEDCOM."

You asked (very sensibly) "How can users help do just that?"

I think we'll be here for ever if we try to refine our understanding of GEDCOM. We need to hit the major areas where GEDCOM has issues then we'll know what to fix in BetterGEDCOM (where we can create our own issues!)

To be practical about this - it would probably help us all if people posting potential GEDCOM issues would bear in mind that the issues might not be with the standard but with the interpretation of the standard or programming of the application. Because one we can fix, the other we can't.

I'm going to sound really arrogant here, but there's then a next step - if you can't read the GEDCOM standard or get confused by it, then you're in no position to decide whether it's the standard or the application. There's no shame in not being able to make that decision - we each of us have things we can do and things we can't. I can't hear themes in music, for instance. What I can do is try and help interpret that standard - not because I'm a PC programmer (COBOL yes, C++ no) but I can read logical stuff and I would hope there are others in this Wiki who can do similar.

So - maybe users propose possible issues and the more technical of us, try and highlight where the issues are.

(This is important - if this were a project, this page would form a MAJOR input to the requirements specification).
Andy_Hatchett 2010-12-13T14:13:45-08:00
[re: What's Wrong With Sources?
AdrianB38 50 minutes ago
OK - see page. I've edited the page(mostly) to read thus:
- Supporting documents/images (partly an issue of poor implementation). Where multimedia documents, images and other objects are linked, there is no overall protocol for how they and the GEDCOM that refers to them, shall be transferred. As a result, any user receiving them may need to edit the file references in the GEDCOM on receipt. The overall protocol must facilitate the transfer of GEDCOM and linked objects without manual editing.
- Sources (specifically see "Supporting documents/images")]

I think that covers it quite nicely!
Thanks
louiskessler 2010-12-13T16:53:46-08:00
Adrian:

Building Mills methodology into GEDCOM is wrong. Building Lackeys or anyone else's is wrong. Methodologies change, get improved, new ones come out, etc.

BetterGEDCOM is hopefully going to be a place this information can be permanently stored. It must be general to accept any methodology. The 21 elements you talk about must be user-defineable, and it should be up to the programs to define them and adapt a methodology if you want. Don't force one methodology into BetterGEDCOM.

Those 21 elements should not be defined. But an element item should be defined so that the specific element can be entered as data. Then attributes for that element and their data values can be enetered as well. That way new elements can be added at any time and their custom attributes can be defined as well.

I guarantee it would take a year for a group like ours to just hammer out the specs to incorporate the different methods of citation and analysis. We should not get into the trap of doing this, because there are 100 other things like this that can be defined in detail ad nauseum: Tags, Date types, Place specs (some people want us to add a gazetteer - egads), groups, different family types, you name it.
GeneJ 2010-12-13T17:01:11-08:00
Experienced users are likely to have two or three different systems in use in a given database, even if that only reflects changes in industry practice over time.
louiskessler 2010-12-13T17:03:34-08:00
Adrian:

Family Historian does is wrong. They should not be using the _FILE custom tag, but should be using the standard and known FILE tag. However, smart programs would know to used FILE for _FILE.

Here is how it is defined in GEDCOM:

MULTIMEDIA_RECORD:=
n @XREF:OBJE@ OBJE {1:1}
+1 FILE <MULTIMEDIA_FILE_REFN> {1:M} p.54
+2 FORM <MULTIMEDIA_FORMAT> {1:1} p.54
+3 TYPE <SOURCE_MEDIA_TYPE> {0:1} p.62
+2 TITL <DESCRIPTIVE_TITLE> {0:1} p.48
+1 REFN <USER_REFERENCE_NUMBER> {0:M} p.63, 64
+2 TYPE <USER_REFERENCE_TYPE> {0:1} p.64
+1 RIN <AUTOMATED_RECORD_ID> {0:1} p.43
+1 <<NOTE_STRUCTURE>> {0:M} p.37
+1 <<SOURCE_CITATION>> {0:M} p.39
+1 <<CHANGE_DATE>> {0:1} p.31

Andy: Sources should simply refer to the multimedia file of the image, e.g.:

1 SOUR @S15@
2 OBJE @O95@

That is not a weakness of the Source or the Citation. In fact, I see that as a good feature, rather than a problem. Objects are entities and should not be embedded within any other entities. They should simply be referred to via links.
hrworth 2010-12-13T07:31:39-08:00
Louis,

Great questions.

I have a couple of examples on the BetterGEDCOM Blog.

I am still investigating a couple of them, and I think I know what they are, but need to prove it.

There is at least one of them that has been documented on the Blog, where the SOUR information was at the wrong 'level' 3 vs 2.

Hope that helps,

Russ
louiskessler 2010-12-13T07:35:51-08:00
Russ,

To me, if the program is not capable of adding sources at the right level, then it is the program's fault.

But GEDCOM is fully capable of adding the source at any level you want.
hrworth 2010-12-13T07:41:13-08:00
Louis,

I don't disagree with you at all. However, in looking at the GEDCOM 5.5, it wasn't clear either way, which is, in this Users opinion, part of the problem.

Thank you,

Russ
AdrianB38 2010-12-13T08:41:38-08:00
Louis
Let me illuminate what I see as one source problem, in which I include citations, a.k.a. footnotes / endnotes / whatever.

If I look in Elizabeth Shown Mills' Evidence! (which is the only one of her books I have), the table on "Basic patterns of citation" includes 20 elements that may, or may not, go to make up a citation. Now, you can argue about whether it should be 20 or a similar number, but that's not the point.

The type of source determines which of the elements go into the citation (so source-type is a 21st element to track). If I were designing a genealogy application, I'd want to make the data entry as simple as possible, and that means capturing each of the elements separately on the input screen. What then?

Well, I could concatenate them all together into one line of text using a template and just stick the single resultant line of text into the "database". Then, when I produce a GEDCOM file to export to someone, that resultant line of text could be somewhere in the SOUR record or the SOURCE_CITATION structure or whatever. Trouble is, my recipient then only sees one long line of text and if they want to see which bit of the line of text refers to (say) the page / frame / folio number as distinct from which refers to the document number - they can't, unless they're using exactly the same template as I am, in exactly the same way.

Therefore, in future, I want to see all those (roughly) 21 potential items captured individually, kept individually in the application's database and exported individually in the (Better)GEDCOM file.

In the current GEDCOM, I don't see, between the SOUR record and the SOURCE_CITATION structure, the (roughly) 21 different and SEPARATE items that compile into an authentic footnote / citation / whatever.

Let me propose a test for the BetterGEDCOM sourcing and citation structure. If you enter your sourcing and citation data correctly, according to your (actual or mental) templates, then if you pass me a BetterGEDCOM of your data, I should be able to totally restructure your citation data to conform to my personal template - always providing of course that I don't ask for anything you've not entered. I'd say convert from Lackey to Mills and vice versa if I knew whether the original elements were the same in both.

(Please people, don't get lost in whether I should be using the creator's original citations or just citing a file supplied by the creator - that's not the point. The point is being able to understand and analyse the individual bits at their lowest level.)
Andy_Hatchett 2010-12-13T08:42:32-08:00
Some people mean that it won't handle the image of the source attached to the source-citation.
GeneJ 2010-12-13T09:02:42-08:00
Here's a place where users, developers and GEDCOM likely all contribute to a problem.

To add to the dialog ... I haven't counted them, so I'm going to assume Andy's comment that we find about 21 distinct elements in Evidence Explained. *

Mark Tucker did some work a while back at the difference in how three programs implemented the template element concepts from _Evidence Explained_. See the blog ThinkGenealogy for his series or blog, "Better Online Citations."
http://www.thinkgenealogy.com/2009/05/03/better-online-citations-details-part-2-gedcom/


*Akin ... even though the same element might be present in the first/second reference note and the source list or bibliographic entry, the latter might require application of different capitalization and punctuation requirements. Ala, the element in a reference note for "digital image" has different characteristics in the bibliography, "Digital image."
GeneJ 2010-12-13T09:12:32-08:00
PS... Ala, the element in a reference note for "digital image" has different characteristics in the bibliography, "Digital image."

I need two elements to handle that in TMG, but GenBox handles it with one.

GenBox also uses higher/lower sources.
mstransky 2010-12-15T06:21:55-08:00
Looking at gedcom from a distance
This is the best way I can say this. I have been out for a while but am back home.

Gedcom originally was for the basic home user to display people. the basic captured data was Names, Birth and death dates and places. That is why Gedcom apps could simply look for BIRTH, DEAT, DATE, PLAC, etc... since that was the only data collected at the time.

Then as time went on the basic APP structure could be easily coded to look for hard encoded tags such as BIRT, DEAT, etc...
As time went on and more types of sources which could be collected more encoded hard tags were needed to be programmed. Such like not just a divorce DIVO, but settlements, child support, doctor papers, causes of death, DNA strings, and hair color and shoe sizes.

I am of the opinion that GEDCOM should not use hard encoded tags to ID each type and or class of document data.

Such like

1 BIRT
2 DATE 29 MAY 1917
2 PLAC Brookline, MA, USA
-------or--------
<BIRT>
<DATE>29 MAY 1917</DATE>
<PLAC>Brookline, MA, USA</PLAC>


Every app code would need to find a hard coded tag to look for "BIRT" in gedcom or xml just to identify it. if there is no app loop or xpath command to go and find them the entire line set of data is skipped there for ignoring collected data resulting in data not displayed, exported or imported.

However if we give up that simplified way to a more modern db with soft text tags, we can find all line sets of data, even see those that may not be a standard and rename the Type + Class of collected/observed data.

Say if a class of data can be Death, Birth, Marital, Religious, etc....

Say your cousin sends to some info from a gedcom type with hard encoded death of a drowning. even old gedcom structures are not made to look for "DROW" drowning so that set of data would be ignored. However if a new BG model would look at all "Death" class type of records, then just view the Text "Drowning" the data would never get lost.

The data would pass the Death / Drowning as a text. If you later find out that he was hit by another car an push off a bridge into the river, you can have the option to change a death class record type to Hit and run, or leave it as is.

I will be trying to finish of my model in the next few days. I would not really say it is a model to do as is. I want the devs and tech guys to understand the old structure backs any look a like gedcom data capturing flat file back into a corner from ever achieving a flexible storage platform to easily to traverse to another platform or program.

It is now 2011 around the corner and we are looking at what to call what. and how to add all these new tags into a basic hard coded gedcom file.
I believe a better approach in the gedcoms favor we could do

1 CLAS Birth
2 TYPE *I will discuss this later*
3 DATE 29 MAY 1917
3 PLAC Brookline, MA, USA
-------or--------
<CLAS>Birth</CLAS>
<TYPE><TYPE>
<DATE>29 MAY 1917</DATE>
<PLAC>Brookline, MA, USA</PLAC>

This way if a gedcom or xml is ever parsed and if a class if IDENTIFIERS are written to capture the standard Birth, Death, Marital, a simple loop can find all those excluded from a list and for example say if Drowning if not part of a later model trying to follow a BG standard it would look like.

Your cousin sends you a block of data, which is not 100% since you found a coroners medical document.

1 CLAS Death
2 TYPE Motor Vehicle
3 DATE 29 MAY 1980
3 PLAC Pine Bridge, MA, USA

You see the app or xml will incorporate ALL classification of documents and the sub type that they are.
On an import the Death Class does not match standard as "Burial" "Will" "Murder" etc... of know BG standards and can flag the user or researcher. The research can see the user did there best with what information they had to work with. The researcher sees he/she can change the type of death becuase of know CAUSE of death.

1 CLAS Death
2 TYPE Drowning
3 DATE 29 MAY 1980
3 PLAC Pine Bridge, MA, USA

Show you can see that data was never lost, never over looked or ignored. But allows the User to update or correct even misspelling and or other languages to translate documents from one language to another.

This is only one example of like 20 reasons that all the simple changes can either work to benefit a gedcom flat file or a new xml standard.

EXAMPLE of failure: A true first name DB does not build database like peoples names, such like

1 MIKE
2 name Smith

1 JOHN
2 name Collins

1 MARY
2 name Jones


This is just names, we do not class people by their names, this is ridiculous. We all know we could never capture every FIRST name in the world
and FACT is new names come up that a gedcom would ignore.

SO WHY ARE WE DOING IT TO CRITICAL DATA THAT WE COLLECT.
We should not be doing that to sources and or events. New evidence and new types of documents become available all the time.

This is the road I am heading down with my thoughts of a BG, if anything comes out of HARD encoding tag in the structure file your designs will capture the spot light but them find yourself in a corner like every other attempt thus so far.

A few people have been invited to look at my very abstract thought process of mach pages. Since a few have given feed back helps me put these many critical points into a small step to bettering itself.

This I call Soft text tags or Hard encoded tags "Soft Tags vs. Hard Tags" maybe it is not the best, but this is finally said.
mstransky 2010-12-18T05:06:06-08:00
"There is no reason on earth that when Downloading an Ancestry Member Tree the applications couldn't also copy the actual image and include it in the download rather than have the end-user do it later."

"If the BetterGEDCOM can define how to include an image, that would be great."

This is just a suggestion how from a techie,

I am just taking a step back about that UIID, I could see that being a unique Id for a tree file like ancestry and others creates per tree not per record or per person. Say if FTM or Ancestry calls a tree 456789123.ged
How a server side app knowns that file is unique. Imagine that both a user computer and a server root create a default location such like MS office does or any other program on the market.
Path:c:/appGENstndrd/trees/456789123/*images.jpgs
WEBROOTS
Path:root\appGENstndrd\trees\456789123\*images.jpgs

So if you import a tree ged, you also must import a separate folder called 456789123 with images in it.

This way say your computer has more than one tree UIID on the default HD, or a web has more than one tree, you would have your app identifiy the DBfile# and automatically know its folder path where the images are.

If you use a FTM, gentech, and others that ever get to the next update version standard would include a default archive command.
If a source pointed to image <>7yt6yt6.jpg<>
The computer side app would default look at
c:/appGENstndrd/trees/456789123/7yt6yt6.jpg

For a web based the same
root\appGENstndrd\trees\456789123\7yt6yt6.jpg

Say you export a few records with images attached. The sender/exporter a text/xml file and image attachments
The receiver gets the text/xml files and it is recalculated to be absorbed into the receiving DB, next prompt would "place images in default folder 456789123? Since your work UIID would be different from you friend, you can select your stnrdAPP folder location for your file 9988776655
you get a drop down and select from 4 image folder on you computer storage 4 different tree research you have.
1. 1856829755
2. 9988776655
3. 7892837832
4. 0187532481
You select #2, the images get dropped into that folder. Any of your apps be in FTM, Gramps, or others that incorporated this BGstandard of external image folder location will support external local image folder paths and storage.


"Of course if Ancestry.com and the Member Family Trees "buy into" the BetterGEDCOM,"

I see it this way, if even 33%-50% of major software supported a level playing field were users and researchers could migrate info back and forth would force the rest up them.
A person might like research tools better in another application, but like using FTM print out displays better.
Once people start jumping back and forth, I am sure those applications realize if they do not create Tools and options that the user demands. all users will migrate to other companies that did change and did offer fixing the lagging parts of their applications. And maybe never to return?

Even so how from your computer connected to a web and you wish to upload a file. You browse you folder selections on your HD the click upload. Servers can do the same thing, you down load a record(s) into you tree 12341234.ged, and to the side are corrolating images, you select all and download to the folder */12341234/(incomming images here)

I am just trying to use/suggest what is available right now and can be done today with minimal effort.
ttwetmore 2010-12-18T05:24:39-08:00
Think of a link to external info as a URL. If it is an http URL no sweat, just send it as is. If it is a local file URL let the exporting program have the option to export or not, have the importing program have the option to import or not. If they are exported put the files in a "virtual file system" (a "bundle" in Mac OS X terminology; or as a ZIP file container as has been discussed elsewhere in this wiki; or in other ways). Specify the rules on the structure of the virtual file system so that the exporter and the importer know what's going on. Doesn't this cover the problem?

Tom Wetmore
hrworth 2010-12-18T05:37:32-08:00
Mike,

The topic Andy brought up, is NOT downloading a TREE, we were talking about an Citation Image that is linked from a Member Family Tree to an image, such as a Census Image.

Today, anyone can download their Member Family Tree from Ancestry.com.

With Family Tree Maker Version 2010 or 2011, if the Member had uploaded images to that tree, the images and the data can be downloaded. The exception, right now, is any citation images. With Family Tree Maker, there is a link that is downloaded to that online citation image which can be downloaded.

Oh, does this look like what a BetterGEDCOM might look like? That is sharing data and images between two platforms?

As a User, can a BetterGEDCOM be done, yes.

Russ
mstransky 2010-12-18T05:47:46-08:00
"Think of a link to external info as a URL."

Tom you write from an app dev, do you see it simply like an app can have a simple input filed <>774837.jpg<> and knowwing the UIID of the gen file is 763457523. so that

form the app side knowing it own location being on a c: or webroot/ the app can default a local path
"C:/genarchive/+763457523+/+774837.jpg+"

do you see it simply control in such a mannar.

Also for say if an image is not controlled locally would a second field <>?<> be needed to point outside to a remote location but would need the FULL http path as an input as well.

I like local control images, but must include remote locations just incase of like copyright laws.
mstransky 2010-12-18T05:58:07-08:00
"With Family Tree Maker Version 2010 or 2011, if the Member had uploaded images to that tree, the images and the data can be downloaded. The exception, right now, is any citation images. With Family Tree Maker, there is a link that is downloaded to that online citation image which can be downloaded."


Ok maybe i did not make my model clear, my EID hold the Citations and will except new ones or export them. That is my goal for the EID.xml The user can import/export snips of "citations" as a text record snip or also opt to export/inport the occosiate images that link/hold SID source docs and images along with a record of a citation.
Once the user pulls it into thier DB, they can attached the EID(s) snip to the person in the PID area as a colloect of records gathered to a individual for keeping.

I am agree with you, you have your own tree you just want to absoreb particular records to add to you tree. You should and can download snips of citations and images if they are available. just like the HEAD of household record for John Smith, PLUS the image that shows this into your own collection and not the whole tree.

that is how I see it, sorry if I did not make it clear with all the techie mumbo jumbo I go into. I guess it can look misleading.
gthorud 2010-12-18T06:00:25-08:00
There is a page called Container and File Issues - why not discuss this topic there. We (incl. me) must become more structured, otherwise nobody will want to follow this wiki.
Andy_Hatchett 2010-12-18T07:30:45-08:00
Russ,

You asked: "Would you agree, that there is a link at Ancestry to a citation image and that the citation image is NOT part of the Member Family Tree?".

I'll agree that there is a link and that Ancestry doesn't consider the actual image to be part of the Member Family Tree (and I think they are wrong in thinking this).

If I have an image attached to my source in my genealogy program then bot the program and I consider it to actually be part of that source.
hrworth 2010-12-18T07:33:44-08:00
Andy,

I want that Census Image in my file. In fact I do download that image. I do think that is the point we are trying to make. The BetterGEDCOM project needs to include this topic.

Thank you,

Russ
Andy_Hatchett 2010-12-18T07:51:53-08:00
Tom,

I almost agree with you.

My belief is that if a source has an image associated with it and the user chooses to export that source then the image *must* also be exported with it. There should never be a choice where you can export a source and not export the associated image.
GeneJ 2010-12-18T08:31:52-08:00
Andy, Tom, Russ ...

Just above, "There should never be a choice where you can export a source and not export the associated image."

If the associated image is subject to copyright, even though I might attach it to my file, I should have the ability to exclude the image from being transferred to a third party or "published" on the internet.
--GJ

PS. I agree with gthorud ... do we think folks are going to be able to find this important topic?
GeneJ 2010-12-18T09:08:10-08:00
BTW ... I added a page "Multi Media Specific" to the navigation ... right under the navigation for container issues.

That gives us a PAGE that can be edited and it gives us a place to post related discussion topic.
GeneJ 2010-12-18T09:08:11-08:00
BTW ... I added a page "Multi Media Specific" to the navigation ... right under the navigation for container issues.

That gives us a PAGE that can be edited and it gives us a place to post related discussion topic.
AdrianB38 2010-12-17T12:28:26-08:00
Excuse my pedantic analytical mathematical mind but can I suggest that this thread has begun to split into 2 topics, which is not ideal?

Hence I suggest we create a new thread for the security / confidentiality / privacy aspect. I've clipped the starting threads for that aspect into a new topic "Security / confidentiality / privacy" (except the last one above) and suggest comments on that topic go there.
hrworth 2010-12-17T13:15:32-08:00
Mike,

Please provide some more information about what this means:

""Not your version of Sharing but what the BetterGEDCOM is trying to do with Sharing.""

What is "BetterGEDCOM Sharing"?

Russ
mstransky 2010-12-17T14:19:43-08:00
"Let's get to the Sharing issue. Not your version of Sharing but what the BetterGEDCOM is trying to do with Sharing."-russ

If we/all us use understand how almost any app or software can export a single gedcom again or close xml format.

GenTech export to universal "bg.xml"
Gramps export to universal "bg.xml"
SFT.xml export to universal "bg.xml"
DeadEnds export to universal "bg.xml"

If we agree a transfer platform that is not app driven that can hold all exported data to a single or multi file.

From that file of records, an xslt's can parse this universal "bg.xml" and reconstruct that data back into the other versions.

"bg.xml" > a.xslt(s) > GenTech
"bg.xml" > b.xslt(s) > Gramps
"bg.xml" > c.xslt(s) > SFT.xml
"bg.xml" > d.xslt(s) > DeadEnds
"bg.xml" > e.xslt(s) > Others

I have just that xslt solution, but I have no DBs views of these other styles, they talk how they do it but no actaul file, like a gedcom file or gedxml file.

It be great for me to put my hands on some.
hrworth 2010-12-17T14:57:10-08:00
Mike,

Sorry. What is a "transfer platform"?

That sounds to me that there is a server somewhere or some software that is sitting somewhere in order to share information.

So far, I didn't know that the BetterGEDCOM was building / creating any application or "platform" for this sharing.

I am asking as an ordinary End User of a PC based program who wants to share my information with another user. That user may be using a PC or the sharing maybe done on a Website.

Russ
mstransky 2010-12-17T17:31:06-08:00
? What I been under the impression is that BG is making a guideline of what need to be called what for "x,y and z". Also that it must use xml technology. and the purpose to share information.

Imagine how FTM imports and exports gedcom files. many others do also. However FTM does NOT export images, shoe size and hair color records.

IF BG says every one must strive to included hair color, shoe size, and genetic string.

For FTM to join the movement MUST export in some xml fashion the GEDCOM data into xml format AND the additional hair color and shoe size data.

<BGxml>
<INDI>98753</INDI>
<NAME>98753</NAME>
<SURN>98753</SURN>
......more data....
<shoesize>7</shoesize>
<haircolor>Brown</haircolor>
<genestring><genestring>
<BGxml>
FTM would have to export all that it can, it does not have say this genestring, so that would be blank.

Now that this attempt of a universal xml is outputed, say you whish to use Gramps or Deadends or others.

Each app or software would have its own methood to parse this stepping stone of an xml file absorbing what it can.

Say you wish to use "MyFavoriteSoftware" application and it does not have shoesize but does have haircolor genestrings.

that app would absorb all data into its own storage structure caputuring ALL basic gedcom, the hair color and shoesize.

What do we do with the shoesizedata? well just as any app has a note per record, or how gedcom has <Cont> a simple thing is
<note>Shoesize = 7, eyecolor = green, favoritefood = apples,</note>

Just for the sake of not losing data, and the importing app does not store such node tags MUST still parse all UNcaptured data and store it per record as a NOTE, or <TRXF></TRXF> that the user can view app side and still see that data, maybe find a new field to put it in, or delete it if they will never ever use it.

My though is to achive 100% transfer, but there will need to be some kind of stepping stone that all platforms can reach and step off of. a common ground that can expand more OFFICAL or STANDARD tags that data exports too and from.

If that is not the goal of BG, then I read too much into the BG model?
hrworth 2010-12-17T18:34:15-08:00
Mike,

When you say FTM, are you talking about a specific program? I am guessing that you are, but am not sure what the reason is for talking about any specific program.

Secondly, FTM CAN deal with Shoesize, Hair color, or other attributes.

As to the Image issue, what Version of GEDCOM allows for that? Also, do other applications include an image IN a GEDCOM file?

Just trying to clarify your statements.

Russ
GeneJ 2010-12-17T18:37:35-08:00
Hi Mike:

Speaking only for myself ...

There are software users right now, today, who can't move information faithfully from it's mothership. We need to understand the issues facing those users. What objectives meet these needs.

Secondly, aside from (possibly partially related to) the problems associated with user-user exchange, there are believe to be other structural issues with GEDCOM. We need to understand what those issues are. What are these objectives.

Finally, we need to look at a blank slate and ask ourselves what role genealogical software should play in advancing best practices and how can this project help us realize on that opportunity. What objectives help us meet these needs.

There may be other considerations, too.

We have to consider all the opportunities/objectives, short and long term, from the perspective of users, software vendors/developers, and our own limitations and together find the balance and means by which we can achieve all the objectives.

My personal two cents. Hope this helps.--GJ
mstransky 2010-12-17T18:43:52-08:00
Take a look at
http://bettergedcom.wikispaces.com/message/view/Mike%27s+Model/31916039

When you say FTM, are you talking about a specific program? a=no was just using it as an example

Secondly, FTM CAN deal with Shoesize, Hair color, or other attributes.
a= I stand corrected, I last used ver 11 a few years back.

As to the Image issue, what Version of GEDCOM allows for that? a= none that I know of.

Also, do other applications include an image IN a GEDCOM file? a= none that I know of, but could very easy.

I was trying to clarify if BG ultimate goal was for 100% data transfer and capturing of data and a how to methods.

Quote myself "If that is not the goal of BG, then I read too much into the BG model?"
hrworth 2010-12-17T18:53:09-08:00
Mike,

Actually, GEDCOM 5.5.1 does allow for Images, where it allows for a link to an image that the original application had.

I posted a blog entry on this.

http://bettergedcom.blogspot.com/2010/12/media-files-with-links-in-gedcom-file.html

Russ
mstransky 2010-12-17T19:03:42-08:00
Sorry I posted this again, first one went to the wrong thread....

1 OBJE
2 FILE C:\Users\(user information and folder name) Media\(media filename).jpg
2 FORM jpg
2 TITL Test Media
2 _TYPE PHOTO
2 _SCBK Y
2 _PRIM Y

Neat, I misssed that becaus ethe the older FTM I had. I had like 1000+ people in my tree when i tried export it away from FTM all I got was a striped down gedcom out put.

Question, GEDCOM and FTM may support an image file, I had many shoebox photos but never achived a FULL DATA export any any kind. That was with FTM ver.11, what version of FTM started to export FULL data files, not striped down one? That crush me not be able to share with family back then.
Andy_Hatchett 2010-12-17T22:44:53-08:00
Russ,

I would argue that allowing for images and allowing for links to images are two separate and distinctly different things.

Allowing a link to am image is no good unless the end-user can actually use that link to see the image; and unless that image is on the web somewhere that can't happen.

What needs to happen (either within BG or within the application itself) is that when a link is included then a copy of the actual image itself *MUST* also be included.

A perfect example of this is how FTM handles downloads from Ancestry member trees. If an image is linked to an event in a Member Tree the link is included but... If I send that download to another FTM user and they use FTM in offline mode then that link is useless- unless they take the extra step to also use that link while online to actually download that image to their own computer.

There is no reason on earth that when Downloading an Ancestry Member Tree the applications couldn't also copy the actual image and include it in the download rather than have the end-user do it later.

I'm just using FTM as an example because I know how it works but this would apply to all applications.

"Hitch not the Chariot of State to the twin steeds of Government and Religion, for down that path lies chaos"
Leto II
hrworth 2010-12-18T02:48:44-08:00
Andy,

I would agree with you.

The concern then becomes 1) the size of the file being exchanged, and 2) if all applications can handle images.

The Links are OK IF the sending end user sends the image in a separate email messages and the receiving user knows what a path is, or knows where the image should go.

If the BetterGEDCOM can define how to include an image, that would be great.

Would you agree, that there is a link at Ancestry to a citation image and that the citation image is NOT part of the Member Family Tree?

Of course if Ancestry.com and the Member Family Trees "buy into" the BetterGEDCOM, then this would become a non-issue. In my mind the Member Family Tree is just a web based version of a program on a computer on my desktop. So the Citation Image would then have to become part of the Member Family Tree, not a link to is, as it is now.

Russ
mstransky 2010-12-16T08:19:15-08:00
"I hoping the pages under the "Data Models" will advance so that we start to have some comparisons. Hope you will set a page up for Mike's Model."

I will as "SFT.xml" model, I justr want it to be a bit more professional looking before I post very absract stuff all at once. I am not trying to make the end all way to do it, but many past time app writers might see some benifits and maybe change thier work alittle so they dont repeat what every past model has trap themselves with storing data.
GeneJ 2010-12-16T08:25:03-08:00
Hi Mike.

I set up a page about "Mike's Model."
hrworth 2010-12-16T14:40:01-08:00
Mike,

You said:

"@russ, from here say you want to say this person with a relative but as you said not the extra mess you collected. I think it would be easy to say export all Confirmed, Known, ok, flagged records and could skip Unresolved, disputed, not reviewed yet records line items."

My point was that the User have the option on what is to be shared. I might want to share Everything, each entry with a Citation or I might want to share only my preferred information with Citations.

Russ
hrworth 2010-12-16T14:48:17-08:00
Mike,

I am a little confused here. Sorry.

What I think I am seeing is an application that you are working on.

I am trying to understand how that fits into BetterGEDCOM. Perhaps the data model will help with that.

Russ
ttwetmore 2010-12-16T15:29:03-08:00
The topic of "deciding what to share" is an excellent one, and it would bear on all models, not just GEDCOM. And it should probably be a consideration in the design of BG, since the model would have to support the feature.

I have tackled the problem a few times in the LifeLines program and found that there are a few dimensions to the problem:

1. Deciding which persons to share.
2. Deciding how much information about the persons to share.
3. Deciding how to handle privacy/living issues.
4. Deciding various closure issues (e.g., if a person is to be output, and the person is in families, under what conditions should those families also be output).

The ability to make these decisions might be considered requirements on applications more than on the BG model itself, but I think it is pretty clear that the model must support objects with enough characteristics that these decisions can be made.

Tom Wetmore
hrworth 2010-12-16T15:51:12-08:00
Tom,

I totally agree that this is an application issue (sharing). I bring it up for this project so that we can deal with it.

I didn't even get into the "living" issue, which has been mentioned here.

Thank you,

Russ
mstransky 2010-12-16T18:00:09-08:00
Russ sorry I got caugth up with another thread.

On the model sft.xml that I am slowing bringing over to BG. I do have a way of displaying Public view infromation and Private information in the same templates. also access level who can update, view, or see all the data on one person.

With that said, I think it note hard at all to take that same access level say on a computer side app to flag a record as
- Public > public veiw shareable
- Private > Nonpublic view but sharable
- InResearch > Viewable only by owner

Right know all I have functionally working is public view, and admin view. I think that is very easy to incorperate a need (option) like you posted

Tom, it looks like you will be able to do the same with your dead ends? I think so.
mstransky 2010-12-16T18:53:33-08:00
"What I think I am seeing is an application that you are working on."-Russ

"I am trying to understand how that fits into BetterGEDCOM. Perhaps the data model will help with that." -Russ

I got fed up back in 2003 how sad software's did not give me the flexibility I wanted to control my data. from 2001-2006 I messed around with xml coding for a long time. while working on other web xml databases and using the concept to control my own genealogy data. i figured making web based genealogy xml databases. so by 2009 I have controlled many of the requests that you and others want from a Better Gedcom. Since I have seen in this sandbox per say, I am looking to change my labels to your BG's terms for better understanding. I am not trying to say this is how BG should do it, just that if one wanted to you could do it this way.
My functional database does work, the templates are not pretty. But if anyone gets ideas how to take anything that I have done in xml and make it better, more flexible then good for the whole.
Mine is no better as a web based xml then a computer side app.
Example, Tom is working on dead ends a computer side app, I have a web based xml DB, FTM has their gedcom. But all of us still will have to export to some universal xml structure that other platforms can import and export to.
Does that make sense? there is no competition just a open sharing of ideas and how to, and concepts that others might get from or give as better ways to store it and control the data.
gthorud 2010-12-16T19:48:17-08:00
Re. Privacy - non-sharing

There is another dimension to this. Sharing may depend on who you are sharing with and how - via a gedcom file, on the web or a report. For example, Norwegian (and EU) law prohibit certain info to be published on the web, but if I print it in a report and publish it on paper there are less restrictions - and even less restrictions (none?) if the paper is distributed in the family only.

For this purpose (and others) it might be a good idea to be able to attach user definable types of "flags" to - in principle - ANY piece of info (person, place, media, event, source, relation, name etc.) - and have the program select sharable info based on the value of a user selectable flag. (Such flags has a much wider application - eg. in selecting/organizing info for many purposes.)
hrworth 2010-12-17T01:18:42-08:00
Mike,

Thank you.

I appreciate what you just posted, as it helps me understand what you are doing. I think that is great.

Let's get to the Sharing issue. Not your version of Sharing but what the BetterGEDCOM is trying to do with Sharing.

You have a web based application, I am using another program on my PC but I want to Share my research with you, or you want to share your information with me. HOW will that be done with your project. I hope that you and I can share our research using a BetterGEDCOM "file".

Can you export your research to a GEDCOM now? Can you import a GEDCOM.

I am only trying to understand how we share data across the platforms we use.

Russ
mstransky 2010-12-17T05:40:33-08:00
I also about the GEDCOM structure. I know there is no way you will get a big software company to change there protuct. So if we can modify what they use in same steps that makes big changes in flexibility for everyone, they might jump at it.
I need to bounce it off off a person who wrote the app for a gedcom style and see by this small 1-3 changes can a gedom read handle the couple sub routine commands?

Anyway yes we will all need to know how to to a gedcom flat file, and a universal file exchange. This I thought about for years. I hope maybe once I get all my cookies on the table it might give some others ideas also.

Do we have a gedcom app writer on the boards?
mstransky 2010-12-17T12:17:20-08:00
"Not your version of Sharing but what the BetterGEDCOM is trying to do with Sharing."

If all plateforms can include a node level which the user and researcher can mark, flag tosuch an extent.

[recordID [record data]+(access/display lvl) (shareable lvl)

What would be some suggestion to mark levels for this. Also maybe consider consider a third node for (evidence lvl)

This could be a number like 1, 2, 3, each can be read from a app or xslt to display if x=2 then display "Hypothisis" or 4 = "Follow up"

the same can been done for displayable items and sharable items.

If we consider the three nodes to mark what would be the labels.
hrworth 2010-12-15T17:27:59-08:00
mstransky,

Could you clarify one thing here? That is How do you view your data.

To me, there are at least two ways. One, as a using looking at a screen presented by my software, and the second view, is how an output would look when generating a report, or even presenting information in a chart.

In my data base, and this is only an example. I have also posted more detail in the Blog. Please consider:

Birth
Date, year only
Citation

Birth
Location only (a state)
Citation

Both of these may be typically seen in a Census Record

Birth
Complete Birth Date
No location
Citation

Birth
No Birth Date
Full birth location
Citation

Birth
Complete Birth Date
Full birth location
Citation

I my program, and I am guessing others, can enter multiple BIRTH events / facts

When I look at my file, I may see each of these entries.

Since I can enter multiple Fact / Events, the program assigns one of them as the Preferred Fact, but that preference is also user selectable.

So far, that in what I see (or the input side) of the program.

As for the output, I have many options, but in each option, I have the choice to select the Preferred Only fact or All of the entries for that specific fact or event.

I have control on output. I usually select the most complete entry and after a review of the data I have collected. For example, the "evidence" of the complete entry is suspect, I may not choose that one as preferred.

Now to the Sharing using a BetterGEDCOM, similar issues need to be taken into account. Hoping that the User has the same type of control on what information I want to share, that data needs to be packaged, transported, and unpacked at the other end.

The user at the other end may want the same type of control on input and what they see in the file that was just imported.

There would probably need to be an indication that the sending user has multiple Event items, for that Event, but is only sharing one of them.

I think you may see the logic there. That is, what if I choose one of my less complete Fact or Event (using the same fact or event name), but it is my preferred, the receiving person might want to see that I have more information but am only sharing my Preferred Fact.

I am using Preferred vs some of the other terms that are on this wiki, mostly because those definitions may not be completely understood or accepted. Preferred to me, is the best data I have available at this point in time based on my research. It may or may not change in the future.

So, just trying to clarify or expand on "HOW data can be universally captured and stored needs to be address."

Russ
mstransky 2010-12-15T17:40:23-08:00
Haaaa! Please hold that thought I want to answer all the above questions you are addressing.

I am going to answer each one! IThis point your asking me is the second biggest thing that made all softwares drift so far apart on compatability. Pleas wait a moment...
mstransky 2010-12-15T18:01:30-08:00
"Since I can enter multiple Fact / Events, the program assigns one of them as the Preferred Fact, but that preference is also user selectable"-Russ

This is how it should be. But the storage of ACTUAL data stuff inside what you call a "preferred person display, with which record is defualt to display"

say we both have a John Neal, say we BOTH have the same source records say 18 of them with various surname spellings like O'Neal and Neel. We both agree these records are true and O'neal got change when he came from Ireland, and one had to be a mis spell. say that all the family first names are correct and address but just the surname was mispelt.

Ok the preferred display name and date. This should be just THAT, the prefered display by the user, this in not a record, JUST a placemarker.

this placemarker in my model is in the PID.xml
with 14 evidence records associated to this particular person marker.

In your tree you would like to display him as John Neal, birth Date place & Death date place

Me I like to display them with the names they are born as so my person marker is edited as so
John O'Neal, birth Date place & Death date place

Now the reports will display ALL records in chronological order from a EID.xml Observed data or cited data from sources.

the pid.xml does not have to be imported if it is not wanted (anothers tree outline and preffered person displays) But all the evidence and source records can be. They get incorprated into your source db, linked to evidence (events) db which is tied to a PID#.

Let me send you an email. then maybe it may answer more questions in one shot.

If the devs and software learn to keep preffered user tree outline seprate from the gathered FACTS and sources, when they import and export data they will not duplicate or over write data records traped in the INDI set.

Look in your inbox.
hrworth 2010-12-15T18:29:35-08:00
mstransky,

(I'll check my email shortly)

But, I see ALL of the Facts / Events on my screen now. So, I am cool with my display.

But, if I am sharing my information with a cousin, who is only interested in hi-level information and not the "mess" we might find during our research.

The issue that you bring up, I think, is the probably that we have when WE choose to merge your data and my data together.

As a User, not a developer, I am not sure have a PID in my file (today INDI) would related to that same person in your file. We would not have a common reference point.

Perhaps if we made on match, between the two files, the rest would fall into place.

It's bad enough, sometimes, when a User merges two of their own files. I have seen that mess too may times.

Be merging two files from two different programs will be messy. At least that's my thought.

Reading the Wiki entries so far, I haven't seen a good solution to addressing this merging of file issue.

Russ
GeneJ 2010-12-15T19:53:44-08:00
@ Mike ...

While Russ might enter multiple alternate births and have one marked as preferred; I work differently. I prefer to work from one birth entry and include conflict or nuanced date notes in my citations.

See the BetterGEDCOM blog entry, "What is research: ... the Kaleidoscope." http://bettergedcom.blogspot.com/2010/12/what-is-research-having-fun-with-body.html

When information is located and confirmed as relative to my research, I want to look into the kaleidoscope to see the change that new information makes. As well, I need to interpret that new information through the lens of the existing body of evidence.

New information very often opens a new research file and yet other research opportunities.

I do have some alts in my database, especially names for sorting in my project view.

Hope this helps. --GJ
mstransky 2010-12-16T05:44:59-08:00
My model I have a generic person record which holds the person as a place-marker. That PID xml holds all the navigation of people that are navigated parents to child. This area holds no records what so ever. This area is the preferred display by a user or researcher.

Then EID xml is the observed data like citations, events, observed data that is listed per item linked to the person marker to the actual source documents that a user can grade as Good, fact, disputed, hypothesis or what ever a user researcher would like to flag the gathered data record set.

If one views say a name list or navigation tree, selecting a preson "which has preferred display name dates". Once clicked, a new window opens with the preferred person name at the top. Just below that name a (xslt,xpath) parses the EID xml list all records pointing at this person place-marker including all records of names, marriages, draft, military, emigration, duplicate even similar records like one marriage in 1930 with a later record request of the same marriage in 1941.
All records will show (selectable) chronological order. you can see all the events for this one person laid out in a time line order of every event captured on them.
That (selectable) default just list chronological records in event order as is. I think it a quick code thing to (select) say all Death class records and filter the large list down to areas and sort by your preference. It is the way the DB is set up you can manipulate the data like an excel sheet. just the command to select filter match for need to be made, that only takes a few minutes to do in xslt or even a app side code.

say they lived in NJ then moved to FL, how can a record pop up for NY on them after they have been living in FL for 5 years? that is a researchers view to quick view disputes, was it a date issue wrong? was it a relative they visited? this a researcher can flag saying I will return to review it or track down this source image for myself.

@russ, from here say you want to say this person with a relative but as you said not the extra mess you collected. I think it would be easy to say export all Confirmed, Known, ok, flagged records and could skip Unresolved, disputed, not reviewed yet records line items.

@GeneJ, "I prefer to work from one birth entry and include conflict or nuanced date notes in my citations."
Say in your PID proffered list you would use the information from that one birth record say 20 Jul 1945, that you can change at any time, does not effect any collected record, it is the actual collected records will show your preferred record will list with all the other nuanced date citations and so on.

1. Preferred Person data place-marker (PID.xml)
a. records from the EID.xml
.....
z. records from the EID.xml

sortable by time, events, names, or filtered by class type of marital, birth, military, etc...
( those view options are a simple app side commend to display ) I think any programmer can add that to their platform rather quickly)

Why this way? Old style they stick all the records inside the INDI/PERSON area. Originally that area was meant for tree outlines and display to print pretty print outs. over time no one was creative, and thought we will stick the basic records in the INDI area. Well that is fine for like 4-8 basic records.

Its now 10-15 years later and they still stick all records inside wrapped in a person ID. That is where the problem comes into play.
app see this preferred slots as must and as records. so when you transfer, merge import export these INDI slots like Birth Name, the app sides say do I keep your data or over write this person with you uncles DB work?

1.) BIG mistake the old way, today's times out grew that concept. It is all about records and the facts in them. The INDI person was good as a place maker in a display tree for print outs, that is how I kept it.
2.) source records are in an area of their own
3) event/evidence/citation records EID.xml is where all the user or researchers real work is held.

Back to Russ, say your person is #72 in your PID.xml and you have 12 EID records of births and military etc... You uncle has the person as #625
he has some records you have but 3 more you don't have.

Concept
By selecting you uncles #625 click the three records you want. on the import to your eid your count was last at 7645 records each line item with count (+1) as it inserts these records pointing now at #72, also from his information he has two source documents and images.
Again you source SID.xml left off at 865 so the import will (+1) each record.
now you have

pid 72, eid 7646, sid 867 (observed entered data line sets)
pid 72, eid 7647, (No source from your Uncle)
pid 72, eid 7648, sid 868 (observed entered data line sets)

Each line will default as not reviewed yet, this gives the user researcher to double check the others work, know where they left off. after review the research can make each line record item as Confirmed, Dispute, not reviewed, (new could be Needs follow up)

I will stop here. I was mainly trying to so merge people that are really not even records is damaging. I have separated the record to one area by itself, and the place-marker person back to it original status as a display person unique to the home user or researchers purpose.

You can still export PIDs like you can EIDs and SIDs.

I am thinking that a person should be able to open two DB side by side and shift over check record sets like folder view of a computer drive to a folder view of a thumb drive. Being able to say you are collecting all the source, events, for person 72 NOW import them. And not taking in all the other mess you never wanted, OR have the tree outline write over your own hard work merging into one big looped bird nest.
mstransky 2010-12-16T06:23:46-08:00
I have a basic online db to test how seperate multi tabled xmls can work indepently or together. I have completed a display to view pedigree views, houshold views, desendent views, and name list view. This was an attempt to test the functionality of mutli seprate files and does work quite nicely. this was not for pretty visual, that I or anyone can change or make pretty template display.

I stopped at that point to capture what people will call "x" as "X" for better genealogical terminology.

What I have next is a universal record keeping of events, sources, and repos. Thanks to Adrian he made a list and I seen I can include Publishers to a new area to capture that data.

I want to have all the cards on the table for every wish list item. also leave room for expandability and flexibiliy for growth. Then I will redo this 2009 project into a 2011 one.

I did not want to invent a project with term no one can relate too, then have to return and relabel extra nodes to capture unforseen needs.

Hidden from the public view is the edit screens to modify, edit add and delete persons. If one did log in, all the edit options display insdie the web base template.
mstransky 2010-12-16T07:11:15-08:00
@ GeneJ

Just like where I have my Flagged option per record item as to where the user researcher left off. I have my own flagged terms, but see you use the GPS standard

- a reasonably exhaustive search;
- complete and accurate source citations;
- analysis and correlation of the collected information;
- resolution of any conflicting evidence; and
- a soundly reasoned, coherently written conclusion

I have no problem to relabel the flags, I am just sort of waiting for the majority to call the diffrent types of (levels?)

But I will wait for russ and you to chance up with what I wrote so far.
GeneJ 2010-12-16T07:37:16-08:00
Hi Mike:

Since your model is different than GEDCOM and different than GENTECH, GRAMPS or DeadEnds, do you think it would be a good idea to set up a wiki PAGE under "Data Models" for "Mike's Model?"
mstransky 2010-12-16T07:56:47-08:00
Actually I coined it SFT.xml for simplified Family Tree xml.

I have node tags like NAM for NAME, EVE for event. and worse what I called Observed or interpreted data, others called Citations and notes.

I have been trying to work my footing from the ground up using the same terms as others so not to confuse anyone. If we had a mach data like the Kennedy's that each model would display how it is stored in each format. Then any of use if not familiar with a design, can still get the jest by following the known data and how they are stored in a data structure.

If we ever get their everyone can see all models side by side and can than further pick the best concepts from each one to model a new BG from the best of the best practices.

When I first started I tried to get 20 ideas out at once, I became quite, and now struggle to explain one concept at a time which creates more questions why I do it that way. But a big picture follow along is worth a thousand words with no explanations.

Then once a few model types are up, and people can follow along, we can say here is a record of a child, that was found..... or here is a location record that must tied all these people to the same event at such and such.....

Each model caretaker or rep, with incorperate each needed or desired record.

when everyone stands back we all can say, I see this model is better at keeping places universal, and that model there has more flexibility in single links to other records and not dual paths, and that model is more compact taking up less room, and so on,

We are not eliminating apps and models, but being able to see others styles for a better concept over predetermined mimicks that set this agenda for failure each time over the past 10-15 years.
GeneJ 2010-12-16T08:02:46-08:00
Hi Mike,

I hoping the pages under the "Data Models" will advance so that we start to have some comparisons. Hope you will set a page up for Mike's Model.
mstransky 2010-12-16T08:15:35-08:00
That SFT.xml "Simplified Family Tree xml"
from 2009 can be found at
http://www.stranskyfamilytree.net/
Most of all the tags are not standard and I have a mach up page not linked from the main pages. With a few invites to others here I have slowly incorperated standard terms if they are givin with reasonable understanding and re wirte my model terms.

The web base displays was from 2009 and its purpose was to see that the function can work. Since it is written with xpath and xslt is very good. I use ASP to control the command. Once it is opensource any guru can convert the ASP to PHP version. If at any time a computer side person can make app side programs.

I see each researcher having the ability to display thier own work with public view templates and private work boards. Also that repo's and archives would not have PID db's but would have EID and SID db's that allows collections of thier own works available for searches, importing segments of records, and even private researches that create db's of cemetary records which are held in SID and EID, each researches can view a tomb stone EID and image if avaible and import that record set to thier own work in a simple click.
NOT MERGE data over other data records. They only act as additional added records.

Anyway my biggest things is watching the boards and grabbing good terminology prcatices that allmost all agree on, then prep my work another step for clarity when I show/display and talk about it.
mstransky 2010-12-15T08:17:22-08:00
What is wrong with GEDCOM.

Gedcom is ok, but the data storage structure is the problem. Soft Tags vs. Hard Tags is the first step in correcting a Gedcom model.

everyone familiar with this model in Gedcom or xml.
Such like
-------GEDCOM-------
1 BIRT
2 DATE 29 MAY 1917
2 PLAC Brookline, MA, USA

------------XML-------
<BIRT>
<DATE>29 MAY 1917</DATE>
<PLAC>Brookline, MA, USA</PLAC>

a better Structure without crippling the structure use can be done this way.

1 CLAS Birth
2 TYPE *I will discuss this later*
3 DATE 29 MAY 1917
3 PLAC Brookline, MA, USA
-------or--------
<CLAS>Birth</CLAS>
<TYPE><TYPE>
<DATE>29 MAY 1917</DATE>
<PLAC>Brookline, MA, USA</PLAC>

Why is this? He is a poor design on a first name hard tag to capture people by first name.

1 MIKE
2 name Smith

1 JOHN
2 name Collins

1 MARY
2 name Jones

This would be unacceptable, this hard code structure and app routine would only Identify the names listed at that time. But when new people are found with first names Jojo, Manjo, Mika, the original does not have such hard tags so data would never store or transfer them. That is why the original writers of Gedcom foreseen the various names could be listed under a universal tag "NAME".
If this is the case, they dropped the ball on the same for Sources and events. Sure they got major tags like Birth, death, date, place, but have you ever seen a Draft or Census tag. what they do for Gedcom is this.

1 BIRT
2 DATE 29 MAY 1900
2 PLAC Burkon, Warsaw, POL
1 IMMG
2 DATE 7 JUN 1917
2 PLAC Rose Island, MY, USA
1 MARR
2 DATE 14 JUL 1921
2 PLAC Brookline, MA, USA
1 DEATH
2 DATE 10 APR 1966
2 PLAC Brookline, MA, USA
1 BURI
2 DATE 12 APR 1966
2 PLAC Brookline, MA, USA

But to capture the same universal hard tag to soft tag system would be

1 CLAS Birth
2 TYPE record
3 DATE 29 MAY 1900
3 PLAC Burkon, Warsaw, POL
1 CLAS Immigration
2 TYPE Port Arrival
3 DATE 7 JUN 1917
3 PLAC Rose Island, MY, USA
1 CLAS Marriage
2 TYPE Certificate
3 DATE 14 JUL 1921
3 PLAC Brookline, MA, USA
1 CLAS Death
2 TYPE Record
3 DATE 10 APR 1966
3 PLAC Brookline, MA, USA
1 CLAS Death
2 TYPE Burial
3 DATE 12 APR 1966
3 PLAC Brookline, MA, USA

Note not everyone has one form of death record that is always a death date. But one cam quick view all DEATH related documents under of class viewing all types that they corresponded to a proper sequence.

I will get into showing that later and also benefits to app side and display benefits for the home user or researcher type of persons.
mstransky 2010-12-15T09:02:11-08:00
Before any data model is developed to become a true better gedcom to store data in any kind of format or structure the foundations of HOW data can be universally captured and stored needs to be address.
If any person or team effort disregards this warning, then the same failures will be repeated over and over.
in my opinion this is one of half a dozen predetermined failures will reoccur if not address. I hope a few can grasp this example I have tried my best to bring across.
After this is understood another major problem of gedcom is sticking actual record keeping of events and sources inside the INDI record sets.
The same principles that GEDCOM structure stands by to classify names under a universal tag as NAME, fails on the same principle to store Sources, Events, Repository, places and such. A true creative person or team must understand you must look at the root problem that has caused so many attempts to follow a poor Capture technique of data. Everyone has tried to create a look alike storage file that has failed over and over and so complex that other apps and software have no way to follow suit. It is setup for failure from the beginning.
I am not trying to re create a gedcom structure, just point out the very root of the problem that many have neglected from a code writters point of view that can fix it for every type of end users purposes.
DearMYRTLE 2010-12-15T09:30:17-08:00
MSTransky said:
"Note not everyone has one form of death record that is always a death date. But one cam quick view all DEATH related documents under of class viewing all types that they corresponded to a proper sequence."

I think this is this is the crux of the problem. There is no one-to-one, document-to-data-field for any event in an ancestor's life, because the world has not been keeping UNIVERSAL birth, marriage and death records since Adam and Eve.

Family Historians extract information from a variety of documents(each with varying degrees of reliability) to arrive at a death date and place.

Does GEDCOM consider:
-- Multiple source citations and attached documents for an ancestor's event.

-- Multiple events for an event (like death), where there is unresolved conflicting evidence from multiple sources?

By the way, as an end-user I can see a census enumeration as an event in an ancestor's time line. Others argue against that.
mstransky 2010-12-15T10:06:38-08:00
"There is no one-to-one, document-to-data-field for any event in an ancestor's life"
-DearMYRTLE

True in that aspect, also true how you myself and others view actual documents and gathered data.

Problem, is the way data is stored. the root problem is on the developer to structure how data is stored. Then because the structure of how it is stored limits the devs and software to (forced and limited) how data is read, retrieved and entered into a DB file.

It is this problem that dev's and software writers refuse to give up all that complex code writing to "give in to" others complex code writing. If the data base was written for a more universal storage and classification data, then every app or code writer would have to read write some code. But in the long run they would only have to add identifiers to the text, not the DB structure where it is hard encoded.

"By the way, as an end-user I can see a census enumeration as an event in an ancestor's time line. Others argue against that."-DearMYRTLE

I also see it that way also, why others argue against it is because they have not made hard encoded CENS or ENUM tags to match and find them as a list from inside a BD structure. Then argue against other needs because the complexity to match and find lists of data becomes overwhelming from a poor gedcom hard encoded tag structure.

"There is no one-to-one, document-to-data-field for any event in an ancestor's life,"
-DearMYRTLE

Exactly my point, so why do we allow re creating hard coded tags for BIRT, DEAT, ect... as from my second post.

Every proposed NEW structure over the years has mimicked the same old style and just adding show new hard encoded tags to their design. after a few years some more items are needed to include making it obsolete time and time again.

Gedcom could be better if they really think about how to make data more universally captured over specific tags only.
Just let the app's or xslt sort and filter for matches from those CLAS and TYPE. then on import exports allows app's and softwares have the chance to pass data lines without have to ignore or discard ones work.

Storage right now is specific not universal, thus making it hard to write a software loop command to find every considered document out there.


I hope I make sense pointing out the DB structure as the culprit, not how we understand there are many kinds of documents and data. it is the predetermined storage that say it is worthy or not with in its limits.

We as user are flexable with data, the storage files have not been over time because of this poor reoccuring NON universal perdetermined and limited way to capture and store the data in a file.
AdrianB38 2010-12-17T12:22:16-08:00
Security / confidentiality / privacy
We started this discussion in "Looking at GEDCOM from a distance" but that thread began to split into 2 topics, which is not ideal. Hence I suggest we create a new thread for this topic. The starting posts for security / confidentiality / privacy are copied / summarised below, and apologies if I've snipped too much:

Russ started with "My point was that the User have the option on what is to be shared. I might want to share Everything, each entry with a Citation or I might want to share only my preferred information with Citations"

Tom highlighted this with:
"The topic of 'deciding what to share' is an excellent one, and it would bear on all models, not just GEDCOM. And it should probably be a consideration in the design of BG, since the model would have to support the feature.

I have tackled the problem a few times in the LifeLines program and found that there are a few dimensions to the problem:

1. Deciding which persons to share.
2. Deciding how much information about the persons to share.
3. Deciding how to handle privacy/living issues.
4. Deciding various closure issues (e.g., if a person is to be output, and the person is in families, under what conditions should those families also be output).

The ability to make these decisions might be considered requirements on applications more than on the BG model itself, but I think it is pretty clear that the model must support objects with enough characteristics that these decisions can be made.

Tom Wetmore"

Russ agreed:
"I totally agree that this is an application issue (sharing). I bring it up for this project so that we can deal with it. I didn't even get into the "living" issue ...
Russ"

gthorud continued:
"Re. Privacy - non-sharing
There is another dimension to this. Sharing may depend on who you are sharing with and how - via a gedcom file, on the web or a report. For example, Norwegian (and EU) law prohibit certain info to be published on the web, but if I print it in a report and publish it on paper there are less restrictions - and even less restrictions (none?) if the paper is distributed in the family only.
For this purpose (and others) it might be a good idea to be able to attach user definable types of 'flags' to - in principle - ANY piece of info (person, place, media, event, source, relation, name etc.) - and have the program select shareable info based on the value of a user selectable flag. (Such flags has a much wider application - eg. in selecting/organizing info for many purposes.)"
mstransky 2010-12-17T12:44:59-08:00
I see there are two toic areas,
1. Displaying as
a.private, living people
b.workboard hidden mess theories and stuff.
c. ? others
2. Exporting
a. Y or N
b sould there be more?


Sorry I hate bring this up about evidence, but if we are able to tag individaul records another function we could add per record is a level evidence might bare for a researchers progress
a. not reviewed yet (like after an import)
b. confirmed
c. disputed
d. follow up
e. ? other kinds or better terms?
AdrianB38 2010-12-17T12:52:38-08:00
On this, I would agree with Tom and Russ that the whole topic of security / confidentiality / privacy would initially appear to belong to the application, but that, as Tom suggests, the model for BG should support (or at least not contradict) those requirements.

I think we need to bear in mind a couple of points:
1. One or two applications use GEDCOM as their native file format and presumably some future versions will use BG as their native file format. It would be useful therefore if we could designate a spot in the BG model to contain this data to avoid incompatibilities and gain some rough level of consistency. Other apps take their inspiration from GEDCOM and we'd like them to take similar consistent inspiration from BG.

2. If I create a file for export to someone else, I don't think there's any point in me adding _my_ security type data to that export file as it's used only to create the export and the recipient might have totally different ideas on security etc. I would also suggest that some security mavens would say that releasing my security flags to be viewed by others - even if the protected data weren't on the export file - is a risk in itself.

3. If I create a file for export to another application that _I_ use, I'd like the security data to be in the same place so that my recipient app can see it and use it the way I want.

4. It's pretty certain that different people have different requirements on this topic (as per gthorud's comments on Norway).

So, my suggestion would be that we designate an area of the model to contain security / confidentiality / privacy data (e.g. a set of attributes for the PERSON entity, and for the EVENT entity and for ...??)

This area would contain a certain number of "flags" that are agreed to have clear meaning across everyone's view of the world (e.g. "Living" is the obvious one but there might be others) plus a whole list of other "flags" and values that have meaning either
- inside one application (and are thus "user-defined" in that sense) or
- within one person's studies (e.g. "Publish to GenesReunited regardless of Privacy setting") (and are thus "user-defined" in the proper sense of that phrase)

Note that a flag "Private" is pretty meaningless is a general sense because it might mean "Don't publish outside my PC" or "Don't publish outside my PC and my web-site" or "Don't publish outside my PC except to my designated relatives" or .... That's why I think we can only designate an area, and not define the precise items.
testuser42 2010-12-17T14:27:16-08:00
In my extended family, there's a person who is adopted but does not know it. The parents don't want to tell him. If I am to respect the parents' wish, he shouldn't find out by accident when looking at a printout or a BG that I made. But I still want to record the truth in my data.

I guess something like this always needs the user's manual interference. I would want to set a big fat marker on this person, and if I export a file that has this person in it, my software should warn me and allow me to do an edit just for the export file. I would change the relationship from adopted to regular child, and that's that.

Of course this is bringing up ethical questions. But that's not the software's nor BetterGedcom's problem ;-)

Long story short - for these kinds of cases, BG might need a "EXPORT WARNING" marker (if BG is used as native database format).
GeneJ 2010-12-17T14:48:14-08:00
As we move more and more into life with Digital Millennium Copyright Act (DMCA) and other forms of copyright interpretation, at least here in the states, we may want to BetterGEDCOM to accommodate some means for an application user to protect themselves from allegations of copyright infringement (long extracts/transcriptions and/or multimedia. --GJ
gthorud 2010-12-17T18:44:07-08:00
Re. "living" - Most programs have functionality that considers information about living people to be private, e.g. if info about living people shall be excluded when exporting or publishing. There are probably differences in customs/laws in this area. Here, it is allowed to publish info about living persons – even on the internet – including their parents, spouse, and children. The things you cannot publish is the exact birth date (but the year is ok) and sensitive info about the reason for death, health info, adoption, info related to criminal activity, and probably more. (These laws may change over time – e.g. due to crime based on such info – or more use of encryption based authentication to prevent crime.) In addition to laws, there is such a thing called "common sense" to be applied. So the choice to share or not to share is not as simple as living or not living, but "living" is useful info. A program should allow you to select the types of info that can be shared (or not shared) for living persons, or a user defined sensitivity type of flag.

I think Adrian has an important point #3, re. exchanging info between ONE user's applications. Thus it makes sense to export USER DEFINED flags ((and not only flags for privacy related stuff)) that may be of no use to other users.

The difficult flags are those that we want to share with others, they should be standardized. I think "living" (or deceased) is one – since you cannot always determine that from other info. Another is a flag indicating that info has been omitted due to "sensitivity". Perhaps there could be a value "sensitive" ("for your eyes only", it would be possible to transfer the meaning of sensitive, or other flags, in the “header” of the file). Copyright may be yet another one.

There will be a need to define what info a flag applies to – cf. Tom’s bullets 2 and 4. This is a difficult issue. Eg. if a person is sensitive, does that apply to all events that involves this person – where other persons are also involved? Also, if we end up with a family entity (I am using this as an example only) so that the parent relation can be transferred either as a birth event or as a link to the family entity – what then if the flags for these two ways of representing the parent relation have different “flags” or no flag.


One reason why I want to see a general “flag” solution used for this purpose is that I envisage that such a solution would allow you to filter on flags, thus you need no special functionality for finding e.g. persons with privacy/living/exclude flags set. And it might also be used to set/reset the p/l/e flags automatically based on a filter that consider other types of info, e.g. set a p/l/e flag adopted persons.


Just so we don't forget, here is an extract from Gedcom 551 (this may trigger some thoughts):

RESTRICTION_NOTICE:= {Size=6:7}
[confidential | locked | privacy ]
The restriction notice is defined for Ancestral File usage. Ancestral File download GEDCOM files
may contain this data.
Where:
confidential = This data was marked as confidential by the user. In some systems data marked as
confidential will be treated differently, for example, there might be an option that
would stop confidential data from appearing on printed reports or would prevent that
information from being exported.

locked = Some records in Ancestral File have been satisfactorily proven by evidence, but
because of source conflicts or incorrect traditions, there are repeated attempts to
change this record. By arrangement, the Ancestral File Custodian can lock a record so
that it cannot be changed without an agreement from the person assigned as the
steward of such a record. The assigned steward is either the submitter listed for the
record or Family History Support when no submitter is listed.

privacy = Indicate that information concerning this record is not present due to rights of or an
approved request for privacy. For example, data from requested downloads of the
Ancestral File may have individuals marked with ‘privacy’ if they are assumed living,
that is they were born within the last 110 years and there isn’t a death date. In certain
cases family records may also be marked with the RESN tag of privacy if either
individual acting in the role of HUSB or WIFE is assumed living.
hrworth 2010-12-17T18:58:16-08:00
Question about "Living"

How is someone 'defined' as 'living'?

Suppose that I have a birth date, but no death date. Haven't found any documents with that information.

I am thinking that different applications consider a "living" based on a certain time from birth.

But, what if there isn't any birth date either.

What is "living"?

How should BetterGEDCOM define this issue?

Russ
gthorud 2010-12-17T19:11:51-08:00
I don't think we need to ...
Andy_Hatchett 2010-12-17T23:22:52-08:00
Russ,

I come at "living " from a different angle. Living=Not dead

Lacking both a birth date and a death date it would be safe to assume a person dead after 122 year from the first dated event you have for a person simply because 122 years is the oldest actual documented age at death that has bee recorded for a person.
gthorud 2010-12-18T05:02:33-08:00
The definition of living was probably developed a few million years ago.

This is an application/user problem. You can also assume that a child of a person born in 1800 is dead today.

It is a question if "living" should have a date - the date when the user/application determined that the person was living (cf attribute-event discussion).
AdrianB38 2010-12-18T09:32:59-08:00
Russ asks: "What is "living"? How should BetterGEDCOM define this issue?"

I think this is one area where we really shouldn't try - because different countries and cultures may set different expectations. Most people use a "Living" flag for privacy reasons, so "known to be dead" implies not-living, that's easy; "known to be alive" is easy also. It's the intermediate state that's tricky because different users might take different views on the need to set such a switch.
ACProctor 2011-12-03T14:58:03-08:00
I just want to provide another slant, just in case it helps...

I've tried to categorise my own data as {public, family, private, sensitive} and grant access based on which 'family' the accessor is. Hence, public OK to everyone, family OK to appropriate family member, private OK to ad hoc people I've selected, and sensitive to no one but me.

The living/non-living is a separate issue I haven't really sorted. It's obviously problemtic since the absence of a death date may just mean you don't know rather than the person is living. No one adds a separate "is-living" flag because it could never be kept up-to-date.

I sometimes feel that I should just set the bar based on births before a particular date rather than deaths. Anyone else feel that way?
GeneJ 2011-12-03T15:05:25-08:00
Hi Tony,

TMG has a user preferences option, "Assumed maximum lifespan," but believed that is used for other purposes.

TMG uses a "Living flag." Here is the snippet of information about that flag from the program help files. "The LIVING flag defaults to ?. The program automatically sets it to N when a death or burial tag is entered, or when a birth group tag indicates that the person would be more than 110 years old. You can change the LIVING flag to Y to use it with filters."
AdrianB38 2010-12-23T15:01:09-08:00
Families, step-families, biological families, etc
I've just added this to the "I'd like BG to do this..." section:
"Distinguish between a group of people who live as a family (who might include informally adopted children) and biological or step-biological children who don't live with their biological or step-biological parents."

What I'm getting at here is that GEDCOM is not good at splitting apart a family who lives together from a family that simply has some biological or quasi-biological relationship.

For instance, my GG grandfather had 3 wives in succession, with children from the 1st 2. When he married his 2nd wife, common usage says that the 5 surviving children from his 1st marriage become step-children of his 2nd wife. Yet those children were looked after by their own mother's aunts, so the 2nd wife never (as far as we know) had any particular relationship with the first 5. They are thus step-children by name only and not part of any living family with the 2nd wife.

When he married for the 3rd time, the 3 surviving children from the 2nd marriage obviously became step-children of the 3rd wife - but also she acted in place of their mother. This is a stronger relationship than that between those of the 1st marriage and the 2nd wife, and it's not just about residence.

So far as I can see, GEDCOM has no specific way of distinguishing these 2 cases and I think it should, because at the moment I'm simply recording this in notes - fine for me but someone might miss reading it.

Note that I think that I'd like to see the children of the 2nd marriage appear in TWO families, while the children of the 1st marriage appear in just one.

There's also the case of informal adoptions - while there's an adoption event in GEDCOM, that's not appropriate if it's just an informal arrangement. The informally adopted need to appear in the informally adopting family, clearly distinguished by more than just a note. And clearly distinguished from a birth child.

And we can have the case of a child born to biological parents who never lives with them - again, arguably we should be able to distinguish.

No doubt there are other combinations, e.g. children who start out fostered and then become adopted. What roles (if that's what we're doing) do we give them in the family?
AdrianB38 2011-01-02T14:05:10-08:00
As a sort of extension to the above , I've just added this to the "GEDCOM messes up" bit:
"GEDCOM has no ability to provide citations and sources for why a child is believed to be in a particular relationship with its (birth or whatever) parents. The citations and sources are either for a family as a whole or for individual birth (or whatever) events that only mention the child"

Innumerable times I discover sources that individually or together show that the parents of X are Y and Z. Yet where do I put the citation to those sources?

Against the birth fact of X? Probably the best of a bad job - but the birth fact doesn't mention Y and Z in current GEDCOM.

Against the family? Tempting but then how is it clear which of several children this is a source for?
GeneJ 2011-01-02T14:07:39-08:00
Adrian,

When I finish some graphics work, I'll open TMG and report on the sourcing methodology for relationships available in that program. --GJ
GeneJ 2011-01-02T23:28:48-08:00
Hi Adrian:

TMG has a source and memo field for the relationship tag, but the output in reports is not as clear as users would like.

The source field is there, so that, for example, separate from the birth tag, there is a place to add the source for parent-child. That same relationship source field can be used to document step-parent-child relationships.

Here's the link to a 19 Nov 2009 TMG discussion that mentions those relationship sources and the more problematic presentation of them:

http://newsarch.rootsweb.ancestry.com/th/read/TMG/2009-11/1258645458
NeilJohnParker 2011-11-15T21:43:01-08:00
I am not sure what is causing this problem, it may be becasue GEDCOM only allows two relationship between a child and its parents, one to the father and one to the mother within one family. This does not represent the real world. This relationship should be many to many to one or more parents (male and/or female) with a type being of: biological, adopted, informaly adopted, step, guardian or unknown.

Perhaps this is time to raise another issue? Why is GEDCOM based on an abstract concept of family with husband/father, wife/mother and child all linked to family? Is not a more natural and correct data base design to link husband/father and wife/mother together via a "marriage" event and then to link child to father and child to mother by a "birth" event. Marriage relationship can be of type: marriage, common law marriage, cohabitating, affair, same sex, unknown etc. Birth relationship can be of type biological, adoption, informal adoption, step, gardian, or unknown. Note that the "marriage" relationship is like a family in its initial state preembro state, no pun intended. The family as used in GEDCOM can easily be derived from these two fundamental entity that are self referential to person (a concrete entity). Does any know how most modern good quality Genealogy Software handle this issue and why GEDCOM chose not to?
AdrianB38 2011-11-16T08:08:35-08:00
Neil - totally agree with you!

(If we're asking why things are, would it be rude of me to ask why GEDCOM doesn't cover polygamy when it was created by an agency of the Mormon Church? No, I don't expect an answer!)
GeneJ 2011-01-04T22:44:01-08:00
GEDCOM's QUAY; comments/feedback
Posting this here; will comment separately.

From, "The GEDCOM Standard: Release 5.5," 2 Jan 1995: [quoting]

CERTAINTY_ASSESSMENT:= {Size=1:1} [0|1|2|3] The QUAY tag's value conveys the submitter's quantitative evaluation of the credibility of a piece of information, based upon its supporting evidence. Some systems use this feature to rank multiple
conflicting opinions for display of most likely information first. It is not intended to eliminate the receiver's need to evaluate the evidence for themselves.

0= Unreliable evidence or estimated data
1 = Questionable reliability of evidence (interviews, census, oral genealogies, or potential for bias for example, an autobiography)
2 = Secondary evidence, data officially recorded sometime after event
3 = Direct and primary evidence used, or by dominance of the evidence
louiskessler 2011-01-04T23:06:07-08:00
I think it's perfect as it is. Simple and understandable and easy to interpret and easy to figure out (without much interpretation) which QUAY is the correct one.

That latter point is the most important.

Most programs let you record QUAY, and I've found it in lots of GEDCOMs.

Louis
GeneJ 2011-01-04T23:06:43-08:00
I've not used QUAY related features in genealogy software.

Hope those that do use this system will comment.

I have a hard time relating QUAY to the Genealogical Proof Standard (GPS) and the related evidence process map earlier discussed on the BetterGECOM's wiki.

I work towards a reasonably exhaustive search, and separate from direct evidence, look for conflicts, indirect evidence, negative evidence ... These approaches have also been examined in BetterGEDCOM discussions.

Item 3, "dominance of the evidence" would be considered out of date by current standards in the US (specific change in 1997).
See http://www.bcgcertification.org/resources/prepond.html for "BCG Abandons the Term "Preponderance of Evidence."
AdrianB38 2011-01-05T07:28:45-08:00
I use QUAY but only for values 2 and 3, for the reasons I indicated on the main page. Unlike Louis, I do not find it easy to decide what value of QUAY I should use. Oh, I can easily make my own mind up on a system, but that rather destroys the point of interchanging data.

I need to explain, I guess. Look up any definition of "Primary" (QUAY = 3) in articles relating to genealogy. Now, what's the definition of Secondary (QUAY = 2)? In everything I've read, Secondary is ANYTHING that isn't Primary. So a source, relative to an event or attribute is _either_ Primary (QUAY 3) or it's Secondary (QUAY 2). _Nothing_, according to those definitions, can be QUAY 1 or 0.

Does that mean no fact can be estimated or questionable? Of course it doesn't. That's because the topic of the likelihood of the source being true in relation to that fact is something entirely separate from whether it's primary or not.

Let's take the issue of my granddad's birth date. I have a birth certificate issued 3w after the event, by a neutral party, that says he was born on the 8th. The birth certificate satisfies all reasonable definitions for being a primary source for his birth, so I've entered it as that (QUAY 3). _However_, there is a strong family tradition that his father thought he was over the legal time-limit for registering the birth so added 2 days on to bring his son (just) inside the legal limit again. Therefore I could say this certificate was Questionable (QUAY 1) - but then I've destroyed any code value that says this is a Primary source. (Even if I decided that 3w old certificates were actually only Secondary, I would still be destroying the code value that said it was Secondary in favour of one that said it was Questionable).

Note that QUAY 3 says Primary, it doesn't say "Primary and as certain as things ever get".

So, the current values of QUAY are trying to do two things at once - and doing one means it can't do the other.

Of course Louis can stack up the 4 codes in his mind as increasing grades of likelihood, and it's a worthwhile thing to do. But that's not what the GEDCOM Standard says.

This is definitely a case of the GEDCOM Standard being ambiguous (hmm - I did ask if anyone could demonstrate where it was ambiguous) because it is not clear what QUAY value should be assigned to a Primary source of Questionable accuracy. We have 2 ways in that Standard to assign such a QUAY value and, based on the maxim that we should not have that in BG, we need to put something different forward.

There are several possible ways forward - I would recommend we treat each factor as a different item and capture those values, e.g.:
- Primary or Secondary?
- Original or derivative (e.g. paper or scan)?
- direct or indirect?
- qualitative degree of likelihood? (NB - I can easily slip into rant mode at this point about GEDCOM saying QUAY is a "quantitative evaluation" - expressing a qualitative evaluation as a number does NOT make it quantitative!)

I'm not sure where an estimate based on other events goes. Or whether I've got all the subtly different dimensions that we need.
GeneJ 2011-01-09T00:12:09-08:00
@Louis, I too see many folks using QUAY, and it's in a lot of GEDCOMs that exist.

Since at least part of the description are out of date, I'm wondering if the various genealogy programs even use the same definitions. If the programs don't apply the identical definitions, we would have a more complex problem.

Indeed, we'd create the same problem if we opted to just assigned new definitions to 0, 1, 2, 3.

Maybe this should be thought through in terms of the overall sourcing methodology of BetterGEDCOM.
testuser42 2011-01-09T06:14:16-08:00
I agree. The QUAY as defined in GEDCOM mixes two or more seperate things. The four points given by Adrian would cover everything needed that I can think of. Seperating the different attributes that way would probably make sourcing more understandable and accurate.

About "estimates based on other events" -
Would it be a problem to use the new "surety" indicator on estimates (=conclusions)?
The first three points (primary/secondary; original/derivative; direct/indirect) don't apply to the conclusions you make, do they?

Gene's thread about qualifiers touches this, too, I think. http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138
EssyGreen 2011-11-12T02:42:20-08:00
Copyright
There should be some way of specifying copyright ownership for media items (so that applications can do different things based on media with a non-null value here e.g. no syncing or publishing to internet etc).

Similarly it would be useful to have a "courtesy of" field to acknowledge contributions, photos etc from others.

For ease of editing, these should be a single link to the details of a contributor/researcher/repository/submitter with a flag or field indicating the type of ownership (copyright, 'courtesy of' or somesuch).
EssyGreen 2011-11-19T22:46:30-08:00
Ooops - I was replying to hrworth on previous page in my last post. Apologies for confusion.
EssyGreen 2011-11-19T22:52:09-08:00
Re twetmore's point "Why tease them?" ... I think this is the difference between private and copyright ... private = hide completely so recipient never knows it is there (even the title could be a giveaway e.g. "Adoption papers for XYZ"). Copyright = show I have the thing (proof/evidence) but don't actually transfer it.
ttwetmore 2011-11-20T02:37:19-08:00
""Re twetmore's point "Why tease them?" ... I think this is the difference between private and copyright ... private = hide completely so recipient never knows it is there (even the title could be a giveaway e.g. "Adoption papers for XYZ"). Copyright = show I have the thing (proof/evidence) but don't actually transfer it.""

I assume that if a copyrighted work is used as a source of evidence, that there will be a source record for that work and that the evidence records (if used) and conclusion records will refer to that source record. There are no copyright issues regarding having a source record about a copyrighted work.

What is the importance of letting the receiver know, "I have my own, quasi-legal copy of that source?" Does it add legitimacy to the evidence by letting the receiver know that we probably really referred to its source?

In summary I don't see any value in letting the receiver know that I have my own personal copy of a source. But it's hardly worth the effort of worrying about. If BG wants to put a tag in source records that means that a full copy of the source is in the possession of the sender of the data, I sure don't see any harm in that. But of course, as soon as the receiver of the data sends it someone else, we've basically lost track of who really does have the copy of the source, so the efficacy of the information seems suspect. Or the tag could be taken to simply mean, "the source was really checked!"
EssyGreen 2011-11-21T00:33:46-08:00
I see your point re the source documents but I still think there is a need for the flag and message so that the *sending* system can identify what to sensibly filter out and how.

For example, say I'm uploading a file to Ancestry ... I would want all copyright media to be ignored (not transferred) and all private records (of any type) to be ignored or blanked out.

If I were publishing to my own web site with restricted access then I might want to include photos from other researchers who have allowed this but with a courtesy message. Similarly I would want to include copyright media which I have permission to reproduce in this way (say a photo from a professional photographer or an old postcard or whatever).

If I were say publishing a PDF for my own interest or to show to a family member etc then I might want to include say a birth cert or census form as an illustration/for interest/as further evidence but it would be nice to have the copyright message so they don't go and reproduce it unknowingly (of course we can never prevent willful reproduction).

I guess what I'm trying to say is that I see copyright as protection/courtesy for the originator whereas privacy is protection/courtesy for the subject.

As an extra complication ... I would treat a transcription as "media" even tho' it isn't - I guess I think of the GEDCOM SOUR.TEXT is in effect a short-hand way of re-producing a text document and as such it would have/need the same privacy and copyright fields. However, I concede that this could be done by the user anyway instead of using SOUR.TEXT - I just wish it were made simple/encouraged in genealogy apps.
ttwetmore 2011-11-21T05:15:51-08:00
So private and copyright can be handled by tags, as I've shown in examples. The only decision is at what level do these tags apply: at the record level, or at substructure (information within records) level, and if at the substructure level, at which ones. In your examples, for copyright, it seems you are thinking mostly about media records, mostly images, so applying the tag at the record level seems easiest. But when it comes to privacy, it seems we're mostly talking about persons. Then the question becomes do you privacy at the person level, or a more deeper level, where you can transfer info about a person, but just keep some of it back?

Note that many privacy concerns are better handled by rules enforced by the application, not by tags in the data. For example, one rule could be, don't export any persons who were born in the last 80 years that don't have a death event. This will catch all living persons (with birth dates), but will miss dead people whose death data you haven't entered. I don't think you would want to go through your database every month removing or adding privacy tags based on the passage of time or the adding of death records.

Courtesy of sounds like part of a figure caption, but if you want a tag, put it in.

Transcriptions can be all kinds of things. They can be put in note structures, put in note records so more than one other object to refer to the same note. Or they can be a perfectly normal media object with mime type text. In Louis's model, where the evidence record is key, they can be a structure within the evidence record, forming the "original" from which all the extracted info in the evidence record comes from.
EssyGreen 2011-11-21T22:54:41-08:00
"many privacy concerns are better handled by rules enforced by the application, not by tags in the data"

I agree but the application needs to have the data (tags) in order to be able to make sensible business rules. If the data isn't available then the application will either be forced to create non-standard tags to do the job or force the user to manually select the data (and like you said in your example that isn't feasible).

Whilst strict privacy rules can be determined from a calculation based on whether the person is living or not, the more subtle rules cannot. Some sources (whether in media form or transcription) may be sensitive at a more general level (criminal records, illegitimacy documents etc) so a flag to "hide" these is necessary.



"Courtesy of sounds like part of a figure caption" - true and that is how the user is forced to enter it at the moment. However, if it were possible to link the media to an "owner" (aka Repository/Subitter) then an default message could be constructed automatically and auto-updated when the contact details changed e.g. Owner="National Archives", Copyright="yes" could produce "copyright, 2011, National Archives" as a default piece of text which would always be shown regardless of the caption. I believe this would help raise awareness of copyright issues and provide the data for software to better handle them.



"Transcriptions can be all kinds of things." - Although transcription *can* be put anywhere, allowing a field to be marked as a transcription (and handled as a media object) would enable the same privacy rules to be applied to it as to other media.

Re NOTEs - I personally believe that one of the failings of the existing GEDCOM standard is the total ambiguity of NOTEs. The software cannot determine anything from the text so cannot handle different types of "NOTE" differently. This makes for great flexibility but a total loss of meaning and hence the inability to create sensible business logic.
EssyGreen 2011-11-21T23:07:36-08:00
Just realised I didn't actually answer your question very clearly - duh!

"do you privacy at the person level, or a more deeper level, where you can transfer info about a person, but just keep some of it back?"

I think a privacy flag is needed for all major entities:

- person
- relationship (FAMily/ASSOciation)
- fact (life event)
- media
- note
- source
- repository/submitter (personally I see these both as a "contact" or a special type of "master source" - could be an organisation or a researcher and/or an individual in the tree but that's a whole different debate)

It could be taken even deeper (e.g. citation) but personally I think this would get too fiddly.
ttwetmore 2011-11-22T01:03:26-08:00
So the questions become, what are the entities in a genealogical data model that can have copyright and privacy tags, and what forms will those tags have.

Copyright. I can't see the basic genealogical data objects ever needing a copyright tag -- that is, persons, events, places, and any of the links or relationships between them. The only thing I can think of that would need a copyright tag would be a supporting "media" record that might hold an image file, a PDF file, a text file, a sound file, a movie file, and so on. In the DeadEnds model nearly all objects can refer to media records. So would giving media records a copyright and owner attribute take care of the copyright issue.

Privacy. In genealogical data models there are entities, relationships between entities, and attributes making up the entities. Where can privacy tags be applied? If a record is tagged private then that record cannot be exported. If a relationship is marked private, then the object "at the other end" of the relationship cannot be exported. And if an attribute within a record is marked private, then that attribute cannot be exported. It seems like that would cover just about all bases. The DeadEnds model already allows this approach, since in the DeadEnds world a privacy tag is just another attribute, and the model allows attributes to appear anywhere.

Of course, there is the issue of the type of privacy. That is, you want to archive your data for backup, keeping everything. To share with the public everything marked private should probably not be exported. But are there examples (we're supposed to use the term "use cases" these days) where you would want to share some of the private data, but hold back some. In other words, a simple privacy tag is not really sufficient for this granularity.

So privacy should probably be an attribute whose values come from some small enumerated set, e.g., {full, family, research}, for private to all, private to all except family, and private to all except other researchers.

And, where do the rules for hiding living people when exporting data come from? In my last post I mentioned that it is much easier for the application to apply these rules than it is for the user to be continually adjusting privacy tags and time passes and people die.

Your note about notes is important. In my old LifeLines program, which uses GEDCOM as its database, but allows users to create their own tags at any point, I have had to introduce conventions for notes. For example, I only put complete sentences in NOTE lines, and have some report programs that when generating biographies will catenate together all the notes found in a person/INDI record. So if I want to add a note to a record, that is really just a comment about something, not something that should be part of a real biography, I use the tag INFO for those. So an INFO tag is really analogous to a comment in a computer programming language. When I am recording, say a date or a place, I generally want the date and place value to be standardized so indexing, searching, and other computation software can operate on them, but sometimes I don't want to loose the exact wording of the date or place as found in the evidence. So I will transcribe the original text and put that in a TEXT line. So I already have to distinguish between three types of notes -- biographical notes, comments, and transliterations. An example of the latter might be:

1 BIRT
  2 DATE 18 December 1949
    3 TEXT XII 18, '49
    3 INFO It looks like the birth date was pencilled in after the birth.
  2 PLAC New London, New London County, Connecticut, United States
    3 TEXT New London, Conn.
  2 NOTE He was born in the evening at the Lawrence and Memorial Hospital.
  2 SOUR @S3@    <<-- Reference to birth certificate
EssyGreen 2011-11-22T23:23:25-08:00
Copyright - agreed with one exception: Transcriptions - If BetterGEDCOM is to have a "Source From Text" or equivalent entity/attribute/property then it also needs to be included there.

Privacy:

"If a relationship is marked private, then the object "at the other end" of the relationship cannot be exported." Not necessarily. Say person A has an affair with Person B or Person X is the illegitimate child of Person Y. Neither Person A nor Person B is private but their relationship entity *is* private. Similarly, if Person A had committed a crime ("fact/event") then Person A can be exported but the fact entity cannot. And again if there is a private "Note" against a fact entity then the fact may be exportable but not the Note entity.

"if an attribute within a record is marked private, then that attribute cannot be exported" - I wouldn't allow a privacy flag on an attribute cos this would get way too fiddly.

I like the idea of an enum for privacy but I'm not sure it would be any more useful than a simple flag because Privacy only comes into play until the user is exporting or publishing and it maybe easier to simply ask the user at that time whether to include Private stuff or not. If they are uploading to say Ancestry they would (hopefully!) say "No", if they were printing for self or family they would say "Yes". If they were publishing to Aunt Molly and didn't want her to see the fact that her mother was adopted then they would need to "Review" anything with a Privacy flag set and temporarily set/unset various item(s) for export.

Re NOTEs - personally I think that the current NOTE tag is more of a DataType = Text/String than a definition of context and would abandon it in favour more specific contexts e.g. [fact] Narrative, [source] Transcription, [person] Research Notes, [repository] Other info. etc .... I am sure there would be loads more we could think of including those you have defined. Some would give up and say "let's just have a generic Note" but I believe there is much value in thinking through as much as possible. Maybe we should have a separate thread on this? Or is there one already?
EssyGreen 2011-11-22T23:32:53-08:00
Side note on media: Media always comes from somewhere (even if the somewhere is the user/researcher). Ergo a media item is simply a sub-entity of a Source entity.
ttwetmore 2011-11-23T06:31:47-08:00
EssyGreen,

I think I am in agreement with just about everything you've just said.

I have a more generic view on attributes than you, so the fact that someone might have committed a crime I might consider an attribute, but that's quibbling.

I'm glad you don't want an enumerated tag for privacy, but we must therefore realize the burden that that will put on genealogical software, as you so well pointed out, to be able to fine tune the exports for different purposes.

And I agree with your points about notes -- notes, as currently used, have many purposes, and people like you and me, who like to add ancillary information to our databases, get a little burned by how to do it. Thus my use of NOTE, INFO and TEXT to mean different types of things. Other people use notes IN SPECIFIC PLACES to mean special things. That works too, if your program gives you the ability to do it.

I might quibble a tiny bit about your idea of a media record being a source record. I would think that in most cases a media record would have a source reference pointing off to the actual source where the media item came from. That is, I would expect that most image files came from some source that contained the image.
EssyGreen 2011-11-24T00:50:22-08:00
Oh dear - now we're in agreement I will miss my morning debate :)
I've really enjoyed having an in-depth discussion about the issues here. It's such a relief to find some like-minded people who are not afraid to challenge the status quo!
ttwetmore 2011-11-17T14:04:07-08:00
Issues of copyright and privacy are orthogonal.

Privacy can be handled by tags in the data and rules in the application.
EssyGreen 2011-11-17T23:52:50-08:00
Eeesh! Doesn't it get complicated!

I still maintain (maybe *because* of the complexities) that BetterGEDCOM should aim to keep it simple. I believe there *is* a need for some form of copyright flag so that software can not share/publish/upload etc if the flag is set. It must be up to the user to determine when they set this flag.

Ditto the copyright/courtesy message.

Although privacy can be handled elsewhere in the application I would vote for a similar flag on notes, media items and sources ... e.g. It's all very well to privatise the events and facts of an individual but what if the source itself contains information which could breach privacy or embarrass (e.g. details of adoption, illegitimacy, crime etc which relate to near relatives or living people)? I have seen people uploading birth certs and all sorts on Ancestry without a care in the world for the possible implications. BetterGEDCOM can't prevent this but we can at least provide the ability for the software and/or user to be more discreet/responsible.
EssyGreen 2011-11-17T23:54:16-08:00
PS: I meant "near relatives OF living people" (not or) in my last post.
ttwetmore 2011-11-18T12:44:24-08:00
I am curious to know what kind of information would need a copyright tag when being placed in a genealogical database.

I do use copyrighted works as sources, but don't see any issue there, as I just link to them. I may refer to links on the web, and if the link points to something copyrighted then it is up to that material to say so.

I may extract birth date or event data or a notes from a copyrighted work, but that all constitutes fair use and doesn't require special handling. Even if I quote large tracts directly, as long as I state the source, I don't think that needs any special handling; certainly I would not worry about it.

Are there any good examples where any of you see this to be a real issue? Can't we just forget about it?
GeneJ 2011-11-18T13:04:58-08:00
@Tom,

I tried to provide some examples above:

I might want to restrict the transfer of ... E-mails, images, letters or lengthy excerpts otherwise subject to third party rights that I have copied/transcribed in whole to [a tag or source memo/field or into ...] my BetterGEDCOM research log or attached as multimedia items.
AdrianB38 2011-11-18T14:24:21-08:00
"I am curious to know what kind of information would need a copyright tag when being placed in a genealogical database."

Tom - with me it's most likely to be images - e.g. photos of people that I've been sent where they'll either be in copyright still or common courtesy requires I acknowledge the source. Especially in the case of the first, I really should not be passing copies on to all and sundry. Also, images of ships or buildings - I try to find "open source" photos but it isn't always possible and I do try to download the image to my PC and not rely on a link that may disappear. My private files are pictorially richer than my published ones.

Private mails and documents strike me as the sort of source that I copy unchanged into my database but would not want to have copied out - they could be marked as "private" (assuming we have that mark) but that's not quite right - it seems fair use to quote a sentence from a "private" (i.e. copyright) mail unless otherwise asked, whereas if it were a "please don't tell" private mail, then we shouldn't even do that - so 2 different concepts there.
ttwetmore 2011-11-18T14:37:18-08:00
The problem to solve seems to be this -- when you export your data there is some information you want to hold back and not share. It seems we can invent many complicated ways to do this, or just have a private flag to cover the information not to be shared. Do we, in our own databases, require a complex taxonomy to cover different kinds of copyrights? We should be extracting evidence, all fair use of copyrighted material, and a few scanned pages probably are in that category either. If some of us truly want complete copies of whole copyrighted books loaded into our databases, I don't see any reason to make our data models worry about such things. Presumably when you export your data, those books wouldn't be part of the transported data.
hrworth 2011-11-18T15:46:54-08:00
Tom,

I think the best example is a picture from the Find-A-Grave website. That photo is copyrighted. I can use that image in my database. So far, no issue.

As you pointed out earlier, the Application must give the user the ability to mark that image so that it can't be shared.

Here is where, I think, BetterGEDCOM has a role to play.

"I have an image that supports my Citation for this event / fact but it will not be shared".

If the application has the flag for the End User and the End User shares that file, the Image shouldn't be sent by the application, but BetterGEDCOM needs to carry the "message" that the image is under copyright.

The receiving application takes that flag (no image included) and presents some sort of message to the receiving end user.

Russ
ttwetmore 2011-11-18T16:59:08-08:00
I agree except for the final point. If info is private I would think it better if the receiving user never knew it was there.

If an image were an image of the evidence that is cited, the evidence can be use as well in another way.

A private tag is all that is needed, which, I believe, is the same as your idea of a "message." If we want to give the tag a value for the reason the info is private, this is fine, and copyrighted material could be one of those values.
GeneJ 2011-11-18T17:16:30-08:00
@Tom,

You wrote, "A private tag is all that is needed.."

I'm not quite following your use of the word "private tag." Are you using "tag" to mean the same as pfact?

Say I record a pfact for someone's 1992 death, and when I create the source or citation, depending on the program, I attach a scanned image of the obituary to that source or citation. That obituary is subject to copyright. If I am sharing, I'd want the tag and the source or citation to report, but I would not want the image of the obituary to be shared.

The same thing would be true if, rather than the image, I transcribed in full that same 1992 obituary into GEDCOM's assertion level TEXT field or into BetterGEDCOM's research log.
ttwetmore 2011-11-18T21:32:08-08:00
Assume you have a scanned image of an obituary in a media record with id m123. Then BG using XML syntax could refer to that scanned image via ...

<person ...>
  ...
  <death ...>
    <date ...>
    <place ...>
    ....
    <obituary><mediaRef id="m123" tag="copyrighted"/></obituary>
    ...
  </death>
  ...
</person>

Alternatively, the tag could be in the media record itself:

<media id="m123" mimetype="...." tag="copyrighted">
  ... the mimed data for this image or a reference to an image file or a reference to a web URI/L with the image ...
</media>

In your second case you have transcribed the obituary into text, so you could ...

<obituary tag="copyrighted">
  ... the full text of the obituary ...
</obituary>

Or if you prefer, copyrighted could be the attribute with values yes and no:

<media id="m123" mimetype="...." copyrighted="yes">

Ditto for private ..., e.g.,

....
    <email private="yes">
      <to> ... </to>
      <from> ... </from>
      <subject> ... </subject>
      <text> ... </text>
    </email>
...

Stuff with private or copyrighted tags would obey rules enforced by the application when generating Better GEDCOM -- e.g., Better GEDCOM export files intended as archives of your own data would include everything with those tags; Better GEDCOM export files intended for the general public would obey another set of rules implemented by the application; Better GEDCOM export files intended for persons working on the same line would have another set of rules enforced by the application. In no cases, when rules omitted stuff from the export files, would there be any need to put in fillers to let the receivers know they are missing stuff. If they're not supposed to see it, there's no reason to let them know it exists. Why tease them?
EssyGreen 2011-11-19T22:44:55-08:00
Totally agree but personally I would prefer the sending system to have the ability to provide the "some sort of message" rather than the receiving system trying to guess it - hence the copyright/courtesy message field.
WesleyJohnston 2011-11-12T08:33:37-08:00
This really opens a huge can of worms. When I wrote an art history book, I had to research copyright issues on old images. (Recent images are a very different story.) The laws in different countries are VERY different about old images (let's say 1800's and prior as the meaning of "old").

One of the really surprising revelations was that all the art museums in the United States that are claiming copyright on images of old art works are in fact not only wrong but are themselves violating the copyright law by making a false claim.

Things are VERY different in France or England or other countries. But this issue of it being a violation of the copyright law to claim a copyright to something that is actually in the public domain in America is something that I doubt many are aware of.

I am not sure what American law says about old family photographs or old photographs of a church or some other place. But an assumption that a claim of copyright makes publishing to the internet a violation could be a faulty assumption. It really is a complex tangle.
GeneJ 2011-11-12T09:30:23-08:00
"Rights" is a field typically found in most of the standardized metadata I have been reviewing as part of the more general work on BetterGEDCOM.

There are limitations.

If you search the wiki for "Copyright" you'll find some related postings. A few links follow.

On this page "Shortcomings of GEDCOM," see the discussion link below in "Security / Confidentiality / Privacy."
http://bettergedcom.wikispaces.com/message/view/Shortcomings+of+GEDCOM/31908183#31913613

See the home page discussion "Automatic Combination of Records," for the discussion...
http://bettergedcom.wikispaces.com/message/view/home/29969583?o=20#30552489

See the discussion on "Shortcomings of GEDCOM," called, "Looking at GEDCOM from a distance.
http://bettergedcom.wikispaces.com/message/view/Shortcomings+of+GEDCOM/31763691?o=40#31926221

See the Requirements Catalog discussion, "Admin12 - Support Privacy Settings."
http://bettergedcom.wikispaces.com/message/view/Better+GEDCOM+Requirements+Catalog/40806995#40817641

See the Requirements Catalog discussion, "Data09 - Collections of source data."
http://bettergedcom.wikispaces.com/message/view/Better+GEDCOM+Requirements+Catalog/38778280

See the page, Multimedia File Inclusion Issues
http://bettergedcom.wikispaces.com/Multimedia+File+Inclusion+Issues

Hope this helps.--GJ
EssyGreen 2011-11-13T23:08:32-08:00
I agree it is complex and am not suggesting that BetterGEDCOM should investigate the whole can of worms. It is up to the user to ensure they don't breach copyright but the software/standard should provide them with the tools to do this. A simple flag as to whether a media item is thought to be copyright protected would be a real benefit to enable filtering of such items in reports, published books and web sites etc.

This would hopefully discourage the type of blatant disregard for copyright recently demonstrated by FTMs Tree-Sync which uploads all your media to Ancestry.
ttwetmore 2011-11-14T02:35:07-08:00
I agree with EssyGreen. Keep it as simple.
AdrianB38 2011-11-15T09:05:47-08:00
Yes - I'd advocate 2 big text strings, one to contain a note on copyright information, the other to contain a note on reproduction rights. Both serve, as suggested, to make the user think. Actually - make that 4 items - oh dear, going up already! One text string for copyright info and, because it could say "Out of copyright in all known jurisdictions", you also need a simple flag to say "Copyright action needed yes / no". Then ditto for "reproduction rights".

You need Copyright and reproduction rights separately because - at least in the UK - stuff may be out of copyright (or never in) but you may have signed up to a no-reproduction agreement when downloading it.
EssyGreen 2011-11-16T00:06:45-08:00
I don't think most users (self included!) would understand these distinctions and it would confuse rather than help. I would keep it to 2 fields: "Copyright status" and "Copyright/courtesy message".

The Status field could have recommended preset values e.g. ("Copyright protected", "Not copyright protected", "Unknown") along with free text for other if necessary. When uploading/reporting the software could then have options to include/exclude any of those values (tho' this would obviously be up to the software producer).
AdrianB38 2011-11-16T08:23:53-08:00
Essy - while the 2 concepts are not generally understood, it's attempts to work only with copyright that actually cause confusion. For instance, we get people saying (correctly) that an 1881 census form is not in copyright (if it ever was) so "Ancestry can't stop me publishing it on the internet" - not true, read the Ts and Cs that people sign up to, the ones that talk about rights of reproduction. By all means, work with just the copyright text if you like, but I'd like both concepts to appear.
EssyGreen 2011-11-16T23:22:27-08:00
Fair enough ... for my own benefit, my understanding was that UK Census was Crown copyright. Is this incorrect?
Andy_Hatchett 2011-11-16T23:27:43-08:00
The actual UK Census Forms are each copyrighted by the Crown but the filled in info isn't - that is the tricky part; thus you can copy any info on the form but can't copy any part of the form itself- unless things have changed recently.
AdrianB38 2011-11-17T08:46:50-08:00
UK Census is indeed Crown copyright. And given the age of the documents, I think (I am not a lawyer caveats, etc) that means the completed forms are effectively out of copyright because they're all over X years old.

However, what also applies, as I indicated, is that when anyone signs up to Ancestry / FindMyPast etc, (I imagine that they all say the same) the terms and conditions say "No reproduction" or "No reproduction without consent of The National Archives", etc. What this does is to protect the investment of Ancestry / FMP / etc, that they put into digitally imaging the microfilms, indexing them, building the site, etc.

Some people suggest that the images of the paper forms, being new, are in copyright while the paper forms themselves, being old, are out of copyright. My brain starts to ache a bit with that idea but fortunately for my brain, it's not important, as it's the conditions in the Ts and Cs that apply.

Andy, I suspect you're right about the filled in information, certainly. Probably because there's no artistic effort that has gone into the forms, you can copy the information and type it on a web-site as plain text - information is not copyrightable. At least, not in the UK. But again, you couldn't copy UK census information less than 100y old and put it on the internet, NOT because of copyright issues, but because the UK's rules set 100y as the census release limit. That would be the effective rule if you happened to have access to (say) our 1921 census. (Yes, it's 100y in the UK, not 72...)
GeneJ 2011-11-17T10:04:13-08:00
So, what are the requirements in terms of functionality.

If I'm sharing private me to private me, well, I likely "want it all." If I'm sharing private me to something other than private me (say a shared or public website--something I consider "publishing"), then I prefer some way of restricting transfer of some materials.

I might want to restrict the transfer of ... E-mails, images, letters or lengthy excerpts otherwise subject to third party rights that I have copied/transcribed in whole to my BetterGEDCOM research log or attached as multimedia items.

(The same would hold if those items were represented otherwise as "evidence" items.)