BetterGedcom

DearMYRTLE 2010-11-01T15:11:11-07:00

Non-Technie Viewpoints

This is a GOOD PLACE for the non-techies in the world of genealogy to voice their frustrations with that they are experiencing when importing and exporting genealogy data in their genealogy management program.

DearMYRTLE 2010-11-16T13:33:47-08:00

The following appeared first in my blog entry located at: http://blog.dearmyrtle.com/2010/11/saving-compiled-genealogies-for-future.html

DearREADERS,
Using computers to organize the family history I'm compiling has been a high priority for Ol' Myrt. I still haven't thrown away the family group sheets my real Gramma Myrtle created back in the late 1950s. Her work served as the source for much of my original data. Then, out of sheer curiosity, I branched out and began ordering birth, marriage and death records. Then I moved on to searching US federal census records the old-fashioned way -- on microfilm at the National Archives in Washington, DC, in the days before reliable census indexes. Now, online availability of documents, created at the time my ancestors lived, has greatly enhanced my research without requiring extensive on-site work, except at those localities where the records aren't yet filmed or digitized.

I used to keep everything in binders -- and still do for research.

But with the traveling I am now able to do since retiring from teaching at the local Vo-Tech, I realize that I am often separated from my own genealogy "paperwork", thus retarding my research in progress.

Scanning source documents and attaching them to ancestors in my genealogy management database is a workable solution, particularly when I began using DropBox as the location for my files. Then as soon as I arrive in Virginia and turn on this computer, all the data entry and scanned image attachments I completed at the Salt Lake City home are available.

SHARING WITHOUT DATA LOSS

Peer-to-peer data sharing is vital among genealogists, as is sharing of data between an individual researcher's genealogy software and a website.

While I do favor the collaboration aspects of posting one's tree on the web, I believe individuals should use a software management program with data residing in a DropBox-type resource until data is "good enough" to share with others. This is important during the process where research doesn't yet prove that individuals and family groups are indeed part of one's tree.

WORKSPACE FOR DEVELOPING PROOF ARGUMENTSAs genealogists move back in time, we typically cannot find a specific document listing parents, and turn to other documents (for instance an ancestor's sibling's documents). "Inferential genealogy" is emerging as the term to describe this type of intermediate to advanced research. Using something like a personal blog with links to personally scanned source documents and web resources while developing the proof argument is something yet to be explored fully by the genealogy community at large.

COLLABORATION & PEER REVIEW
Placing one's family tree on the web provides opportunities for critical peer review, but unfortunately, the transfer of attached source documents is an important element now missing in the creation of online trees. Peer review is an important element of the GPS Genealogical Proof Standard.

The current GEDCOM (genealogy file sharing protocol) model does not provide for the transfer of multi-media files in either the peer-to-peer or the researcher-to-website scenario. One must upload each document individually, reattaching to each ancestor mentioned in the document. This is a cumbersome task, repeating the painstaking work done on each researcher's personal computer. Consequently, few are uploading those essential source documents, in favor or spending more time on original research. This makes it impossible to fairly evaluate the reliability of a researcher's tree. It is no better than reading a genealogy book that lacks source citations. And in this digital, open source age, sharing is what it's all about.

KINSHIP AND COMMUNITY
Centering the tree around an individual and family in family group and pedigree-style layouts does not consider the importance of documenting the role that other members of a community have in the life of an ancestor.

For instance, suppose a non-relative (or not-yet-proved relative) "Mr. A" is noted as witnessing known Smith family land records, acting as an executor on several Smith family wills, appearing as a Godparent or sponsor at the christenings of the children of several Smith family siblings and cousins, and posting marriage bonds for several known Smith family ancestors.

Now suppose the researcher discovers two individuals in the same community with the unfortunate same "John Smith" name. The researcher wishes to determine which is the ancestor he seeks to add to the Smith family genealogy. Often through the indirect evidence that only one of the two "John Smith" ancestral candidates has notations on his land record and a marriage bond witnessed by "Mr. A", do we lean toward that "John Smith". The other John Smith appears to have no part in the "kinship or community" as he transacts no legal or ecclesiastical business with the known Smiths in the area where "Mr. A" has also figured prominently.

THOUGHTS
We've got some awesome genealogy software and website developers who have created valuable tools for recording our research, and discovering more in digital archives of original documents.

From an end-users standpoint, sharing data needs to be seamless. We have a difficult time merely doing the research. Sharing what we've unearthed should not be a problem.

I am confident that the talented genealogy developers will accomplish the task of data sharing standardization. No single commercial entity can force the issue as most certainly their own ideas would win out, and take precedence over another programmer's work. Hence the technical work going on over at BetterGEDCOM.

End-users like Ol' Myrt here and my DearREADERS can speak with their buying dollars and facilitate the changes to improve data sharing by purchasing products that adhere to new standards for genealogy data exchange.

WHAT'S A GENEALOGIST TO DO?
Frankly, it is keep up with the current technology. There is no point in worrying about the longevity or the lifespan of data -- just keep changing and updating it during your active research days. And sometime in the distant future, provide for turning it over to the younger set for them to carry the torch.

And DO place copies of all your genealogy data, images, videos and voice recordings in multiple places. Again, DropBox.com will come in handy there, particularly if you share the folder with others in your family association.

Happy family tree climbing!
Myrt :)
DearMYRTLE,
Your friend in genealogy.

VAURES 2010-12-04T22:54:52-08:00

hrworth wrote
We are hopeful that the various genealogy software / application developers will join this community.

I just want to inform you, that a group of 21 German speaking authors of various programs have joint into a vivid discussion on how to use GEDCOM 5.5.1 correctly.
As a simple user I was surprised about the differences in interpretation the GEDCOM draft.
The goal of this group is to offer a loss-free export-import from one program to a next one.
One problem this group cannot meet is the misuse of tags by the users I often experience when I try to import data from a friend.
They advance tag by tag and discuss the different meanings/interpretations of the tag and than the ADMIN-group of the discussion forms rules on how to use this tag for export and for import and than the participants vote on it.
The voting possibilities are
YES I agree
YES I can live with it
NO I disagree (and will not adapt my program.

In case some one is interested watching the group or even adding to the discussion, everybody is welcome, however the discussion is all in German. So if you wish, let me know

VAURES

AdrianB38 2010-12-05T05:12:25-08:00

It would be useful if someone (German speaking, which lets me out) at least monitored the discussion since it may be that it will give us some ideas about which GEDCOM tags need to be understood by an application program.

To explain by recapping my thoughts - if I create a "Burble" event (a nonsense word that would be a user defined custom extension) and put it on a BetterGEDCOM file for export, then so long as the recipient program loads my file and shows the recipient user that "John Smith went through the burble event in New York in 1901" (or something similar), then the recipient program doesn't need to understand what "burble" means - only show the event type, place, date, etc.

On the other hand if I create a custom event "death in military action" (maybe because I want to analyse my military relatives separately), then if the recipient program doesn't understand that this is nearly the same thing as a "death" event, then any software that allows a report showing (say) "Age at Death" will not show any results for my military guys who were killed in action. That's an example of an event that the software needs to understand to allow it to do other things.

This refers to custom events but it could just as easily apply to "official" events that hadn't got into the software.

If we see which tags get the comment "NO I disagree (and will not adapt my program)" then maybe these are (some of) the ones which require software to _understand_ them.

dsblank 2010-12-05T06:44:00-08:00

VAURES,

I would be interested in the 21 German Programmers results, especially as it applies to Gramps. Do you have a Gramps person on the team? Do you have a published set of tests or results?

I don't speak German, but think that your project is very worthwhile. I hope the idea could spread.

-Doug Blank

PS - "Blank" is German, from the Black Forrest region, across the river from Strasbourg.

DearMYRTLE 2010-12-13T07:37:29-08:00

Adrian,
The custom TAGS appear to be a problem, for at least one software engineer who spoke with me at a recent genealogy conference.

He suggested that initially, coders might just line up all the custom codes from each existing genealogy software program, determine the common definitions of each, and create a "standard" code for each one.

Does this make sense?

louiskessler 2010-12-13T07:52:07-08:00

There is a tradeoff.

Too few tags, and you can't do everything you want to.

Too many tags, and everything becomes confusing and tags will be used for the wrong things.

I think current GEDCOM now leans towards the latter.

I think there should be a core set of tags, and then allowance for "TYPE" attribute to further breakdown a general tag, which GEDCOM now has but is not in common use, e.g.

1 OCCU
2 TYPE Engineer

As far as other languages go, a set of translation tables could be made for the same tags for other languages - but unless this is well thought out, it will quickly grow unmanageable and no programmer will want to support it.

e.g. Dates are mostly 13 JAN 2010, but GEDCOM supports other calendars. Other calendars are tricky to implement correctly and not used by most people so most programs don't bother with them.

Let's make sure the standard does not become too onerous for the programmers.

GeneJ 2010-12-13T08:16:18-08:00

Not sure if this is proper Wiki etiquette, but I've pasted below comments from yesterday on the Shortcomings of GEDCOM discussion.

Hi VAURES

Your example is great.

We need to find the best thinking about the sometimes fine balance between consistency and accuracy.

In my main file (uses "tags"), I have 478 tag types. Now, this is a file I've been working with for many years, and I couldn't possibly keep all those lesser used tag types straight.

I know some folks who like to keep census tag type by year. Here in the US, that means they have separate tags for 1790, 1800, 1810, 1820, 1830, 1840, 1850, 1860 ..... and so on. Many of those folks use sophisticate census entry/display systems, which are dependent on the tag being delineated by year.

I have read the GENTECH materials on this topic, and certainly understand the logic of having a set that is agreed upon. On the other hand....

Might not this be an area when innovation could play a roll? Would it be possible to have plug-ins or modules to accommodate particular* needs or user customization? (Did someone say apps?)

Thinking out loud. --GJ

*Dog pedigree, House genealogy, etc.

re: Is the GEDCOM Standard ambiguous?
P.S.

In the program I use, each "tag" is associated with a default narrative sentence. I'm assuming the tags at issue here are those that would create the life story section of a genealogical narrative, where there are few constructs.

I'll be very interested to see how Louis Kessler's word-processing-in-the-program goes. It sounds like a dream come true. Perhaps he'll chime in on how he'd approach this issue.

Separate from tags, this same issue is apparent in dealing with sources.

Even if we can't solve the 478 tag type issue in this go round with BetterGEDCOM 1, we might be able to tackle this issue with sources.

louiskessler 2010-12-13T08:35:01-08:00

So what is wrong with:

1 CENS
2 TYPE 1790
2 ...

or I've seen programs that output:

1 CENS
2 DATE 1790
2 ....

Both are allowed in GEDCOM, although as I noted, not too many programs have taken advantage of it.

Straight out of the GEDCOM 5.5.1 standard:

"The Lineage-Linked GEDCOM Form uses the TYPE tag to classify its superior tag for the viewer. The value portion given by the TYPE tag is not intended to inform a computer program how to process the data, unless there is a list of standardized or controlled line_value choices given by the definition of the line value in this standard. The difference between an uncontrolled line value and a note value is that displaying systems should always display the type value when they display the data from the associated context. This gives the user flexibility in further defining information in a compatible GEDCOM context and the reader to understand events or facts which have not been classified by a specific tag. For example:
1 EVEN
2 TYPE
"

Basically, we need a set of primary tags, and the subsclassification of those tags can be done by the program.

We can't define everything or think of everything. GEDCOM allows for custom tags because there will always be something needed that is not available. I don't see custom tags as evil. They are used in 2% of the cases and most of the time they are reasonably easy to interpret. They just don't fit into the boxes that most programs have available so most programs ignore them or add them to notes. That is not the problem of GEDCOM. That is is the decision of the programmer to relate that 1% of information into their "less important" category.

My plan right now is for my program is to allow the TYPE tag to be the user customization. It will have a place in the current GEDCOM standard to export it. And most other programs will at least be able to import it into their notes field.

Andy_Hatchett 2010-12-13T09:02:27-08:00

Wait! Stop! Hold *Everything!!

5.5.1 is *NOT* an approved standard. It was- and remains- a draft release only and as such should not serve per se as basis for anything.

louiskessler 2010-12-13T09:47:28-08:00

Andy,

In all practicality, 5.5.1 added some very necessary things to 5.5. Half the 5.5.x GEDCOMs out there claim they are 5.5.1 and most developers consider it a de-facto standard.

Louis

louiskessler 2010-12-13T09:48:15-08:00

... besides. Everything I said also applies to 5.5

AdrianB38 2010-12-13T14:56:40-08:00

Re "The custom TAGS appear to be a problem ... He suggested that initially, coders might just line up all the custom codes from each existing genealogy software program, determine the common definitions of each, and create a 'standard' code for each one"

This is potentially mixing up two different uses of "custom tags", and so there's 2 different answers to your question. (Well, 2 or more)

Use 1 is where the custom tag is recognised in the application program as meaning something. For instance, the software that I use has the concept of "Lists", which are, well, they're just lists of things in my data file (e.g. lists of people in my file with something in common, such as World War 1 soldiers). The program recognises them and allows me to jump from the member in the list to the person in the database.

Such a custom tag could well be usefully analysed by the coders as suggested. (So that's answer 1)

Use 2 is where I create a custom event or a custom property / attribute / characteristic / whatever. Whether or not anyone calls these custom tags I don't know but let's say they do, just in case. Then in that case, I do not want the coders trying to interpret what I've created. I just want them to display the value, date, place, address, note, whatever that I put in, unchanged. I don't _expect_ them to understand that my custom tag for "Book" means that this is the title of a book that someone wrote for 3 reasons:
(1) there's no need - just display the data untouched and let me update it;
(2) there's no need - it doesn't impact on anything else anywhere - no need to (say) validate whether it occurs before or after death because stuff can be published posthumously;
(3) someone else might have used "book" to record when someone was booked for speeding!

So answer 2 is that tags for custom event or a custom property / attribute / characteristic / whatever must not be interpreted by the programmers - just preserved.

Answer 3 (see, I did get 3 answers out of 2 cases!) is that, instead of answer 1, this Wiki team (or its formal successors) should come up with all the definitions that are necessary, thus removing the type 1 custom tags. In reality, we're not going to be clever enough to do that so it would be useful if the coders listed their _own_ custom tags and why they created them. Then the Wiki team can see if they should go forward into the BG standard.

DearMYRTLE 2010-11-01T15:11:52-07:00

Ol' Myrt here is not a coder, nor do I play one on the Internet. However, I do not like the problems I've encountered of late when attempting to transfer perfectly good data between my genealogy program and the one my husband uses. See my blog entry:
http://blog.dearmyrtle.com/2010/08/frustrated-by-gedcom-incompatabilities.html

It took a lot of time and effort to type in the data and sources, attached scanned images and such.

I favor a universally-accepted BetterGEDCOM that does away with the problem of stripping out some data during transfer. It should be seamless.

geni-george 2010-11-09T13:37:19-08:00

This thread will be great for feedback. I'm not sure how Geni can help yet, but we have already discussed internally that we'd like to see what the goals are before trying to figure out what resources we can provide.

DearMYRTLE 2010-11-09T14:20:47-08:00

Thanks, George. I am not sure either. But any website that permits the upload of GEDCOM files should be part of the discussion.

hrworth 2010-11-10T09:23:15-08:00

Like DearMYRTLE - I am not a coder, nor do I play one on the Internet. However ... I do help other users of my Genealogy software program work through issues, learn how to use the program, but I have been using the program for a long time. For this discussion the program is NOT the issue, it's how WE share our research between researchers.

There are user discussions on this sharing problem all over the Internet. Between computer based applications, and between web sites and computer basted applications.

This Wiki was created to help us, users, of what every application we are using, to make our needs known, then engage the Techies, like Greg, to help address this sharing of information.

We are hopeful that the various genealogy software / application developers will join this community.

Bottom line, we do want to share our information with others. Controlling What is shared going out, What is shared coming in, without loosing pieces of our research.

The best example is sharing a part of a file, only to have all of the source-citation material not received at the other end.

Thank you,

Russ

DearMYRTLE 2010-11-01T15:15:47-07:00

Techie viewpoints

This is space for the more technical genealogists to voice their opinions and suggestions.

This might include but is not limited to power users of genealogy software, genealogy software developers and genealogy website developers.

geni-george 2010-11-10T16:29:27-08:00

Geni is the same, our service allows you to create profiles, to add documents, and to create a relationship between a profile and a document.

Many profiles can be "related" to a document, and many documents can be "related" to a profile.

I can't think of any other way for this to work.

GeneJ 2010-11-10T23:36:24-08:00

Not a techie, but a user. TMG (The Master Genealogist) will allow you to enter data, but then go back and add sources and create citations. Unfortunately, most of the sources I enter in TMG look great when developed within that program, but when filtered through a GEDCOM, the data at the other end looks l bit more like a mangled serving of spaghetti.

GeneJ 2010-11-10T23:36:25-08:00

hrworth 2010-11-11T06:24:38-08:00

SueYA,

My only comment about "going back to the drawing board" is the embedded base.

I am a User of a software package, and that is where I am coming from. There are many researchers who currently use programs that allow us to Share our Research. We have the capability to Export to a GEDCOM file.

I would hope that where ever BetterGEDCOM goes or ends up, the existing base is not ignored.

Thank you,

Russ

GeneJ 2010-11-11T08:19:12-08:00

Again, a non-tech user here.

I don't think any of us want to "break" content. As such, I expect software that supports existing GEDCOM to continue to support that export/import mechanism in the way it does today.

For a lot of us, me included, existing GEDCOM doesn't allow me to share data fully and accurately. While I "can" create and share a GEDCOM, its contents would like distort/mis-communicate a fair amount of research. Alas, I already shouldn't and don't create GEDCOMs as a way of sharing my research.

GEDCOM was built before the publication of Elizabeth Shown Mills, _Evidence_ and _Evidence Explained_. As well, GEDCOM predates publication of The Genealogical Standards Manual_, 2000.

Existing GEDCOM was developed in 1996. Windows 95 had just been released; it's a good bet that lots of us were still using Windows 3.1. Apple Computer bought NeXT in 1996, bringing Steve Jobs back to the company's ranks as a "special advisor." The iMac wasn't introduced until 1998, and we saw the first "intel Mac" computers when, in 2006?

In 1996, my access to genealogical information on the Internet was growing rapidly, but it was limited by today's standards; much represented transcriptions of original sources. Today, I access a broad range of sources via the internet as digitized originals!

I think _Family Tree Maker_ was still owned by Broderbund Software; I was probably still using FTM v3.0 in 1996.

I now use an Intel Mac, on which I have installed a virtual machine. I use a variety of genealogical software programs on both the Mac and the virtual (Windows) side of my computer.

Back to the point of your posting, Russ, the "installed base." From this user's perspective, our industry standards should specify that software fully and accurately "round-trip" it's export (whether that be existing GEDCOM or BetterGEDCOM).

That point also seems to support the need for underlying industry "technology standards" or "best practices" more global than just BetterGEDCOM.

SueYA 2010-11-13T01:03:33-08:00

Hi Russ

Yes, we will need to be able to translate to and from GEDCOM for a long time to come, but we should not let that dictate the development of new standards.

Sue

hrworth 2010-11-13T04:56:57-08:00

Sue,

For my education, please clarify the development standards that I was trying to dictate. I think that all I was trying to suggest is that the BetterGEDCOM may have to, or should acknowledge that the "old" GEDCOM will be around for a while. So that any development of a BetterGEDCOM needs to address the "how to handle" the old.

If my software hasn't embraced the BetterGEDCOM, will that keep me from sharing my research with another User whose software does embrace the BetterGEDCOM.

All I am trying to do, is to Share my research. In my simple mind, the BetterGEDCOM is the vehicle for this sharing to happen.

Russ

DearMYRTLE 2010-11-17T05:35:52-08:00

Sue, I agree that BetterGEDCOM or whatever it will be called cannot dictate any single point of view.

That being said, Lackey's source citation work has been eclipsed by Mill's Evidence Explained.

We researchers have certainly learned that it is ill advised not to have the "file sharing" mechanism keep up with the advances in software and website development.

ttwetmore 2010-11-17T07:16:56-08:00

Concerning the continued use of Gedcom in the glorious BetterGedcom world to come...I think a few things are so clear that they can be stated unequivocally.

The BetterGedcom data model will be so rich that any system whose database is based on it will be able to import a Gedcom file without loosing any information.

Any system whose database is based on the BetterGedcom model will be able to export a Gedcom file that contains the lion's share of all the information that is in the BetterGedcom format, certainly all the truly significant genealogical events, but obviously it will have to use a simpler data model (just persons, families and sources for all practical purposes) and all the reformatting and shuffling around of information that that implies. NOTEs are your friends but should be used as sparingly as possible.

It is easy to support Gedcom import and export, and one should not fear that the next generation of BetterGedcom programs, shaking the foundations of the genealogical world to its very core, will eschew its support.

Tom Wetmore

GeneJ 2010-11-17T07:19:49-08:00

Warms my belly; makes me wanna sing with joy! --GJ

hrworth 2010-11-17T07:23:33-08:00

Tom,

Awesome - GeneJ, Happy Dance to follow.

Tom, thank you for your contributions to date and I look forward to all the work ahead of us. This is a community effort.

Thank you,

Russ

VAURES 2010-11-22T23:00:51-08:00

SueYA wrote

“As genealogical data is complicated, this is not a small or easy task. So, we need to identify some (fairly) discrete issues.”

It's not the genealogical date that's complicated it's the living persons and their individual wished that makes it complicated.
The simple data of one person would be date and location of birth, marriage, and death confirmed by sources. There is a need to add parents and possibly children with the same dataset to form a tree.
The rest are add-ons that are individually different.
There may be an interest in profession/occupation and much more.
To my knowledge no-one has ever thought of anthropometric data.
The problems observable in GEDCOM files are caused by the genealogical data collecting ideas of the makers and in some occasions of the users of the programs.
GEDCOM 5.5.1 offers a good way of arranging database provided the authors of programs would read it carefully and follow its rules.
Since the beginning of this year 21 German speaking authors of programs discuss the meaning of the different GEDCOM tags and agree on their meaning and use, which results in much better import-export qualities.
Ma advice to a betterCEDCOM would be:
Read the draft carefully, follow it and thereafter you may add some improvements like the various ways of getting married (Christian, Hindu, Jewish and others)/(church, temple, sheriffs office)
or of names (especially in Germany) Anna Adele Auguste Clementine Elisabeth Charlotte von Loewenstein zu Loewenstein is given in the birth and baptism document.
Later the normal use was “Elisabeth Charlotte von und zu Loewenstein“ or short „Leu v. Loewenstein“
This could be split ito given names, used name(s) and nickname (or AKA).
So far we don’t see this possibility in the GEDCOM draft 5.5.1.

Take care
Wulf

DearMYRTLE 2010-11-02T16:06:02-07:00

From: Dear MYRTLE <Myrt@DEARMYRTLE.COM>
Date: Tue, November 02, 2010 6:59 pm
To: GEDCOM-L@LISTSERV.NODAK.EDU

I've studied your comments Thomas, and favor the thought process
instigated by your comment "some scheme of probabilities would have to
be devised to represent the researcher's confidence in the
evidence-observation as a whole and to its various component parts - as
well as any external 'assertions'."

Without significant user-based pressure, software coders have no
incentive to devote time developing a betterGEDCOM or adhering to it.
Otherwise, the project would have concluded, and we wouldn't be having
this discussion.

To make the project DOABLE, how about mutually defining a betterGEDCOM
format for file transfer among researchers with industry-wide software
specific and user-created data field capability; consideration of
multi-media and multi-language formats; and the ability to rate each
event's supporting source document's perceived reliability with an eye
to rating multiple elements of each document's perceived reliability?
(Several existing programs consider the reliability of the document as a
whole, not delving into the component parts which can be quite disparate
in reliability. An elderly man's death certificate comes to mind, where
in addition to the attending physician's signifying cause of death and
date, the document lists his parents, their birth places and such,
clearly not within his purview to assert.)

Genealogy software is defined as useful for recording of proved,
disproved, biological, adopted, inferred, and highly suspected (etc.)
lineages.

This would assist 95% of the genealogy researching public. That number
represents a reasonable market for the use of a programmer's time.

Once a betterGEDCOM file is adopted, genealogy websites would have to
move to accommodate the newer format for receiving data uploads and
permitting downloads. Ancestry.com, FamilySearch.org and the like cannot
impose unique rules.

As a SEPARATE ISSUE, think of a workspace for developing research
arguments until a conclusion is reached; particularly useful in the
absence of a clearly related document-to-ancestor scenario about life
events and/or relationship to parents. This would be something like a
blog with with hyperlinks to supporting digital copies of ancient
documents in one's personal Picassa web album or DropBox folder, shared
or not as personal interests dictate.

Once results in the workspace qualify an individual or family group for
one of the accepted categories (proved, disproved, biological, adopted,
inferred, and highly suspected...) the researcher can enter data in his
genealogy management software with links to the workspace.

I'd venture to say only the cream of the crop attempts inferential
genealogical research. Consequently there is little impetus for coders
to spend time devising a file transfer scheme for this data set.

Taking into consideration documenting 'kinship and community' is a
distinctly different approach than merely documenting one's direct
lineage including collateral lines. I see 'kinship and community'
discussions as part of the 'workspace' strategy. Whether to incorporate
those individuals who surround but are not directly related to one's
ancestors e is a personal choice, unless the betterGEDCOM file transfer
takes into account a database listing those who witness wills or serve
as Godparents at christenings, etc.

As we move to the cloud, let's not forget that most genealogists prefer
to have the best copy of their genealogy on their personal computers.
From a practical standpoint, inspired genea-bloggers and instructors
have a history of being adept at training folks to use new versions of
genealogy software in ways not previously considered by typical
researchers.

Let's ensure that 21st century genealogy software won't impede the ready
exchange of data among conscientious researchers.

Happy family tree climbing!
Myrt :)
DearMYRTLE,
Your friend in genealogy.
Myrt@DearMYRTLE.com
http://blog.DearMYRTLE.com

rjseaver 2010-11-09T14:15:02-08:00

This sounds good to me, but isn't this really complicated?

Didn't a GENTECH committee try to do this ten to fifteen years ago and fall by the wayside? Has there been an attempt to contact persons on that committee to see if they can contribute to this effort? If not, there should be - we need all the help we can get and knowing what others tried to do would be informative and perhaps crucial, and might help the group master the learning curve.

My 2 cents -- Randy

geni-george 2010-11-09T14:17:03-08:00

I'd argue that information science has advanced enough in the past fifteen years that most of what they tried would be completely irrelevant. That said, it might be nice to have some notes.

DearMYRTLE 2010-11-09T14:24:17-08:00

DearRANDY,
I believe that genealogy software engineers on and off the web are acutely attuned to meeting the needs of their users.

However, as individual users group discuss problems, it never gets beyond the challenges of the fact that GEDCOM was originally developed by coders at FamilySearch, and they weren't updating their own product.

After 14 years -- its probably time for end users to pull the diverse talent together saying "THIS is what we need to get our research sharing done!"

I agree with you about the GENTECH committee contacts. Most of them who are still alive were included my initial mailing. I am sure the news of this group will spread.

M

DearMYRTLE 2010-11-09T14:26:30-08:00

I couldn't agree more, George. On the other hand, we don't want to throw out the baby with the bath water.

I am a power genealogy website and software user, not a techie.

I am sure that the detailed discussion of developing a new standard, releasing it in manageable models will prove useful to all concerned.

Then, for once, we'll be comparing apples to apples, and not oranges.

greglamberson 2010-11-09T14:27:17-08:00

I think GenTech was working from a theoretical standpoint, at least initially.

Several efforts have tried to do something to revise or replace GEDCOM, but none has succeeded for various reasons. ALL of their experience is certainly valuable to this effort, so hopefully a lot more of that content will be added shortly. We intentionally started out a little sparse on things like that to give those with real knowledge about it a chance to add information from firsthand knowledge rather than anything extensively rehashed.

Is this hard? OF COURSE!! However, every journey begins by placing one foot in front of the other, and that's what we're doing here.

One thing is for sure: Nothing will happen if everyone sits around and hopes something will change without any effort. We believe that by continuing to move forward that we will improve GEDCOM. Only by stopping to reach our goals will we fail.

xvdessel 2010-11-10T02:56:09-08:00

For those of you who want to have a look at the many failed attempts to improve GEDCOM in the last 15 years, there is a summary in the Wikipedia article on GEDCOM, including links to the relevant documentation.

SueYA 2010-11-10T15:26:13-08:00

I think we should be going back to the drawing board, rather than trying to improve GEDCOM. I believe that the data model used by GEDCOM is fundamentally flawed. The GENTECH model made a stab at re-thinking the data model, which we would be foolish to ignore.

Once we have got the data model right, we might be able to write software that better serves our needs.

As genealogical data is complicated, this is not a small or easy task. So, we need to identify some (fairly) discrete issues.

From what I can make out, most genealogy programs, due to the influence of the GEDCOM model, work from presumed relationships, rather than from the sources (e.g. documents etc.). I think The Master Genealogist and perhaps GRAMPS at least claim to use a data model based on sources. Does anyone know if this is true?

dsblank 2010-11-10T15:41:42-08:00

Gramps does not require one before the other. One can easily add sources, then add items that use sources. Or one can later can add sources to existing items.

Does that help?

GeneJ 2010-11-02T09:28:36-07:00

What are "our" electronic guidelines or standards?

There is great need for a BetterGEDCOM, but seems this should be developed within the framework of industry electronic guidelines or "best practices."

Wouldn't "we genealogical users" have more influence on the development of current and future technology if our needs were advanced within such a framework? (Said, as I glare at multiple file cabinets and boxes of CD's in formats my 'puter no longer reads.)

greglamberson 2010-11-02T09:52:19-07:00

Answer: Yes.

So far the very idea you put forth is the framework I personally have been shooting for.

This new standard is a very technical undertaking, and I liken what currently exists to a vacant house yet to be filled with furniture. The walls and other structure have been built, and now we are basically ready to start moving in.

The content and discussions that we will need in developing this standard are probably going to be unintelligible to most genealogy users, but the overall structure for a commonsense discussion of our goals has been put in place. This structure has been added so that it, rather than the technical aspects of this project, can drive the discussion.

The BetterGEDCOM Sandbox area is where the work will be done, and it is essentially empty. This has been done on purpose. However, participation by software developers and the like will quickly envelop the other discussions. The important thing is to make sure everyone knows who's running the show.

Does this help address your question?

GeneJ 2010-11-09T12:50:47-08:00

Congratulations, Greg!

We might start by seeking volunteers for the project from various certifying and/or standards boards around the world. While some such volunteers might be technology experts, it's their genealogical expertise we would be seeking.

So exciting!

dsblank 2010-11-10T05:17:23-08:00

Comments from a Gramps developer

Greetings BetterGedcom people!

I'm writing from the perspective of a Gramps developer for many years. I'm not representing Gramps, just myself, but I know my community pretty well. Some comments:

1) We would love to have a standard that everyone conforms to, whatever it is! We are with you 100%. We don't (generally) support any ad hoc extension to Gedcom, as that would be a moving target, and entire project unto itself. We read/write official Gedcom, even if that means losing data. For lossless use, we have Gramps XML.

2) You are welcome, of course, to take and build on our open source contributions (given that they adhere to the open source requirements).

3) A really great standard is worthless if no one uses it. Conversely, a standard's utility will be hard to gage without it is being used.

4) I don't think any of the commercial entities will use a new standard, unless they have to. Why would they? People could get their data out of their system and use a competitor's. Blaming bad Gedcom is a convenient excuse. The only way one could pressure commercial groups is by not buying their products.

5) Developing a freely available API is great, but does not replace the need for a file format. Gramps will probably support such an API someday, but as an additional convenience.

6) People (even genealogical experts) pick their genealogical applications for very personal reasons. The fact that an application supports a particular file exchange format will barely register in their decision making.

7) I *think* that the rest of the Gramps developers would be happy to have Gramps XML as an ISO standard. I would certainly do all that I could do to help.

8) You all should get a mailing list. This is no way to carry on a conversation. I *suspect* that the Gramps infrastructure would be happy to provide resources for your work. For example, you need a long term solution for hosting, discussing, and developing your work. Some random free wiki is not a long term solution.

-Doug

greglamberson 2010-11-10T05:43:00-08:00

1 & 2) That's great, thanks. If there is consensus around using GRAMPSXML in some way, I am sure you will be aware of it.

3 & 4) Clearly any work done will be useless sitting on a shelf. I think we have a good chance of actually getting something done. We have good indications from many software vendors that they are ready for something new. Now is the time to get some work done, though, and as they say, talk is cheap.
5) We feel strongly that an archival file format is a core user need, albeit the basis for an XML API can somewhat easily be adapted for this purpose.
6) That may be true for some people, but I can assure you this concern/issue is the absolute most important issue or nearly the most important issue for a whole lot of people.
7) I encourage you to look at AIIM.org and examine both the sort of work they do generally and what they have recently done more specifically with standards such as StratML.
8) How do you feel a mailing list is different than discussion pages on a wiki like this? I see no difference except that you can actually see what was recently said easily, and that a wiki has a space to develop and change what is said via a summary page. Regarding infrastructure, we will almost certainly need to change infrastructure at some point, including things like a development space and probably a more robust wiki. We will remain independent, however, so we would not be interested in cohosting with GRAMPS in any way, just as we have turned down offers from other entities. I don't think anyone considers this a long-term solution.

I very much look forward to your contributions and those of your community.
Greg Lamberon

dsblank 2010-11-10T06:49:05-08:00

Greg,

Answering in a slightly different order. You said:

"""8) How do you feel a mailing list is different than discussion pages on a wiki like this? I see no difference except that you can actually see what was recently said easily, and that a wiki has a space to develop and change what is said via a summary page. """

This is, of course, the least important of my points, but still an important one. Even as I try to respond to you, I have to manually cut and paste your comments. I am typing into a little box with some unfamiliar text formatting. Email is designed for exactly this purpose, and shows up on my device, whatever it is. Most listserves have archives. What happens when you want to change wikis, do your conversations go away? Having a separate system for conversations is important for long term projects. On a related point, email also has archive formats, and listserves use those.

Ok, now to the main points:

"""1 & 2) That's great, thanks. If there is consensus around using GRAMPSXML in some way, I am sure you will be aware of it."""

I'm not aware of any consensus. PhpGedcom has in the past supported an importer for Gramps XML, but I don't believe that is anywhere near current. I think we are the only system that uses it. Perhaps you mean something else?

"""3 & 4) Clearly any work done will be useless sitting on a shelf. I think we have a good chance of actually getting something done. We have good indications from many software vendors that they are ready for something new. Now is the time to get some work done, though, and as they say, talk is cheap."""

Exactly. If you had people from any of these vendors on your board of directors, I would have some hope that this project will have an impact.

"""5) We feel strongly that an archival file format is a core user need, albeit the basis for an XML API can somewhat easily be adapted for this purpose."""

What is an XML API? But doesn't an API still require some kind of server?

"""6) That may be true for some people, but I can assure you this concern/issue is the absolute most important issue or nearly the most important issue for a whole lot of people."""

You are preaching to the choir on this one, as they say. Having our data in a manner that we control is the most important point for the Gramps team. Most of us would rather suffer through years of painful development and debugging than use a commercial product. But I'm talking about everyone else. Regular people :)

"""7) I encourage you to look at AIIM.org and examine both the sort of work they do generally and what they have recently done more specifically with standards such as StratML."""

Took a look at that (Strategy Mark Up). Yes, impressive. But that will have a huge impact on groups with a lot of resources and money. And has a government mandate to make it happen (eg, "do it or you don't get a US government contract"). Genealogy is tiny, and is splintered by competing interests. Look at the trouble Open Office has had in making their XML a standard. Hint: the vendors fought back.

"""8) Regarding infrastructure, we will almost certainly need to change infrastructure at some point, including things like a development space and probably a more robust wiki. We will remain independent, however, so we would not be interested in cohosting with GRAMPS in any way, just as we have turned down offers from other entities. I don't think anyone considers this a long-term solution."""

Ok, but I think you might be a bit confused about the Gramps community. It is independent. We don't consider ourselves to have a product that competes with anything. We are a bunch of individuals that work together to make an open, useful system. For example, we also have a CD that contains many other free programs. We are not an entity that has any external motivating forces. For example, we encourage people to use other open source software (such as Linux), but our software runs everywhere.

In addition, the Gramps community will probably be your first, and best evidence that that whatever you end up standing behind actually works. We have a large number of people exactly interested in what you are proposing (and, in fact, we created our best stab at it already).

I can see that others might perceive us as competitors, and so you might want to change the name "Gramps XML" to "Open Genealogy ML" or something. But you could still work with us, and we could support a completely independent body. In fact, we have many developers that have many future ideas of where our own format should go. We have what we call Gramps Enhancement Proposals, and many of them have file format connections.

-Doug

ttwetmore 2010-11-10T06:50:30-08:00

Two Comments for a Gramps Developer.

1. How can we see the GRAMPS Schema?

2. I just looked at the GRAMPS wiki and it seems that GRAMPS does not support genealogical research by not distinguishing between evidence and conclusions. This seems clear by your notion of PERSON MERGING. It seems (correct me if I am wrong) that each person in GRAMPS is expected to represent a different real person, and that as soon as two persons are found to be the same real person the two records are DESTRUCTIVELY MERGED into one person, destructively meaning you can't go back. When third or fourth persons are merged into these growing persons the whole history of research is lost. Another symptom of this is the fact that it seems (again correct me if I am wrong) that events are not records in their own right, but are associated with person records.

The GRAMPS destructive merging is nearly the same as the one I used in my LifeLines program eons ago, except that LifeLines allowed the user to pick and choose what to keep from the two records, instead of applying fixed rules like GRAMPS does. Thinking about the implications of destructive merging was one of the main influences that caused me to start working for a better genealogical data model that supported evidence as well as conclusions.

My hope is that the better Gedcom effort will begin with a more complete data model that clearly distinguishes between person records that are tied directly to evidence and individual records that are constructed from the person records that the research believes represent the real person. The original person records should NEVER be destroyed because these records are YOUR RESEARCH and destroying them obliterates your research from your computer.

Tom Wetmore

ttwetmore 2010-11-10T08:52:08-08:00

I had never tried to use Gramps before so I just downloaded the latest stable version to my Intel iMac with a 2.66 GHz Core 2 Duo processor. I wanted to check whether I could find a way to use Gramps in an evidence and conclusions mode, since with LifeLines, which is a conclusions only program, I have found ways to fudge it. I created a database and I'm now importing my master database from a Gedcom file. That file has 13,796 persons and 5,102 families. I started the import at 11:22. The import is now 10% done and it is 11:41. That means it will take 190 minutes (over 3 hours) for Gramps to import my data. FYI, it takes 20-year-old LifeLines 15 seconds to import that data, and it takes DeadEnds (my current program under development) and GedEdit (John Nairn's Mac-only genealogy program) both about 3 seconds to import that data. Wow. I think I will let the import continue even though I can hear my poor disk working its heart out. I hope the user experience will reward my patience!

Tom Wetmore

dsblank 2010-11-10T09:07:16-08:00

Tom asked:

"""
1. How can we see the GRAMPS Schema?
"""

Gramps itself doesn't use a relational database. But, we have written importers and exporters to SQL-based systems. You can see the relational schema for those here:

http://www.gramps-project.org/wiki/index.php?title=GRAMPS_SQL_Database
http://www.gramps-project.org/wiki/index.php?title=GEPS_013:_GRAMPS_Webapp

"""
2. I just looked at the GRAMPS wiki and it seems that GRAMPS does not support genealogical research by not distinguishing between evidence and conclusions. This seems clear by your notion of PERSON MERGING. It seems (correct me if I am wrong) that each person in GRAMPS is expected to represent a different real person, and that as soon as two persons are found to be the same real person the two records are DESTRUCTIVELY MERGED into one person, destructively meaning you can't go back. When third or fourth persons are merged into these growing persons the whole history of research is lost. Another symptom of this is the fact that it seems (again correct me if I am wrong) that events are not records in their own right, but are associated with person records.

The GRAMPS destructive merging is nearly the same as the one I used in my LifeLines program eons ago, except that LifeLines allowed the user to pick and choose what to keep from the two records, instead of applying fixed rules like GRAMPS does. Thinking about the implications of destructive merging was one of the main influences that caused me to start working for a better genealogical data model that supported evidence as well as conclusions.

My hope is that the better Gedcom effort will begin with a more complete data model that clearly distinguishes between person records that are tied directly to evidence and individual records that are constructed from the person records that the research believes represent the real person. The original person records should NEVER be destroyed because these records are YOUR RESEARCH and destroying them obliterates your research from your computer.
"""

Gramps is agnostic as to one's methods of use. It allows many different objects to be the same person; however, it currently has no method to represent that fact. We are currently evaluating a global ID mechanism to do this. There are no special requirements, and no support for, evidence vs. conclusion.

The "destructive merge" is only for those appropriate uses where a user really wants to collapse two objects into one. Most of the data is not destroyed, but rather merged. For example, all events are retained. Data like ID numbers will have to decided which will be retained, as each person object is only allowed one.

Hope that helps,

-Doug

dsblank 2010-11-10T09:18:43-08:00

Tom said:

"""
I had never tried to use Gramps before so I just downloaded the latest stable version to my Intel iMac with a 2.66 GHz Core 2 Duo processor. I wanted to check whether I could find a way to use Gramps in an evidence and conclusions mode, since with LifeLines, which is a conclusions only program, I have found ways to fudge it. I created a database and I'm now importing my master database from a Gedcom file. That file has 13,796 persons and 5,102 families. I started the import at 11:22. The import is now 10% done and it is 11:41. That means it will take 190 minutes (over 3 hours) for Gramps to import my data. FYI, it takes 20-year-old LifeLines 15 seconds to import that data, and it takes DeadEnds (my current program under development) and GedEdit (John Nairn's Mac-only genealogy program) both about 3 seconds to import that data. Wow. I think I will let the import continue even though I can hear my poor disk working its heart out. I hope the user experience will reward my patience!
"""

Yes, we know about this issue, and we hate it! Remember, though, that this isn't a product, per se, it is a collaboration of many volunteers donating their time and energy. This particular issue is due to the fact that we are using an entire stack of open source software. Apparently, some combinations of OS, version of Python, and the database layer suffer this slowdown (eg, this isn't a Gramps issue) [1]. We are (constantly) working to make the Gramps experience better, but it is a community effort. Feel free to join the community, and make it better.

[1] - Comments from:

http://www.gramps-project.org/bugs/view.php?id=3750

The issue of incompatibility under Mac OS X is in
https://trac.macports.org/ticket/24310.

The issue of slowness is also in
https://trac.macports.org/ticket/23768 for Mac OS X.

I have raised [http://bugs.python.org/issue8504 a bug with python] "issue8504 bsddb databases in python 2.6 are not compatible with python 2.5 and are slow in python 2.6" for this.

greglamberson 2010-11-10T13:46:12-08:00

Doug,

Thanks for the exceedingly useful, detailed input. Let me try to address some of your questions and comments:

referring to your comment at what shows to be 10 Nov 8:49 AM:

re: 8) All I can say at this point is that a mailing list would be a redundant tool, and we're going to have a hard time keeping up with just the wiki. We won't be adding a mailing list any time soon, I don't think.

1 & 2) Let me put my original answer another way: We're not yet ready to address the use of ANY data model. If consensus build around using GrampsXML, I am sure you will be aware of that consensus as it develops. Stay tuned.

3 & 4) We have no board of directors. We are completely informally organized at this point. Were we to formally organize, I feel we would be very likely to garner very significant support. We intentionally have no affiliation with any other group of any sort right now, however. This may change as we explore standardization. Mention of a Board Of Directors makes most of us just giggle right now.
5) An XML-based API. The GrampsXML specification is in fact an XML-based API, although you may not think of it as such. There are certainly other vendors out there with similar projects (like FamilySearch).

8) I certainly don't think of GRAMPS as a competitor, but certainly more of a project with very similar goals. However, you do have a project that is available with its own data model and format, and even if we were to adopt your open-source work wholeheartedly, that's not how we're going to begin. I do a lot of work with open source projects, so it's not like I don't understand what you guys are about or what you're doing. Basically, we are starting from a point at which everyone is invited to give input. We've been public for right about 24 hours. We're not even eating solid food yet.

Reading down, I would almost pay to see you and Tom discuss things back and forth. You guys are exactly the folks we need to hear from!

dsblank 2010-11-10T14:44:32-08:00

"""Thanks for the exceedingly useful, detailed input."""

Greg,

No problem! If you have specific question, please feel free to ask me, or to join the Gramps mailing list. As we have already done most of what you are aiming to do, I probably won't pay a whole lot of attention, until you catch up.

Some other comments below:

"""re: 8) All I can say at this point is that a mailing list would be a redundant tool, and we're going to have a hard time keeping up with just the wiki. We won't be adding a mailing list any time soon, I don't think."""

I think the key phrase is "hard time keeping up". Email would make that easier, in my opinion. It is really hard to respond to a point, or to have a threaded conversation, with this.

"""1 & 2) Let me put my original answer another way: We're not yet ready to address the use of ANY data model. If consensus build around using GrampsXML, I am sure you will be aware of that consensus as it develops. Stay tuned."""

I think that you will find that you need to consider the data model. At least in the abstract. Otherwise, it seems hard to imagine how things will fit together.

"""3 & 4) We have no board of directors. We are completely informally organized at this point. Were we to formally organize, I feel we would be very likely to garner very significant support. We intentionally have no affiliation with any other group of any sort right now, however. This may change as we explore standardization. Mention of a Board Of Directors makes most of us just giggle right now."""

I just hate to see all of your energy wasted. We started in a similar manner almost 10 years ago, and have exactly zero adopters of our format (outside of ourselves). I think that there are lessons to be learned with Gramps.

"""5) An XML-based API. The GrampsXML specification is in fact an XML-based API, although you may not think of it as such. There are certainly other vendors out there with similar projects (like FamilySearch)."""

Usually, API refers to a live, functional interface (requires/provides). XML is description of data, and the data. Gramps has an internal API, but not one for exchanging data.

"""8) I certainly don't think of GRAMPS as a competitor, but certainly more of a project with very similar goals. However, you do have a project that is available with its own data model and format, and even if we were to adopt your open-source work wholeheartedly, that's not how we're going to begin. I do a lot of work with open source projects, so it's not like I don't understand what you guys are about or what you're doing. Basically, we are starting from a point at which everyone is invited to give input. We've been public for right about 24 hours. We're not even eating solid food yet."""

Ok. But most everything that your group is hashing out, we have either already done, or have a plan to include. And a lot more. For example, in has taken 10 years for us to figure out the appropriate manner to represent names. And dates. We have the most sophisticated representation of dates of any system that I am aware of (for example, alternate calendars, alternate new year's days, callnames/rufnames, multiple surnames, patronymics, etc.). We have a number of developers from all around the world. We wouldn't be able to come up with general solutions without input from all over the world.

"""Reading down, I would almost pay to see you and Tom discuss things back and forth. You guys are exactly the folks we need to hear from!"""

Well, I wish I had the time to invest in participating, but we have solved most of these issues, and we have a working system that takes advantage them, to boot.

I wish you luck!

-Doug

greglamberson 2010-11-10T16:05:58-08:00

Doug,

I am pretty sure Tom's work on this very subject predates GRAMPS by at least 5 years if not 10. Tom also has have a working solution to most of these issues.

I think you're mistaking our blank-slate approach. We feel we can catch up pretty quickly by simply evaluating the work you and many others have already done and adopting similar approaches.

No one is saying we're not going to consider a data model, but just maybe we can get by with putting that off til next week. In the meantime, is there some reason that discussions such as this one can't take place in the meantime? This certainly isn't a strictly linear process.

Anyway, frankly, I would think you would jump at the chance to compare notes with Tom Wetmore regardless of the level you perceive the rest of us to be at.

dsblank 2010-11-10T16:50:09-08:00

Greg,

Yes, it does look like LifeLines predates Gramps, by exactly 10 years. However, Gramps is still under active development---perhaps more so now than ever. Over the last 10 years, the Gramps community has grown into a very rich set of experiences to draw from, and that is a valuable resource.

I don't mean to be dismissive of you or your energetic, enthusiastic team, but as volunteers, we only have a limited amount of time to invest in genealogy. As a group, we have been discussing these issues every day/week for ten years. We only make a decision or add to the data model when absolutely necessary. But I'd be glad to discuss the weakness of our current XML. I bet many Gramps developers would too.

So, with limited amount of time, I'll probably spend that time working on unsolved issues, rather than starting from scratch.

ttwetmore 2010-11-10T18:25:12-08:00

The first version of LifeLines was around 1990. I made it open source in the mid 90's when my job changed and I no longer had access to UNIX machines (LifeLines is a UNIX program written in C during the wonderful 17 years I worked for Bell Labs). It was strictly a personal project but I let some friends know about it and before I knew it there were users all over the world. (It turns out that many UNIX geeks are closet genealogists.) Its user interface was and still is abysmal, using the UNIX curses library for an 1980's style windows interface. Its main claims to fame are the programming feature, which is powerful, and the fact that it can store anything in its databases as long as it looks like Gedcom. If it weren't for the programming feature I'm sure LifeLines would be a disappearing footnote at this point, which in fact, it is.

The main connection between LifeLines and Gedcom is that I wrote a custom B-Tree database for LifeLines that used pure Gedcom as its record format. Importantly I used Gedcom SYNTAX, but wasn't strict about Gedcom SEMANTICS. So I could avoid all controversies about Gedcom versions and standards. LifeLines can read Gedcom 3, Gedcom 4, Gedcom 5, custom Gedcom, basically anything that looks like Gedcom it will happily read and store, unchanged and unchallenged. Records can be as large or as deep as you like. You can make up your own tags. You can even invent your own custom kinds of records. The only Gedcom semantic rules I enforced were to honor the special meanings of the few tags that held lineage-linking information (INDI, FAM, SEX, FAMC, FAMS, CHIL, HUSB, WIFE, but only when they were found in the right contexts). All other tags could be used anyway one liked. If you chose to use the standard event tags and a few other common tags (BIRT, DEAT, MARR, CHR, BURI, DATE, PLAC, NOTE, SOUR) then LifeLines would handle them in the expected manner, but if you didn't use them or used them in non-standard ways, no sweat. If it adheres to Gedcom SYNTAX LifeLines figures you're the boss. Of course the problem this creates is that different users using different non-standard conventions and then trying to share LifeLines programs to process their data or generate reports from it. This rules cop-out was an early decision on my part and I decided the advantages outweighed the disadvantages. It was important to me at the time to demonstrate that Gedcom SYNTAX was an excellent database record format. This was before XML, and XML has since demonstrated exactly the same thing. When I wrote LifeLines I was embroiled in some discussions about the best database technology to use for genealogical databases, and my contention was that a relational database, which seems to be everyone's obvious first choice for a genealogical database, was not the best choice at all. I have never used a relational database for genealogical software. In order to prove the point to myself as well as to those I was arguing with, I decided on the B-Tree/Gedcom syntax approach. Relational databases always come with serious restrictions, and when the data that one wants to store in relational databases is as irregular and strange as genealogical data often is, these restrictions can be critical. Try putting "the first Tuesday in May or June in the year Tony came home from the war, probably in 1921" in the date field of a Family Tree Maker (or any other of your favorite systems) and see what happens. Give that date to LifeLines and LifeLines says yum, yum, what's your problem (and LifeLines will see that 1921 in there, and if it has to estimate that date on some user interface window or in some program it it processing will use it). Databases based solely on well-indexed Gedcom (and now XML) records suffer no restrictions and can be used to store things that were never anticipated at the time the database was created. These are mega issues for software design in my opinion.

There really are some points that apply to better Gedcom hidden in all this. For example flexibility and extensibility is critically important. For example, designing value formats so they will adhere to some future relational database formatting rules should NEVER be done.

I didn't mean to sound disparaging to Gramps before (I am after all a Gramps myself). I am anxious to see how it works and learn from it.

Tom Wetmore

jbbenni 2010-11-10T06:02:50-08:00

XML for Multimedia -- pure XML versus efficient multimedia

Multimedia (MM) files are typically large and binary. Embedding MM into BG could certainly be accomplished by encoding it for UTF8. But that has drawbacks. Most obviously, large binary files get larger with encoding. Video is already really big. A more subtle drawback is the problem that a BG file full of encoded MM might be XML but it quickly loses many of the advantages people expect from XML. It becomes difficult to edit, process, and is no longer human readable (in a practical sense).

So should BG have a provision for including MM by reference (rather than by value)? That solves many problems but creates a few more. It means BG must be expanded beyond a pure XML standard. XML is great for the content typically found in GEDCOM, but something else would be needed as a container for referenced external files. (Think ZIP, TAR or something comparable.)

The purist in me would like to keep life simple and opt for pure XML, but if BG files are bloated with encoded JPEG and MPEG content, they'll start to suck. So the pragmatist in me would opt for XML for the GEDCOM type data, with references into a container for binary files like MM. But I can see arguments on both sides.

What does the BG community want to do with MM -- encode and embed it, or externalize it and include by reference?

brantgurga 2010-11-10T07:25:30-08:00

In a goals discussion, I expressed the same concern. I'd think an approach similar to what the OpenOffice.org or Microsoft Office formats now do. The file you transfer around is not an XML file. It is a ZIP (or other archive format) of a directory of XML files as well as the associated multimedia in its binary formats.

gthorud 2010-11-10T14:05:43-08:00

I agree that a container and reference approach seems to be the best solution.

Besides a little extra work, are there other negative aspects of expanding "beyond a pure XML standard"? Are there any advantages at all by including the media into the xml?

It may be useful to come up with some more detailed requirements for a reference solution.

For example, a reference mechanism should also be able to point to things outside the zip file, e.g. media files that have been transferred separately - to be able to identify them, and possibly identify their location.

theKiwi 2010-11-11T04:13:54-08:00

GEDCOM already allows for links to external multimedia files, and did also allow for the "BLOB" method. According to

http://wiki-en.genealogy.net/Gedcom_5.5.1

the BLOB method was eliminated in GEDCOM 5.5.1. One software I know of that supports this is TNG - The Next Generation which is a PHP/MySQL based web application.

A great many of today's genealogy software support the OBJE tag.

code1 OBJE
2 FORM jpg
2 FILE ~/Documents/Documents/Genealogy/Roger/ReunionPictures/photos/people/JohnDewar.jpg
2 TITL John Dewar
2 _TYPE PHOTO
2 _PRIM N
2 _SIZE 363.000000 500.000000
code

is from a Reunion generated GEDCOM file. The "difficulty" now is in making sure that all that "stuff" before the actual file name can be translated between computers and operating systems so that the transport of the GEDCOM file and separately the media items can be lined up at the recipient end to ensure correct linking.

Roger

jbbenni 2010-11-11T07:09:00-08:00

This is a productive thread, but brantgurga is right -- I should have put my initial post in the Discussion tab on the Goals page.

I recently posted some observations and a suggestion into brantgurga's thread there. I suggest we all consolidate this discussion there.
GOALS/29930043

Andy_Hatchett 2010-11-11T09:42:16-08:00

Since 5.5.1 was never released as an actual standard and even though some software developers have embraced, it was only a draft I really don't see using it as an actual basis for anything- although some of its functions should be considered for BG

theKiwi 2010-11-11T09:56:33-08:00

BetterGedom shouldn't start out being less than what is fairly widely in use today, even if those features, like geotags for places, media etc were never part of a specification that was actually released.

greglamberson 2010-11-11T12:20:09-08:00

It's safe to say the BG will be far more functional than any released or draft version of GEDCOM.

Regarding the topic of this discussion, I believe it has moved to a similar discussion on the GOALS discussion page.

gthorud 2010-11-11T15:10:20-08:00

Although this has been included in a goal, I think the discussion should continue here since here we have a proper Subject.

hrworth 2010-11-12T11:33:26-08:00

General Comment about Media, and this is from a genealogy software User, not technical view of this topic.

I may have a variety of Media files within my genealogy file. May be Pictures, Movies, Recordings, Images of Sources, to name a few.

The issues, today, that the BetterGEDCOM needs to address are:

The ability to Read the file being sent, outside of a genealogy program. This is how some of us have to identify a problem between two software applications.

The size of the file being sent between two End Users, and one of the End Users might be a website, needs to be taken into consideration. Sending a file that includes Media can be very large.

The application being used for the information and the media MUST offer an option to include or exclude Media is the Sharing of the Information.

Data AND Media
Data Only

I am not sure that a Media Only option is necessary.

So, if I, the end user, select Data (genealogy information) AND Media, the first to requirements remain.

Zipping (or some other compression technique) of this information must be available to the Sender and to the Receiver. (for this discussion, not using the same software at both ends).

At the Receiving End, the data must be unpacked to include the Data and the Media. The software application would need how to related a Media file to a Person, Event, Source-Citation.

How the Data and Media are stored on either end would be determined by the application being used at either end.

This is One Users opinion.

Russ

greglamberson 2010-11-12T13:15:44-08:00

Russ,

I think most of these issues are issues that software developers have to grapple with. I don't think BetterGEDCOM is the proper way to deal with things like multimedia file formats or applications used to manipulate those file formats. Those things are wholly in the purview of the genealogy software developer, in my opinion.

hrworth 2010-11-12T20:59:27-08:00

Greg,

I agree with you. I am only reflecting the issues we have today with GEDCOMs in general. The good news about a GEDCOM file is that it's smaller then most genealogy files are for transmission, and you look as see, in plain text what is contained in that file. But there are not error messages, inconsistent delivery of information now. If the software developers can address the transport of the Media and provide appropriate error messages on dropped data, or errors within the data, I'll agree. I am only trying to raise the filesize issue and the ability to find when some data isn't received correctly.

For example: (today) a Data Element not being at the correct level (1, 2, 3, etc) or the sending application and the receiving application are not handling a data element the same.

Russ

ttwetmore 2010-11-10T09:15:31-08:00

Automatic Combination of Genealogical Evidence Records

I am now retired, but for the past five years I was a chief software architect at Zoom Information, Inc., a company that automatically creates person profiles by extracting 100s of millions of records from the world wide web. My main task was to figure out ways to take these 100s of millions of "evidence" records and combine them down into 10's of thousands of "conclusion records". I took this job because of my interest in solving the analogous problem in the genealogy world, where must take large sets of evidence records and decide upon a much smaller set of conclusions records representing the real set of people the evidence refers to.

The only organization in the genealogical world that I am aware of that is trying to do these kinds of things is Ancestry.com, where they have an obviously fairly sophisticated system for taking info about people and then finding other records in their database that might be the same person and suggesting them as hints. This is wonderful stuff.

Anyway, a few months ago I wrote up some notes about my thoughts and experiences with these types of algorithms, and I thought there might be some interest from others in seeing. So I just put it up on my Barton Street web site with the URL:

http://bartonstreet.com/DeadEnds/CombinationNotes.pdf

Tom Wetmore

hrworth 2010-11-29T05:42:52-08:00

Doug,

Now we are getting somewhere. I get hung up on specific terms that folks use. I see evidence person and I can't see the rest.

What Needs to be done, is what you said. the Meaning of the various "Flags" or "Tags" that might be associated with a data element.

A Person Data Element has the following Atttributes:

Name
Events
Facts
Stories (text)
Flag 1 = Evidence Person
Flag 2 = Hypothesis Person
Flag 3 = Conclusion Person

This might end up being a Flag of

Flag #,#,#,

Flag 0,2,0

and sent meaning Not an Evidence Person, a Hypotheses Person, Not a Conclusion Person.

Then, IF the sending application uses these terms, it know how to format that "Flag" statement. If it doesn't use those attributes, it send the Zeros. Then the receiving application know what to do, or drop that data on the floor.

I would further suggest that the application that drops data on the floor, in this case, would provide the End User with a stat

The sending application send Flag information that is not understood by this application.

or

The sending application did not send any Flag information to present here.

I am using the term 'Flag' to try to be consistent.

We, I think and hope, are trying to extend / expand and improve the current GEDCOM (mess) or lack of compliance.

Russ

brianjd 2010-11-29T06:16:14-08:00

Ok, to time to clarify my use of flag. Sorry for the confusion. I need to try to be more consistent with my words.

Using Russ' example.

A Person Data Element has the following Attributes:

Name
Events
Facts
Stories (text)
Person_type (choice: Evidence Person, Hypothesis Person, Conclusion Person)

So, what I mean by flag is a single field that flags a particular choice. You have a choice of one of a predefined list. Although, I'd recommend making the standard capable of being expandable to unforeseen choices.

I disagree somewhat that the eventual programs are irrelevant to the development of the new BG standard. I think that part of the standard should include a requirement that any program meet certain criteria to be certified as compliant.

I'm not certain the new BG standard should be so rigid that it becomes a burden to comply with a certification process. But I do think there should be a process we can develop so those who use it and are certified actually are able to retain the relevant details. That way users of programs can feel confident that when they import or export in BG they know the information will be there and get communicated right. We could make a simple test web application were programmers could self test.

I'm sure I could write a webservice that any program could communicate with to run a self test, and generate a level "X" certification result back. I could also write a web app that people could upload an XML file that gets tested and a result given of the level of compliance. The webservice would be a way to get automated certification, in case we don't have people to actually test potential applications for compliance.

brianjd 2010-11-29T06:24:07-08:00

I also like Doug's idea. I know the consensus is to make an XML standard. But, I really like the idea of also writing a standard way to represent BG output as extended GEDCOM. It's got a real nice backward compatible ring to it. All data is preserved, and it is readable by old GEDCOM only programs. I like backward compatible. I also like the idea that it could also be used as an alternative transport mechanism for those who really really don't like using XML. Plus it also provides an easier path to implementing BG.

Brian

hrworth 2010-11-29T06:28:25-08:00

Brian,

Who is going to determine BetterGEDCOM Compliant / Certification?

I understand what you are saying, but my question stands.

Elsewhere will we see where Greg tried to get at that point.

As of this point in time, we don't have all of the right players at the table to do other then being specific with the definitions of the data and its attributes.

Russ

brianjd 2010-11-29T09:18:02-08:00

Russ,

I'm pretty sure, I just offered to write the automated certification tool. If that is insufficient, I will also offer to host it on my server. But we are no where near needing that tool. I'd also like guidance from the community on what should be mandatory and what optional, and whether we will have levels of compliance or just and up/down verification. And also, a consensus approval of my final tool. An automated tool isn't a good as a group that actually does the task of testing a submitter's program, but it would be better than nothing.

But, I'm with you, on the point of who would be responsible for handling certification. Considering we're just a loose conglomeration of people at this point.

Now don't misunderstand me, I'm not trying to say no one can use it unless they get certified. I'm just saying as the creators of the standard, and whoever winds up maintaining should have a process whereby those who desire it can get certified as compliant. If the BG turns out to be really useful, and becomes widespread, there will undoubtedly be those who will implement it and not care about certification, but others will.

ttwetmore 2010-11-29T09:35:40-08:00

Concerning XML as the standard output format for BG and implications for other output formats:

I have had some experience with output formats in my various genealogical software endeavors. It is almost trivial to generate output from text-based trees in any syntax you like. We're talking about very few lines of code to do this. In my DeadEnds efforts the code to write a record in XML is a tiny method on the Node class (I will send it to anyone who would like to see how simple it is). I also have another trivial method on the Node class that allows the same data to be written in GEDCOM syntax, and another trivial method that writes using the example syntax I've been using for the DeadEnds examples.

This all presupposes that the internal representation of database records in programs that use the BG model are consistent with the BG model. If they are not I don't think we need worry about it.

Tom Wetmore

brianjd 2010-11-29T17:22:27-08:00

I too have some simple methods for writing XML data and other types of data. Not hard.

Andy_Hatchett 2010-11-29T20:13:39-08:00

Russ,

Here's the thing...

Any genealogical software that supports BG will have to change to meet it's requirement in order to know how to handle the data both to receive it and to send it out to another program.

It is imperative therefore that a well defined and well documented data model be developed so that those software developers know in advance just exactly how they will have to change their programs.

Just creating a new way to package/send stuff won't help at all unless the programs can handle it so instructions for doing so must be set in the data model. Otherwise we are just creating another flawed GEDCOM replacement.

hrworth 2010-11-29T20:27:24-08:00

Andy,

You are absolutely correct. The problem is that THEY, the developers, aren't at the table yet.

Russ

Andy_Hatchett 2010-11-29T20:55:50-08:00

Russ,

And, imho, most of the big player's won't be at the table until the data model and daffynitions (no- that is *not* a typo) are at a much more advanced stage than they are now.

louiskessler 2010-11-29T21:29:19-08:00

Andy,

Yes, let's get to the data model already. Goals and Shortcomings of GEDCOM can go on forever.

We can argue the abstract forever. But once a data model is proposed, then we will have something concrete that we can define.

p.s. there are a few developers here.

brianjd 2010-11-29T23:10:36-08:00

Some developers are here. Any model we come up with will be flawed from some perspective. We just need to hash it out.

romjerome 2010-11-28T11:58:22-08:00

Doug,

OK, now I know understand DB 'log' ;)

What I tried to say was that some genealogists use place on event like conclusion/evidence. To use person objects might be overkill for theirs searchs.

Events are unlimited.
To set a place on event is our evidence and conclusion. Date is the time log (on source and event).

Without any person object: time log (date) + place/source code (INSEE) + the type of event = evidence/conclusion

Agreed, this model cannot work on others countries. Maybe that's why I missed why multiple sources on persons is the right way: event evidence still sounds more natural.

brianjd 2010-11-28T12:23:23-08:00

Russ,

Yes, I'm advocating adding a field to events, people, roles, relationships, etc. that indicate if it is a fact or conclusion or something in between. After all even a person may be only a guess, or it may be a "fact" gleaned from some document. If researchers used such a field, than any other researcher would easily know which records they might be inclined to reject/accept. Not everyone would use such a field, but it's an imperfect world.

Tom, sorry, I didn't mean to imply that only professionals would use such a feature, or that you were advocating forced mergers.

On the Gramps model. I like it for the most part. I use it. It has it's own issues, but the plug-in feature was fantastic. I haven't used any yet, but plan to in the future. The Gramps merge is bizarre, and requires clean up of duplicate data when done. But is probably the right basic approach.

The way I see resolving the evidence/conclusion person is this. You get a name of a new person you need to research and you have one a fact for her, you add the person and the fact to your database as a new fact and new person. You get a new fact for this person, you have two options: add a new person, or add the fact to the existing person. Let's take the conclusion approach. Later now you have compiled 10 facts for this person, you get a new fact that proves 3 of your facts belong to a different person. So you pull up the persons list of facts select those three facts and issue a split person command.
Alternatively, with the evidence approach you would simply merge the ten people into two people, with the respective facts.

The differences in the two approaches are minimal, in the evidence model, every fact would be marked a as fact, because they always point to one piece of data. However, that doesn't mean that each fact is proven. You may have a fact for Jakob Schmidt, but in reality it was recorded wrong and it really should be Johannes Schmidt.

My point is there really isn't much need for two kinds of person entities in the database. A single person entity can serve both purposes. Merely by how a users enters them. you have a real person you are searching for, so you enter him into your db with no facts. Every fact you accumulate you enter as a new person. When you make a conclusion about a fact you merge it into the real person or create a new real person to merge it into.

One thing that could be done is to add yet another field to the person entity to set it as either a "evidence" person or a "conclusion" person. Those using the conclusion model would never need to see this field, and would want to ignore this field or always have it default to conclusion.

hrworth 2010-11-28T12:33:48-08:00

Adrian,

The End User puts information into an application. The application would do what it needs to do, to generate a file, that follows what is defined by the BetterGEDCOM file format. That BetterGEDCOM formatted file is passed along to another application. This application will unpack that file to meet the needs of that application, which, in turn is presented to the other EndUser.

How the packaging / formatting gets done, is within the application. If either application does not use the Better GEDCOM format, but a basic GEDCOM 5.5 format, what ever flavor it currently uses, that GEDCOM data should not be tossed, but presented to the application with some notification to the End User.

Russ

hrworth 2010-11-28T12:42:22-08:00

Brian,

What is an Evidence Person to you?

When do YOU make a Conclusion and How?

Are you suggesting that an Application allow the EndUser to create this 'new person' and what ever data is associated with that person, but NOT enter that new person into your file? Sorry, I don't know what you are asking the Application to do.

What does it look like to the End User?

OK, so that gets figured out. But, while this new person is in 'limbo' I want to share my research with someone else. What happens to this 'new person'?

Russ

romjerome 2010-11-28T12:47:07-08:00

brianjd,

Well explained (better than myself).
I used this method on some alternate databases: matches one primary name with multiple sources, so evidences and conclusions. One year later, most people have been merged according evidences/conclusions, and alternate databases are still in progress, safe and documented. Data related to my primary database are exported when complete and validated by evidences.

Note, using Gramps, you could try to use 'Marker' on persons: Evidence could be 'Complete' and Conclusion could be 'Todo' or custom marker value. This feature is incomplete, but you can imagine one custom tag for one specific conclusion or evidence.

hrworth 2010-11-28T16:31:38-08:00

romjerome,

What is an Evidence Person?

What is a Complete Person?

What is a Conclusion Person?

There is A Person. That Person has some information that is associated with that person and you have some documentation from where you received that information.

What makes these two or three persons?

I am trying to understand these terms that a number of folks are throwing around. As of yet been able to describe what this means.

Thank you,

Russ

brianjd 2010-11-28T20:36:18-08:00

Russ,

I don't use evidence persons. So, I'm just making examples that make sense to me, were I to use them. When I find a new person to add to my family tree , I add them in where the first piece of evidence places them. If I come across conflicting evidence, I make conclusions on whether to keep those conflicts with that person, or whether to create a new person from them. My Anneheim family is a good example. I have one ancestor with five different spellings in five different official records. Another where some relatives have joined two different people, even though there are birth records for children (yes, plural) by this couple, clearly indicating he lived beyond the death record they would posit for him.

To me an evidence person is a person defined in a single event/fact, or in multiple records that can be definitely tied together.

A conclusion person would be a person where you accumulate all the pertinent records. This is how I do it. If I come across a record or story that is conflicting or doubtful, or I only suspect to be correct, I mark it as speculative or such. Or I place it in a new person. Or add it to my list of miscellaneous records (one step away from file 86). Because I hate to throw anything away.

Although this thread, which has drifted here and there, is about the automatic combination of records. Tom, talked about his algorithm, which I'm sure is great. I doubt any of my recommendations would aid his algorithm in combining evidence people into conclusion people. It would be foolish to think that every researcher would be careful in setting evidence vs conclusion flags in records. Also, I think creating a type for conclusion people would be equally as unreliable. One person's fact is another's incorrect information. Which is another reason, I have doubts as to the usefulness of having a separate conclusion person in the data model. I think we should just have people. We can add fields to people so those who want to track evidence people vs conclusion people can do so, and when they share others can choose to accept those conclusions or reject them.

romjerome 2010-11-29T00:00:52-08:00

Russ,

Maybe I mixed some Gramps terms...
First, "destructive" is not the correct word for merging into Gramps, as Gramps also creates an attribute with merged object (person): Merged GRAMPS ID= (Doug, I never saw that before, amazing!) and will add events, names, attributes to a common existing list.
Maybe the DB log will only contain previous gender and internal object_handle (relations/associations). But user can quit the session without commiting changes, or to export to XML and use some tools (xmldiff, last modified objects, etc ...). So this destruction concept into Gramps, needs more proofs as records are still there or available with tools!

'Complete' and 'ToDo' are two markers used by Gramps for grouping object (set by the user): the way for next searchs (evidence/conclusion).

Sure, the 'Automatic Combination' sounds like a hight level concept, but an extra work for something not user friendly: an abstraction.

To set a person for matching evidence or conclusion makes sense, but what is the difference with an alternate person set with his name, event, sources, relations ? None.
Also, why should models have a limitation on events number ? Current use does not limit seizure (conclusions).

I suppose, some data models need one entry for birth, one for death, one for parents, etc ... True, in this case we need a 'standard way' or automatic combinaison for evidence records, but maybe this can get rid with an Event entity ???

hrworth 2010-11-29T03:44:20-08:00

To All:

I think I finally understand what one of the problems on this thread of messages.

The use of specific program terms. DeadEnds and GRAMPS. Terms that are used by these programs as a User Interface.

That isn't what this project is about. Its about the DATA within the program. Names, Dates, Locations, Events/Fact, and the appropriate documentation. To get that information to another application.

What the application does with the information, how it is presented to the end user, is not part of what we are trying to do. We are trying to identify the data elements that might be in that file.

These other terms, in my opinion, are user interface terms. If you sent to me an "evidence person" flag, the application that I use wouldn't know what to do with it. As the receiving User, of your data, do I care, especially if I don't know what it means, and/or my application doesn't present that to me. I am sure that the Evidence Person is important to some of you, and should continue to use it, if your application provides for it, but that's between the end user and the application.

Only one users opinion.

Russ

dsblank 2010-11-29T04:17:48-08:00

Russ,

If you don't care that information is reliably exchanged with its intended meaning intact, then GEDCOM can, today, encode all possible data. For example, one could make up codes to put in attributes that a program could unpack, and produce any possible representation.

But BetterGEDCOM is, if I understand it, about making sure that the intended meaning is transfered too.

Gramps and DeadEnds are useful programs to discuss to see what information is needed, and to look at explicit meanings. Also, because these two programs are both interested in supporting BG, then they are useful "guinea pigs" to actually use what BG produces. If BG can't get these two applications on board, there is little hope for the rest of the world of apps.

-Doug

hrworth 2010-11-29T04:37:30-08:00

Doug,

I must have said something wrong, if you came away that I didn't care. What I care about is when I send my research to another user, it gets there, reliably. NOT if the Data is reliable or not.

For example: I shared my research with another user, using different programs, and my source material was not received by the other user.

The other user deemed my data unreliable because there was not source material. Looking at the GEDCOM file, the Source Material WAS in fact in that file.

Two places that the problem could have occurred, My application didn't send it, or the one of our applications didn't find what it was looking for in the right place or the right format.

Not sure that the BetterGEDCOM goes as far as to make sure of the "intended meaning", nor what that means in my example.

I am not saying anything good or bad about Gramps or DeadEnds. I am sure that they are both powerful / great applications. But, I don't use them.

Russ

dsblank 2010-11-29T05:00:28-08:00

Russ,

You said: "If you sent to me an 'evidence person' flag, the application that I use wouldn't know what to do with it. As the receiving User, of your data, do I care, especially if I don't know what it means, and/or my application doesn't present that to me."

My point was that the meaning of this flag (or any other extension to GEDCOM) must be well-defined in order for BetterGEDCOM to work. If it is not well-defined, then, GEDCOM can, today, encode it through a variety of means. For example, through extension tags (eg, _UID), specially constructed notes, etc.

I think the discussion here is largely about what needs to be well-defined, so that applications know what to do with it. If some genealogists have multiple people records that represent "hypotheses" and some that represent "conclusions", then that is probably something that applications need to know in order to deal with the data properly.

So, it isn't about the User Interface in Gramps, DeadEnds, or any other program, but about User Meaning. And that, we need to have.

(Perhaps my only useful comment here is that whatever BG develops, it *could* be represented through a series of well-defined extensions to GEDCOM. For example, it could be encoded (as described above) so that BG extensions could make it through (in and then back out) an existing current GEDCOM-handling program, and then into a program that can actually handle BG data, where it could be unpacked into its proper meaningful way.)

-Doug

GeneJ 2010-11-20T08:21:59-08:00

P.S. Do we have any trademark, copyright or licensing conflicts as we proceed down this path? Anyone?

--gonna double post this. --GJ

brianjd 2010-11-27T21:22:43-08:00

I'm going to throw my (wordy) hat into this topic. I don't really understand why we want to have an evidence person and a conclusion person. I can see the rational for this for the professional genealogist and for transcribers.

However, to my sensibility it is at a fact/evidence level. You collect a fact, it belongs to a real person. It is your aim to compile as much evidence you can for the person of your objective. IF you have a piece of evidence or even a family lore about a person, it is that fact or lore that is in question. It should be markable as: fact, probable fact, possible fact, speculative fact, pipe dream, total fantasy, conclusion, etc.

It should then be possible to separate those pieces of evidence into a separate person(s) if desired.

I would never want a program to automatically combine persons in MY database. I have no problem getting a listing of what persons and facts the program thinks it could merge. But I want to be in control of the final decision.

It would seem to me, that the evidence/conclusion should be the place to mark it as a conclusion or a fact. I can see a field in any piece of information about a person that has a field called conclusion or assertion with values like above stated.

That field would be the key to when and where a piece of data might or should be separated out into a new person. It should be possible to select any number of records in a person and split them out into one or more new people.

Now if you're compiling data on every person in say Urloffen, Baden then it would make sense in entering facts as individual persons for later combination. But this is not the normal activity of genealogy research, unless you happen to be Ancestry or such, and it really is the job of the software and the researcher to make the distinction between facts that belong to one person or to different people.

Lastly, having such a field would allow me to import other people's data selectively. I might not want to import non-fact records into a person, I've researched fully over the past ten years, but might be more tolerant of that brickwall collecting dust in the corner. So for person A do not allow any data that is not a fact with source citations, but allow speculations for person B, with citations. etc.

Sure this puts the burden on the program or the importer/converter. But that's where it should be. The data format should just store the data however the researcher chooses to record it. You can't make researchers use an evidence person, when they want to use a conclusion person. Trying to force such behavior will only lead to bad data, where researchers fudge the data to make it fit where they want.

hrworth 2010-11-28T03:08:32-08:00

Brian,

You said:

It should be markable as: fact, probable fact, possible fact, speculative fact, pipe dream, total fantasy, conclusion, etc.

I am trying to understand what that statement has to do with the transport of the data in my file to another end user. Are you saying this should be a "property" of a fact (your term)?

As I read these discussions, I am trying to understand what belongs in the Application that a User has, and the Transport of that Information. What this group, I think, needs to think about where this belongs, then IF it (what ever "it" is) belongs in the application we need to make sure that information gets between the two applications.

If the application sees "this", here is what it means, then the application does what it wants to do.

So, if your application can mark a "fact" as "junk", isn't that between the application and the end user? The information about the fact gets passed along. Isn't the "junk" part of your evaluation of the data? Doesn't the user at the 'other end' then do their own evaluation of the data?

Russ

ttwetmore 2010-11-28T04:19:22-08:00

Response to Brian,

The rationale to distinguish between evidence and conclusion exists only for researchers who want to encode their evidence into person and event records and store those records in their databases along with their final conclusion persons. If only professional genealogists care to do this, then the distinction is one of use only to them. However, I am not a professional genealogist and I feel the need for such a distinction every time I use any of the genealogical programs I have installed. One could also imagine a setting in which a genealogist uses an entirely different program to record his/her evidence, and only adds data to his/her genealogical database when they are absolutely sure of who the persons are. I believe that Clooz (if I have the name right) is a popular Windows program with this purpose.

If you prefer to not to add any information to your database until you are sure to which real person the information applies, then you have no need to make the distinction between evidence and conclusions in your database.

I think you would be comfortable with the Gramps model, which is very good for non-professional genealogy. You enter an event record for every major event you learn about. You don't have to make an anal distinction between evidence and conclusion, just add it. Then make the persons who you believe are the role players in that event point to the event. If you decide later that those are not the right persons you break their connections to the event and maybe create new persons to point to the event. If you decide that two persons in your database are the same person, you merge them and the merged person points to the all the events that the two pre-merge persons did. This is an excellent compromise in my opinion.

By the way I have never advocated 1) automatic combination of person records for genealogical programs or 2) forcing a genealogical program or a program's users into following a particular process. My comments on automatic combination describe techniques that can find highly probably duplicate persons in a database. My comments on a research process describe the most complex process a genealogist might want to pursue in order to make sure the BG model can support it and every other process that is simpler.

And another by the way, in the DeadEnds model solution to the evidence and conclusion process that I have proposed, there is no attribute in any record that states whether that record is evidence or conclusion or is a fact or a fantasy. I make the evidence and conclusion distinction only to show how a genealogical data model can be constructed to support the typical genealogical process of taking evidence, reasoning about it, and making inferences from it. The DeadEnds model can support the professional genealogist who want to make that distinction, but the same model can support a PAF user as well.

Tom Wetmore

AdrianB38 2010-11-28T04:46:33-08:00

Russ - re "I am trying to understand what belongs in the Application that a User has, and the Transport of that Information"

It may (or may not) be helpful to refer to those people who use software that uses GEDCOM as the native file format, i.e. GEDCOM for them is not just for transport between different users, but also for how their application stores the data. For instance, the UK product Family Historian by Calico Pie uses GEDCOM as its native file format.

For those people, (myself included) all the discussions about transport have impact on the storage of the data by their application. Pretty much the same impact, clearly. If we add a user defined flag to the GEDCOM format (as FH does), to save data about privacy (say), then that same data goes out on an exported GEDCOM format file if the particular person is part of the export.

So for me, _if_ I carry that view forward, anything that gets marked by my application as "solid proof" or as "fantasy" gets stored in the BG format file that my application uses, and if the same "thing" that got marked is on an export, then that same mark will get transported out to the recipient.

Even if the application doesn't use BG format as its native file structure for saving the data (and most won't) the application's native file format for the genealogy data must look equivalent to the BG format to stop data loss.

So, for me, anything in the BG format, also belongs in the Application's own file format.

The Application does the work in dynamically translating what's in your brain as you use the keyboard and mouse and stores it as static data in its own file. It then translates the static data in its own file to BG file format on export. (Unless it's equivalent to Family Historian now, in which cases there's no need for translation, got selection and exclusions for privacy).

dsblank 2010-11-28T06:33:43-08:00

Tom said:

"I think you would be comfortable with the Gramps model, which is very good for non-professional genealogy. You enter an event record for every major event you learn about. You don't have to make an anal distinction between evidence and conclusion, just add it. Then make the persons who you believe are the role players in that event point to the event. If you decide later that those are not the right persons you break their connections to the event and maybe create new persons to point to the event. If you decide that two persons in your database are the same person, you merge them and the merged person points to the all the events that the two pre-merge persons did. This is an excellent compromise in my opinion."

I mostly agree with this assessment, except for a fine point. The Gramps "merge" is a destructive process, as Tom mentions, but can be used by professionals and non-professionals alike. It doesn't preserve anything, except perhaps a log trace, of that action. Sometimes, that is the right thing to do (for example, if there was just a simple mistake or misunderstanding... there is no reason to preserve that).

However, one need not use Gramps in such a destructive manner. One can create their own scheme (such as making distinctions between evidence and conclusions, or something else entirely). Gramps has a very rich addon/plugin mechanism that allows users to create (and share) such extension schema. For example, Gramps does not come built-in with the idea of a Census, except for a Census event. A third-party Gramps user has defined a set of attributes, file format, report, GUI, and attributes for entering Census data as a form [1].

The same could be done for the Evidence/Conclusion Process. I can't think of anything that would prevent that from working, and for being a very nice environment for a "professional" that wanted to work in that manner.

-Doug

[1] - http://gramps-project.org/wiki/index.php?title=Census_Addons

romjerome 2010-11-28T10:36:20-08:00

Note, this Evidence scheme is not the most practical, common and useful for non-US (professional or not) genealogists...

romjerome 2010-11-28T10:54:21-08:00

"The Gramps "merge" is a destructive process"

but also a constructive one!
ie. two persons (or places) with two different names, will create a primary one (selection) and an alternate name after merging for the person.

As genealogical data are merged (events, names, attributes), why should we keep the ID or gender value ?

greglamberson 2010-11-28T11:01:53-08:00

Romjerome,

The posting you bring light to merely gives a difference in opinion about the style of citation of a source. It is certainly not any commentary on "this Evidence scheme," as you so broadly mentioned.

romjerome 2010-11-28T11:05:28-08:00

Note, Gramps does not remove events/sources, names/sources or person/sources after merging, they are added to the selected person. An individual could have 10 birth events with or without sources. Where is the destruction ?

romjerome 2010-11-28T11:14:09-08:00

"From the mid 1850s to the early 1900s I have found city directories to be a rich source of info about where someone lived and what they did. So the city directory is the evidence and the residence is an event taken from that evidence. It's hard to say whether that the created event is an evidence event or a conclusion event. Certainly the city directory is not primary evidence (there are often mistakes in the data to be sure), so by that tenet one would have to say the evidence is already secondary material so any event extracted from it would not be primary."

In France, city archive (INSEE code) will tag a place (event) and a source (on person, name, group, etc ...)

This means that places/events are often linked to a physical source. Event validate by a source by adding a place reference! Archives and places use a common code. This cannot work with US city directories: an other Evidence scheme. The need is not he same.

dsblank 2010-11-28T11:37:16-08:00

The destruction is that of one person object when merging with another. The discussion is about partially about process: For example, can you undo the merge? is there a trace of the merge?

The Evidence/Conclusion process is one that Tom has argued can be better supported by the file format by allowing multiple non-conclusion people objects, and an additional (perhaps) conclusion person.

So, rather than merging, one could mark a conclusion person as the result of other non-conclusive records.

ttwetmore 2010-11-19T13:54:24-08:00

GeneJ,

I'm still not sure whether you are are are not becoming a fan of evidence persons! I took your two posts to mean you were becoming not a fan!

I was considering my question to you about how you would enter data you weren't sure about without having to create a person. I think that would meet your criteria as I understand them. Which I believe is you'd like to record all evidence, but you'd like all your person records to be people that you currently believe to all be real persons.

Gramps might have the answer. In Gramps you can create events but not have any persons link to them. So you could enter a city directory entry as a new event and somehow put a note in there about the name of the person the entry was for. That event record can then just sit in your Gramps database (how you wouldn't loose track of it I can't say!). Then later, if you decide that the city directory record does refer to one of your persons you could then link the person to it. This implies that Gramps can treat a city directory entry as an event, which I would assume it can, since it really is an event, having type, date, place, and person mention.

Tom Wetmore

hrworth 2010-11-19T14:23:55-08:00

Tom,

I was almost with you, but not sure how a City Directory is an Event. Yes, it was published at a certain place and time. BUT, what Event is it? Being published?

To me, a city directory is a piece of evidence. Like a census record, it would put a person at a specific location at a specific time (time period). Not saying anything about how Gramps works or does, just that for me, the City Directory is a source of information, especially at one of many Census Substitute for 1930 and beyond. It, to me, is recording the same thing.

Now, to the topic at hand, evidence.

Some of us, I think, will enter data into our database, with evidence. Some don't enter that data until they are sure that the evidence is conclusive.

I am entering exactly what I found from my source. Complete, Incomplete, or somewhere in between. I maybe wrong here, but I do that so that IF I were to share my research with you, and you looked at my source, you would see what I recorded.

Hopefully, at some point in time, I will have a complete record.

Now, to be clear, at least for me, this is at an Event / Fact level. I will evaluate that FACT / Event to see which one do I think is the right / most complete information.

I will then repeat this process for each Event / Fact about that person. This then will help me create a 'profile' for that person. Not sure that profile is the right word, but I'll use that term profile.

I then would expand that to the Relationships that the documentation shows me. So the Evidence in the pieces will help create the profile for the Family.

If I have 'evidence' I need to evaluate its accuracy. I have some source information, that I have looked at, over time, that at first I thought it was good information actually contains bad information.

What I don't have, in my software, is a way to Mark and Person Record(s) at 'the real deal'.

I think you will find that GeneJ and I handle our data differently. Some of that is due to the Software Features that we have. In either case, the BetterGEDCOM needs to transport that data between the two applications with enough information to define want the receiving application is looking at. Of course, the sending application would have to provide the information.

Russ

GeneJ 2010-11-19T14:52:25-08:00

I add persons to my database as I locate information about them.

In a modern era, it's not uncommon for that first information to come from the census record, where they are reported as a child of someone I've previously entered.

I'd create a master source for that census and then develop a citation to support "add person," The information I'd enter for the new person at that time would be limited to what I could learn about him/her in that census.

The next evidence I seek about our "add person" would depend largely on where they were living and when they were born. I might have several census of the family that includes an entry for that "add person."

As I find more information, it has to be assimilated. That new information may change the way I understood previously entered sources; it often just helps tell more of the story.

I do use a research log and/or write research memorandum. Some folks use the logs available from their genealogical software; I haven't.

I'm unlikely to ever stop looking for more information about those who are the focus of my family compilation.

testuser42 2010-11-19T16:23:47-08:00

GeneJ said: "How each of us interprets information available from any given source can and is wildly biased by what we know (or think we know), what we know we don't know and what we just plain don't know."

Yes, that's true.
And it's another argument for seperating the information (= fact or evidence) from the interpretation (=conclusion).
The conclusion is yours alone. You may or may not trust someone else's conclusions, and may accept or "overrule" them if you take in new data.
I would hope that conclusions will be "tagged" with the UUID or even PGP-signature of the researcher that first made them. If you accept what the first guy comes up with, you can add a "me, too" signature. Or you just make your own conclusion and work with that. Then the old conclusion could be deleted, or better just marked as "unlikely" or "wrong" or something like that.

A "conclusion" person would be subject to lots of changes as you gather up evidence. The evidence stays the same. If you come up with a better interpretation, you change the "conclusion" person but no evidence.
I guess the only time that an evidence record should be changed after its initial creation is, if you've got a typo or mis-reading of the source. But even that could be a conclusion?

Another thought: Should the BG keep track of all the changes that were made (with timestamp and the editor's UUID)?
This would be useful if you want a review of the "development" of a conclusion-person. Or, as a backup, if you just want to revert to something you had earlier.
But it may lead to huge files of not very relevant data? (But zipping these should be very effective) Maybe the "version history" should be put in an second file in the BG container, to keep the main XML as clean as possible?

bye for now, gotta sleep!

ttwetmore 2010-11-19T17:16:50-08:00

These are great discussions. Trying to understand what others are thinking and learn from them is difficult but ultimately fruitful. It seems we all have struggled with many overlapping topics, but we all have our own words and particular ways of seeing things. I sure do appreciate trying to see how others see them as it often illuminates an important point that I have failed to see, as it also often illuminates the commonality of perceived problems.

Just a quick point about city directory evidence and "what it is." I view a city directory entry as an event because I tend to think of it as the evidence behind a residence event. From the mid 1850s to the early 1900s I have found city directories to be a rich source of info about where someone lived and what they did. So the city directory is the evidence and the residence is an event taken from that evidence. It's hard to say whether that the created event is an evidence event or a conclusion event. Certainly the city directory is not primary evidence (there are often mistakes in the data to be sure), so by that tenet one would have to say the evidence is already secondary material so any event extracted from it would not be primary. By the more strict interpretation of the words primary and secondary, any info taken from a secondary source would have to be called a conclusion on somebody's part. This points out some of the problems I feel we get with the terms evidence and conclusion. What I mean by these terms is "closer to the primary data" and "further from the primary data", and these are certainly not crisp definitions that would satisfy many people. It is hard to defend such relative and slippery concepts!

What I do with city directory stuff in my own database is that I will create a residence event (date, time, address if possible, occupation if possible) and use the city directory entry as its source. The fact that the city directory is probably a year or more out of date when finally published I pretend to ignore.

Tom W.

hrworth 2010-11-19T19:01:16-08:00

testuser42,

I agree with GeneJ. My conclusions may change.

But your suggested that "you can change the "conclusion" person but not evidence".

I am not sure that would be true. I am not sure that I would change my conclusion without new evidence.

My conclusion would be 'time bound' based on evidence and the Evaluation / Analysis of that Evidence at that point in time. New evidence, re-evaluation, same or different conclusion, with a new date/time stamp.

I don't think that the BetterGEDCOM should track anything. It is the transport of the information.

If the application that I am using has an Evaluation platform or utility, I would want IT to track my Evaluation and Conclusion. When I choose to share my information with GeneJ (for example) I would want to pass along How I reached my conclusion, which would mean my Evidence as well.

When GeneJ receives my information, she should be able to see how I made my conclusion. BUT, she should be able to Accept, Reject, Ignore my conclusion. That should be an application feature.

Hopefully then, GeneJ would bounce my Evidence against her Evidence and reach her own conclusion.

I can only suggest that a Conclusion WILL BE CHANGED over time. But, the application, I think, should keep track of the conclusion, or do what ever the User wants to do with that conclusion.

Russ

Andy_Hatchett 2010-11-19T19:26:46-08:00

[QUOTE]
Ancestry.com must be attacking this problem in a big way also. When you use Family Tree Maker, your people keep getting little green leaves attached to them. This means that behind the scenes Ancestry.com has discovered information that "it believes" might be evidence for the same person. They are amazingly accurate in their suggestions. I wouldn't be surprised if sometime soon Ancestry.com starting running some combination algorithms and we will benefit from their results.

Tom Wetmore
[END QUOTE]

God help us... No!!!!!

Ancestry already tried that and called it OneWorldTree.

OneWorldTree is undoubtedly the largest single database of Junkology ever produced by mankind.

GeneJ 2010-11-20T00:09:02-08:00

These really are great discussions.

To generate its suggestions, I imagine Ancestry is using its indexes and such, and comparing those against information you enter into the predefined fields in your file.

Perhaps when we get into showing how examples translate into the models, we'll understand the different approaches and positions better.

I don't ever really "stop" reading or interpreting evidence. Newly discovered evidence changes how I read or interpreted existing evidence.

I think of evidence is best interpreted as a whole, which means it's looked at over and over again.

In the 1990s, a cousin located 1829 real estate interest transfer for my ancestor, William^7 Preston b. 1780 [NHVR], and his wife, Asenath. The way all of us read the document, the interest was sold to Collins Preston, a miner of Rumney, New Hampshire.

Together with other information gathered and studied, we believed William had sold the interest to his just younger brother, Collins, b. 1782 [NHVR]. More time passes, and we collect more family evidence including photos of the family graves from Rumney. These are all diligently shared and documented, including the grave marker for Collins^7 Preston. Some of the gravestones were hard to read.

A few years later, researching in another state, we discover Collins^7 Preston's obituary. He had died in 1812.

Wow, that changes things, doesn't it.

Yet other evidence, is discovered, again researching in other states; we learn the the gravestone in Rumney belongs to Collins^8 Preston, b. 1812 [NHVR] the nephew of William^7, and the 1829 land interest transfer can now be read as Collins Preston, "a minor" of Rumney ....

ttwetmore 2010-11-20T03:50:37-08:00

Andy,

I don't share your despair. Those green leaves are almost always spot on. And I'm not forced to do anything with them.

I had a phone interview with Ancestry.com after they saw the document that started this thread. They weren't interested in getting back into the technology because of the flop of OneWorldTree. They actually said something that seemed very strange at the time and still does actually. They said that people didn't want Ancestry.com doing their genealogy for them, that is, doing the combination and providing it. They said people wanted to do the work themselves and check all the facts and evidence themselves. Well this is true of some family historians, but as far as I have seen, most family historians take everything they find without careful scrutiny, and would be more than pleased to take on trees already created for them. Personally I think people didn't like their previous efforts because of how poorly Ancestry.com was in deriving those results, not that they wouldn't like their results if they were accurate. I don't think you can use the example of OneWorldTree to condemn the idea. Ancestry.com just didn't have me to write the algorithms for them.

Properly done this technology is incredibly accurate. I would never advocate using the technology to make the conclusions and then build the conclusion persons, and accepting them without evaluation. But I would use the technology to statistically group records that are almost definitely the same person and let users pick and choose how they want to interpret the data. The little green leaves now provide this capability and I think it is a great thing, and I can only see it getting better and better. I am unabashedly a fan of Ancestry.com. A world subscription to Ancestry.com is probably the best deal in town. I can search an incredibly rich array of sources, and get more record searching done in one night than I used to be able to get done in a year. For those of you who remember what it was like to search census records twenty years ago, you know what I mean.

FamilySearch is coming along in great strides too, and they are free, but for my money, right now, Ancestry.com can't be beat as the place to go to get started. I am in no way associated with Ancestry.com except as a very satisfied customer.

Tom Wetmore

hrworth 2010-11-20T04:34:31-08:00

Tom,

RE: "They actually said something that seemed very strange at the time and still does actually. They said that people didn't want Ancestry.com doing their genealogy for them, that is, doing the combination and providing it."

Actually, I think that is true. I think that the OneWorldTree and the WorldFamilyTree projects were so messed up with Junk, and I think that Andy would agree, the some would go so far as to tell Ancestry.com to Trash, pull of their websites both of these projects.

I think that folks are waking up to the fact that it's not about gathering every name in the world back to Adam and Eve, but to get to serious research. Yes, there are still many name gatherers. But, more and more folks are getting serious about a well documented family history.

The next step is moving from the name gathering phase into the importance of the Family Historian. Meaning, putting meat to those names, the Stories of the people.

What we haven't gotten to, yet, is the Evaluation of the Evidence, and drawing a conclusion, at a point in time, of what ever piece of information in your file.

We are talking about there here, there are genealogy organizations that are talking about this stuff.

With the resources that we have today, Ancestry.com, or anyone else, are working on the ability to locate the record(s) that we, the users need and/or want.

The "newby" may want someone, like Ancestry, to do their job for them. Going to Ancestry and "give me my tree". They want it to be like Who Do You Think You Are. All done in 43 minutes, going back to Adam and Eve.

Those same folks don't want to hand enter every piece of information from every piece of evidence in one key stroke.

I think that this is an Education opportunity, hand enter that information, key stroke by key stroke. Oh, and while you are doing that, read and evaluate what you are entering.

Now get the software vendors to provide a good platform to evaluate what we have.

What might that look like: Don't know, but going to a website that sells Computers, for example, Select a couple of products and Compare them side by side. Bring up a Fact or Event, bring source material up, side by side, to see the Comparison of the information in the source material.

Then back up to the next layer, from the Event / Fact to a group of Events or Facts, and be able to ask the question, Does this make sense. The back up to the whole person, the whole family.

Do I need some new or different information to help complete the picture of that person.

How about being able to have a check list of documents / resources that I want or need to complete that picture.

Oh, I need to get or look for an obituary for this person.

Ancestry needs to provide an accurate way to return results of those types of information when I do a search. I think they are getting better.

In the meantime, this project needs to flesh some of this 'new stuff' out, and have the vehicle to transport that information for one place to another. Let Andy and I, or GeneJ and I, share our research, without messing with the data being shared.

I used those projects (OWT / WFT) information to help with road blocks. Where didn't I look, what might that family look like, based on one of those trees. I wasn't concerned about the data, as much as the hints that might be in that data. In doing this, I learned other ways to research. Other approaches to find the missing sources.

I think and hope that Ancestry, and the others, become the Educators rather than the Doers of our family history research.

Enuf rambling for now.

Thanks for listening.

Russ

Andy_Hatchett 2010-11-20T04:38:15-08:00

Tom,

I'm on Ancestry 12-14 hours a day. Those shaking leaves are fine- as long as they lead to historical records. The problem is that they also lead to those dreaded Member Trees, most of which are put up by clueless namegatherers who are neither researchers nor family historians but merely copy from other trees and add it to their own.

There is also the fact that Ancestry seems to be on a campaign to promote those trees as actual evidence!

I,also, am a huge fan of Ancestry; although I make no hesitation to rake them over the coals if they do something I consider stupid or ill thought out - which , unfortunately, happens rather too frequently -in Provo I'm known as "Infamous Andy"
*grin*

It is exactly because Ancestry allows information from those usually sourceless member trees (which also contain a myriad of biological impossibilities) to be copied from one tree to another that most of teh real researchers and family historians that I know on Ancestry are making their trees private.

GeneJ 2010-11-20T08:09:41-08:00

Lots of good comments.

As to Evidence vs Conclusion entries.
These decisions seem best made at the base software program level by the user/file compiler. Some folks enter information and sources/citations to the base program after they have more completely identified the subjects whole life. Others, like me, maintain a more research oriented base file. Still others may a mix of methods, and some of those users may have further marking techniques by which they notate or "flag" their entries.

Working together, users, technicians and developers need to being to "map" information in the current genealogy programs so that we have a record of the fields that are in use.

Me thinks. --GJ

testuser42 2010-11-19T09:35:34-08:00

Hi,
Tom, I got an 404 on your link above.
Seems like
http://bartonstreet.com/deadends/CombinationNotes.pdf
is working, though.

testuser42 2010-11-19T09:54:41-08:00

...also, I think you are not a poor explainer ;)

Having only skimmed through the PDF, I find it fascinating.

At the same time, I guess that most geneaologists will do their basic combining of "evidence persons" into "conclusion persons" manually, while entering the data.
Only when adding an external tree would an automated combination be necessary (or at least very helpful). And even then, I suppose that geneaologists will want to "seal off" or "refuse" any resulting conclusions. So the software should offer a kind of "combination report" or preview to help people do that.

GeneJ 2010-11-19T10:01:39-08:00

Humm...

Identity is precious.

I am not a fan of automated family histories.

I am beginning to think I am a fan of the distinction between an "evidence level person" and a "conclusion level person."

This seems to me a "rationale" for creating, publishing, archiving and exchanging unproven identities.

There is a place for "unproven" links in genealogical standards, at least in the US. See the published material for the parenthetical notation that precede the "possible" or "probable" child's name in the child list in a genealogical narrative. See Henry B. Hoff, Michael J. Leclerc and Helen Schatvet Ullmann, "Jeremiah1 Rogers of Dorchester and Lancaster, Massachusetts,” _Register_ 162(January 2008), 18-22, in particular for child list entries at p. 21.

GeneJ 2010-11-19T10:03:10-08:00

Ooops... beginning to think I am NOT a fan of distinction between "evidence level ..."

ttwetmore 2010-11-19T10:22:35-08:00

Test User,

I agree. But I still have a fascination with automating part of the combination.

I went to work for Zoominfo over five years ago because I heard about what they were trying to do and immediately realized that it was an analogous "problem" to the genealogical evidence and conclusion problem. I became the software architect and implementer for all of that combination software. Over the years I implemented those combination algorithms a few different ways, as we learned a heck of a lot as we went along, and I it was a great experience. There are some fascinating problems that came up along the way. Consider, for example, problems you might get from nicknames, from people who have the same name as mega-celebtities (it basically sucks to be automatically recognized on the web if your name is Michael Jackson, just as it sucks to be a genealogist researching the name John Smith). Somewhere along the way I got wind of the "nominal record linkage" process used by historians to try to reconstruct families from church registers, and realized this was a "by hand" version of the same basic algorithms. I don't think having these automated algorithms is all that important, but it would be good to be able to suggest when a database has many records that might be the same person and help making conclusion objects from them. I think the user interface for this could be really cool, with evidence records shown as slips of paper or as index cards, that users could move around on their desktop and form group out of. I think this kind of user interface is so compelling for a genealogists that i've done some work on putting together some infrastructure for supporting it further on down the line.

Ancestry.com must be attacking this problem in a big way also. When you use Family Tree Maker, your people keep getting little green leaves attached to them. This means that behind the scenes Ancestry.com has discovered information that "it believes" might be evidence for the same person. They are amazingly accurate in their suggestions. I wouldn't be surprised if sometime soon Ancestry.com starting running some combination algorithms and we will benefit from their results.

Tom Wetmore

ttwetmore 2010-11-19T10:49:11-08:00

I am beginning to think I am NOT a fan of the distinction between an "evidence level person" and a "conclusion level person."

This statement certainly sits well with all authors of all current genealogical systems.

My only question is this. If you only want proven persons in your genealogical database, where will you keep all your evidence about persons you haven't proven yet?

You go to Ancestry.com and find a city directory entry for a person with the same name as someone you are very interested in, but you are not yet sure this record really refers to the same person. It could, but you're just not sure yet. What are you going to do with that item of evidence? I assume you want to record it in a software program somewhere and not leave it as a slip of paper on your desk.

Tom Wetmore

testuser42 2010-11-19T11:02:59-08:00

GeneJ, what do you mean with
"This seems to me a "rationale" for creating, publishing, archiving and exchanging unproven identities."
I can't seem to follow you there. Why should this be the case?

I would argue that seperating evidence and conclusion is making everything more transparent.
So that, if someone puts an unproven "conclusion person" out on the web, it will not accidentally be taken for more than it is. If there is evidence, it would be shown, with sources you can verify.

A "conclusion person" should only be taken into your data if you are satisfied with the evidence, sources and the reasoning behind it. That's why I wouldn't worry about some other people's "unproven persons". I wouldn't take other peoples GEDCOM into my trees now, either. I would look at what they've got, and decide on bits that I believe to be helpful for my research. A BG with strict seperation of evidence and conclusion would make this easier. Especially if all the evidence is backed up by things like pictures in the BG.

So, don't worry about "conclusion persons" - treat them like you would any GEDCOM files on the web now.

On the other hand, lots of "evidence persons" on the web would be a real treat. Really, every little piece of evidence that is on the web should be in a format that cites sources. The persons you can get out of this evidence are "evidence persons". A BG format that can hold sources and evidence persons would be natural for storing that. The software would let you import the evidence, and you can then decide if you believe that one of the people mentioned, is someone you have in your tree. If you want, you can link your "conclusion person" to the "evidence person" from the file you found on the web. If you include your reasoning on that link, it might be helpful for you later, or for other researchers. But if, later, you come to another conclusion, just unlink it (even better, add a note as to your new reasoning). So, no data corruption, no harm done.

If every little disconnected piece of "evidence" on the web (images of headstones, inscriptions, transcriptions, old photos...) were in a BG-kind of format, these bits and pieces might actually be much more useful for researchers!

I'm really just a user and don't know if future software will work like this at all. But I hope it would be possible with a BG to implement these things. I'm very much in the cheering section here ;-)

GeneJ 2010-11-19T11:07:05-08:00

Hiya Tom:

I do enter most "probably relevant" entries in my computer software. (Relevant includes notations that similarly named persons or families resided in an area.)

I have an "Event" that is labeled "Research note"; it has a place for date, location, memo field and citation.

You wrote, "If you only want proven persons in your genealogical database..."

I set out to prove the identity of a particular set of individuals (basically my direct ancestors' families and four generations of descent).

testuser42 2010-11-19T11:10:42-08:00

Tom -
yes, it would be VERY cool to have some kind of "suggestion" system that's really smarter than just a name comparison! And showing it in a graphical way as snippets on the screen could be very intuitive. I'm that kind of person, my mind seems to make connections easier if there is something physical to move around in front of me. Can't wait for the programs that do things like that :)
(can you code this into GRAMPS one day?)

greglamberson 2010-11-19T12:18:09-08:00

Testuser42 said, "If every little disconnected piece of "evidence" on the web (images of headstones, inscriptions, transcriptions, old photos...) were in a BG-kind of format, these bits and pieces might actually be much more useful for researchers!"

I think you've really hit on it here. Imagine there being little bits of information on the internet that is formatted specifically for genealogical use in a syntax that nearly every computer can understand (i.e., XML) pointing people to sources exactly referencing, potentially, their ancestor or relative.

WOW!

Just having your properly sourced BetterGEDCOM file ANYWHERE on the internet would be contributing to a metalibrary of genealogical information of gargantuan proportions! Truly, the implications are staggering.

greglamberson 2010-11-19T12:31:42-08:00

GeneJ said, "
I am beginning to think I am a fan of the distinction between an "evidence level person" and a "conclusion level person."

This seems to me a "rationale" for creating, publishing, archiving and exchanging unproven identities."

Let me suggest to you that all technology advances make it easier to do genealogical research badly (or not at all, relying on "automated" mashing together of information). Is this a reason to stop making technological advances?

In fact what we're talking about here is standardizing a way of adding specificity to data (or rather to metadata, the description of the actual data). How can that be bad?

I again state that while using "notes" in genealogy databases may work for individual researchers and particular cases, the practice in general represents a failure to adequately categorize genealogical data in proper fashion so that the information can faithfully be conveyed to others.

GeneJ 2010-11-19T12:45:43-08:00

How each of us interprets information available from any given source can and is wildly biased by what we know (or think we know), what we know we don't know and what we just plain don't know.

ttwetmore 2010-11-10T09:19:49-08:00

I failed to mention in my initial post the hopefully obvious point, that I am using combination example as another way to strengthen my plea that any new genealogical data model must encompass the research process by including both evidence records for people (what have been called person records, persona records, nominal records) and conclusion records for people (what have also been called person records [unfortunately] and individual records).

Tom Wetmore

dsblank 2010-11-10T09:27:21-08:00

Yes, a file format should be able to distinguish between these two types of records (if that is important to you). Of course, all file formats probably can encode this information (say through the use of an object attribute). The issue is how to create a standard method of doing so.

There may be room for a specification above the level of the file format for such type of uses. I can think of many "standard attributes" that could be defined.

ttwetmore 2010-11-10T15:22:26-08:00

Responding to dsblank...
I agree. The evidence level person/persona/nominal record can have the same basic format as the conclusion level individual record. I think that this simple fact is the basis of much of the difficulty in trying to explain the differences between evidence records and conclusion records. It is either inherent in the problem, or just an symptom of how poor an explainer I tend to be. However, I don't think they have to be the same.

I will start another thread soon where I provide a link to my DeadEnds data model. In that model I do use the same record for the evidence and conclusion concepts, but I explain how you can tell the difference.

Tom Wetmore

ttwetmore 2010-11-10T10:16:48-08:00

Please Let's Use UUIDs for Better GEDCOM's Record ID's

I wish to propose that the better Gedcom effort decide up front to use UUID's (universally unique identifiers) as the id references for every record created within the eventual better Gedcom model.

As a quick introduction a UUID is a 128-bit number usually displayed as a string of hexadecimal digits, though this is not necessary. In DeadEnds I use UUIDs encooded as a 22-character string because I like to limit space (most hexadecimal forms are 36 or more characters in length). A UUID has a VERY INTERESTING property (I really want to say a very interesting and UNIQUE property) -- every UUID created by UUID generators will be unique until the end of the universe. Some of the consequences of this include ...

Databases with records indexed by UUIDs can be merged without ever worrying about id clash.

Standard databases covering fixed applications, e.g., the Royals of England or the presidents of the U.S., can be created and provided as drop-ins that will simply add to an existing database trivially.

Every one of us could create our own personal better Gedcom record and we would know we will always be unique! What could be grander than that?

There is a big reason from the Gedcom world to make this recommendation. One of the nastiest problems with importing a Gedcom file into an existing database is resloving the id problem. Most programs that generate Gedcom files are very uncreative in how they assign id values, usually being something like I1, I2, I3, ..., F1, F2, F3, ... You can imagine the problems this causes when one imports multiple Gedcom files to the same database. And a funny thing I have discovered is that many users of databases take a very proprietary interest in their user ids. Even though these ids should be invisible to the users, some systems always show them and allow users to search using them. When people export their data to another system they expect to find their old data to show up with their old set of id's. Woe unto them. (Most people don't know that Gedcom uses the REFN tag for a user-defined unique identifier; oh, well). Certainly using a 38 character id, or even a 22-character one, will quickly break users from the habit of memorizing the ids of their "important people." Oh, how cruel it is.

Anyway, that's my plea.

Tom Wetmore

geni-george 2010-11-10T13:15:56-08:00

Let me play devil's advocate on this..

Suppose you create a profile for yourself with a UUID that has minimal/sparse information. And let's also suppose that your mother creates a profile for you with a different UUID that has a very comprehensive timeline of your life, tons of relevant facts and information...

Your mother's profile of you is probably much better for the genealogy community as a whole; so if, during a merge, her profile is declared the "master" profile, doesn't that render your original UUID to be irrelevant?

We have to work around this issue at Geni, and we have developed some great ways to do so...but for this kind of merging to be able to be shared across all sites/applications/platforms, there would have to be some kind of central, authoritative database of UUIDs and relationship data showing how they relate to other UUIDs.

For example, if my UUID is 155, and in the master database my mother's profile of me is UUID 220, and it is generally decided that her profile of me is much more authoritative, then the database would have to store the relationship of "UUID 155 has been canonicalized to UUID 220", and all interactions with UUID 155 would have to check this, and then "redirect" or update UUID 220 instead.

I hope that makes sense.

ttwetmore 2010-11-10T15:12:26-08:00

Responding to geni-george.

Good points. If there were two individual (conclusion) level persons that turned out to be the same real person, merging them would definitely make one of the UUIDs irrelevant and would remove it from the pool of UUIDs forever. I haven't thought about what problems this would cause if other databases had one or the other or both of these individuals also. It is interesting to consider. But I sure would not be in favor of an authoritative database of UUIDs to maintain relationship data. Unless, of course, FamilySearch came totally on board and were interested in providing that service. What an interesting idea though. Yes, I understand what you are saying! Thanks.
Tom Wetmore

ttwetmore 2010-11-10T15:14:15-08:00

Responding to greg ...

Yes, exactly, each record level object in the model/file, more precisely any object that needed to be referred to by any other, would have a UUID assigned to it. For all practical purposes I think this applies to all top level record concepts.

Tom Wetmore

geni-george 2010-11-10T15:26:25-08:00

@Tom

Out of curiosity, why wouldn't you be in favor of an authoritative database unless FamilySearch was the provider?

I'm almost of the opposite opinion; I wouldn't be in favor of an authoritative database unless it was completely independent and agnostic.

I don't know how these types of things work, but I would assume the operation would be run by an organization similar to ICANN.

-George

gthorud 2010-11-10T15:59:52-08:00

ttwetmore wrote: A UUID has a VERY INTERESTING property (I really want to say a very interesting and UNIQUE property) –
every UUID created by UUID generators will be unique until the end of the universe.

Well, Unique feature? There must be hundreds of such unique id schemes created by various organizations over the years :-)
For reasons of robustness long term, I think several schemes for unique id assignment could be identified as part of the id, eg.
a number or string in addition to the UUID..

My intuition tells med that a unique ids (UIDs) is a good thing – in principle. It will most likely have many applications,

I don’t think that one should prevent users from selecting objects e.g. persons by entering a short ref number – it is an efficient way of working!!!
I would not use a UUID as such a refnumber – so I don’t think there is a real conflict here, there can be several ref numbers.

One (or more?) UIDs could be assigned to many things: personas, places, sources, relations, media, collections of gedcom info (eg
a file or part thereof). Geni-george has mentioned one application, I think a similar scheme was discussed here some years ago, a historic birth identification number. Another application could be to identify the endpoints of a relation, and maintain it, between entities in different files or databases. This would allow me to e.g. establish a relation between an object in my own data and an object in a gedcom file I have received from someone. If I receive a new version of the gedcom file, I will be able to automatically re-connect these relation between the files. The relation could be something different from a family relation, eg a relation to a source.

An identifier should be accompanied by a “last change” timestamp, although I see that there may be issues related to “Does an
update change the ID”. Also, when used in a reference, the type of record may be needed.

The new standard will most likely need a general solution for global identification of identification schemes.

xvdessel 2010-11-12T08:42:28-08:00

Some reflections,

Updates may NOT change the UUID.
If you want a new UUID, you should clone the object (e.g. to separate 2 identically named but different individuals). Optionally, you could destroy the old one if it no longer makes sense.

UUIDs have many advantages:
- if you receive an updated file from someone that already gave you some data before, the UUIDs can ensure you that the right records are updated with the right information (assuming the receiving software keeps trace that his internal UUID 1234 contains matching information from imported record UUID 5678 which is now being refreshed). Without UUID, the software would need other fields to match the records, which puts limitations on which fields you can freely modify.
- Assuming the BG format supports a feature I would call "traceback source", it could allow to find redundant information when synchronizing with 3 people: If A exports to B, then B exports to C, the import for C could then mention both the UUID within A as well as the matching UUID within B (provided B confirmed the match). If C then synchronizes with A, the UUIDs can still match as they are known on both sides.

UUIDs are great for data built by individual researchers. The key concept here is that it is the researcher that creates and owns his version of the object. They are not a good idea for data that is supposed to be common knowledge. The reason Soundex was invented, was that people wrote identical items in different ways but referenced the same. Hence, a first name as such (e.g. John) should not become a UUID (although the event where a person receives that name could!). The same holds for locations. I have started a discussion around locations on another part of this forum. Users should judge whether a person in their database matches a person in someone else his database, and then log that as such. They should not have to do that for locations (or time-locations as explained there).

Feel free to join the location discussion here:

http://bettergedcom.wikispaces.com/message/view/BG+Data+Model+Discussion/29953155

Xavier

AdrianB38 2010-11-30T13:14:46-08:00

Afraid I don't agree with this premise (the use of UUIDs as record ids).

1. Feel free to have an optional UUID - obviously some people desire it, so let's the possibility.

2. Can any hardware and software platform generate a UUID? Could a Smartphone version of BG compatible software do it? An iPad? If not, it cannot be mandatory (here or anywhere).

3. I really can't see its value. What am I missing?
If I create a Person record for John Bruce, married to Annie Bruce in Scotland, with a UUID, and someone else creates another Person record for John Bruce, married to Annie Bruce in Scotland, with a UUID of its own, the UUIDs don't help me decide whether he is the same John B or not.
If I export my Person record for John Bruce, with its UUID, and eventually get a BG format file back with a John Bruce having the same UUID, what use is this to me? While I know where the Person record started (me), I still need to look at all the data that's been added because what's changed since it left me could be sheer garbage. Or sheer brilliance. Assessing that doesn't seem to be help by the UUID.

Xavier referred above to matching new data against a previous import. My view is that surely this has to be a manual process because how else can we do a sense check on the import? And what if they've done something that results in a new UUID for the same real-life person (as will happen all the time in the evidence and conclusion model)?

4. I do like Tom's comments at the start about deliberately obscuring the Ids. However, on innumerable occasions, I have needed to check when doing a merge that I've got the right John Smith - to do this, I use the Id and checking whether I've got John Smith I4472 or the wrong one, is a sight easier than checking the full length of a UUID.

So I still prefer ordinary numbers to UUIDs

greglamberson 2010-11-30T13:31:13-08:00

Please don't mix up UUIDs and personIDs (or any other identifier) assigned by a particular application. They're totally separate concepts. PersonIDs assigned by applications also have to be accommodated separately. UUIDs certainly do not replace the PersonID number an app gives.

The idea behind a UUID is that it creates a way to uniquely identify the data record that is independent of the data. Any data record handled by BetterGEDCOM is considered a separate thing under any circumstances. That's it. If an application wants to use this UUID during the import process or it already implements UUIDs for records and decides to use its number for an export, that's totally up to the developer.

dsblank 2010-11-30T14:25:12-08:00

UUIDs are necessary for a variety of functions. But as Greg says, don't confuse it with the PersonID. In Gramps, the PersonID (eg, "I0034") can be changed by the user. In fact, we don't even require that they be unique in a single database. The UUIDs are not for humans, and should not be optional.

In Gramps, we are planning a use of UUIDs that is similar to the distinction made here regarding conclusion and hypothesis people. If you find a UUID in a different database that is the same person as someone in your own, then you want to be able to keep the association between UUIDs over time. There may be a whole set of UUIDs that map to the same person.

BTW, any smartphone or calculator can create a UUID.

AdrianB38 2010-12-01T13:31:00-08:00

If any known platform can use UUIDs, then that's better than I feared.

So long as we can have a user-friendly key that I can read in a merge or whatever to tell which John Smith I'm dealing with, then that's fine by me.

So use UUID as an internal "record" key by all means, but also make provision for a simple, readable id number that can be used to assist the user to tell the difference between the different John Smiths in the file. That readable id then needs to have space allocated for it in the data model for 2 reasons -
1. To be able to round trip stuff out of your BG compatible program and back in (e.g. for back up or whatever)
2. Because some of us use software that uses GEDCOM as its native file format and it's logical that the next generation of those apps will use BG as a native file format.

(Having said that, are user defined fields in properly constructed XML easier to add if one has the right DTD or whatever?? That's where my knowledge fails)

hrworth 2010-12-01T17:09:37-08:00

Adrian,

What is a "User-Friendly key"? What does the User have to know?

The "platform" needs to, but I don't think the End User does.

Russ

AdrianB38 2010-12-02T14:40:57-08:00

"What is a 'User-Friendly key'?"
Something like P1234 where "P" means Person and 1234 is a simple number, roughly between 1 and the number of people who've ever been in the database

In contrast to the UUID which is a 24(?) digit number so somewhat tricky to read and check.

"What does the User have to know?"
Depends on how the application programmer wants to design it. Potentially, the application programmer ignores the user-friendly key and so the user doesn't use it.

Personally, if I were the application programmer, I'd stick the user-friendly key into the files and software so that the first Person that the User inputs, automatically gets a sequence number P0001. The 2nd gets P0002. All automatic. Display the user-friendly key against each person but do not allow the user to update it.

The user-friendly key comes into its own when the user realises that this John Smith here is the same as that John Smith there and wants to merge the 2 people into just one (assuming the application programmer has coded this). I'd make a note from a printed report (say) that the first John was P4472 and the 2nd was P4498 - then when I select the 2 John Smiths for merge, I'd double check those keys to ensure I've got the right two. Otherwise I'd be looking (say) for the John Smith that was born in 1820 and for the one born in 1823 with parents James and Mary - and if I try to do that, inevitably I get things wrong or find that there were two in 1820 or....

This is one possible way that a user-friendly key could be made visible to the user and could be used. If it's not visible, then there's actually no need for it, but I think it can have its uses in trying to easily distinguish which John Smith (or whatever) the user is dealing with.

GeneJ 2010-11-10T10:38:28-08:00

Not a techie, but I love this idea!

In SecondLife(R), UUID's are assigned to all things brought in or created in-world.

greglamberson 2010-11-10T12:36:35-08:00

Just to be clear, you mean the UUID should be applied to the record for each person/individual/nominal record/whatever we call it, right?

I think this is what you're saying, but this isone of those things that's right on the edge of easily understood and highly technical, so it's always best to clarify.

fisharebest 2010-11-11T03:09:24-08:00

Questions, questions, questions ...

To what extent with the file format be dictated by the data model used by the application? You might have one application that uses the GEDCOM concept of INDI. You might have another that has separate concepts of "person" and "individual" (persona, person-fragment, individual-record or whatever you call it). Should the file format attempt to include both structures, or should applications change their data model to fit?

Genealogy files/databases can be large, and XML is not a compact data format. Many XML tools require that the entire file is loaded into memory. What is the rationale behind requiring XML? Do we want files that can be archived (compact) searched (indexed), processed sequentially (objects occur before references to them), etc.? Were other formats considered?

There's a lot of GEDCOM already out there. The project would have a much better chance of success if the goals included conversion tools GEDCOM=>BG and BG=>GEDCOM.

Has any thought been given to linked/distributed family trees? Traditionally, genealogist want everything on their own computer, in their own file. Today we live in a networked, connected world. Does this monolithic approach still hold?

As well as traditional genealogy (family trees, one-name studies, etc.) should/could BG be used for storing the raw data/historical records. e.g. a parish register (baptisms, marriages) could be stored as a series of "mini" trees containing three/two individuals, instead of the traditional tabular format? Combined with UUIDs and linked/distributed trees, this could be a tremendously powerful way of searching/linking/connecting with other research.

In some other threads, the idea of transferring data using APIs was suggested, and dismissed. If the purpose of BG is simply an archive format, then fair enough. But if the purpose is to link/transfer data through distributed systems, then why not?

Greg Roach

fisharebest 2010-11-11T06:11:38-08:00

<<Converting BG to GEDCOM, however, is another matter.>>

...presumably because it will use concepts (such as person-fragments, assertions, etc.) that don't exist in GEDCOM *or other existing applications*.

This is the point I was trying to make in my first paragraph - about the data format/structure driving the application that uses it.

If the data is in a structure that cannot map to the structure used by an existing application, then you are not going to be able to add a "export to BG/import from BG" option to it.

Therefore, can we infer that this is a file format that will only be used by a future generation of applications?

Perhaps I could ask the question another way. Is this file format aimed at the existing mass market applications, such as FamilyTreeMaker, or is it aimed at "professional applications" for "serious genealogists"?

<<BG will use XML ... it is one of the few points that have been decided upon for this project>>

I didn't mean to be argumentative. I was just asking. I have only just found this site, and wasn't part of any initial discussion. All I know about BG is what is what I read on this wiki/forum (and yes, I *did* read all of it before I posted!).

My experience of XML is that large files, with lots of internal references (i.e. a typical genealogy database), tend to require a great deal of cpu/memory to process. It was a valid question to ask - and relying on Moore's Law is an acceptable answer ;-)

<<why would anyone want to essentially lose information by converting data in the new BG format down to GEDCOM?>>

I am no fan of GEDCOM. I abandoned it in favour of a relational database some time ago. I have a few stored procedures that do *exactly* this type of lossy export down to GEDCOM. Why? Because there are many great tools - especially for displaying data (charts, reports, web publishing), that require it.

Until every tool/product/applications supports BG, you will always want to be able to "downgrade" to GEDCOM.

Greg

greglamberson 2010-11-11T06:50:02-08:00

BG is meant to serve the same purposes of GEDCOM initially, only better, with plans to move far beyond this goal in the future. BG is meant to be a universal standard developed in a practical manner to foster adoption by every possible genealogical application without pressing existing applications to completely change their data model or other such radical intentions.

A great deal of the issues you bring up are issues for individual software developers of individual applications. We will have no control over a great deal of these issues.
However, we exist to involve the community, build support, develop a usable standard, codify the standard with appropriate standards bodies and afterward foster its adoption through a variety of methods.

Regarding to XML parsers and the required use of memory, these are implementation issues beyond the scope of the project. However, structurally, there is no real difference between GEDCOM 5.5 spec Chapter 1 (a.k.a. GEDCOM SYNTAX) and XML. I cannot speak to it beyond that.

Let me suggest to you this idea: Many of your questions seek answers. I would posit that your purpose here is to provide those answers, not to receive them. Some of these basic questions are pretty obvious no-brainers, but some are issues which I hope you will jump in and help answer.

Regarding downgrading to GEDCOM, every application I know of currently allows GEDCOM import. There would be no reason for existing applications to abandon this option. Thus I can't see how downgrading from BG to GEDCOM would be necessary. However, as I said, even with some data loss or ambiguation, I'm sure someone somewhere will tackle it out of sheer academic interest if nothing else.

Greg

hrworth 2010-11-11T07:19:56-08:00

fisharebest,

Just a couple of comments on your earlier message.

You ARE part of the initial discussion. It started 2 days ago. Yes, a couple of us have talked about it, but this is the beginning of something new or more importantly Improved.

From a User point of view, and that is what I am, Nothing has been dismissed nor accepted.

This 'project' started because genealogy software users want to share their information with other researchers.

I leave it to the technologist joining this community to figure our HOW to allow me to share my research. The development folks of software programs, hopefully, will be joining this community. Those developers should include personal computers programs and web based applications.

Thank you for joining us and look forward to your participation.

Russ

Andy_Hatchett 2010-11-11T09:26:14-08:00

My personal opinion on BG to GEDCOM is- absolutely Not!

The sooner that GEDCOM can be relegated to a place of NO importance to the genealogical community the better off said community will be.

I long for the day when some software developer produces a genealogical program that can directly import (by whatever means)from other major programs and doesn't even offer GEDCOM import. That will be a day true significance.

GeneJ 2010-11-11T09:29:23-08:00

Not a techie here, but I favor only requiring the genealogical software be able to "round-trip" the export file it creates.

GeneJ 2010-11-11T09:29:24-08:00

Not a techie here, but I favor only requiring the genealogical software be able to "round-trip" the export file it creates.

geni-george 2010-11-11T11:49:33-08:00

@Greg - when did people start making decisions already? I thought this was blank slate. I'm not opposed to XML but there are a lot of really bright engineers involved with genealogy, so it might be worth giving them some time to contribute before making a decision like this (which, potentially, could have implications).

greglamberson 2010-11-11T12:34:37-08:00

geni-george: What decisions are you referring to? It's hard to answer your question without knowing what you're referring to. XML? IS that it? I frankly don't think there's any reason to pretend to debate about the use of XML. We're using XML. If there is significant groundswell around something else, then great. I'm not the absolute boss by any stretch of the imagination, but I'm also interested in moving the process along. When I hear even one serious comment about the use of some other data syntax besides XML,l I'll gladly listen.
Besides, if someone has another data syntax they wanted to use, it's extremely easy to convert from XML. One of GEDCOM's biggest problems is its proprietary syntax. Had GEDCOM made it to an XML data syntax, I seriously doubt whether all development on it would have stopped.

kiwi3685 2010-11-11T12:35:46-08:00

geni-george

You asked "when did people start making decisions already? ". The answer is - further back in this discussion greglamberson clearly stated (I presume on behalf of BG) "However, BG will use XML. It's not the answer to everyone's problems, but it's a pretty great tool, and use of it is one of the few points that have been decided upon for this project."

geni-george 2010-11-11T12:47:31-08:00

Sorry guys, I wasn't familiar with how the wiki works. I have since found the discussions on other pages. It seemed like the decision was coming out of left field.

I'm going to agree with the GRAMPS guy, it's really hard to follow everything in this style. You may want to consider alternatives, because the sheer amount of emails that I receive (when I'm not even following everything) is overwhelming, and I'm used to doing as many emails per day as anyone.

gthorud 2010-11-11T14:55:57-08:00

I could not agree more. After two days, this wiki is on the fast track to chaos. Consider eg. the subject of this topic.

greglamberson 2010-11-11T21:08:26-08:00

Give it time. Frankly I think things are moving along extremely well.

A wiki, particularly one with a topic of such interest to quite a number of people, is pretty much supposed to be chaotic, especially during its second day.

If the email notifications are too much for you, remove them. Just check back periodically as you like. Things will also calm down a bit over time. The most important thing right now is for people to participate.

While it may take a few days before we can get to properly modifying the main pages to reflect the onslaught of discussion comments, there has already been very significant progress on several fronts.

The discussions taking place are really, really useful in several discussion threads. Others are more general or even repetitive. However, a wiki is all about participation, and the more people come here and actually add comments the better. Of course it's going to be chaotic in the beginning. We've had over 2,000 visitors with over 12,000 page views in barely over 2 days!

greglamberson 2010-11-11T03:52:54-08:00

Fisharebest,

I'm not sure what you mean exactly by the terms you're using. I consider "file format" to be analogous to the GEDCOM file format as described in Chapter 1 of the GEDCOM 5.5 standard. This portion of that standard corresponds pretty much exactly to the function that XML would provide. It has nothing to do with the actual genealogical information inside.
As a practical matter, the data model adopted was to map to data as organized in the software packages that exist today. We also want to accommodate better genealogical methodology and practice, and one of the key questions is can both of these goals be achieved at once or is an incremental approach required.
Regarding XML: XML data syntax is almost exactly like GEDCOM's data syntax. The differences are:
1. No one else uses the GEDCOM syntax;
2. There are tons of tools that use XML that would be of great benefit to software developers were they to adopt an XML-based data syntax underlying genealogical data import and export; and
3. There are many already developed and easily usable extensions to XML (e.g., GML or Geography Markup Language) that could be adopted for use in plug-and-play fashion, which would be a huge boon to developers, and by extension, genealogy program users.

Is XML perfect? Nope. However, BG will use XML. It's not the answer to everyone's problems, but it's a pretty great tool, and use of it is one of the few points that have been decided upon for this project.

GEDCOM will be easily converted to the new BG format. No one here is committing to developing these tools, but such tools already exist for several of the already-existent XML-based genealogy formats out there. This will be no problem.

Converting BG to GEDCOM, however, is another matter. There are many features that will be in the new BG standard that simply aren't available in GEDCOM. Some developer with insomnia will probably tackle this project at some point, but frankly, if we develop a standard that is widely adopted and use as is the goal, why would anyone want to essentially lose information by converting data in the new BG format down to GEDCOM?

Regarding your other questions, I would ask you to simply read some things around here, as we have very clear plans regarding communications and other aspects that I'm simply not going to retype here.

joehcole 2010-11-11T13:51:20-08:00

Using sources as building blocks

I've been thinking about the questions raised by Tom Whetmore and others about evidence / conclusion based approaches to the model. It struck to me that the fundamental building block of geneology is the source, and that we spend a lot of time looking at a single source and then extrapolating a number of different 'facts' that go into different places in our geneology software. Therefore, would it make sense to build the model using the source as the basic entity?

Playing around with this idea I created an xml document on this basis, which describes the data extrapolated from a single marriage certificate. It's fairly rough and ready. xml is here - and a copy of the certificate is here

The basic structure is...

><source>
>><event>
>>><role>

...so we end up with a series of events as sub-elements of the main element, which is the source itself. Individuals appear within roles, which are sub-elements of each event.

So, in our marriage certificate example we have:

*A few tags describing the source - i.e. the name, type, repository, dates of the event and of the document

*A list of events we can derive from the source, containing only the information that is contained within the source itself - i.e. the marriage, the births of the protagonists, occupations, names and so on.

*Individuals appear only as ID's - everything else about them is contained in an event (i.e. including names). When I created my example I presumed that the 'personid' that links an individual to a role would be unique to the whole file, so we could indicate that someone appearing in one source was the same person who appeared in another. However, if we made the 'personid' unique only to the source, we could model the data without requiring any 'conclusions' at all. It would mean that a chunk of data from an online source could actually be stored using the same format, making it easy to collect and drop records into our project.

However, there could be a place for a separate Person entity which contained attributes that allowed us to link to external ID's, such as an individual on a tree on ancestry.com or geni or something.

If we had a record of this type for every source we used in our project we would be able to describe it in its entirity.

It would be still be possible for software to parse the file and build up a list of individuals and their events and relationships even though the model is structured around the source. What we would have would be multiple records for a single event, one for each source that refers to it (i.e. my example contains birth events for the protagonists in the marriage; our database could well contain similar birth records extracted from parish records, censuses and so on.)

The question is, can we still build a 'conclusion' based tree on this basis? It would probably need attributes that ranked the reliability of each 'fact' within an event, or a way to mark an event from a particular source as 'Primary'. Or would all this require too much processing within genealogical software?

xvdessel 2010-11-15T04:17:15-08:00

Russ,

Remember I asked you whether BG should serve exchange between researchers as well as transfer from one software to another. I think you said that BG should support both.

Now, if one day software DOES store the complete process of sources, analysis, decision and conclusion, then BG should be capable of supporting it for the sake of data transfer when you decide to change from one software to another. That does not mean BG will require each software vendor to store the complete process: the source/deduction/conclusion model should be an optional component of a BG solution. Also, software should offer the end user the choice of the data to export: process data or only conclusion data.

Xavier

xvdessel 2010-11-15T04:44:44-08:00

Joe,

I'm not sure whether your model allow for an iterative decision approach, which I think is sometimes needed. I will try to explain what I mean by this.

Assume a set A of birth records, all with same names for father and mother, and all in one location. A fist deduction step would then be to combine these, i.e. unify the father in each of these birth records, as well as the mother. each birth record also lists a godfather and a godmother.

Now you build a list B of other birth records in an earlier period (same family name as the father of list A). These ones again have similar father & mother names, so again you make a decision to combine that family. At this point, the 2 families (one from list A and one from list B) are yet unrelated (except the family name matches, but that does not prove anything).

Lets say you find a marriage which you can relate to one of the children born in list B. He marries to a lady named X1. Decision 3 is taken.

Now comes the point where you want to prove that the father of list A is one of the children in list B. And this decision is a complex one, as it builds on multiple name matches (first names and family names), including lady X1 who appears as godmother (and thus we count on decision 3), but also on location and plausible dates/ages. This is decision 4. Depending on how strong the matching elements are and the plausibility, the researcher can then rate the confidence level of that decision.
Sometimes, missing information can also be a valid source! In the above case, it would strengthen conclusion 4 if there are no other births for a similarly named baby within a similar timeframe and in the locations in that area. The fact that the researcher searched for it, and did not find anything, is a source element as well.

A good model should be able to store all this. Note that decision 4 is NOT related to a source as such, but rather to other decisions (1, 2 and 3) and the conclusions they already provide.
Even further: if later on, the researcher (or a colleague who received that data) should be able to decline decisions 1, 2 and/or 3, whereby the system should flag decision 4 as an impacted decision: it requires a new examination as at least one of its pillars has been removed. Maybe decision 4 can still hold, but with less confidence.

Xavier

hrworth 2010-11-15T04:54:53-08:00

Xavier,

OK, but I don't think we are trying to define how the genealogy application works.

The User should be able to define What is 'exported' / sent and was is received and what to do with what is received.

I should be able to define Who, in my application, What information about that "Who" is sent.

I should be able to define what to do what that information when I receive it, create a new file, bring it into my existing file.

If there is any information that is 'dropped' / not available for inclusion into the new or existing file, I should be notified with enough information that I can find what was dropped.

Sorry, don't know what "conclusion data" really mean.

If you mean that a genealogy program allows the user to draw a conclusion after evaluating the sources and has records to reflect that evaluation and conclusion, then you bet, it needs to be sent along.

Now, if my application 'doesn't know what to do with that information, because my application hasn't gotten there yet, I should be notified that the "sender" had Conclusion information on these Sources, and have dropped that information. But, if you, the User wanted to see this conclusion information open this file and see the information that was not included ....

Russ

GeneJ 2010-11-15T08:14:22-08:00

A lot to chew on!

I'm hoping BetterGEDCOM will require programs be able to "round-trip" the user's data. Said another way, if a user chooses "full output," then that "export" should be flawlessly re-imported by the same user to the same-versioned software.

Now then, if another person imports the "full output," shouldn't ALL the sources in that file become "source of the source."

Said another way, if I import information based on sources Russ worked with, when I import a BetterGEDCOM of his file, shouldn't the preferred import mechanism show Russ' BetterGEDCOM as the source and cite his source as my "source of the source?"

GeneJ 2010-11-15T08:18:57-08:00

Err... "shouldn't ALL the sources that were ported into HIS BetterGEDCOM be become the "source of the source" to me? ... Said another way..."

testuser42 2010-11-15T09:41:21-08:00

GeneJ, yes, I guess that should be the case. But that's easy, isn't it? Just put the whole imported BG in a new <source> tag.

hrworth 2010-11-15T09:51:00-08:00

GeneJ,

You bet. In your software, you should see all of the Facts / Events that I provided, with my as the source of that data, AND you should see where I received my information from.

That does two things: 1) evaluate my data, and 2) be able to see where I received my information from. Bouncing these against your own data, allow you do evaluate the combination of data.

It might provide you with places to search that you haven't searched. See what I saw.

Russ

hrworth 2010-11-15T09:54:25-08:00

testuser42,

In addition to putting a BetterGEDCOM source (information about my file, who I am, contact information, and date of import), it must not change or deleted the Source-Citation on the data elements that are in my file.

Russ

testuser42 2010-11-16T06:17:34-08:00

Russ, you're absolutely right.
Maybe it could be like this:
If I import a BG from someone else, it gets put into my BG-Container exactly as it is. In my BG-XML there's only a link to this source, just like there would be a link to a JPG-source.
I'm no programmer, so I don't know how difficult it would be for a program to follow that link and then parse the "secondary" BG.
But a system like that would make it easy to seperate the work of others from my own. The imported BG would never ever be changed by the program. Everything I would like to take from it could be copied to the main BG (with proper sources and citations), or just referenced by links. If I want to amend the info in the source, this will only be stored in my BG.

Another thing I could imagine would be to allow the researcher to use a PGP-signature (or similar) to prove the integrity and origin of the BG. If you are a well respected professional, and anybody with a text editor could change your carefully assembled data without a trace, it might be annoying.
A proper geneaology program should keep the integrity of the sources, and add proper information about who added what. But if a PGP-signature is part of the source, you'd be able to spot if there have been "undocumented" changes or accidental data corruption.

What do you think? Does this make sense?

GeneJ 2010-11-16T08:42:15-08:00

I'm not technically inclined enough to speak to the PGP-signature option.

Expect the person creating a BetterGEDCOM should be presented with options about how the evidence should be presented* and how tightly they'd like their "conclusions" linked to their "evidence" array.

The latter options might range from "loosey-goosie" to "standards please," to "read only."

*ala, BetterGEDCOM is the source and the underlying file citations are the "source of the source."

GeneJ 2010-11-16T08:54:16-08:00

As to third part IMPORT level BetterGEDCOM exchanges, see Mills, _Evidence Explained_, [1st ed] (electronic)(2007), p. 156 for "3.44 Research Files & Reports, Personal File Copy" for a discussion and series of examples.

Note the "Source List Entry" example and the TWO citation examples (latter as First Reference Note no. 2 and no 3.)

Where there is no underlying source in the GEDCOM, Mills cites the GEDCOM ("Kincaid GEDCOM file"), specifies referenced individual by name and GEDCOM reference number, ("Lois Kincaid, no 1234"), includes what I assume is the "event" reference ("Biographical Sketch") and then adds this language, "with no citation of source."

When a "source of the source" exists, as is the case of the second "First Reference Note" example, ("Biographical sketch of John Kincaid, no 321"), Mills incorporates that reference in her citation example. She writes, "citing 'Resignations of militia officers, November 1832–January 1833 General Assembly Session, North Carolina State Archives.'”

I would expect the mechanics of BetterGEDCOM to enable software certified as compliant be just that smart!!

--from the cheering section, GJ

GeneJ 2010-11-16T08:54:17-08:00

joehcole 2010-11-12T14:26:36-08:00

Kind of - you're right that we would want to capture the different sources for a single event, and our evaluation of them. But when it comes to the data modelling, I'm approaching it from the other direction, in that the events are recorded against the source rather than against an individual.

Thus In my marriage certificate example, we have:

<source> The marriage certificate
- <event> THe marriage of the bride and groom
- <event> The birth of the groom
- <event> The name of the groom
- <event> The previous marriage of the groom (as he is a widower)
- <event> The occupation of the groom
- <event> The residence of the groom
- <event> The birth of the bride
- <event> The name of the bride
- <event> THe occupation of the bride
- <event> The residence of the bride
- <event> The birth of the bride's father
- <event> The name of the bride's father
- <event> The occupation of the bride's father
- <event> The birth of the curate
- <event> The name of the curate
- <event> The occupation of the curate
- <event> The name of the curate
- <event> The birth of the witness
- <event> The name of the witness

We could attach a different rank to each of those events, or to each separate attribute of the event itself.

By doing it this way round we can model the exact range of evidence that a single source gives us. It gives the source primacy in the process, and reflects practical tasks of genealogy, which is to take a source and work out what information we can extract from it. I think if we're taking an evidential approach to the model then we need to capture the fact that our marriage certificate source tells us that the groom was born before 1839, for example, while the birth certificate source gives us the exact date. I don't think the <person> - <event> - <source> model is so suited to doing that.

hrworth 2010-11-12T21:14:19-08:00

Joe,

How do you enter that data from the Source to Event to the Individual?

If I have a Marriage Certificate in my hand, the Source, I would identify the first person to enter information on and go to that persons record. I would then add a New Event, add another Event to that person, along with the Source-Citation, or add a Source-Citation to an existing Event.

I would then repeat this process for each person listed on the Certificate.

If I don't do that, going to the person first, I may end up with a Source listed to an event, on an incorrect person.

I don't know about you, but I have many people is certain families with the same name. Which one do I choose working from the Source creating events associated to an individual.

I am talking about data entry. Clearly I Start with the source, but my data entry is from the Person to Event and linking to the Source.

Russ

GeneJ 2010-11-13T00:01:31-08:00

Take the example of a person's date of birth. I might have 6 sources for that date. If all those sources report the same date, I wouldn't want six identical "birth events" listed in that person's profile.

I wonder/worry how that approach would actually translate when it came time to run a narrative or biography. How would I get all my citations to report if they are not all associated with a single event?

I do want to be able to retrieve source information by event and the related citation.

Hope this helps. --GeneJ

SueYA 2010-11-13T00:47:58-08:00

Both Joe and Russ have some good points. I am not sure your proposals separate evidence from conclusions.

Starting data entry with the source is a good idea. I would want to create a transcript of the document, record citation and provenance data, create a scanned image etc. The structure of the source needs to be modelled to accomodate all the information it contains.

Having captured all the information in the source, I then would link bits of it to persons and events. These are conclusions.

Taking the marriage certificate example, the following would be recorded in the source:
name of the groom
age of the groom
status of the groom
Occupation of groom
name of the groom's father
occupation of groom's father

And the following might be linked to the source elements:
For person A (the groom):
Name
date of birth - age of groom
occupation - occupation of groom

For person B (the groom's father)
Name - name of groom's father
occupation - occupation of groom's father

For person C (the groom's previous deceased wife)
date of death (before the marriage date) - groom's status (widower)

Persons A, B cnd C and thier associated events are separate pieces of data from the source data. The links between source and conclusions are more pieces of data.

Sue

hrworth 2010-11-13T04:48:21-08:00

GeneJ,

"Take the example of a person's date of birth. I might have 6 sources for that date. If all those sources report the same date, I wouldn't want six identical "birth events" listed in that person's profile. "

Just want to clarify this a little, at least in my mind.

First, might there be a couple of attributes to a Birth Event? I am thinking the Date and Location of the event.

In doing research, I might find both of those attributes in a couple of Sources, but I might find the Year of birth in another source, a location of that event in another.

For the birth year, I want to enter, just the birth year and it's source. Same for birth location. For the complete birth information each source linked to that same birth event.

What I think you are getting to, is in the presentation to the user, on that birth event with the multiple sources. In the presentation, I would not want to see the identical fields on my display, one display entry would be acceptable, but that one Event entry would have multiple source entries linked to that event.

Birth Event (date / location) Source 1, Source 2, Source 3, etc.

For the BetterGEDCOM each of these entries, one at a time, should be into the string of information that is shared between Users. At the receiving end it might be displayed as

Birth Event, Source 1, Source 2, Source 3 as above.

All I am trying to say, is what is displayed in our software may be transported to the other user differently.

Not sure that makes sense, but ...

One Users opinion.

Russ

GeneJ 2010-11-13T08:14:14-08:00

I believe we are talking about the same oranges, yet.

I currently enter "birth events" and "alternate birth events" in my genealogical software.

I'd LIKE to see the ability to associate citations to the date "attribute" or to the "location" attribute, or to the complete "attribute" (date and location), but I'm not able to do that in good fashion with my current software.

testuser42 2010-11-13T11:26:11-08:00

I've not much to add here. Only that I really think starting with sources is the only sensible way of doing family research.

I think modern genealogical software should motivate their users to work this way.
As far as I can see, many people start out with research in a very unorganized way, and some get carried away with conclusions (Oh, here's a very similar family name, 150 years before my last record -- I'm sure it's a relation, I'll put it down as a great grandfather...) An entry mask for a "source" should be the first thing that shows up on the screen, IMHO. How this method of data entry is reflected in the file-format, I don't know. But I believe this project here is already up to a good start!

hrworth 2010-11-14T06:50:17-08:00

GeneJ,

I thought about what you would like and I understand what you are saying.

I enter into my software "what I see" from a specific Source based on the attributes / properties if the Event / Fact. If the Event or Fact has three attributes, date, location, description, I record that. Reason: I want you to be able to go to that source citation and see what I saw and what I recorded.

I am not sure that I would want to look at each attribute of that event to view the source of that piece of information.

Having said that, clearly the software should allow for that.

The Transport of that information, between you and I, in the BetterGEDCOM, should be, in my opinion, the Data Elements associated with a Source-Citation. Let the software we use, display what is received, or combine them the way you are proposing.

Actually, after thinking about this topic, the Evaluation of the Evidence for a specific Event might be easier when presented the way you are taking about.

For example: If there are 3 Birth Dates, from three Sources, presented in the Birth Date field, I could evaluate the source and select the best or conclude that one of the three birth dates is correct. Said differently, help draw a conclusion for that birth date.

One user's opinion.

Thank you,

Russ

GeneJ 2010-11-14T09:00:49-08:00

Yes!

joehcole 2010-11-14T15:29:30-08:00

I agree with testuser42 and Sue about the limitations of current genealogy software. I guess all the money and focus is going into the web-based stuff, as the desktop market has hardly improved in the last 10 years despite the explosion of interest in the subject.

And I think we're all agreed that we need to be able to cite sources at the level of individual attributes (dates, locations etc.) and not just events as a whole.

Russ, I think you get to the nub of it when you say:

"If I have a Marriage Certificate in my hand, the Source, I would identify the first person to enter information on and go to that persons record. I would then add a New Event, add another Event to that person, along with the Source-Citation, or add a Source-Citation to an existing Event. Then I would repeat the process for each person listed on the certificate."

Does this not strike you as a very awkward way of getting the data out of your source? The software has not been written to match the workflow of the task involved. It's as if the first person writing a genealogy application looked at the standard type of Personnel or CRM database that you might find in industry and figured that it ought to look like that. And then everyone else copied it. In that arena it makes sense to put the data-entry process into the individual record as each bit of information will tend to concern one just that individual, and there is no debate as to who that individual is. Genealogical sources are not like that.

The ideal genealogical software interface that I picture in my mind has data entry templates for all the usual source types, that will allow me to type in the full range of data from the source in a pain-free manner. Once I've done that, it will automatically create the different events that are derived from the source, and group them by each role (i.e. all the groom's events, then all the bride's events, then the bride's father and so on. Next to each event, there would be a slider with which I could drag up and down to rank the 'surety' or reliability of the source. They would all default to sensible positions based on the source - so the marriage event from a marriage certificate would default to a higher surety that a date or birth that is derived from the age of the bride or groom.

On the right hand side of the screen I can pull up a list or a mini-tree of individuals who I have previously identified, which I can drag onto each of the roles, thereby associating that role to an individual.

Only then would I go to the 'person' screen, which would list all the events I had associated with that individual. GeneJ - you say...

"Take the example of a person's date of birth. I might have 6 sources for that date. If all those sources report the same date, I wouldn't want six identical "birth events" listed in that person's profile."

...well, actually I would want to see all those events under some circumstances, even if they were all telling me the same thing (which is evidence in itself). My ideal interface would allow me to sort the birth events from all the different sources into order of reliability (i.e. the most likely at the top). I'd want to be able to do the same thing with each separate attribute, so I could choose the date I thought was most reliable from one source, and the location from another, if necessary.

Only then would I click a button to hide all everything but the most reliable event - the familiar 'conclusion' based view.

So, what is the significance of all this for the better gedcom data model? Well, I guess I'm arguing for a data model that follows the same logical pattern as my idea software package, which follows the same workflow that we use when processing a source, which is:

1. Take the source

2. Establish the various events that we can deduce from it (and I mean events in the broadest sense, i.e. including personal attributes like name or appearance)

3. Make a judgement on how reliable the source is for each event

4. Decide whether the source is referring to one of the individuals on our tree

Mirroring this in the data model we have:

1. <source>

2. <events>

<standard attributes> (like role, date, location etc)

3. <further attributes> (for evaluating the source, i.e. ranking)

4. <individuals> (attached to roles)

I feel the model needs to be engineered with firm foundations. So you're starting with the elements of which you have the most certainty (the existence of the source itself, and then the events derived from the source) through to the elements with the least certainty (the evaluation of the reliability of the elements of the source, and the association with a particular individual). Sue, I'd argue that this does separate evidence from conclusion - steps 1 and 2 are evidence, 3 and 4 are conclusions.

It also means that it is easy to strip off the outer elements - the evaluation, and the association with individuals - and still be left with useful data. So it would be very suited to transferring pure source data, such as records from websites, transcriptions and so on. The key thing is that the model is free from the 'tyranny of the individual', because the individual is the last sub-element rather than the master element. You don't have to create individuals just to transfer event data.

I would also envisage that any 'conclusion' data - i.e. the evaluation of the source's reliability for a particular attribute, or the linking of a role in an event to a known individual - would be 'tagged' to identify the researcher. Thus we could separate our own conclusions from those of other people. And we could link to individuals in any number of ways - a person in your own tree, or a person in a public tree on a website and so on.

Finally, I see no reason why this format wouldn't still work with a poorly sourced tree with few citations - you could just wrap all non-sourced events in a source called 'no source' or something. Pretty much all the attributes, whether 'standard' or 'further' would be optional, so a file could be created even with partially complete information.

Joe

hrworth 2010-11-14T16:56:23-08:00

Joe,

To me, what you described in not the Transport of the Information from One User to another User, which is what I thought / think the BetterGEDCOM is about.

What you described is in the Data Entry / Software that I would use. And, as you point out, most software we use today, doesn't work that way.

Russ

joehcole 2010-11-15T02:39:16-08:00

Russ, the point I was trying to make is that we should not be limited in our conception of the data model by the limitations of current software. But this model will work with current software too, and user to user interactions, as it will just store and transfer what data is available.

Say you record a birth event for someone, and cite two sources. My model would extract this as two sources, each with a birth event that is then linked to the individual. Reading it back in to a current software product, you'd just get your single event with two sources again. But when more advanced software does come along, it could take that same file and create multiple events from it.

hrworth 2010-11-11T16:26:25-08:00

Joe,

Are you suggesting one source for an event would help drive a conclusion?

There are genealogical definitions for a Primary Source. A single Source, might have different roles that help document an event.

Take your marriage certificate example. Would the birth date for the bride or groom be a primary or a secondary source?

I might suggest that the attributes defining the source and not the fact. In the evaluation process, I would look to the attributes of the Source to head toward a conclusion.

Just a few random thoughts on this.

Thank you,

Russ

joehcole 2010-11-12T02:59:18-08:00

Sorry, I was being a bit free and easy with the word 'primary'. I was talking about a way of indicating that, for example, the birth date that appears on a birth certificate source is the one I currently view as 'true' rather than the one that appears on the marriage certificate.

I would envisage that any entity within the source could have attributes that define its evaluation. In the marriage certificate example, the groom's birth event could be given a lower rank or 'surety'. Equally we could add these attributes just to the birth date entity within the birth event, so we could indicate that the fact of the groom's birth was high surety but the date was low surety.

Of course, there would be no complusion for the importing system to adhere to the surety data in the file. You may wish to discard it, and evaluate each 'fact' separately. Or there could be an algorythmn in the software that could evaluate each source for us as a starting point (i.e. automatically conclude that a birth event from a birth certificate was of higher surety than that from a marriage certificate).

hrworth 2010-11-12T06:20:46-08:00

Joe,

Just to make sure we are on the same page.

The Groom would have One Event, Birth as an example.

The Birth Certificate would be a Source for that Event. A Marriage Record would also be a Source for that event. The Birth Certificate, during my evaluation of the Birth Event, would have a higher rank, and the Marriage Record a lower rank. If they both were the same date, I might reach the conclusion that that was the date of the Birth Event.

Adding the Place of that Event, the same process would happen.

Clearly, the Options on what to import is a requirement.

I am not sure that I want the Software to Evaluate the information. But, you are correct, the software needs to have a platform so that I can evaluate the Event based on the Source of the information.

One Users opinion.

Russ

Robin_L 2010-11-11T15:20:11-08:00

Exchanging data without loss?

I feel that we need carve this project into discussion areas. The evidence-conclusion data model is critical, but so is the glue that ensures the same interpretation is made in the receiving environment as the source environment.

Some areas that cause this to become unravelled are
- the representation of incomplete dates and the results of comparison of incomplete dates.
- the difficulty of representing and comparing places in time, by culture, by proximity and by aliases.
- the handling of cultural differences for naming the _same_ persona
- responding to explicit exceptional use of existing family history software to record history of non-human personas, eg the not uncommon use of pseudo-persons in The Master Genealogist for animal breeding, buildings, and social organisations, etc that have links to persons.

These are but examples of where the objects in the model, their properties and the assertion mechanism must be adequate to achieve the goal of effective data interchange. The semantics of data value comparisons under assertion tree is critical.

xvdessel 2010-11-12T10:22:00-08:00

Robin,

some ideas about the issues you raise:

- dates. Solid point. One solution that is probably followed by several software tools is a dual storage of each date. Once in a free text field, where you can write it like you find it in the source, and once in a nice syntactically enforced valid date. The former can use alternate calendars (French Republican for example, in use here around 1800, with dates like "13 Nivose An IX"), but the calculations take place on the other field. If the text date field is understandable by the software, then it calculates the correct date. If not, it is up to the end user to fill it in. I have seen births as "The Friday after Easter" etc.
Would that work, you think?

- Places. See a different discussion completely dedicated to locations:
http://bettergedcom.wikispaces.com/message/view/BG+Data+Model+Discussion/29953155
Feel free to react there.
- Depends what you mean by this. Software and exchange standards should allow for alternate names, which can include translated or culturally affected naming. But then it remains the option of the user to fill it in or not. Another way to see this, is when searching and matching data from 2 different sources. For me it is clear that Pierre Mampuys is most likely the same as Petrus Mampaey, and people probably called him still differently in his time. If neither source has indicated the possible variant, then we have an issue. With a dedicated soundex-like engine one could convert such names, but I know of no tools that do such things. I'm not sure whether this is relevant for exchange standards. Maybe it is, when a soundex-like system becomes commonly accepted.

- I was not aware that TMG supports non-human personas. I believe some data models that were proposed earlier had a notion of a group of people, with membership relations etc. But I've never seen such things as buildings in data structures ...

Xavier

Robin_L 2010-11-13T15:54:11-08:00

Thanks for the response.

Dates - I realise now that I did not include enough detail. I was particularly concerned about the representation of _incomplete_ dates (say no day of month) and _modified_ dates (say "before", "between", etc). It is critical that there be the agreed comparison rules between all forms.

As you are probably aware, there are cultural issues in the software converting a text date to a homogeneously interpreted value. It needs to know the location for which that date was recorded as for example the the Gregorian / Julian date changed at different dates in different localities. This is not an easy topic.

Personas and TMG - Just a minor correction. TMG does not distinguish non-human personas, but there are numerous examples of users of TMG creating a "person" that is not a human entity (eg. regiment, building, community, etc) but that entity has data like a person but not all such attributes. Often the relationships between these pseudo-persons with other pseudo-persons and also to real persons can require link types that are not within the usual human to human relationships. TMG's concept of Witnesses helps to encode these links. The data model is not extended enough for adequate clean representation of these linkages. This is partly a "genealogy" versus "social history" question.

If we are to build a better data model, then it needs to better address the "social history" needs. It may be that by addressing the standardisation of the social history data exchange, we could get the support of the backing of other professional bodies - like psychologists, anthropologists, historians, fraud/criminal instigators, etc. All these groups have the need exchange and research inter-relationships between personas.

greglamberson 2010-11-13T19:04:16-08:00

This is one of those discussion topics I am reluctant to wade into just yet because of its complexity. Well, here goes.

I have been thinking how templates could be used for person names, dates, and place names in particular. This is a concept that some programs use, and I think using a similar system for BetterGEDCOM would be advisable.

What I have been thinking of is making every reference to a time/date, person name or place name refer first to a naming template. There could be standard templates as well as custom templates which could be imported and exported along with the genealogical data.

For example, person name templates - defining how names are formatted in a particular area/culture - could include Modern Western; Scandanavian Patronymic; Medieval European Noble; Matronymic; etc. The labels for the name elements would help define naming rules and clarify the component parts independent of any software application.

For place names, we could use US Standard; Louisiana Standard; (ok I really don't know what's even appropriate for other locations)..

For time/date, we could have Gregorian; Julian; Muslim; Jewish; Chinese; Henry VII Regnal; etc.

These templates could in some cases be part of existing XML namespaces and in others they could be defined within the individual XML file.

There is some reference to some of these concepts on the Data Models page and also in some other discussions. This is important, and I'm not sure where this discussion fits at the moment, but it's certainly important.

gthorud 2010-11-13T04:51:52-08:00

Do we need some guidelines for writing in this wiki?

In some cases arguments are presented just by including a link to a large external document where the relevant part may be only one paragraph in a ten page document. This is a very inefficient way to accumulate knowledge.

Unless large parts of the referred document are relevant, I think it would be better to cite the relevant document (in addition to the link) or at least state where in the document the relevant info can be found. And even if large parts are referenced, a summary should be presented.

greglamberson 2010-11-13T08:25:42-08:00

gthorud,

This is a wiki. It has intentionally been started so that participants can add content. This is not meant to be a place to come and have everything already deliberated and presented, at least not on our fourth day of existence.

The idea is that if that's what you think should be done, then do it. If you're not comfortable doing it, then I'm sure someone else will, but you must be a little patient.

I expect to do a major rewrite of the pages to reflect the discussions on the discussion pages and to add some more summary content. Ideally, however, I would not need to do this because participants like you would do it instead as part of the process. However, we are just beginning, and overall, we're all extremely pleased at what has occurred so far.

gthorud 2010-11-13T09:40:05-08:00

I have written a page, but if it shall be of any use it should appear on the navigation bar to the left - How do you do that?

greglamberson 2010-11-13T09:51:44-08:00

gthorud,

I saw your page appear so I went ahead and added it to the navbar before seeing this.

To add anything to the navbar, you jsut use the "edit navigation" link below the counter on the left. Then you just edit it like any other page. To link a page, enter the title as you want it to appear, then highlight it and select "Link" within the tools. Then you can link to the page on the first screen that appears.

greglamberson 2010-11-13T09:55:10-08:00

gthorud,
By the way, I'm very glad to see you add this, and I'll build on it by recording some narrated tutorials that I'll put up on YouTube highlighting various things on the wiki.
Thanks for doing this.
Greg

gthorud 2010-11-13T10:33:46-08:00

Thanks! I was loking for the "edit navigation" link, but did not see it....

greglamberson 2010-11-20T18:31:48-08:00

Weekly or bi-weekly conference calls?

Hi,

I think we should establish weekly or bi-weekly conference calls for general discussion among anyone interested in participating. Assuming we do this, what suggestions do people have regarding technologies we could use to do this? (Pick something free, guys.)

Please give your feedback so we can get something like this put together.

If topics develop sufficiently, we could hold discussions with more specificity as well. However, this is a first step. Thoughts?

DearMYRTLE 2010-11-22T13:57:46-08:00

I'd like to meet via Skype weekly with the BetterGEDCOM organizers on Tuesdays from 5-6pm Eastern US time (daylight when applicable)using Skype.

I can also see two additional Skype calls:

1. End-Users
2. Developers

hrworth 2010-11-22T14:13:03-08:00

Greg,
Dear Myrtle,

I'll join.

Thank you,

Russ

greglamberson 2010-11-22T16:46:58-08:00

Me too. After Thanksgiving I'll set up some times and venues for developers and end-users.

Andy_Hatchett 2010-11-22T17:50:10-08:00

I'm not sure how Skype works but do have a questions...

Will these calls be recorded or transcribed so that those unable to join at that particular time can see/hear what went occurred?

Has any thought been give to a private chat room (I know one of the developers has a private chat on there web site that is available) so that a transcript would be available for posting for others.

Andy

greglamberson 2010-11-22T17:53:39-08:00

Andy,

I am certainly all ears for suggestions, and we're jsut working this out, but I did find out zoho allows web-based, decentralized group chat (for free). I have added a CHAT LIVE! link in the sidebar. Come on over and chat with me:

https://bettergedcom.wikispaces.com/Chat

hrworth 2010-11-22T20:22:40-08:00

Andy,

Skype is just a 'phone' call. I think that the phone calls are for calibration of the project, not the content of the project. The Chat room sounds interesting.

Russ

todrobbins 2010-12-09T11:22:53-08:00

Web of Kin

Just an announcement last I recently launched a new blog for semantic web technologies and genealogy. It's part of an extension of a conversation started by Jesse Stay on Facebook. The blog is called Web of Kin:

http://webkin.blogspot.com

Please visit and join the conversation. Also, contact me if you are interested in writing for the blog.

Best wishes as we seek a new open standard for genealogical data,

Tod Robbins
Web of Kin

greglamberson 2010-12-09T15:35:38-08:00

Tod,

Thanks for stopping by. I've been watching your new blog already and look forward to see what you come up with.

Greg Lamberson

todrobbins 2010-12-09T17:03:32-08:00

Has W3C been considered as a standards body?

I noticed on the Sandbox page that World Wide Web Consortium (W3C) was not mentioned. I am wondering what the reasoning is behind that. If one thing is certain about genealogy, it's that the search/study of ancestors is becoming more and more web-based. Wouldn't W3C be a great fit? Dan Brickley, formerly of W3C, has suggested elsewhere that a Incubator Group (the predecessor of a Working Group) could be arranged within the W3C organization.

Anyhow, just some thoughts. It's great to be a part of the conversation!

greglamberson 2010-12-09T21:57:21-08:00

Todd,

W3C was one I looked at, but I don't remember what became of it. Certainly nothing is set in stone or decided, and I see no reason we couldn't go the W3C route.

We've got lots of organizational stuff going on, so this will come more apropos to the discussion in the near future.

appletree2 2010-12-12T13:43:12-08:00

Working with OpenGen

Hi BetterGEDCOMers,

I want to introduce our organization after having spoken with Pat Richley (Dear Myrtle) and Greg Lamberson. I believe we have similar goals and can reach them fastest and most effectively by working together.

OpenGen.org was formed in mid-2010 to form genealogical data interoperability standards. A 501c(3) legal entity was created for the International OpenGen Alliance with the hope that the collective efforts of the genealogy industry would come together to craft a contemporary standard for genealogy data sharing and preservation. Part of our goals is to replace GEDCOM with an XML Schema designed to encapsulate both genealogists' needs with evidence, sources and research specifications and today's social media needs with specifications that include tagging, comments and multimedia. The ultimate goal of OpenGen is to facilitate lossless exchange of data between any 2 systems and be able to indefinitely preserve this data.

OpenGen's 501c(3) is an asset intended to exist outside of the control of any one single company or entity. OpenGen maintains financial, engineering, marketing and software resources that we will happily make available to the BetterGEDCOM efforts to help expand a wider involvement of participants and offer a legal entity under which the BetterGEDCOM efforts could exist.

There is an OpenGen webinar scheduled for tomorrow on Monday, December 13th. We can discuss aligning OpenGen's efforts with BetterGEDCOM. Details are available at www.OpenGen.org.

Thanks,
Scott

GeneJ 2010-12-12T15:24:12-08:00

Scott,

How can we obtain a copy of the meeting agenda and any submitted discussion topics? Would be nice to see a copy of that today sometime.

I looked at the webinar materials. From your site:

"OpenGen is back online after a 2 month break to recast the operating guidelines and structure the legal foundation. Join us for a discussion of what is next?"

And then below the registration I find a button: "Submit a Discussion Topic"

Thank you. --GJ

DearMYRTLE 2010-12-12T18:02:25-08:00

Please understand neither DearMYRTLE nor BetterGEDCOM organizers have a preference for Scott's work over the work of BetterGEDCOM contributors or the many experienced coders & developers who've created GEDCOM work-around apps currently in use.

CLARIFICATION OF PHONE CALL
Greg & I did discuss OpenGen and BetterGEDCOM with Scott Mueller and his associate Rysa via telephone a few weeks ago. From my notes, I recall that Scott explained he would write the new GEDCOM code, that the rest of the genealogy community could adopt.

Greg & I specifically declined to join OpenGen in favor of a "consensus of opinion" model for redesigning a file sharing protocol such as been accomplished by over 20 independent genealogy software programmers in Germany. We restated the BetterGEDCOM goal that no one individual or corporation, commercial or otherwise, should "own" the new GEDCOM.

Being an end user, I specifically could not and therefore did not offer an opinion about using an XML alternative during that phone call.

At the conclusion of the telephone call, Greg and I invited Scott and Rysa to begin posting at BetterGEDCOM so the genealogy "coding" community could learn more about their team. Until today, (with this posting above)there have been no postings.

CLARIFICATION ON MONDAY'S WEBINAR
An email, sent out by Rysa after 4pm Thursday US Pacific time, received by me late Friday after traveling, included information about a joint OpenGen/BetterGEDCOM webinar. This was the first I heard about such a meeting. The implication from the invitation was that I as DearMYRTLE or that BetterGEDCOM had been in on the planning.

Several have approached me privately with concerns that BetterGEDCOM has already partnered with what is construed to be a commercial entity, OpenGen. I most certainly have not partnered with anyone, nor to my knowledge has anyone from BetterGEDCOM (organizing team or otherwise) had any such conversations or made any such agreements.

While I do favor bringing all available talent to the table, let's make it clear that though I have been referred to in this invitation, at no time have I taken part in any planning for this webinar.

BetterGEDCOM
The Wiki is barely a month old, and there has been much active participation and lurking (as your private emails to me indicate). We've gathered time zones from those who wish to attend a BetterGEDCOM large-group meeting, but have not yet set the date for that meeting. Our agenda is clearly defined and will be posted once we set that date.

PERSONAL
My own attendance at the OpenGen webinar hinges on rescheduling two appointments which cannot be arranged until shortly after start of business tomorrow.

Pat Richley-Erickson
DearMYRTLE,
Your friend in genealogy
http://blog.DearMYRTLE.com
Myrt@DearMYRTLE.com

greglamberson 2010-12-12T19:30:19-08:00

I have withdrawn from BetterGEDCOM due to what I see as inappropriate conduct by certain organizing members and ask they do not further invoke my name.

I will attend the OpenGen meeting in hopes of salvaging work that has been done at BetterGEDCOM which its current organizers do not agree with and believe they are empowered to dictate.

Greg Lamberson

appletree2 2010-12-12T21:33:36-08:00

In the interest of clarification, I want to confirm what Pat said in that OpenGen and BetterGEDCOM are 2 separate organizations. We were invited to introduce ourselves and participate in the BetterGEDCOM discussions during a telephone call with Pat & Greg and so that's what I did. In that call, we discussed joining forces or at least helping each other and Pat suggested the above to see what the BetterGEDCOM community thinks we can do together.

I want to clarify that OpenGen is not about just me writing GEDCOM code and then others can choose to adopt it. OpenGen is about getting the best minds in genealogy collaborating on open standards around genealogy data exchange and preservation. I mentioned in our phone call that I personally wrote the beginnings of code and a data model to simply get the ball rolling and to satisfy a need in my company, AppleTree.com. But I believe that OpenGen and BetterGEDCOM are both very much aligned with the idea of open collaboration and no individual or corporation owning a genealogy standard. That's why a 501c(3) organization was formed for OpenGen and term limits are in place for elected board members.

Regarding the OpenGen web meeting tomorrow, this is NOT a joint OpenGen/BetterGEDCOM meeting. It is an OpenGen meeting, where hopefully some members of the BetterGEDCOM community will participate and we can discuss BetterGEDCOM. Please visit www.OpenGen.org to reserve a seat for the webinar.

Finally I'd like to clarify that OpenGen is not a commercial entity, it is a non-profit entity that isn't controlled by anyone without term-limits and an election process.

Pat, I truly hope you join us tomorrow and want to extend a warm invitation to whomever else wants to participate.

testuser42 2010-12-13T13:46:31-08:00

Greg, I really hope you will change your mind and come back. I've got no idea what happened, but I hope that not all bridges have been burned.

ttwetmore 2010-12-13T07:54:12-08:00

Loss of Greg Lamberson

I am concerned about Greg Lamberson's exit from BetterGEDCOM. In the few weeks of its existence, Greg was the only organizer who seemed to think of BetterGEDCOM as anything more than a few patches to GEDCOM. He was the only originator who contributed substantially to the Wiki, and the only one with a technical background.

The organizers stress the need to be able to fully share data between genealogical programs, and Greg understood that it was the nature of the GEDCOM model that made this impossible. The reason that genealogical programs have so much difficulty sharing data is that they each support their own internal models of what genealogical data is, and these models are all different from one another, and they are all more compelx than the ability of GEDCOM to convey their contents in transport files.

It is not the different interpretations of GEDCOM that make sharing between genealogical applications impossible, it is the simplicity of GEDCOM itself. Without a model and file format that can encompass the actual internal models used by real genealogical applications, any effort to patch up GEDCOM is meaningless. If Greg was the only organizer of BetterGEDCOM who understood this, and all indications are that this is true, then BetterGEDCOM is in trouble.

From the little bit I can ascertain about the remaining organizers, the ideas of incorporating evidence and the research process into the BetterGEDCOM model seem foreign and distasteful to them. I heard through the grapevine that they found my particular ideas and posts particularly unwelcome. This is of great concern to me because it indicates the organizers have not grasped the fundadmentals of either genealogy or what makes the sharing of genealogical data so difficult.

I would like to know what the remaining organizers think the goals of the BetterGEDCOM effort should be. If my ideas are indeed distasteful and subject to their censure, I would prefer to end my relationship with BetterGEDCOM at this point and move on to something with a chance of success.

Tom Wetmore

hrworth 2010-12-13T08:13:50-08:00

Tom,

We are sorry that Greg departed as he did. He was very clear to us, that this project was very, very, important and the is certainly needs to continue. And we do hope that is does. That has been a lot of great stuff posted here.

Yes, I for one, did not know about all of the work that has been going on since 1995. I know now, but the details are way over my head as to What needs to be done. I am learning now, with your help, and the help of others. Yes, this looked, on the surface as an easy project. Clearly it is not so.

I am not sure that anything is distasteful to "them" (I am one of them). But, the terms that have been tossed around, again to me, are foreign. I am an End User of a program that has been working for me in my research.

I don't know where you heard that your posts were not welcomed. I for one, have learned much from your postings, but I have had a problem with your terms, sometimes, and have tried to ask questions to that I could understand what you have said.

This technical aspect of this problem is new to me. As I understand it you have been at 'this' for years. All I can ask, is that you help me catch up.

I would hope that you continue your very important contributions. I have learned lots from you already and hope to learn more.

We just need to help understand some of the terms.

Please stick with us, as we grow. After all, we are only a little more than a month old.

Thank you for your comments and concerns.

Russ

DearMYRTLE 2010-12-13T08:30:46-08:00

Tom, at no time have any of the organizing team felt your particular ideas and postings were unwelcome. How you "heard that through the grapevine" merely sounds like sour grapes to Ol' Myrt here.

The organizing team at BetterGEDCOM was set in place to advertise the BetterGEDCOM workspace designed to bring groups from all parts of the net into one place for decision making. We've clearly stated our background and orientation on the "Who Are We?" page.

As BetterGEDCOM expands and actually makes decisions by consensus, the organizing team will not exercise control. Our goal is to organize large-group meetings and report progress to the world at the direction of the consensus of participants.

As with any volunteer organization, there are growing pains.

The BetterGEDCOM organizing team has expressed the desire that following Greg's departure, a "coder" type will emerge to facilitate communication and foster understanding among all BetterGEDCOM participants, be they end-users or developers.

GeneJ 2010-12-13T08:36:23-08:00

I echo Russ' comments.

Tom, you write, "From the little bit I can ascertain about the remaining organizers, the ideas of incorporating evidence and the research process into the BetterGEDCOM model seem foreign and distasteful to them.

I certainly don't feel incorporating evidence ... is "foreign and distasteful. Quite to the contrary, I've been working to get a series of reasonably varied family circumstance and research/source/evidence materials posted to the Build a BetterGEDCOM blog to support the discussions here on the BetterGEDCOM wiki. Russ and I have both been blogging their about how we use software.

What is research: Working with documents about a c1815 estate
http://bettergedcom.blogspot.com/2010/12/what-is-research-working-with-original.html

What is research: Outlining contents of an American Revolutionary War pension file
http://bettergedcom.blogspot.com/2010/12/what-is-research-outlining-contents-of.html

How do scholarly genealogists approach the evidence process?
http://bettergedcom.blogspot.com/2010/12/how-do-scholarly-genealogist-approach.html

More on ... How do I enter information .... (GeneJ)
http://bettergedcom.blogspot.com/2010/12/how-do-i-enter-information-genej_06.html

How do I enter information .... (GeneJ)
http://bettergedcom.blogspot.com/2010/12/how-do-i-enter-information-genej.html

When do you enter data into your database ?
http://bettergedcom.blogspot.com/2010/12/when-do-you-enter-data-into-your.html

Going all fundamental
http://bettergedcom.blogspot.com/2010/12/going-all-fundamental.html

More in a bit.

GeneJ 2010-12-13T10:53:46-08:00

Tom:

You wrote, "...indicates the organizers have not grasped the fundadmentals of either genealogy ..."

Humm...

And here I thought we just hypothesize data flow/data dependency differently.

testuser42 2010-12-13T13:51:10-08:00

I was really shocked to see Greg has left. I've no idea what happened. I do hope that he might reconsider his decision, because he's been a very good moderator and impulse giver, and this project would miss his energy and direction very much.

testuser42 2010-12-13T14:12:13-08:00

Tom,

I didn't get the feeling your ideas were unwelcome to anyone. For me, being a user who never before looked at a gedcom file, they were real eye openers.
I did get the feeling that for some people it might be hard to grasp the concepts other people use in their models, especially if they are used to another model. Quite a few times, when people have been arguing I had the impression that they actually both mean the same thing - they just keep misunderstanding each other. This is frustrating! As there was a lot of posting in various threads at the same time, it was easy to lose perspective. Maybe a slower pace would do the project good.

On the positive side: I have the feeling everybody here is emotionally invested. Moments of tempers flaring and frustration rising show how much people care, how much they want this project to succeed.
Maybe some care too much! After all, a BetterGedcom is NOT vital to the survival of humanity, even if it is about lives and deaths. Taking a few deep breaths or just turning off the computer for a day or two might help to cool down and relax. I did this before and it does work ;)

DearMYRTLE 2010-12-13T11:10:27-08:00

Thanks to Tamura Jones

who wrote:

2010 GeneaBlog Awards

"Best GEDCOM alternative blog: Build a Better GEDCOM

There have been quite a few attempts to replace GEDCOM in the past; GEDCOM Alternatives provides an overview.

This year saw the introduction two new ones, first OpenGen and then BetterGEDCOM. Most activity happens on the BetterGEDCOM Wiki, but there is also the Build a Better GEDCOM blog. On this blog, the four bloggers behind BetterGEDCOM blog about the problems they encounter using GEDCOM"

SOURCE: http://www.tamurajones.net/GeneaBlogAwards2010.xhtml

GeneJ 2010-12-13T11:12:09-08:00

Yes, Tamara, thank you. --GJ

testuser42 2010-12-13T15:15:10-08:00

I didn't know there was a BG Blog. Was there an announcement on this wiki that I overlooked?

DearMYRTLE 2010-12-13T15:55:19-08:00

The BetterGEDCOM Blog was added as a link to the home page, and the blog is listed in both GeneJ and Russ' profiles under "Who Are We?"

GeneJ 2010-12-13T15:59:07-08:00

Myrt set it up as the same time as the wiki, but we didn't really know to put that work space to use until a couple of weeks ago.

Russ and I started testing it's use by posting some GEDCOM sharing information and we used it to talk about how we use software.

We've added more sections and expanded the content.

Suggestions?

testuser42 2010-12-13T17:12:36-08:00

Thanks, Gene and Myrt.
I've read most of the posts now. Good start with the comparisons!
We could have/develop a Gedcom test file if we really want to see what programs do on import. It doesn't need very many people, but a lot of complications: adoptions, multiple marriages, images, multimedia, complicated names, user generated tags, incomplete data, multiple instances of tags...
I don't know what exactly Gedcom allows in which version. Has the LDS produced test-suites or examples?

But, even though it's very interesting to see the different results, maybe this is a bit off topic? We know the implementation of Gedcom varies. That's more of an issue for the software developers.

How will we make sure that the implementations of BetterGedcom will not vary this much?
Do we provide an example file / test case?
Do we just hope the documentation of BG is clear enough for everybody (in every language?)
(these questions probably should go on another page)

DearMYRTLE 2010-12-13T12:43:30-08:00

BetterGEDCOM in the news

The Tampa Tribune genealogy columnist, Sharon Tate Moody spotlights the upcoming RootsTech Conference (Feb 2011, RootsTech.org) and mentions BetterGEDCOM.

Source:

http://www2.tbo.com/content/2010/dec/12/BANEWSO8-1st-tech-convention-brings-together-techi/life/

DearMYRTLE 2010-12-19T08:48:03-08:00

This morning, James Tanner, author of the Genealogy's Star blog, published REALLY BAD ADVICE CORRECTED spotlighting the challenges of a Family History Center patron who had data on a very old Mac with a 3.5 inch floppy disk.

He mentions BetterGEDCOM is pushing for a better file transfer protocol.

My comments (not yet published) included the thought that genealogists must continue to keep their files up to date using the latest software AND hardware until it is time for the next generation to take up the standard.

Source:
http://genealogysstar.blogspot.com/2010/12/really-bad-advice-corrected.html

DearMYRTLE 2010-12-17T07:47:41-08:00

BetterGEDCOM Blog announced

The BetterGEDCOM Blog http://BetterGEDCOM.blogspot.com has been officially released, and submitted for addition to the listing at GeneaBloggers.com with the following announcement:

The BetterGEDCOM Wiki http://BetterGEDCOM.wikispaces.com is for the technical types, and the new BetterGEDCOM Blog is for the researcher types. There you'll find blog posts by family historians that describe how they use genealogy software in the research process, and report challenges when creating, exporting and importing GEDCOM files. Authors currently include: Russ Worthington, GeneJ Composer, DearMYRTLE

Both sites seek to encourage updating in the 14 year old file sharing protocol known as GEDCOM.

todrobbins 2011-01-11T21:48:52-08:00

BetterGEDCOM and OpenGen Alliance

I'm curious if there is any collaboration occurring between the various groups. Ours and OpenGen Alliance, to name one other, have a similar end in mind. I'm wondering what everyone's thoughts are about forming a greater whole, a larger open standards body?

Also, I've been following the History and Genealogy Semantics WG Google Group, another genealogy data standard group. I suppose, in my tired attempt to present an issue: is there communication happening between these groups? If not, can we organize some kind of meeting at RootsTech, online?

These are some of my thoughts tonight.

Cheers,

Tod Robbins

PS: For those unfamiliar with OpenGen see: http://www.opengen.org/

todrobbins 2011-01-11T21:50:24-08:00

Here is the URL for the History and Genealogy Semantics WG: https://groups.google.com/forum/?fromgroups#!forum/history-and-genealogy-semantics-wg

GeneJ 2011-01-11T21:59:52-08:00

HI Todd:
Welcome.

I'm not familiar with the History and Genealogy Semantics WG Google Group. I know quite a few of us are members of the GEDCOM-L.

Two of the OpenGen organizers attended the recent BetterGEDCOM Developers Meeting. Likewise, I attend the OpenGen Webinars, as do several of those contributing to BetterGEDCOM.

We all want for a similar outcome, that is a GEDCOM[-like] alternative, the two group, OpenGEN and BetterGEDCOM are organized differently.

As for BetterGEDCOM, you'll find our process to be consensus-driven and quite open book, right down to the typos. :)

todrobbins 2011-01-13T12:23:01-08:00

Gene,

Thanks for the update. I think it's paramount that we organize/coordinate the various groups working toward the same goal. Are there any particular meetings planned for RootsTech for BetterGEDCOM?

Cheers,

Tod

GeneJ 2011-01-13T12:46:40-08:00

If you search the wiki, you'll find several references to RootsTech.

I added a category to the navigation bar, "Related News and Resources," where you'll find a link to Myrt's recent post about RootsTech.

You wrote, "I think it's paramount that we organize/coordinate the various groups working toward the same goal."

BetterGEDCOM is open and consensus-driven. Like our process, the BetterGEDCOM organizers meetings are open (Mondays at 4 PM, PST; see calendar).

Hope this helps. --GJ

brichcja 2011-02-07T08:43:03-08:00

census XML data format

Hello all

A couple of months ago I posted a query as to whether there was a common format for sharing census data, to which the answer was "not really". So, I took it upon myself to try and create one, and you can view the results at

<http://www.chradams.co.uk/censusxml/index.php>

I don't expect this to be perfect straight away, but as far as I can see it would be of immense value to this project to have a common format for exchanging this kind of data. I've done it in XML, and it can easily interface with the betterGEDCOM project.

I'm all ears. All feedback is welcome (especially encouraging feedback!). At the minute it's only set up for censuses of England & Wales, Ireland and Scotland, but in principle it can be extended to anywhere else.

What do people think? Is this a worthwhile exercise, or has it already been done? Whichever, I've learned some code on the way!

Cheers,

Chris

gthorud 2011-02-07T11:03:45-08:00

Regarding Tags, have a look at the Variables in this international project working with census data:

http://www.nappdata.org/napp/

testuser42 2011-02-15T16:31:24-08:00

short notice

Hi,

I just wanted to let you know that I've not disappeared. But real life has kept me busy and will continue to do so for a while.

I'll try to drop by and read up on happenings regularly. If I disagree with anything I'll let you know ;)
Thank you, guys and gals. You are doing good work! Don't get frustrated with the process ;)

SeptemberM 2011-02-16T07:44:16-08:00

Introducing myself

Greetings to everyone! and kudos on all the great work. I have spent many hours over the past couple of days reading through everything here, trying to catch up and understand where the project stands right now. Like you, I also see the need for a better gedcom, and I believe that it is necessary for the future usefulness of all the genealogical activity, past, present, and future.

That said, I imagine you're wondering who I am. Long-story-short, I am a both a computer programmer and a professional genealogist. As part of my undergrad education (B.A. in Math) I "wrote" my first computer program in Cobol, on punch cards which were fed into a computer the size of a large room. Since then I have spent my career working for a wide variety of industries, in many positions. The last 12 years were spent working in the financial services (investments) industry developing quantitative analytic applications for investment/portfolio managers, as well as working with other systems within that environment, i.e. trading, accounting, compliance, attribution, etc. While this may not seem immediately applicable to this project, the underlying fact that all of this work involved continuous transfers of very large amounts of data between these systems is directly meaningful to this project.

One the genealogy side, I began researching my own family history about 8 years ago, and I think it was about a month into it that I realized I had found my passion. I was laid off from the investment/computer job a little over a year ago and found myself with the luxury of choosing to pursue my passion full-time (of course, reality encroaches and the possibility of my having to return to a non-genealogy but paying job is growing larger, but I'm not focusing on that yet :-)). Last winter/spring I attended the Boston University Certificate in Genealogical Research program, joined the various professional genealogy organizations, and have been lucky enough to win several professional assignments.

For this project, I think this gives me the unique position of understanding both the technical and the genealogy sides, as well as understanding the genealogy sides from the perspectives of both the hobbyist and the professional.

As I said at the beginning of this rather long message (sorry!), I've spent a great deal of time reviewing everything here. As you can guess, I have thought about it, slept on it, and have put together my thoughts on what I've read. Not being sure where to place it in this rapidly growing wiki, I have opted to put it in the "BetterGEDCOM Comparisons" area because there is another page there which seems similar. I am very curious to hear what you think, and hope you'll be gentle if I'm totally out in left field, so to speak.

I look forward to working with everyone on this most important project.

gthorud 2011-02-16T08:04:30-08:00

Hi,

First of all, Welcome.

I am not sure what your document contains, but you could add it as a model. The important thing is that it is published, and it can be moved later.

I guess most people use the Recent Changes page to see what is going on, so wherever you put it, it will be noticed.

louiskessler 2011-02-16T20:43:14-08:00

September:

Nice of you to join us.

It would be best if you could put your bio (as above) on the "Who Are We?" page. Most people will look there first.

Louis

ttwetmore 2011-04-05T01:23:17-07:00

The Chasm

I just read the following blogs from Ancestry Insider.

http://ancestryinsider.blogspot.com/2011/03/three-reasons-vendors-get-it-wrong.html
http://ancestryinsider.blogspot.com/2011/03/chasm.html

The breakdown of doing genealogy into three phases as outlined in these postings is useful and is a clear demonstration of the need for a better genealogical data model.

As you do genealogy further back in time you have to shift from a person-based methodology to a records-based methodology. Or to put it in the terms I use, you must shift from a conclusion-based methodology to an evidence-based methodology. Or to put it in process terms, you have to start doing real research and follow a reasearch-based paradigm. The "chasm" is the gap you must cross when you realize you have to make the shift to evidence-based methodology.

Most of the current generation of genealogical applications assume only the person-based methodology, so they are great for recording yourselves, your spouses and your children, your parents and grandparents and great grandparents. But further back in time, when you have no direct knowledge, or even solid evidence for your ancestors, and you must shift into the "advanced" genealogical world of real research, searching for records, analyzing the records, combining records in the ways that you feel best document past ancestors, but at this level of research you can no longer be sure of anything.

GEDCOM was designed for the person-based, conclusion-based phases of genealogy. It can't handle the records-based, evidence-based phase that we actually spend most of our time in.

The main motivation for Better GEDCOM came from the recognized and easily demonstrated inability of genealogical systems to share data because of the inadequacies of GEDCOM. With the idea that most of those inabilities came from misinterpretation of the GEDCOM standard, errors in implementing the standard, and lack of standards for some areas that caused vendors to add custom extensions to GEDCOM.

My main contention for the inadequacy of GEDCOM has always been a much larger point, that is, its inability to cross the chasm to the records-based world, the real research world of genealogy. My DeadEnds model is my attempt to cross that gap. My constant harangues on this Better GEDCOM wiki are all based on my hope that Better GEDCOM will realize the gap and also try to cross it.

Tom Wetmore

GeneJ 2011-04-05T20:38:42-07:00

At the application level, would it matter what model? I'd think all software that supports drafts of biographies, family group sheets and the like would need a full record of the evidence in citations.

What am I missing?

ttwetmore 2011-04-06T00:31:19-07:00

A model either supports the current world of applications, where evidence is not embraced as an integral concept, or it supports the richer world that embraces evidence in the database. The model matters. A simple one is fine for the conclusion world. A more complex one is needed for the world that adds in evidence.

To answer what a model must encompass you have to ask what you want your genealogical application to do for you? (And, of course, you also have to answer that question.) Where do you want your evidence, and in what form do you want it to be? How do you want your applications to support your evidence? Exactly the questions posed above. DO YOU WANT YOUR EVIDENCE IN YOUR COMPUTER DATABASE OR DON'T YOU? IF YOU WANT IT IN YOUR DATABASE WHAT DO YOU WANT TO DO WITH IT? You have to answer these questions before any other discussion about the goals of Better GEDCOM or the model that Better GEDCOM should be based on makes much sense.

If you don't want the evidence in your database, and in your case, GeneJ, I really don't think you do, then all you have to do is patch up GEDCOM a little bit, incorporate some of the current extensions, add more tags to the source world to satisfy ESM templates, and be done with it. You'll end up with a better (lower case 'b') GEDCOM that allows more sharing than the current GEDCOM, but a better GEDCOM that is wholly incapable of supporting models that "cross the chasm" that was the beginning of this thread.

If you want the evidence in your database, in a form that supports record-based genealogical research, then you have to answer my questions and then decide how to extend a GEDCOM-like model to handle the additional ideas. It's not a complex extension. All you really have to do is make person records be able to refer to lower level person records that contain evidence information. Ditto events. That is, if you first go along with using persons and events as the mechanism for holding your evidence in text form.

Better GEDCOM must decide the answer to these evidence questions before any work other than GEDCOM tweaking can get done. If the people here on Better GEDCOM would do the homework I assigned above we could answer that question. I hope that most of us want our models to hold our evidence so we can keep our evidence in a useful form in our databases. If this is not the case then those of us who insist that our evidence "be computable" will need to find another route.

A lot of this is already moot. If you look at the data structures now used in the New Family Search trees, you will see that they have already added an infrastructure to support records-based genealogy. They call these records "persona" records internally (and just records externally -- get it, records as in records-based) to distinguish them from "person" records (which they call person records externally -- get it, person-based records). Unfortunately, they don't actually ever insist, require, or even recommend that persona records be used to hold evidence, but fortunately much of their data does get structured that way. New Family Search doesn't really have a formal model behind it. It's defined as XML-based API's. You can see the murky outlines of a model by looking at the structures of the XML entities passed through the API's, but it's an after the fact model. If Better GEDCOM wishes to provide a model that would support New Family Search type of applications, it will have to add in the evidence components as I have been describing them. Likewise, Ancestry.com, and now many other organizations, are providing software or services that in one way or another use evidence records. Ancestry.com has features that are becoming ever more sophisticated to search out and suggest evidence for you based on the properties of the persons you seem to be most interested in. If that evidence were not already transcribed into textual form and organized into evidence-person records, exactly as I have been advocating, NONE OF THESE REVOLUTIONARY CAPABILITIES would exist.

Right now New Family Search and Ancestry.com don't export their data to mere users like you and me. But if they did, they would have to use a format that supports both record-based and person-based data. GEDCOM doesn't hack it. If there isn't a Better GEDCOM around to help define the model, then we'll be forced into a world of adhering to custom formats specified by the gorillas. Frankly it's probably too late already. But if Better GEDCOM really wants to stand up for itself, instead of just huff and puff on the sidelines, then this is where the real action is going to be.

Maybe this vision is just too much for Better GEDCOM. If all Better GEDCOM wants to do is figure out how to make Family Tree Maker share data with Roots Magic and vice versa, and all the other n-squared combinations, and make sure ESM citations stay the same when transported to a new program, and not tackle any of the thorny records-based issues, it's understandable and even laudable. But if Better GEDCOM wants to support a model that will be adequate for the current world dominated by the LDS and Ancestry.com, and the future access that we will have to massive record-based databases, then it has to go further.

GeneJ 2011-04-06T11:39:53-07:00

Humm... If I were FamilySearch, with "Millions of rolls of microfilm[ed] census, vital, probate and church RECORDS...," being indexed by volunteers all over the world,[3] yet unable to correctly identify my own internal collections,[1][2] what would I do?

See:
[1] http://theycamebefore.blogspot.com/2010/12/closer-look-at-familysearch-historical.html

[2] http://theycamebefore.blogspot.com/2011/01/please-lets-not-wiki-familysearch.html

[3] http://www.familysearch.org/eng/indexing/frameset_indexing.asp [emphasis added]

AdrianB38 2011-04-06T14:22:06-07:00

"At the application level, would it matter what model? I'd think all software that supports drafts of biographies, family group sheets and the like would need a full record of the evidence in citations. What am I missing?"

For me, the missing bit is - how you get there.

It's all the research objectives, log, inputs, outputs, conclusions, etc. I'd like all that lot in my application for my purposes. Right now, I get the impression that some software does some of this. But because there's no common model in the applications, it's inconsistent between apps or missing. I think.

If BG defines the model that embodies the research process in the diagrams that we've seen references to, and embodies the GPS (though since that's a process with inputs and outputs defined at a very high level, I'm not sure how realistic that is) ... if it does define that stuff then - even if the take-up is bit at a time - it gives a common standard to work towards that is robust.

Let me quote myself from the new Intro to Goal and Reqts Catalogue:
"Many of us believe that BetterGEDCOM has the opportunity to set the common framework for recording the research process, in the same way that GEDCOM set the framework for recording the results of that research. And if that happens, then we may see many more people handling their research in a robust manner, which can only be of benefit to the study of genealogy"

testuser42 2011-04-06T17:10:40-07:00

IMHO BetterGEDCOM needs to be as "big" as possible. Only then can we hope for it to become a widely used standard.

Since there are more than a few people who really want to have their evidence recorded in a digitally useful way, the standard must make this possible. Same with documenting the research process, same with usable citations etc.

And the nice thing is - in going with the "biggest" model, any other models with "smaller" needs will still be working just fine. The other way around it won't work.
Sometimes bigger is really better ;-)

ttwetmore 2011-04-07T01:02:54-07:00

GeneJ says, "Humm... If I were FamilySearch, with "Millions of rolls of microfilm[ed] census, vital, probate and church RECORDS...," being indexed by volunteers all over the world,[3] yet unable to correctly identify my own internal collections,[1][2] what would I do?"

An interesting conundrum, but I wonder how it bears on the nature of record-based versus person-based genealogy and the chasm. I was hoping you might chime in and answer my homework question. You seem to be one of the only Better GEDCOM members who doesn't want evidence records in your database. Don't you want to advocate your position?

GeneJ 2011-04-07T08:18:00-07:00

Tom wrote, "Homework for anybody reading this: In your dream software system, where and how do you want to store your evidence, and what do you want to be able to do with it?"

GeneJ wrote, "At the application level, would it matter what model? I'd think all software that supports drafts of biographies, family group sheets and the like would need a full record of the evidence in citations.
What am I missing?"

Separately see Genealogical Proof Standard
http://www.bcgcertification.org/resources/standard.html
The whole standard applies really, but for starters, in part, "Complete and accurate citation of sources."

ttwetmore 2011-04-07T12:41:20-07:00

GeneJ,

If that is your answer, thank you. I don't understand it though. I think you are one of the persons in the group who doesn't want transcribed information from your research records (e.g., certificates, census images, family histories) to appear in your database as evidence person records. If I remember you correctly you want to enter data into your genealogical database only after you have done your research and reached your conclusions, so every person in your database represents a real person. If so that makes you are a person-based "fundamentalist." I think that is your answer, but I was hoping you would say so so I wouldn't be making a mistake about your opinion.

It is important to me to know how the group feels about this question. This quote from the Evidence01 requirement has me VERY concerned:

"It is therefore suggested that handling of evidence data and not just conclusions, is postponed to a later release of BetterGEDCOM and the current work should simply not do anything that might make separate handling worse."

I read this as meaning that Better GEDCOM is chickening out on adding evidence, record-based support to its data model. It certainly means it's being postponed to the future. My opinion has always been that adding support for record/evidence-based genealogy should be the most important goal of Better GEDCOM, the only goal that cannot be postponed. If Better GEDCOM decides not to cross the chasm it changes from a worthy enterprise to a trivial tweak of GEDCOM. Please say it ain't so.

Tom Wetmore

GeneJ 2011-04-07T15:12:23-07:00

I'm not your fundamentalist. I have "who's this" tags and roles. I'm the one with 40,000 citations.

In my mind, there is a difference between "record tags" and "evidence person tags" and even "research tags."

Identity is a priority. I think new records that might be evidence (any form), should be reviewed against all the other evidence (all forms). If someone doesn't have the time to make that review or can't confirm identity from that review, then I think they have a research note.

Record tags: I went through a "record tag" phase. They don't work for me in a biography and it takes as long or longer to get rid of them as it did to create them. I could have you in stitches with stories about attempts to put records in a database or spreadsheet. If only I'd had admin-research a long time ago.

In another life, the term, "scrubbing" referred to the process of working a file before someone made an assessment. Getting the apples and oranges at least better identified. --I feel that way about "records."

It takes a little time to sort out a new record/new record group. How much time? Humm... Often about the time it takes to set up the Citation Elements, write a Citation Template and create a new master source. If longer, then I found something really cool in the process.

Evidence Person tags/conclusion person tags... and all the related and underlying and webbed and ... : You have a passion for something that leaves me in tears.

The single richest field in my entire database is probably the Citation Element, "source of the source."

ttwetmore 2011-04-07T17:46:36-07:00

GeneJ,

I don't believe there is any point in continuing our discussion about this. I don't see how what you've written here has a bearing on what I've said or asked, so I can only assume that you too don't grasp what I'm saying.

Tom W.

igoddard 2011-10-01T05:22:34-07:00

A citation isn't evidence. It's a pointer to where evidence was found (was not is - it may not be there any longer).

I'd only make one point to Tom. Transcriptions are important but not the totality of the evidence. If the image is available then it should also be included.

I've come across a few instances where the published transcript (Almondbury PRs) has a blank because the name isn't readable. But if, from other information, the question arises "Could it be xxxxx?" then going back to the image a yes/no answer is possible.

And a translation would also be useful. My Latin is largely botanical so I occasionally get flummoxed by a Latin document.

ttwetmore 2011-10-04T06:01:47-07:00

DeadEnds model has a general "info structure" idea where just about any entity or attribute can have

1. notes
2. atttributes (so all this is recursive)
3. source references
4. media references
5. dates
6. places or place references

In sources, attributes may include transcriptions, summaries, translations, markups, or anything else one might want. And they may contain any number of notes and media references. And all this goes as recursively deep as one wishes, though one would wish that one would not wish this.

You could refer to the Google protocol buffer form of DeadEnds to see the overall structure:

http://bartonstreet.com/deadends/DeadEnds.proto

GeneJ 2011-04-05T09:03:16-07:00

I fail to see how a comprehensive GENTECH-like model is superior rather than just different.

At least as far as I have been able observe, the GENTECH-like approach emphasizes select information found in a source over all the information in and about the source, and over all the evidence (direct, indirect and negative).

I'm a sponge--I want it all, including all the evidence and all the information about the source.

When I put on my BetterGEDOM hat, however, it's not about which model works for me. I hope BetterGEDCOM is inclusive--that it helps users transfer genealogical information between different applications and even applications based on different models.

ttwetmore 2011-04-05T13:11:54-07:00

As far as GenTech goes, I wish we could just say it's out of contention. It's a loose-loose proposition that I won't have anything to do with. GenTech is a non-issue.

There is only one KEY ISSUE that must be resolved in developing the Better GEDCOM model; everything else is trivial. And that issue is --WHERE DO WE STORE OUR EVIDENCE AND IN WHAT FORM DO WE STORE IT?. Do we want to store the evidence on our computers, or just as paper in notebooks or file folders. And, if we want to store our evidence on our computers, do we want to store that evidence as images of the physical evidence, or in some form in which the information in the evidence can be easily processed by a computer? Or in both forms?

My answer is clear. I want my evidence on my computer. I want access to the images from my computer. But I ABSOLUTELY INSIST that the evidence MUST ALSO be in a textual form so I can search it, index it, view it, cogitate about it, rearrange it, and compute with it. If these are not also Better GEDCOM's answers, then I believe Better GEDCOM is not a worthwhile endeavor. I have the feeling that some on Better GEDCOM do not agree with my ideas, that they would be happy just tightening up GEDCOM a little bit, by clarifying rules about names, places, dates; by enhancing the source records so they agree with the "world according to ESM"; making it look sexy by putting pointy XML brackets around everything. If this is indeed an intellectual clash going on about Better GEDCOM, I think it's better that we get it out into the open so we can discuss it.

Homework for anybody reading this: In your dream software system, where and how do you want to store your evidence, and what do you want to be able to do with it?

I'm going to assume that we will decide we want our evidence in our computers in some textual form. Given that, WHAT FORM SHOULD THE EVIDENCE BE IN? This is the ONLY IMPORTANT question that Better GEDCOM has in front of it. Better GEDCOM's answer to this will make it or break it. What should Better GEDCOM's evidence records look like? My answer is very simple. It is almost as simple as GEDCOM. Read the DeadEnds model document once more if you like. I will state the answer here as a fact, not as an opinion, because it is the only way this can be done that makes any sense -- evidence records MUST BE records that TRANSCRIBE textual information that is taken directly from the real, physical evidence, and that is then stored in text-based computer records that represent collections of facts about persons and/or events and possibly a small set of other entities.

This is what "record-based" genealogy is. These are those "records". We create records that represent the entities that we discover in our evidence. We find a birth certificate -- that's the physical evidence. It's paper so we keep a copy in our files. We're thoroughly modern Millie's so we also scan it and keep a copy on our computer. BUT, BUT, BUT we also transcribe information from that certificate, creating a person record for each person mentioned on the certificate, and an event record for the birth as a whole. These are the records in records-based genealogy. We never change these records. They are evidence. These are what I have been calling "evidence persons" and "evidence events". If we do not have these records in our database, we don't have anything we can COMPUTE WITH. If we want to cross the chasm, we must have these records available to us. I want to cross the chasm.

I have been describing this model since I joined Better GEDCOM last November. Do people agree with it? Am I shouting into the wind? If you don't agree with it, can you say why? Can you propose an alternative? And I mean a real alternative, not just some loosely expressed wishes and hopes and ideas. A real solution to the real problem of representing real genealogical information. If you just want to tweak GEDCOM so it can handle all the different current extensions while keeping ESM happy, please say so. What do people want Better GEDCOM to be?

A Better GEDCOM based on the model I've expressed here and elsewhere, can handle the data needs of every current person-based (conclusion based) genealogical application. Therefore, it can meet the original goal of Better GEDCOM -- the ability to transfer all data between all programs with all currently existing models, without lose of any data. BUT, BUT, BUT it also will handle the NEXT GENERATION of RECORDS-BASED (evidence based) genealogical applications. It's present-proof and future-proof at the same time.

AdrianB38 2011-06-26T04:47:55-07:00

Future direction of BG?

I need to ask the question - what is BetterGEDCOM currently aiming for and has it got any chance of reaching its goal?

let me add two recent quotes from http://bettergedcom.wikispaces.com/message/view/Research+Process%2C+Evidence+%26+GPS/39718788?o=20

Geir said "I think I have learned during the work on BG that it is unlikely that it will be the current contributors to BG that will decide what gets implemented in programs. I see our main role at the moment to describe possible extensions to GEDCOM ... If we are able to design functionality in such a way so that implementations not choosing to implement a certain feature, can interwork with those who have implemented it, that is an added bonus"

And Louis said "To me, what seems to be going on at BetterGEDCOM is very abstract. ... I'd like to see much more concrete work done. Something that might lead to a specification that BetterGEDCOM can recommend to the Genealogical community"

Now - I respect both those guys, and both are valid viewpoints. But what is the BG community's aim and view?

When I joined BG, my _perception_ of what it was about was a thorough overhaul of the GEDCOM structure, replacing what was there and ensuring successful exchange of data. I did not understand it to be about enhancements to the GEDCOM "language" that simply stayed within the existing top-level entities defined by GEDCOM, nor about designing potential extensions that were optional.

Now, if the BG community wants to either change or clarify the aim, it is free to do so, but it needs to be a conscious decision. Do we now reject the idea (from the Goal), that BG "will be more comprehensive than existing formats"?

To take Geir & Louis' comments:
- optional extensions seems to fly in the face of interchangeability - we already have optional extensions in the way apps handle citations and that was one trigger for BG.
- concrete work seems to imply actually changing the GEDCOM spec'n right now, rather than data modelling, and that implies the small changes that we can - perhaps - agree on, not a more comprehensive "language".

Whichever way you want to go, BG needs to agree.

We also (I'm afraid there's more) still have no reason for any of the software suppliers to come on board. One of the major players, FamilySearch, are sitting on the GEDCOM standard, it seems to me, and have no reason whatsoever to promote the exchange of data outside FS - or allow anyone else to build on GEDCOM to do so. Even if we were to propose minor, pragmatic changes to GEDCOM, it would be entirely unofficial without their agreement, reducing the chance of any uptake. Even the IT literate genealogical community has a substantial body within it that sees the only way forward as donating all their data into a collective, thus avoiding the need to exchange data between individuals outside the (Borg?) collective.

So - what do we want to do? Can we do it? Is it worth doing? Why should anyone listen to us?

ttwetmore 2011-07-12T01:39:24-07:00

Louis,

Yes, we don't see eye-to-eye. I think I can summarize the difference by saying that I believe the person tree idea, aka the persona idea, is the one critical key to the Better GEDCOM model, and that you see little of importance to the idea. It is the only way to get useful support for the E&C process into genealogical software. This is a fundamental difference that cannot be reconciled. I believe you are completely wrong in discounting the idea, and that you are amazingly blind to a tremendous amount of previous work that has been done that demonstrates the importance of the persona concept. I will never be able to agree with a model that does not include the persona as a full citizen of the data model.

I believe it is impossible to compromise our positions. I believe more of the active members of BG agree with me than with you. You reject the idea, GeneJ calls it a Frankenstein, but everyone else who has commented has expressed support. I believe your underlying core thought about BG, which is to modify GEDCOM to the minimum amount necessary to support your ideas for the development of the Behold program, is flawed, is rather self-serving, and would lead to failure.

However, I am not interested in arguing this anymore. If I have not been able to convince you with all that I have written before, I will never be able to. I now foresee BG taking a long trip into strange territory that it will never return from. Sorry I have to say that, and I sincerely hope I am wrong. I'm just too darned old to care about this enough to go to into battle once again. I leave the field to you.

Dovy 2011-07-12T16:40:29-07:00

I am from the company behind AncestorSync. However, I will address the above concerns from my personal experiences and opinions. I will also signify our interest in being involved as a company that will support a BetterGEDCOM standard.

I have been developing web applications and user interfaces for over 11 years. My love is the internet and the "user experience." During that time I have worked with many "firms" and learned that the traditional methodology of development isn't as satisfying a product as those created with a methodology of Agile development as Myrt outlined above. Many very successful companies employ this development method since it works. Companies such as Google, Microsoft, and Apple.

Some key principles I have learned which can directly correspond to BetterGEDCOM are as follows:
-Planning out every piece of the model before you develop anything often creates a rigid product that may not even solve all the problems that were the basis behind the conception.
-Over complicating leads to confusion for developers/users and a greater barrier for adoption.
-Talking about things gets you no where, you need to develop something then make iterations and improve it. That may even end in you scrapping the original product and beginning anew. This is acceptable in the Agile model because you haven't spent endless time planning every piece.
-You can't put everything you want into one package. The most successful products are often the most simple. Some products supersede this due to the marketing power of the company behind it, but this an exception to reality.

I want BetterGEDCOM to succeed because, thought I doubt it will be adopted by everyone for some time, it has the potential to solve many more of the international problems not addressed by the current powers that be. Genealogy programs are often so US based, it's surprising. Unless there is a real reason and successful model released, this pathern will perpetuate.

So that you are aware, my family is from Lithuania and my wife's is from Greece. We have every interest, personally, in hoping for a better mapping of genealogical information. The more I learn of my Lithuanian heritage the more I realize the ill-planning of the popular genealogical program in supporting even the naming conventions of other nations. It is quite unfortunate. Not everyone is english or had the same surname despite gender.

Yes, it is true we as a company would love to add the BetterGEDCOM standard to the list of programs/standards we support. We will, if supported by your community, create a stand-alone converter for developers to migrate their personal formats (those which we support) into the BetterGEDCOM standard. We believe this will reduce the fear of adoption of your standard.

Quite honestly we created AncestorSync with the mantra be as agnostic as possible. We hope to parter with many, but not limit any format we support. We want to facilitate communication. Regrettably, my experience in the Genealogical community is there is too much disagreement and not enough cooperation. We will support and integrate anyone who wishes to work with us, and open to them all of the website providers we support.

It is great to be passionate about something, but if your passion leads to arguments you're going to end up no where. History proves that again and again. Work together or realize you will not succeed in even your personal desires.

With that being said I think this group has immense potential. You need to refocus and create a basis whereby all your "plugins" could fit. Realize you're all working together. If you could get past your own ideas and look at the overall goal I think you will all be satisfied in the end.

We'll do our best to support you and I will do my best to give you insight. AncestorSync will happen. My hope is that BetterGEDCOM will also.

gthorud 2011-07-12T18:33:48-07:00

I have a question about "the survey". What is the point in splitting it into 4 steps? It seems to me that two steps should be enough for now, short term - long term. I don't see any point in discussing, at this stage, if something should be in step 2 or 4, 2 or 3, 3 or 4.

The problem with deciding what is long term and short term is that you will get as many answers as there are participants. What we must focus on are the thing most/all agree on, and that also have enough interested people that are prepared to do the job. If something gets a lot of votes for "short term", but no one is willing to do the job - it is automatically "long term".

GeneJ 2011-07-13T13:28:22-07:00

Oooo. I want to "+1" Geir's posting.

Andy_Hatchett 2011-07-13T16:41:59-07:00

I've seen mention of the "survey" but no link to it- is it up yet?

DearMYRTLE 2011-07-13T17:21:25-07:00

survey not up yet... we've literally been bailing out our neighbors from the third flash flood in since Sunday night.

Andy_Hatchett 2011-07-13T17:39:09-07:00

Yikes! Sounds like what we had last year!
Stay Safe.

Christine_E 2011-07-14T15:50:26-07:00

I hesitate in supporting the splitting of BetterGEDCOM development into phases, but maybe someone can tell me what I'm missing by seeing it this way:

BetterGEDCOM is not a "new" product. It is more of an upgrade to older GEDCOM. It is not software, so I don't don't that the software development cycle can apply (especially in estimating remaining time to do something you've never done before--how many here have _updated_ standards before?). We're starting with known inconsistencies in existing software and trying to see how a standard would prevent/minimize them.

Since this development is open to everyone, the first release would have to be posted here. We may get feedback on the first version, which we would assumedly "fix" before going on to the version. Meanwhile, some vendors may start writing software so they can be the _first_ to be compliant. I'm not sure it that will cause problems or not. What if they sell it at that point when BetterGEDCOM isn't finished? Will there be a test suite available? Or are versions before the completed project just "drafts"?

I'm having a hard time understanding how BetterGEDCOM can be developed in phases.

GeneJ 2011-07-14T22:22:08-07:00

Hi Christine,

As you probably understood from the meeting, I have my own take ...

(1) I would prefer folks just log into the wiki and express their opinions. The wiki forum allows us space to clarify (such as whether something is or is not possible). Each person can start their own discussion if they wish.

(2) Have we put the cart before the horse? Most members haven't even seen Geir's work on citations. Nor have we asked Geir and Adrian what they believe is really possible. In that same spirit, what members can do to maximize the value of the effort.

For example, there seem a number of details that members can help with--Geir posted a request that members help gather the details about privacy settings. (Does that kind of request should go to all project members using the wiki mail feature?) If we free up Geir and Adrian from some detail work, are they able to complete the project overview?

It seems to me we'd get further with a "staging" discussion if they had completed the overview and we had their advice in hand.

(3) "BetterGEDCOM is not a "new" product. It is more of an upgrade to older GEDCOM" ...

GEDCOM isn't just older, it's ancient. ::grin::

We don't assume vendors will make an investment in developing for BetterGEDCOM if the standard only makes it easier for users to move to other software. We want BetterGEDCOM to be forward thinking enough so vendors can develop into the standard. (Geir says this better than I do.)

Hope this helps. --GJ

louiskessler 2011-07-15T10:32:23-07:00

GeneJ,

You said: "We don't assume vendors will make an investment in developing for BetterGEDCOM if the standard only makes it easier for users to move to other software."

On the contrary, users want to have a program that they will not be locked into. So it is to the vendor's benefit to have in place a way to make it easy for users to move. And the smart vendors will incorporate it.

Louis

theKiwi 2011-07-15T21:25:26-07:00

Just so long as the vendors put in both sides of it - the sneaky ones might only put in the part that gets data into their software, but not make it possible to then get it out again. <g>

I will post a report on how Reunion handles Privacy/Sensitivity, but it won't be for a day or two.

Roger

louiskessler 2011-07-16T07:22:18-07:00

Roger,

If vendors make an import routine that they can't export and then import again and get the same results, that news will get around.

This accounts for most of the current problems with GEDCOM. It is not implemented correctly and either the import or export or both are inadequate.

Why? I suspect because most vendors don't take the time to understand GEDCOM. They implement the parts they see fit and don't even bother checking to see that those are correct.

Part of the reason is that GEDCOM has a lot of lesser used features (e.g. TYPE or RELA tags) that they don't see a use for or have a place for in their program. For that matter, the CONC tag is still not implemented consistently, and that causes import data problems with no adequate solution.

Louis

ttwetmore 2011-07-01T10:08:12-07:00

Louis,

Thanks for that URL. I've downloaded the sources of the projects and read through them. Always fun to read others' source code. I don't think these projects are active.

gthorud 2011-07-01T17:21:15-07:00

Tom,

2. Personas from Gentech have been implemented by nFS, one way or another. Genbox has also implemented some of the functionality. But I have not checked if any of these have followed Gentech to the letter.

4. I am open to proposals for a work process that seek to resolve those agreement. Endless discussions with the same arguments being restated over and over again does not help. The fact that we have discussed Evidence&Conclusion for many months without reaching agreement on one single way to do it (but have two alternatives that may even work in synergy), does not mean that we cannot reach agreement in other areas. Also, there are areas that have not had much participation, for example citations/reference notes – and there are many in the requirements catalog.

8. I take the opportunity to mention that AncestorSync have a positive attitude towards BetterGDCOM (c.f last developer meeting) , and has offered to contribute to our work and even support things that comes out of our work – although it will be premature to be too specific.

Rumcd,

23. Tom has described some aspects of the situation. Familysearch did announce in February (at Rootstech) that they are working on what I have understood will be a replacement for Gedcom, but what it is and when it will be released has not been announced.

Regarding Citations:

24. There have been some discussion about this, in several places of the wiki, one of them is http://bettergedcom.wikispaces.com/EE+%26+GPS+Support where we, among other things, have looked into the citation functionality of existing programs. Many of the most popular US programs have implemented support for Evidence Explained (EE), which many consider the de facto standard for genealogy citations in the US, but since they have chosen different approaches it is not possible to exchange that information using Gedcom ( Gedcom does not support it). One of the issues we should work on is how EE can be supported.

There are several other issues:

- 25. EE is US centric, it will not work for sources in most other countries, and a citation using the English language does often not look good in a document written in another language. My personal position on that can be found in the document linked to in the start of this discussion http://bettergedcom.wikispaces.com/message/view/Fix+the+Transfer+Problem/37421050
But I stress that everyone may not agree to everything I have written. A solution that would work internationally is desirable.

- 26. EE describes hundreds of source specific Citation elements (the data fields in a database that would be combined in a citations – if a database is used) and have a huge number of source types. Development of a smaller set of more general citation elements, that could be used internationally and that could be found in existing archive/library databases is one possibility.

- 27. It would be very useful to be able to more or less automatically download citation data (Source meta data) from databases on the Internet, from sites that hold digitalized source data or others (e.g. library databases - Worldcat– cf MARC). Such a scenario is described in a different document I have written http://bettergedcom.wikispaces.com/file/view/From%20repository%20meta%20data%20to%20BetterGEDCOM%20and%20reports.pdf but I would like to stress that the main purpose of that document is to map out the issues and possibilities. There are certainly things in the documents which are not realistic in the short term, in fact, it may turn out that initial work will not have download of citation data as a goal (?).

- 28. Tom has described some issues about information, that does not only specify the source and “where in source”, that some of us would like to be able to include in footnotes or endnotes (or other notes), but the exact list of what it should be possible to include is up for discussion, i.e. there are several views on Tom’s list (research notes, transcriptions of evidence etc.). Tom seems to indicate that this is outside EE or “Chicago” but that is not the way I read these manuals – anyway what’s important is what we see as useful – and what many current programs also allow. I guess the discussion is about how specific fields we should have for such information.

29. I have probably not covered everything I
should re. citations, so I hope others will fill in and also correct me. When I can get some time off this discussion, I hope to finalize a document that tries to further detail possible solutions for citations (or the somewhat extended concept of Reference notes – see EE – that we have discussed). I said on Monday that I would have that document into a “presentable state” within two weeks, but that limit has now been exceeded by a few days due to the above discussion. But if someone would like to resume the citation discussion now, please do so – preferably in the context of the “EE & GPS support” pages – see the left side of the wiki – start a new discussion or continue on an existing one.

gthorud 2011-07-01T18:11:21-07:00

Re. citations, the following blog posting by Tamura Jones is interesting reading, and also has a few interesting - well - "citations".

http://www.tamurajones.net/GenealogyCitationStandard.xhtml

IE 9.0 or another browser needed.

ttwetmore 2011-07-02T00:49:23-07:00

2. The concept of persona predates Gentech by at least 50 years. I presented a paper at the Gentech conference (1988) where the the Gentech modeling effort was initiated; I was one of the insistent parties that a persona concept must be included to get evidence into the model. The persona record as implemented by Gentech was not the "historical" concept I hoped for. However, the persona concept of New Family Search is the concept I hoped Gentech would define. I think it is safe to say that the way to tell if a modern program was truly influenced by Gentech would be to check whether it is based on a large and complex set of relational tables, and whether every record, fact, attribute, conclusion, ..., in the database is accompanied by an assertion record (it's a miracle that Gentech doesn't require assertions to have assertions -- I believe that John's name was John with surety level 8, and I believe that I believe John's name was John with a surety level of 9, and ...)

4. The reason things are discussed over and over is because there are fundamental difference, and there is an honest desire to get things right. For example, I will always disagree with the idea of extending the citation concept. I think the size of the Better GEDCOM staff exacerbates this problem. As I alluded to earlier, and not wishing to be offensive, there are a number of off the wall ideas among us Better GEDCOM'ers, and this is one of them. Let's say two of five active participants view the extention of citations to be a good thing. At a 40% accept ratio this makes the idea seem legitimate. This 40% idea is there because a "patch" has been found in using TMG by a participant who has found it an expedient way to work so should be supported by Better GEDCOM. If there were 100 people actively thinking about Better GEDCOM, we would not have 40% of our members trying to redefine citations in order exploit this workaround; in that case we would be working on a model to decide where the non-citation information should go. Better GEDCOM has maybe 2 or 3 people who are idea persons, pushing ideas, trying to explain them, trying to argue and convince, and we have maybe 3 or 4 people who are seriously considering those ideas and commenting cogently on them. Is this enough? The fact that we can get derailed by this citation idea suggests not.

8. For AncestorSync, Better GEDCOM is one more check mark to put on the list of standards they can translate to and from. They are not interested in Better GEDCOM as the standard they would use as their basic format.

28. See 4.

ttwetmore 2011-07-02T06:59:16-07:00

2. Sorry. The Gentech meeting where I gave a talk ("Structure and Flexibility in Genealogy Data Storage") and was part of the Gentech model kickoff was in 1994.

TamuraJones 2011-07-02T13:36:40-07:00

The future of BetterGEDCOM is discussions about the future of BetterGEDCOM?

louiskessler 2011-07-02T16:30:54-07:00

Let me add even one more level:

Should we continue to discuss the future of BetterGEDCOM in this topic: The future of BetterGEDCOM?

Louis

GeneJ 2011-07-06T11:00:11-07:00

@Tom: By my earlier comments, I was only trying to explain where I thought we were ... and where I thought then we were headed. Many of the questions you have asked give rise to a discussion about differences in our approaches to personas, citations -- maybe genealogy in general. Hoping it's okay, I'll respond with a separate posting to my genealogy methods page, "Goal oriented research." Perhaps before the Monday meeting, I'll add some additional comments in a discussion to the "About Citations" wiki page, too.

P.S. You wrote, "I beleive the conclusion person is the basic person concept of all genealogical software today. It is the record that holds all we believe we know about a real person who lives or lived and may contain any kind of notes about the person. I believe it is fairly well agreed to."

...I called it the sweet spot. With most modern software, we can record a "conclusion" and reference our source(s) in one or more reference notes. If the understanding of that event or pfact changes, we can update the various information accordingly. See the page, "Goal oriented research."
http://bettergedcom.wikispaces.com/Goal+Oriented+Research

Christine_E 2011-07-11T12:38:10-07:00

I was at today's developer's meeting where the future of the project was discussed. I'm new here and don't yet know everyone, so I'm sorry I don't know who else was there, so don't know who to acknowledge for some of the following comments.

One person suggested that it be a phased approach with a partial product being developed and more being added in future versions. Myrt arbitrarily picked 4 releases and what should be available in each. She tried to make a chart but decided that it should be done offline and voting could occur.

I disagree with this approach since I think the first release should contain all features that exist in existing genealogy programs, else users of program x would not have any incentive to update to the BetterGEDCOM if they will lose data during the data transfer. Someone in the meeting has a chart showing which programs have which features.

Of course, the first version would also have to implement unicode, data about persons, a lot of the syntax requirements, etc. So it is hard to leave most of the requirements out. A few could be left out, but not many.

Also mentioned at the end of the meeting by a developer is that we could build a quick prototype to help flush out the requirements (at least that's what I understood). This is frequently done on large projects so that users could feedback before any design/programming work is done. What this usually is, is a set of user screens (what the user would see) with drop down menus, checkboxes, radio buttons, field names, etc. There is no programming/processing behind the screen, just navigation between the screens. This does not mean that the finished product will look and feel like the prototype screens, but it could. I think this is an excellent idea to help us get moving.

It was also proposed that we have a day-long meeting (maybe 6 hours) sometime in the future. The prototype could be built/modified during this meeting.

DearMYRTLE 2011-07-11T14:43:57-07:00

Christine, rest assured that the division of the project into 4 workable segments discussed during the Developers Meeting today didn't mean all would be decided in the off-wiki survey, or that all would be decided in this coming week.

That survey is just a tool to assist in breaking into the next segment of our work at BetterGEDCOM. It took the Excel worksheet scenario in our meeting to get the point across about what we are looking for in this next level of participation and focus.

As we move from the discussion of necessary elements to actually devising versions of BetterGEDCOM, discussion is required. Planning, dividing the project into workable "sprints" and "action" must also be taken or we will be left in the "product backlog" discussion stage forever.

-- Product backlog - wish list.

By planning which features will go in each release of BetterGEDCOM, we ensure that as many desired attributes (from user input, etc) are implemented (read that also "not forgotten") as we focus on each individual product version release.

I recommend viewing the following YouTube video to see where I am going with the BetterGEDCOM project:
http://www.youtube.com/watch?v=Q5k7a9YEoUI

BetterGEDCOM must move from a "going around in circles" to actually making BetterGEDCOM project versions with scheduled release dates. Based on our work, those release dates and the elements those versions will feature may be revised.

Thanks for recently joining the BetterGEDCOM project. This is an exciting undertaking, and we're on the verge of making some real progress toward a working copy of a solution for genealogy data file sharing.

BetterGEDCOM must address current compatibility issues AND set the standard for genealogical data storage.

louiskessler 2011-07-11T16:40:23-07:00

Pat:

I was very encouraged by the meeting today. In fact, if the goal is to come out with a first version of BetterGEDCOM by November then I'd be extremely happy. (But I'd call it Version 1, or at least 0.9, rather than Tamura's 0.01 which sounds like there's still 99% to go).

I took about 3 hours (at Starbucks) this afternoon to go through the 32 printed pages of Geir and Adrian's BetterGEDCOM Requirements Catalog, which I think is very well done and covers most of the issues.

I wrote down on my papers how I think the majority of them could be covered in Version 0.9/1.0 with the 4 changes to GEDCOM that I suggested at the meeting. It is very workable and doable and it will respect GEDCOM for backwards compatibility but will enhance it to eliminate many of the concerns. It will do so while not imposing too great a burden on programmers. It won't handle all requirements (mostly the ones that extend GEDCOM) and the remainder of items can be left for future versions.

Now, Pat, yes I do think GEDCOM is pretty good. LDS put a lot of years and thinking into it and you have to study it in detail and read that history to understand that thinking. So I was very happy when you said to address what was wrong with GEDCOM first. To me that's a workable version 1.0. If LDS would have continued, their next version (after 5.6 which was published) might have been something like what I would propose.

I'm a programmer and a logical/practical guy who has done this sort of work before. I'm working with GEDCOM every day with my program and I've seen GEDCOMs from 100 different programs. I know what other programs are producing, and where they are messing up their GEDCOM output. Yes we documented most of it, and it is all in this Wiki somewhere buried amongst all the philosophical discussions.

This is not a simple job, and I expect it would take a minimum of several hundred hours of time to fully spec out a BetterGEDCOM 1.0 by a few people that would need to be heavily involved (as Geir and Adrian and GeneJ and Tom have been).

I figure it would take me about 6 hours to spec out a proposal for Version 1 based on my 4 changes and refer with examples to how it would solve the most important requirements in the requirements catalog.

But I have a full time job and I'm in the throws of trying to get version 1.0 of Behold out. If I was to invest this sort of time, I would need a little buy-in from many people and constructive criticism to make it work.

The one furthest away from me in ideas is Tom, who ironically is the other programmer here. He and I still don't see eye-to-eye on the persona/conclusion person and have a different viewpoint as to how the evidence model should work. But if I'm to be able to propose an initial model to the group, I'll need a everyone to give me a little rope and have an open mind.

I'll want criticisms in the form of concrete examples of things that are not handled by GEDCOM and still are not handled by the new model, other than the requirements that will be deferred to later versions.

Now the caveat: There still isn't consensus here as to whether or not to do this incremental version. Some still want a rewrite. We must establish this first.

Then, would people allow me to come up with a 0.8 version that can then be moulded by the group into 0.9 or 1.0.

I'll need to know before I invest my time.

Louis

DearMYRTLE 2011-07-11T17:14:03-07:00

Thank-you for spending so much time today, Louis.

Let's follow through with the survey, which is about to be published.

We are having another flash flood in our neighborhood, and I am going to help my three unfortunate neighbors with the kids for the next few hours.

I like especially the concept of a version .8

ttwetmore 2011-06-26T09:12:08-07:00

When I joined Better GEDCOM I assumed the goal was to generate a model for genealogical data that could store and transmit more complete data than GEDCOM could, with no ambiguity in semantics. I assumed that the overall idea of GEDCOM records as collections of structured attributes would not change, that we might take basic GEDCOM and add a few more record types, add a few more attributes, and shore up the syntax and semantics so there would be no need for vendors to add extensions. I also assumed that the actual GEDCOM syntax would be replaced by XML or JSON or some custom representation, though frankly, I have never had anything against good ol' GEDCOM syntax, as long as Unicode becomes the default character encoding and ANSEL is not longer used.

Since I've been working in the area of genealogical software and data models for some time I knew I would want to add a few more record types to GEDCOM, including events and places. And I wanted the model to be able to hold codified evidence as well as the usual conclusion records (INDI and FAM) of GEDCOM. After some thought and experimentation I found that this latter wish could easily be fulfilled by allowing person records to be able to refer to "lower level" person records that were created with codified information extracted from evidence. Since records of this codified evidence type are now so widespreadly available from so many sources, I assumed this idea would be obvious, and have been surprised that so many don't understand the idea. (Essentially all records retrieved from every web site that serves up genealogical data today returns those records in the form of codified evidence. Codified evidence has become the lingua franca of the genelogical world. It should be obvious to us that BG must speak that language.)

Frankly I think the core purpose of Better GEDCOM has been distracted by the areas of citations and work products. The GEDCOM model for citations is inadequate, but by simply adding more fully defined attributes to accommodate the ESM needs for describing sources, the problem is solved. I have described how easy this is to do using source references. The idea that we need a citation record to be the place to store research notes, proof statements, summaries of evidence, text that we want formatted into reports, has taken us far afield from the basic needs of storing genealogical data and backed us into a corner with no clear exit. All we need for these things are free format notes that we can attach to person records and source references. We might need a few different tags to indicate the type of note, but that should be easy to decide. I've added notes like these to my data for twenty years, notes to show up in biographical text, notes to show up in footnotes, notes to describe reasoning and how I've resolved inconsistencies, notes to remain hidden because they describe personal information that should not be shared, and so on.

I agree with the sentiment I sense that we are not getting anywhere fast, and that our relevance score is dropping. The world is passing. Ancestry Sync is a new service which has tackled the problem of being a clearing house for data for transport between all major genealogical web services and big trees and all client programs. They have accepted the fact that a new genealogical data model will not, certainly in the foreseeable future of many years, have an impact on genealogical transfer, so they are solving the problem the old fashioned way. They are probably defining their own internal data model and then writing translators to and from that format to every big service API and client program API. A good implementation of this idea, if it is possible, essentially obsoletes Better GEDCOM. Family Search is getting closer and closer to its new tree formats and models.

"What do we want to do? Can we do it? Is it worth doing? Why should anyone listen to us?"

GeneJ 2011-06-27T09:54:36-07:00

Although in a different form, some of the comments in this thread were discussed at the last Developers Meeting.

Have added this thread to the agenda for today's Developers Meeting. Hopeful those commenting are able to join in.

See "Proposed Agenda" at http://bettergedcom.wikispaces.com/Developers+Meeting

Separately ...

In advance, please pardon my use of obviously inferior layman's terms and descriptions.

Without citing discussion threads, I assume we all recall the different opinions about whether BG should begin with a blank sheet of paper and diagram the "future" genealogical process as we saw it. (As opposed to beginning with the diagram of existing GEDCOM and making modifications to same.)

Although some effort was made to encourage a dual path, prior to an early April developers meeting, it was suggested all other work should stop until the work on E&C was clarified. My take on the early April meeting was that we would consolidate our efforts on E&C.

At least from my perspective, E&C initially took a "blank sheet" approach to designing a future process.

A diagram began to unfold, but from my perspective, the initial diagram did not advance a process that "got us," to what might be described as the "sweet spot" that exists in modern software (largely associated with by current GEDCOM).

While we all had different perspectives on the gap, I saw a need to incorporate recording logic at a higher level, beyond "I think this is the same person because."

Said in another layman's way, initial E&C seemed to diagram a process of compiling documents and associating records, but it did not yet advance steps by which "data" becomes "information" and information becomes "knowledge." Finally, the process by we record our "knowledge" and document same with "evidence"--as this lay person defines that term for the purpose of modern genealogy.

While no one software package on the market may today meet all conditions all users want, at least I believe a sweet spot developers of modern software hope to offer users is this ability to record knowledge and associate same with evidence (again, as the latter term is used in a modern way).

From my perspective, Geir, Testuser and Adrian sought in late April and May to find ways to bridge this gap, and perhaps others, as part of Defining E&C for BetterGEDCOM.

At one point, I thought we were close to consensus among the technologists and wiki-ite developers about how to bridge the gap, but objections were raised.

As I understood it, the theory was advanced to look at the process in three steps. Allow me to use my own words ...

Step one ... add codified record data to the database, with each bit representing a unique fake person. Let me call this the record persona.

Step two ... using the logic "I think these are the same persons because" group together the various record personas to form a "conclusion person." [Like a "compilation" persona" (my word)]

Step three ... apply higher forms of logic, reason, and research methodologies to essentially redevelop the conclusion person (above) into a "biography person."

In haste here, in the above three-step approach, I just don't see what I otherwise consider the sweet spot of existing modern software--it's as though we've erased it.

The technology by which "conclusions" become documented in existing software goes far beyond the concept of of a compilation.

At least from my perspective, there should be a meeting of the minds about modern software's "conclusion" person.

While I can't articulate it well, it seems that if BetterGEDCOM tries to dumb it down into a "compilation" person, we will break existing content (my term).

Having spoken in what seems like the alternative, it remains my real objective that we reach consensus about E&C in order to move this baby forward.

Hope to see some or all of you at the meeting.--GJ

theKiwi 2011-06-27T10:19:00-07:00

When I first heard of and joined the BetterGEDCOM community my hope was for just that - a "better GEDCOM", not necessarily a perfect one ;-)

My angle on this was as genealogist for the Clan Moffat Society I am frequently sent GEDCOM files that others produce from their software which I then have to massage - sometimes quite seriously - to get them into Reunion on my Macintosh where the society's master file resides, and from there the data goes to the genealogy website at http://genealogy.clanmoffat.org/ which is running Darrin Lythgoe's excellent "The Next Generation of Genealogy Sitebuilding" - TNG.

Due to the willingness of Darrin to tweak his code I am able to import almost all (Child Status is the one exception) of the information that Reunion can export to a GEDCOM file into TNG. Others find the same thing - the TNG code is tweaked to allow import of features that other software companies do differently from the GEDCOM standard, and from each other - things like geodata, multimedia and notes for places for example can be imported from several different outputs.

But Leister Productions, Ancestry.com, Millennia Software, Bruce Buzbee et al aren't as nimble or accommodating as TNG is, so I had hoped that BetterGEDCOM might provide a common basis for these things that developers might be nudged towards eventually.

Then along comes AncestorSync - in many ways this to me is the immediate answer to my particular concerns, especially if I could have it as a Macintosh software running on my Mac (although if I had to run it under Windows on my Mac I'd do that too I guess <g>) that would let me take a GEDCOM file from software A and ouput it as a GEDCOM file compatible with software B, ideally with an intermediate step that says to me "I don't know what to do with this - where would you like it to show up?"

The more esoteric discussions have been interesting, if not entirely over my head at times, and I see merit in many of the ideas going forward, but it wasn't what I thought that at least the first output from this effort might be going to be.

Now the meeting is underway...

Roger

ttwetmore 2011-06-27T18:21:56-07:00

GeneJ,

"A diagram began to unfold, but from my perspective, the initial diagram did not advance a process that "got us," to what might be described as the "sweet spot" that exists in modern software (largely associated with by current GEDCOM)."

Could you explain what is the sweet spot in genealogical software?

"While we all had different perspectives on the gap, I saw a need to incorporate recording logic at a higher level, beyond "I think this is the same person because.""

Can higher level logic be placed into notes that are attached to normal data records? Or does the BG model need special records for recording and holding higher level logic? What is higher level logic?

"Said in another layman's way, initial E&C seemed to diagram a process of compiling documents and associating records, but it did not yet advance steps by which "data" becomes "information" and information becomes "knowledge." Finally, the process by we record our "knowledge" and document same with "evidence"--as this lay person defines that term for the purpose of modern genealogy."

I don't see the point about data, information and knowledge. The historical and scientific methods are based on evidence and conclusions. It seems to me that evidence and conclusions are the data, information and knowledge you are talking about. I believe we have covered these concepts from the beginning. If you believe the models do not handle data, information or knowledge, could you explain why and how you would change them so they do?

"From my perspective, Geir, Testuser and Adrian sought in late April and May to find ways to bridge this gap, and perhaps others, as part of Defining E&C for BetterGEDCOM."

The E&C process has been defined, with no gaps or bridges, since I wrote up my first descriptions of how the DeadEnds model can implement the E&C process. That was November or December.

"Step one ... add codified record data to the database, with each bit representing a unique fake person. Let me call this the record persona."

I agree, except the term "unique fake person" seems odd. The record is derived in the sense that it is codified from evidence, but it is not fake. This would imply that all records retrieved from Ancestry.com and Family Search, among others, are fake.

"Step two ... using the logic "I think these are the same persons because" group together the various record personas to form a "conclusion person." [Like a "compilation" persona" (my word)]"

I agree with this as the deductive step required by the scientific and historical methods.

"Step three ... apply higher forms of logic, reason, and research methodologies to essentially redevelop the conclusion person (above) into a "biography person.""

The biographical persons are the subset of the conclusion persons we evenutally write up in our research results. When I split things into three levels before it had to do with footnotes and other notes only -- at level one the job is to cite sources; at level two the job is to document conclusions; at level three the job is to provide additional notes and information that the researcher would like to see included in the work products that document their research. We only need to add a few types of notes to do this; we don't need to complexify the model with new record types.

"In haste here, in the above three-step approach, I just don't see what I otherwise consider the sweet spot of existing modern software--it's as though we've erased it."

Maybe if you could describe the sweet spot, we could try to understand how the models fail to support it.

"The technology by which "conclusions" become documented in existing software goes far beyond the concept of of a compilation."

The compilation person hasn't been included in any model so is a new and undefined concept. Are you implying there is something weak about the conclusion person concept as we discuss it? Would it be possible to explain?

"At least from my perspective, there should be a meeting of the minds about modern software's "conclusion" person."

I beleive the conclusion person is the basic person concept of all genealogical software today. It is the record that holds all we believe we know about a real person who lives or lived and may contain any kind of notes about the person. I believe it is fairly well agreed to.

"While I can't articulate it well, it seems that if BetterGEDCOM tries to dumb it down into a "compilation" person, we will break existing content (my term)."

How is Better GEDCOM "trying to dumb it down" and what is the "it" that BG is dumbing down? I see the Better GEDCOM models as trying to encompass a fuller process than GEDCOM. Everything we've done so far is a "smarting it up" as far as I see it. It would be helpful if you could try to explain what you mean.

gthorud 2011-06-29T16:54:11-07:00

Sorry that I have not responded earlier, but I have moved for the summer, and that is not as simple as packing a bag.

1. For several months BetterGEDCOM has had a participation that I assume everybody has understood will not be the one that set a standard. Setting a standard requires a much larger participation, and more participation from vendors.

2. The way I have seen it, BG could still produce some useful output in terms of “user requirements” (cf. Requirements Catalog), and could provide a discussion forum for development of technical solutions, and actually propose sollutions. I have not wanted to sit patiently and wait for a standard I have no idea about what it will contain, when it will arrive, and I am sorry to say, is very unlikely to solve issues that are “European” – although it may claim to do so. That has been the basis for my work since then.

3. I have had no expectation that we would produce a specification in a few months, that is not how standardization works – except when you let someone dictate it.

4. My hope has been that development of some solutions would generate more interest from the community. One or more draft specifications would be easier to sell than discussions spread all over the wiki.

5. The alternative was to kill BG in February.

6. Will we be listened to? That question depends on many things.

7. I don’t think anyone participating in this work should be disappointed if what we produce does not become a standard. One problem is that the genealogy industry is dominated by actors that think that all genealogy work should be carried out on their servers. They may not be interested in having a file standard, and I would not be surprised if they came up with a standard that extends the functionality compared to Gedcom, but going via servers. So if you want to do genealogy in the future, you may have to share your data with someone running a server. There will be no such thing as private data, you have no guarantee that your data will be available in the long run and you will not be able to store data that the big guys don’t think you should store. The other actors are too small to have the resources to develop standards, and may in the long run see their market share reduced as server functionality improves.

8. Another aspect is, do we address the most important problems experienced by users, and the most wanted features? Frankly speaking, I don’t think we have done that in the last months. But, there is also a chicken and the egg problem in this because few things can be developed in isolation. And, I am not in favor of finding quick solutions that will make it more difficult to have much better long term solutions.

9. But there might be other results from our work. For example, if we choose to work on the right things, it could demonstrate important shortcomings in the alternatives. And, it might inspire some vendors to implement extensions based on our work. If we can manage to document some of the ideas that are now hidden in discussions, it might be of interest to some actors.

10. Is it worth it? That obviously depends on the answer to the question above. I personally find the discussions very interesting, and as long as I have little to lose by participating – at my own speed – I think even the PROCESS is worth it, although it would be more interesting if we could get to a stage where more is actually documented in specifications. But, if it is the case, as Louis’s indicates, that all our discussions are just repeats of what has already been discussed on the GEDCOM-L, it is not worth it.

11. Can we do it? That depend on what “it” is. I think we can create interesting results even with the current participation, but a question is when and if we manage to avoid counterproductive discussions and negative statements about other peoples interests/work?

12. What do we want to do? Hope to write more about that tomorrow.

13. There is also a question, How do we do it?

Some comments on other postings.

14. Louis refers to 5 items (of which the two last are just there for archiving) and saying that that is not concrete goals. To me, the Requirements catalog contains lots for concrete goals, so I am not sure what Louis means. What would a diferent goal look like?

15. But, I agree that there are problems with how the wiki appears to a new reader.

16. Tom writes: “When I joined Better GEDCOM I assumed the goal was to generate a model for genealogical data that could store and transmit more complete data than GEDCOM could, with no ambiguity in semantics. I assumed that the overall idea of GEDCOM records as collections of structured attributes would not change, that we might take basic GEDCOM and add a few more record types, add a few more attributes, and shore up the syntax and semantics so there would be no need for vendors to add extensions.” Re. the last few words. Having read all the proposals about new functionality from Tom, I have a problem understanding the logic in this.

17. Several people have stated that BG should just try to fix a few small problems with GEDCOM, and by the way, include “my pet” major extension(s), and do it my way. The problem is that many of these “pets” have had only one or two really interested person, and a lot of opponents. BetterGEDCOM cannot be developed if it only incorporates one person’s pet. The only way forward that I have seen is to develop all those pets, and see if it would generate interest – the alternative was to do nothing – or just continue to repeat pros and cons statements forever.

18. Also, when there are conflicting “pets”, or alternative solutions - and the discussion goes on for a long time without any real progress – I see little point in continuing the discussion – it is simply counterproductive. Even if we had I voting way of working, it would not have any real meaning with 4-5 persons voting since the real decision makers, most of the program vendors, are not voting. In such a situation I think it is better to document the alternatives and issues.

19. If there are anyone that has an alternative way that would work, I am listening.

There are several more issues above that I have not addressed, I hope to do that during the next couple of days.

ttwetmore 2011-06-29T18:54:29-07:00

1. I have assumed our goal was to create a new standard. Though you make so many good points in this post, that I tend to agree that it will not happen.

2. I am not interested in creating user requirements. I am interested in implementing solutions. If BG decides its goals are to create those requirements, it might be a positive step, but it begs the question of who or what would implement them. Can you say "GenTech?"

3. I have (also?) come to the conclusion that BG will not get anything meaningful done in the short term.

4. BG is getting no participation from the larger "industry." This means what it seems to mean. No one "important" sees the problem the way we do.

6. We would be listened to if we had a powerful and compelling proposal.

7. Well said.

8. The biggest problem from the point of view of users is sharing data. The solution to that is either AncestorSync or an enhanced GEDCOM that manages to be the right superset of features needed for full sharing. Forget E&C, forget citations. Let's eat cake.

9. The biggest shortcoming we could overcome is the inability of the current home genealogy systems to handle evidence as well as they handle conclusions. This is my only "pet." It is the key to full support of the research process.

10. Is it worth it? Only if the results would be used.

11. Can we do it? Not as we are constituted.

12. What do we want to do? I want to create a data model that covers the genealogical research process.

13. How do we do it? Read the DeadEnds model and agree that it is a golden glow on the horizon.

16. Since I wrote that I have no problem with the logic. My DeadEnds model is GEDCOM with the addition of an event record and the ability of person records to refer to other person records. I don't think I've proposed all that much new functionality if it can be handled with such a tiny change to GEDCOM!

17. Some "pets" are important and some "pets" are pretty immaterial. I know the difference. You probably do too.

louiskessler 2011-06-29T20:35:45-07:00

14. Well, goals are not requirements. But you are correct, Geir. The requirements page contains goals - those "descriptions" are at an abstract enough level to be considered goals. I thought we were trying to come up with a concrete solution to each of those, and for a short while we were. But then we started veering off into abstract land.

16. And I have a similar simple but different solution than Tom has which takes GEDCOM and adds an evidence record. I'd also add a place record. But I wouldn't add an event record since in my idea they would be attached instead to people, places and evidence. This will most likely become Behold's data model.

Louis

rumcd 2011-06-30T15:44:42-07:00

So is GEDCOM dead going forward (as in no more development)? If so, has anyone heard that as a formal statement from the LDS?

I'm brand new to this group. What I have longed to see in GEDCOM (since early iterations) are right and proper catalog records for source citations. What can I say, I'm a Librarian ;-)

Pulling/pushing various MARC fields would provide this. <http://www.loc.gov/marc/>
If that could be accomplished, connections to OCLC/WorldCat would be a natural.

I will resume quietly reading the historic posts and keep quiet. Thanks for letting me on-board, wherever we are going!

gthorud 2011-06-30T17:08:16-07:00

Tom,

2. Several of the concepts in Gentech have been implemented, and there are more to come. Unfortunately they wrote a spec that even computer geeks have trouble understanding. Had they taken the time to sell it properly, you might have seen more of it implemented.

3. I have not come to the conclusion that BG will not do anything meaningful short term. Considering that we have spent large part of the time discussing your E&C proposal, I am surprised that you don’t think that have been meaningful.

4. “No one sees the problem the way we do.“ I am not so sure about that, but the big guys want to be in control, had you expected anything else. And, how would you expect anyone to see it our way when we have not even got around to describe “the problem the way we do”. The first step towards implementation and perhaps a standard is proper documentation of the concepts – you have to make a “seed”, and acknowledge that things take time to grow.

6. Your best statement so far. Yes, that is my hope too. I think we would get increased interest even if the proposal was not “in agreement” on all issues, and just described the alternatives. You cannot expect people to get involved in our endless discussions, which have been totally disorganized in most cases. If people got a proposal, and an organized way forward, I think we would get more interest.

8. AncestorSync will solve some problems, but how well it will do it remains to be seen. And it also remains to be seen if it will ever do user program to program syncing (without being “filtered” by an online service), and there is a limit to what they can do – you can’t get data into a program it is not designed to handle at all. But there are lots of possible ways that service, and user programs, could develop. I don’t think it is a replacement for BG, but see when the dust has settled. And, as you know, we spoke to “AncestorSync” in the last meeting.

16. You may consider it a minor change to Gedcom. Even if it was, it is no small change to programs.

I'll, go on with some other issues, extending my initial numbering.

20. Adrian writes about optional extensions and problems with interoperability. There are several aspects of this

- backwards compatibility with Gedcom and also, to some extent, with de facto implementations of extensions in current programs.

- do we expect all programs to implement all the things we “invent”. I am sure that will not happen any time soon.

- can data be converted to fit the implementing implementation, with loss, without loss, without serious loss? The possibilities will have to be investigated.

- is it possible to get total agreement on all proposed extensions in BG, with the current membership? Probably not.

- assume that it will become a standard, are we the ones that will be the only one to have a say on what goes into that standard? Probably not.

21. As an example, at least in my head, I do not expect all implementations to support personas or multilevel E&C or references to personas via reference notes. That’s why I try to see if this can be designed in a way that, with conversion rules, data can be interchanged with minimum consequences if there are differences between programs. Since I assume that BG would be something for programs to grow into (which is better than each program growing in different directions), we have to allow flexibility – and we have to consider all possible combinations of functionality. No doubt, there will be a minimum set of mandatory enhancements, but what they are, I can’t tell today.

Again, I am open to alternatives.

22. Roger want’s to fix the short term problems. I must admit, that some of that has not appealed to me (and others) with the proper level of challenge. But some of the “simple fixes” have turned out to not be that simple after all, or they could make it difficult to develop a much better long term solution (you don’t know until you have a picture of the long term solution), or they would only work in the US or one single country/language. For example, from my viewpoint, I see no very simple solution for sources/citations or Research Log. And, without looking at the other “simple things” that have been proposed, I am not sure they would have made a big difference – there would still be problems. But, one area where you could easily make a big difference would be in multimedia – creating a simple solution that it would not be a big problem if a more advanced solution came later.

Rumcd – it’s 2 am here now, will answer tomorrow.

ttwetmore 2011-06-30T17:25:03-07:00

rumcd,

Welcome. The LDS dropped any noticeable support for GEDCOM many years ago, but I don't think we can say what their position truly is. Non LDS-organizations have created their own semantic structures using the GEDCOM syntax, e.g., Event GEDCOM from CommSoft in 1994. You can think of GEDCOM syntax as analogous to XML and JSON in terms of expressive power.

In the genealogical world, the writings of Elizabeth Shown Mills are treated with great reverence, so much so that the term MARC may be unknown to many. But I would say there is agreement with you that if GEDCOM had more fields in the source record that agreed with standards that GEDCOM would properly handle citations.

One of the controversial approaches being advocated by some on Better GEDCOM is a redefinition of the citation, beyond that of MARC, or the Chicago Style manual, or Elizabeth Mills. The idea is to augment it with research notes, transcriptions of evidence, discussions of conclusions, textual paragraphs the researcher would like to see included in footnotes, basically anything the researcher thinks is relevant about the information being cited and would like to see written up in their research reports. The reason for this is that most genealogical software systems don't have records or fields set aside for this kind of information, and by placing this information in citation records, at least one popular program (TMG) can manage a good workaround. I think the proper solution to the problem is to first to decide if this kind of information has a place in a genealogical databases, and if it does, what is the proper model for handling it. It kind of boils down to the question of whether the purpose of a genealogical database is to manage your data or to write your reports. Personally I think it should do both, but this approach is not the best way to do it.

ttwetmore 2011-06-30T18:01:05-07:00

Love those numbers!!

2. I'll take your word for it that Gentech concepts have been implemented. I'm not aware of them. I dislike the Gentech model and it colors my thinking and words.

3. I won't argue more about short term results. Time will tell. I am pleased that the person/persona concept has taken hold with many Better GEDCOM folk.

4. Yes, proper documentation of the concepts is required. I don't think there is enough agreement on those concepts yet, and I'm getting pessimistic that there will be. I think the Better GEDCOM staff is too small and that some of our ideas are enough off the wall that we can never reach concensus.

8. AncestorSync will not be able to solve "you can't there from here" problems. Many transformations between genealogical representations are "lossy." The AncestorSync web page will not advertise this!

16. I take it a matter of course that programs must change. I've recently been considering how to modify my old LifeLines program to accommodate these two changes (event records and "recursive" person records), and I believe it would be straightforward. Easy to say, hard to prove!!

20. Yes.

21. Laudable goals, and well described.

22. I am not interested in simple fixes. I have nothing against them or the people who advocate them. I guess I should also say I don't belive they would prove to be worth it, and maybe that's why I not interested.

louiskessler 2011-06-30T20:46:32-07:00

There are 4 programs that I know of that say they use GenTech. I've never tried any of them so I can't comment on how well they did.

See:
http://www.gensoftreviews.com/index.php?s=gentech

louiskessler 2011-06-26T08:04:46-07:00

Adrian,

When you go to the BetterGEDCOM Goals page (top of the left menu), we have 5 different sets of goals, each so completely different from each other, its no wonder why we don't know what BG is aiming for.

1. "Introduction to Goal and Requirements" written by you in April. This is a nice set of general wants.

2. "BetterGEDCOM Requirements Catalog" has had a lot of work put into it, with a lot of details and specifics by you, Geir and Gene between February and now.

3. "BG Requirements Catalog Index" - the Requirements Catalog became so unweildy that Geir had to make up an index for it.

4. "The Original Goals Page" - started by Greg Lamberson in October and continued by many until about Februry - at which time the BetterGEDCOM left hand menus were restructured and most of the original discussions became hidden away and a new set was started.

5. Tom's Goal and Requirements. Tom added this in January with some of his higher-level requirements for BG.

So you're right. Someone coming to the BetterGEDCOM gets confused immediately by this and realizes that we don't have concrete goals.

FamilySearch has their new GEDCOM already. It is their internal database structure which they give programmers access to via their API. You are correct that they don't care and moreso want theirs to become the standard.

Would Ancestry care to get involved? If they decided to rewrite GEDCOM, theirs would become the new definition and everyone would be forced to follow. Fortunately (or unfortunately) they have chosen to basically ignore GEDCOM.

Then there's Bruce Buzbee and RootsMagic who's the next biggest player and has been mentioned as possibly attending a BetterGEDCOM meeting but never had. If he decided to issue a new standard, he might have enough clout that a large number of smaller developers would follow.

But otherwise, you are right. When you've got just me, Tom, Michael and who-else as small time developers, you don't have enough momemtum to make a difference.

Instead, what we are doing is continuing on an infinite discussion of all the different ways GEDCOM can be improved - the same discussion that has been going the GEDCOM-L mailing list for the last 20 years.

So Adrian, I'm happy to retweet your question: "What do we want to do? Can we do it? Is it worth doing? Why should anyone listen to us?"

Louis

GeneJ 2011-07-05T13:14:41-07:00

Case posted: Sheriff William Preston's identity crisis

Tom Jones' says genealogical questions (focused goals) are either identity or relationship oriented.

The subject case is an effort to present research our family has conducted to identify the parents of Sheriff William Preston, who, with his brother John, is considered the first settler at the fort in Defiance, Ohio, after the war of 1812.

Eleven articles, of a planned twelve that will make up the series, have been posted to my personal blog. The case begins at the link below.

http://bit.ly/l1ishE

The final article "Putting it all together" will cover my approach to correlating the various information in the "body of evidence."

Separate from the objective of the whole series, each of the articles focuses on at least source (some on many sources).

I had planned to hold off 10 days before posting the final article. I may still do that ... (unless I decide to post tomorrow in honor of Maj. P's birthday) --GJ

ttwetmore 2011-09-09T17:09:18-07:00

Better GEDCOM Future ?

Better GEDCOM wikispace activity has dropped off. Is interest waning? Are people consumed with end of summer chores?

louiskessler 2011-09-09T19:38:43-07:00

Tom,

There was discussion at the last Developer's meeting for BetterGEDCOM to possibly collaborate with SourceTemplates.org. See meeting notes: http://bettergedcom.wikispaces.com/DevelopersMeetingNotes29Aug2011

There was no meeting last week because of the long weekend, and there is a big convention going on that Myrtle is at.

I suspect everyone is waiting for this Monday's meeting.

Unfortunately I can't make it this Monday. My torn achilles tendon has healed enough that I'm back at work again.

Louis

I think everyone's awaiting the meeting on Monday. See agenda:

theKiwi 2011-09-12T10:07:14-07:00

Not exactly the right place for this, but anyone know where the "Chat" is - it used to be linked up from the Meetings ------> Organiser's Meeting page, but it's gone from there.

In any case, I'm "Waiting for an Organiser to arrive..." for the BetterGEDCOM meeting today 12 September. DearMRYTLE is I suspect on the road? She was at FGS and still in Springfield yesterday...

gthorud 2011-09-12T10:09:33-07:00

I am also waiting.

GeneJ 2011-09-12T10:13:37-07:00

Ditto.

Hi Kiwi. There is a note in the last Developer Meeting minutes about a change to how agenda-like items for the organizer meeting would be handled.
Myrt or Andy would probably be able to answer more general questions about the "chat" function.

P.S. I just love the culturally significant kilt, socks, red shirt.... !!

theKiwi 2011-09-12T10:23:54-07:00

Yeah the picture

http://www.facebook.com/photo.php?fbid=2175174371733

has my Scottish heritage but also my Canterbury/Christchurch Heritage with the red After Socks

http://www.facebook.com/photo.php?fbid=10150269531197820

Roger

DearMYRTLE 2011-09-17T07:28:47-07:00

Am looking forward to this coming Monday's meeting.

We will work to flesh out the SourceTemplates work group -- defining the tiers for managing the collaboration project with FS, LFT, RM, ASync and others.

I removed the "chat room" option as a cost-savings mechanism. From the logs, one appeared to be using it.

On a personal note, we should arrive home later today.

ttwetmore 2011-11-03T12:35:21-07:00

With all due respect I can no longer avoid the conclusion that Better GEDCOM has failed. In my opinion nothing significant has happened for months. There has been nary a comment about producing a model and exchange format during weekly meetings for many months. The only comments about models for months have been posts from me. In fact the weekly meetings seemingly avoid all technical issues and postpone all decisions. It seems that all members interested in and competent to work on models now eschew Better GEDCOM.

Do you not believe that Better GEDCOM should be a genealogical data model and exchange format that should encompass GEDCOM and the internal models used by the current generation of desktop and on-line genealogical systems? Is there any question about this? If you don't think this is what Better GEDCOM is, then, please, can you try to define what you think it is?

If Better GEDCOM is not a model and format for genealogical data and exchange, then, in my opinion, there is no substance to it. No beef.

If Better GEDCOM is a new format for genealogical data and data exchange, why are we spending month after month after month not working toward that goal? For goodness sake we don't even mention that goal! There is a BIG elephant in the Better GEDCOM living room!

Better GEDCOM is a little over a year old. One year ago we were further along with discussions of purpose, goals, requirements, original compilations of models, comments on those models, arguments over what should be included in the Better GEDCOM model, than we are today.

Am I just a neer'do'well seeing things in a negative light? Do others truly believe we are getting anywhere? Does anyone think anything is getting done? Can you put it into words?

Andy_Hatchett 2011-11-03T13:02:16-07:00

All I can comment on is the Meetings aspect of things. From my viewpoint it appears that the actual technical discussion stuff is postponed because most of those who actually attend the meetings aren't really qualified to discuss such items. I know I'm not!

Do we need to change meeting times so that more of the tech people can attend or have a separate tech meeting at a different time?

I'll be posting the survey results Sunday night. It is an interesting mix of comments.

Andy

WesleyJohnston 2011-12-01T00:21:25-08:00

I've noticed the discrepancy between the meeting title "Developers Meeting" and what is actually discussed. The reality of the meetings now seems to have more to do with the development of the organization -- an extremely important task of fundamental importance. This meeting probably needs to be renamed.

I'm not sure how to create a true Developers' Meeting, since there are so many issues -- and getting those of us with the technical experience to attend regularly is also a problem. But at some point, this has to happen. The discussions now occurring -- after a lull of several months with little discussion, as I have discovered -- are raising important issues, and there needs to be a way of wrestling the mass of text in the discussions into some current state of each topic in a way that has some hope of a result ever coming out of the process.

That has to be done in some way that converges on a result while still being open to some major overlooked perspective. I think there also needs to be a clearer vision of the long-term way in which we would like the future to look, without concern for current-day limitations, so that there is a true big picture that all agree with. From that context, we can then come back down to the reality of today's limitations and come to some agreement about how to create a BetterGEDCOM version 1 that works in today's world but does not lose sight of where we want to ultimately be.

Andy_Hatchett 2011-12-01T00:50:44-08:00

Wesley,

Now that we have taken care of securing a GoToMeeting account, a 'tech' meeting can be easily arranged.

If you and the other tech-types can agree on a time and date I can schedule such a meeting whenever you like.

GeneJ 2011-12-01T07:44:48-08:00

@Wesley,

You raise a host of good topics, many of which are shared by both technologist- and user-members.

The "Developers Meeting" has long been so named; the matters discussed depend on agenda topics proposed by members of the wiki--that process is mostly open.

Meetings focused on agenda topics that have not been presented and discussed on the wiki are problematic in an international setting. Even native English speakers should want a little time to review, absorb and confirm an understanding.

Your post gives rise to other topics that are important. Thank you for commenting.

Three cheers for Andy and his support! --GJ

Clarkegj 2011-09-26T07:26:45-07:00

Developing an Organization

I have added some helpful information at:

http://bettergedcom.wikispaces.com/Developing+the+Organization

GeneJ 2011-10-17T10:39:10-07:00

May we please have "Developing the Organization" added to the Home page and the wiki nav bar

Thank you.

Andy_Hatchett 2011-10-17T10:55:56-07:00

I second this idea. There may be those who read the Wiki that, while they may not feel qualified to speak to the technical aspects of BetterGEDCOM, could contribute a great deal toward the organizational effort.

hrworth 2011-10-17T12:38:50-07:00

Andy,

That is possible.

Russ

Andy_Hatchett 2011-10-17T19:03:53-07:00

Another though about this - and yes, I can hear the groans and moans already but...

Should the organizational aspects of BG be discussed in the Developers Meetings or should there be two working groups with each having their own meetings?

It is something we may want to look at if more people start taking an interest the organizational side of things

hrworth 2011-10-18T04:58:56-07:00

Andy,

OR re-activate the Organizers Meetings and have this as a task for that group and expand the membership of the Organizers.

Just a thought,

Russ

Andy_Hatchett 2011-10-30T01:12:49-07:00

Happy Birthday BetterGEDCOM!

As BetterGEDCOM celebrates its 1st anniversary the following might be of interest.

As of 30 Oct 2011 3:00 AM Central Daylight Time, the BedderGEDCOM Wiki has 123 members, 103 files, and 192 pages.

During the past year 47 (38.21%) members have made 3,302 posts to the wiki and 48 (39.02%) members have made 4,626 page edits.

Just thought you folks might like to know.

Andy

Andy_Hatchett 2011-11-02T23:21:51-07:00

Another interesting GEDCOM discussion on soc.genealogy.computing

and we are mentioned, a longish thread but worth reading.

http://groups.google.com/group/soc.genealogy.computing/browse_thread/thread/fa8dc544f22ea0bb/876e11b79b62a58a?hl=en&q=%22BetterGEDCOM%22&lnk=ol&

Andy_Hatchett 2011-11-09T10:17:35-08:00

Wiki Backup

A Backup copy of the Wiki is now available for those who may want it.

The path is:
Manage Wiki->Tools-> Export/Backup.

Andy_Hatchett 2011-11-30T11:42:56-08:00

Ancestry and GEDCOM output of Member Trees and FTM

This thread may be of interest. They have acknowledged a problem with the AMT GEDCOM output and are instituting a change to correct the problem.

http://boards.ancestry.com/topics.ancestry.membertree.membertrees/1186/mb.ashx

GeneJ 2011-11-30T11:49:57-08:00

Excellant research, presentations and all around work, Andy.

Great news; great job. --GJ

Andy3rd 2011-12-02T19:26:20-08:00

Randy Seaver, GEDCOM, RootsMagic5, FTM201

Interesting article in Randy's Follow Up Friday article he posted today.

www.geneamusings.com

GeneJ 2011-12-03T06:53:09-08:00

Great discussion there.

http://www.geneamusings.com/

Andy3rd 2011-12-11T18:42:46-08:00

Wiki Stats Update

As of 11 Dec 2011, 8:35 PM Central Standard Time:

Members: 130
Pages: 247
Posts: 4,114
Edits: 4,577

GeneJ 2011-12-16T10:28:57-08:00

Genabloggers has begun a Technology Meme

Geneabloggers has begun a technology meme, "ROOTSTECH: MY ROOTED TECHNOLOGY MEME"

Read about it on the Geneabloggers site (below). There are 27 challenges listed.
http://www.stumbleupon.com/su/19HhKi/www.geneabloggers.com/rootstech-my-rooted-technology-meme/

Caroline Pointer's meme tech article is already up:
http://www.4yourfamilystory.com/1/post/2011/12/my-rooted-technology.html

Andy_Hatchett 2011-12-17T17:37:18-08:00

Are we forgetting?

There are several discussions going on about sources, conclusions, evidence etc.

There seems to be one thing missing in all of this.

Just as the vendors, if they accept BetterGEDCOM, are going to have to change the way they do things; so are the endusers. The main change will be that they will enter stuff the way BG calls for it to be entered- thus no completely custom sources, citations, etc.

That is what a 'standard' is. To expect BG to handle data as presently input into *any* program is folly. Restraints *will* be placed on all parties that accept BG- that is part of the bargin. Those endusers who don't follow BG can't expect that all their data transfer without error- and BG shouldn't promise them that.

I'm not a technical person but I would offer this advice - if BG works get used to the idea that your old way of doing things simply will no longer apply.

ttwetmore 2011-12-17T21:51:12-08:00

@Andy,

I'm glad you have an easy to spell name.

I basically agree with you, that BG will force vendors to have to make changes. But I'm not sure that this will have as great an user impact as you may believe. If BG can be designed as a superset of the models used by current systems, while also being adequate for future, records-based systems, then the user interfaces of programs as they come to embrace BG, need not change all that much. That is, I think the fact that the programs can now export and import the richer BG model, need not have a great impact on the user interface.

Of course you are right that the user interfaces may have to be extended to display or capture additional information, but I still would maintain that this would not be so dramatic as to materially change the nature of the user interfaces. I could be wrong, now (credit to the Monk theme song).

GeneJ 2011-12-19T06:49:36-08:00

TMG v8

Good news for TMG users. It's here! TMG v8 has been released.

http://www.whollygenes.com/Merchant2/merchant.mvc?screen=TMG

In his release announcement to the TMG user mailing list, Bob Velke reported about some of the new features listed below: (quoting)
- a whole new Report Writer that supports 32- or 64-bit systems;
- a new report viewer with expanded features;
- the ability to send Pedigree charts to Word/RTF with an index;
- many other improved report formats;
- color-coded report output;
- many new default roles and sentences;
- the new ability to share your tag types, roles, and sentences;
- new "Add Multiple People" and "Add Family" screens;
- greatly-expanded web searching features;
- many new features to make data entry faster and more consistent;
- and much more!

theKiwi 2011-12-19T09:06:39-08:00

And Tamura has tweeted that it does NOT support UTF-8.

Really? How limiting!!

Andy_Hatchett 2011-12-19T09:11:44-08:00

That should not come as a surprise to anyone. It has been known for some time- like years- that it wouldn't; indeed couldn't, as it is still based on Foxpro.

GeneJ 2011-12-19T09:39:19-08:00

Hi Roger,

I've exchanged thoughts about TMG with Tamura. He does such a fine job of reviewing programs.

There are many fine programs on the market, and I think it's probably hard to fairly review TMG. It's pretty easy to brush over features that are favorites of TMG Users--event based, fully supported associates and witnesses, full service source system and extensive search/research features. Its narrative reporting formats are among the best in the industry. It hard not to mention John Cardinal's supporting programs--TMG Utility and Second Site.

The missing UTF-8? Well, the reason for that limitation is pretty well known. I'm just not sure how many users decide that limitation outweighs the benefit users seek from the other program features.

Good going to Bob Velke and his development team for gettin' v8 out! --GJ

Andy_Hatchett 2011-12-21T20:46:08-08:00

Wiki Changes

Wikispaces will soon be making changes. Read about it here:

http://blog.wikispaces.com/2011/12/changes-to-all-wikispaces-themes.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+wikispaces+%28Wikispaces+Blog%29

GeneJ 2012-01-10T02:33:32-08:00

Seems like many of these changes kicked in yesterday, Andy, about the time we finished talking about wiki organization.

theKiwi 2012-01-10T05:21:06-08:00

Well the one thing it seems to have screwed up royally is that the "Recent Changes" page, which is my normal entrance to the Wiki is now such that it needs a browser window nearly 1600px wide to see the whole thing - when I first arrived this morning I couldn't see the Nav Bar at all, nor the dates of the changes of posts because the lines showing the Post excerpt and the author were so wide and don't wrap. At least other pages wrap at a narrower width so that the Nav bar is always visible on the right now.

Not impressed so far!!

GeneJ 2012-01-10T05:34:45-08:00

Hi Roger,

Appears this right/left nav bar is a new "theme" that has been inserted by Wikispaces?

http://blog.wikispaces.com/2012/01/a-great-new-look-for-wikispaces.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+wikispaces+%28Wikispaces+Blog%29

When I visit my own personal wiki, that new "theme" is now the default. I was able to change it back to an alternative theme and restore the page presentation.

Gulp, anybody wanna risk the wiki to a test? Anybody remember the old theme? --GJ

theKiwi 2012-01-10T15:30:43-08:00

Well somebody figured it out :-)

TomAlciere 2011-12-28T09:36:19-08:00

Indexing published GEDCOM files

I don't know how many researchers have found connecting lines through my site at GedcomIndex.com

One task is resolving the place names which are typed in a variety of different ways and often spelled wrong.

Consider that the user might write the place of baptism or marriage as GRACE BAPTIST CHURCH,MERRIMACK,NEW HAMPSHIRE or another might use CONCORD,MERRIMACK,NEW HAMPSHIRE. In the first instance, MERRIMACK is the town name; in the second, they mean the city of Concord in Merrimack County.

Since the idea is to connect families, political changes should not change the indexing of an event. When Kanawha County, Virginia became Kanawha County, West Virginia, mostly the same families were present, so your ancestor born in the Virginia county in 1859 might be related to the other ancestor born in the West Virginia county in 1866.

Andy_Hatchett 2011-12-28T09:48:59-08:00

Hi Tom, Welcome to the Wiki!

Place names can only be resolved when full information is available- and that can only be provided by the person doing data entry.

In your example the person should have entered "Grace Baptist Church, Merrimack, Hillsborough County, New Hampshire" and "Concord, Merrimack County, New Hampshire" respectively.

As to county changes. etc- imho, those fall completely outside the scope of BetterGEDCOM. If a vendor wants to provide a Place Name Authority it is up to them.

AdrianB38 2011-12-28T14:47:01-08:00

Andy re "As to county changes. etc- imho, those fall completely outside the scope of BetterGEDCOM"

Except that _if_ we create the concept of a location-entity in BG, we can record relationships that document changes of county and allow the software to follow the stuff through. Of course, having the format defined would allow vendors to provide said Authority tables in BG format. So BG can facilitate the Virginia/West Virginia changes, not to mention the UK relationships between historical, administrative, ceremonial counties....

TomAlciere 2011-12-28T16:20:19-08:00

Not just that BUT: New Hampshire used to be part of Massachusetts in the 1600's, and the researcher might type in DOVER, MASSACHUSETTS for the place, and now there is a Town of Dover in Norfolk County, Massachusetts. Jefferson, Maine was incorporated by Massachusetts prior to Maine statehood, so from 1807 to 1820 the event would be classified as JEFFERSON, MASSACHUSETTS and now there's a section of Holden, Massachusetts known as Jefferson, Massachusetts. The question is whether your McClanahan from Jefferson, Massachusetts might be related to the other person's McClanahan from Jefferson, Massachusetts and if it's the same place, then it's a much stronger possibility.

TomAlciere 2011-12-28T09:47:52-08:00

crossing language barriers

The advantage of GEDCOM is that it crosses language barriers. In the olden days folks would write their genealogy in sentences in their native language and snail-mail it across the ocean. GEDCOM eliminates the difficulty in interpreting the family relationship information in those sentences whilst leaving the difficulty of interpreting the notes and sources.

Whilst we discuss any changes we offer to the developers of English-language software it is critical to maintain compatibility with developers abroad. There is a German-language list you can learn about at
http://list.genealogy.net/mm/listinfo/gedcom-l

Developers should announce their respective innovations and also produce software compatible with the innovations of developers abroad.

gthorud 2011-12-28T16:31:14-08:00

Tom,

Thanks for the link. I have subscribed to the list and will have a look at the archive.

It is absolutely not our intention to discuss the requirements of English-speaking users only, although you may perhaps get that impression from time to time. But, we have to choose one language for the discussion.....

TomAlciere 2011-12-29T06:24:05-08:00

Ooops. Actually, they are devoted to better implementation of the latest GEDCOM 5.5.1 standard, not to developing a new, better standard.

In either situation, the classification of a place will facilitate the identification of connecting lines.

Some locations may be known precisely enough for good cross-matching, but still defy categorization. If the user's informant knows only that the ancestor was born in Texarkana, and does not know whether in Arkansas or Texas, the most precise location available is USA unless a special locality is added. If your McClanahan ancestors lived in Texarkana, Arkansas, they may well be related to mine who lived in Texarkana, Texas. It's actually more precise than just MILLER COUNTY, ARKANSAS.

Another opportunity for improvement would be allow standardized entry of an event in which a death occurred, especially since so many people have no connection whatever to the locality of death. LOCKERBIE, DUMFRIESSHIRE, SCOTLAND comes to mind. The name of a shipwreck, plane crash or battle, in standardized form, would allow identification of a date and place even if the battle started in one month and ended in another. If your McClanahan ancestor died in the same shipwreck as my McClanahan ancestor, they may be related, but if they died in two separate shipwrecks, even in the same spot, the chance of a connection is reduced.

On 30 June 1971, three cosmonauts died in space when a valve remained open after undocking, allowing the oxygen to escape and exposing them to a vacuum.

TomAlciere 2011-12-28T18:03:59-08:00

Welcome to the BetterGEDCOM wiki

Comments