GEDCOM 101
  

 

GEDCOM 101: EVERYTHING YOU WANTED TO KNOW ABOUT GEDCOM
by Lance J. Jacob

Welcome to GCU, Genealogical Computing University. I'm Lance Jacob, your instructor for this course. If you haven't signed up for GEDCOM 101, you are in the wrong class and are welcome to leave at this time.

This class is one of a three-part series to be taught over the next three quarters. We at this non-accredited, fictitious institute hope that you will go on to finish the complete series of classes. Let's begin.

GEDCOM

GEDCOM stands for GEnealogical Data COMmunications. It is a way of formatting information to use with different genealogy programs. Normally, these programs would not be able to understand this information. Some of you may be familiar with the comparison to foreign languages that I have used in the past. If so, bear with us.

Suppose three individuals had just met and could not understand each other. The first individual could only understand English, the second French, and the third German. In order for our three friends to understand each other, they have three options.

The first option is for two of the individuals to learn the language of the third. For example, the Englishman could learn German, and the Frenchman could also learn German. This may seem like an easy solution--until the two individuals discover that there are no equivalent words in the third language to translate some of their specific words. I am not aware of any language that contains an equivalent word or phrase for every word and phrase that all other languages have to offer.

The second option is for each individual to learn the languages of both of the other two individuals. In other words, the two individuals would learn two languages. This may solve a little more of the communication problems by allowing them all to speak in an agreed-upon language at any one time. But it can be cumbersome as well.

The third option is for all three to learn and communicate using a new secondary language developed specifically to allow translation of any word or phrase to and from any of the three languages. In this situation, the three could meet an additional friend who only speaks Italian. The Italian would only need to learn the new language to communicate with the other three as well. If he uses a common word or phrase in Italian for which there is no equivalent in the new language, they could update the new language by making up a new equivalent word or phrase. An example of this took place in the gold mines of South Africa. Over the past twenty to thirty years, a language called Fanagalo (pronounced fun-a-ga-lo) was developed at the mines to allow all of the workers from many language backgrounds to communicate. The workers represented English, Afrikaans, various other European languages, and many Black tribal languages (Zulu, Xhosa, etc.). As a new worker came to the mines to work, he or she only had to learn Fanagalo to speak with the many people. It was much better than learning several different languages.

GEDCOM is like the third communication option. There are many genealogical software packages that have their own individual ways of storing information. One may store dates and places for only births, marriages, and deaths. Another program may store the same information, but may allow place names to include twenty more characters than the first program. Still another program may allow the storage of christening, burial, occupation, and emigration information, etc. It always seems that you and your newly discovered second cousin are using different programs. In order for the two of you to share information with computers, a way is needed to convert the information created by one program into information that the other program can understand.

Instead of writing conversion utilities, each genealogy program developer needs only to write a utility program to convert its data to and from one standard data format known as GEDCOM. In order for ten different programs to exchange information, each only needs two programs: one to create GEDCOM files from its normal information files, and one to import information from a GEDCOM file into it.

A program can only receive information from a GEDCOM file that it is programmed to use. Most kinds of information used by all programs can be stored in a GEDCOM file. If your program has no place to put a specific kind of information, your program will not be able to use it, even if it's in the GEDCOM file. For example, if your uncle sends you a GEDCOM file that includes occupations for the different individuals, and your program doesn't store occupations, your program can't use the occupation information. In this case, you would need to do without or choose to use another program that stores occupations.

History

The GEDCOM standard has been developed over time by the Family History Department of The Church of Jesus Christ of Latter-day Saints with input from developers of other genealogy programs. An informal meeting of several developers was held during the 1985 annual National Genealogical Society (NGS) Conference of the States in Salt Lake City. During this meeting the concept of GEDCOM was discussed. It was received enthusiastically by those in attendance. Since then, similar meetings have been held at subsequent annual NGS conferences. In addition to meeting in person on an annual basis, discussions have evolved on public information services such as CompuServe and, more recently, the Internet. Such communication has allowed developers to help each other understand GEDCOM.

GEDCOM has not been accepted by all genealogy program developers. The Family History Department maintains a list of registered programs that are GEDCOM-compatible. This is the best source to determine which programs are compatible with GEDCOM. One can be requested by contacting the FamilySearch Support Unit at (801) 240-2584. In addition to the above-mentioned list, the annual Software Directory that is published in GC contains programs that have and have not been adapted to use GEDCOM.

The first genealogy program to incorporate an experimental version of a GEDCOM utility program was version 2.0 of Personal Ancestral File, which was released in April 1986. Since that time, the standard has been greatly expanded to fit the needs of more genealogy programs. Today's version (5.3) is compatible with the original version.

Basics

GEDCOM data is stored in what is referred to as an "ANSEL" text file. An ANSEL text file is like an ASCII text file, but it can include international diacritical marks and special characters. In other words, it can include non-English words that include letters such as (ä). A regular ASCII text file cannot include such characters. A GEDCOM file can be viewed with a regular text editor; however, I don't recommend doing this until you know what you are doing.

If you were to look at the contents of a GEDCOM file, you will notice that it is nothing but a lot of lines of information that are listed in what I would call a "hierarchal order."

The first record is the header record, which usually provides information on the name of the file, the source of the file, and the destination of the file. The second record in the illustration is an optional submitter record that includes information regarding who created the GEDCOM file. Following the submitter record are three individual records containing information about three separate individuals. Next is a family record that gives details pertaining to which individuals are in the family, and the husband and wife's marriage date and place. An essential component is the very last line, which is known as the "trailer" record. It indicates the end of the file.

Each line in a GEDCOM file begins with a level number. Lines that begin with 0 are the start of new records. Immediately following the level number on each line is a word known as a GEDCOM tag. Level 0 lines are the only exception to this. If a record number is associated with the record, it will precede the tag on that line. Each tag indicates what kind of data follows.

A line of information corresponds to the closest line above it with the next smaller level number. For example, if a line begins with a level number of 4, it pertains to the closest line above it that begins with a 3. To illustrate this, suppose that you saw the following in a GEDCOM file:

  1 BIRTH
  2 DATE 21 Mar 1953
  2 PLAC Salt Lake City, Salt Lake, Utah
  3 SOUR Personal knowledge

Both the DATE and PLAC tags are prefaced with a level number of 2. They both refer to the line that begins with the tag BIRTH because its level number is 1 and it is the closest line to the DATE and PLAC lines. You've guessed it! The date and place refer to the birth. The level 3 source tag refers to the place of birth.

Another critical part of GEDCOM files are record numbers. A record number is always enclosed within two "@" signs. If a record number indicates the number of the current record, it will appear on the level 0 line immediately following the level number as mentioned above.

If a record number refers to or "points" to another record, it will follow a GEDCOM tag. The first individual record has a number of I1 and appears as "I1." The last line of that record reads "1 FAMS @F1@." The FAMS tag means "Family as a spouse." The line is interpreted to mean that the family who includes this individual as a spouse has a number of F1. Look at the family record. Its number is F1. If you look at the next line it reads "1 HUSB @I1@." I'm sure you've caught on. It means that the husband (HUSB) of the family is the individual with the record number of I1.

Look at the third individual, who has a record number of I3. The last line of the record reads "1 FAMC @F1@." The tag FAMC means "Family as a child." Your job is to interpret the whole line.

Using the pointer record numbers, you can read a printout of a GEDCOM file and reconstruct family names, dates, and places.

Many people who are beginning to use their GEDCOM utility programs are curious as to whether the information they put in the GEDCOM file is really there. There are a few ways to tell this. The safest way is to use your GEDCOM utility program to copy the contents of a GEDCOM file (you created) to a new file. You can then use your main genealogy program to inspect the contents of the new file(s) or database.

It is important to create a new file instead of copying the contents of the GEDCOM file onto the end of your normal data file(s). If you do this, you will end up with two of every individual in your file(s).

Another way to inspect a GEDCOM file is to use a text editor. You need to be cautious, however. If you want to use a word processor, you need to know the difference between a word processing file and a text file. A word processing file includes special control characters that tell the word processor to do things such as print in bold letters using a specific font. Text files do not include such control characters. Control characters will cause havoc with a GEDCOM file. Putting them in a GEDCOM file is like putting water in the gas tank of your car.

If you use a word processor, you will want one that has a special process for retrieving and saving text files. For example, WordPerfect has a "Text In" option [Ctrl] + [F5] which can retrieve a GEDCOM file to view on your monitor. When you exit, you will want to exit without saving it. If you do a normal retrieve [Shift] + [F10], WordPerfect will insert control characters into the GEDCOM file as it retrieves it. And if you accidentally save it when you exit, the GEDCOM file will "have water in its gas tank."

In Conclusion

Use GEDCOM if you need to exchange information with someone who uses a different program, or if you want to import information into your program from the popular FamilySearch®.

 
 
Copyright The Chase Data Group 1998-2004 Last updated: January 25, 2004 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-.