Dealing with Duplicate Records
Here’s help for a problem that plagues us all.
Dealing with Duplicate Records
Published on October 17, 2013 by Anders Monsen
Image courtesy of Victor Habbick / FreeDigitalPhotos.net
A common issue in all databases is dealing with duplicate records. In FileMaker I tend to use two different methods to identify duplicate records. Each serve their own purpose. One method uses a self-join relationship with a calculation that marks the duplicate records, while the other method uses a script that sort and loops through records.
On the surface, both methods are fairly simple, such that when memorized will come to your aid time and time again. If you maintain a database template or a technique file, you can set up both methods and refer to these as needed.
Self Join
The first method, removing duplicates via a self-join, is drawn directly from FileMaker’s Help manual. It is a technique that has existed for several years, simple and elegant. This method requires two additional fields in your table (or three, if you do not yet have a field with your “unique criteria”). Once you determine which field makes your record unique (which may require a calculation; the example uses name + phone number), you create a self-join relationship. You do this by creating a new table occurrence. I usually call this “MyTableName__Self” (with MyTableName switched to the actual table with duplicate records, of course). Make the relationship key the UniqueField = UniqueField.
With the self join in place, create your two new fields, the Counter and the DuplicateFlag (or whatever you want to call them). Set aside the counter field for now. Make your DuplicateFlag field a calculation. This is a simple IF statement:
If( MyTable::Counter = MyTable__Self::Counter ; “Unique” ; “Duplicate” )
Make the Counter “Auto-Enter Serial”. Then, exit the Manage Database window, and in your layout select the Counter field. With all fields showing, add a serial key by clicking in the field and selecting Records>>Replace Field Contents from your menu. Select Replace with Serial Numbers. All fields should populate, and the DuplicateFlag immediately will update. You then can search for any duplicates. Records update live, so to find new duplicates just run a search.
Looping
The second method loops through the found set and checks each record against the previous record. There are more steps involved, but the benefit is that you control what you want to check. This is useful for reports or displaying certain data to a user by finding or omitting records as needed. This method also does not need any schema changes, like the self-join.
via Dealing with Duplicate Records.