I imported a file that produced many Duplicate/Superseded errors. How can I fix this?

Duplicates and Supersededs

If you have imported an EDD and received a large number of Duplicated or Superseded records, you probably should undo the import file and learn more about how Duplicates and Superseded results are handled in Enviro Data before trying to import them again.

Undoing an import is simple and can be done by going to the Import Wizard form and clicking on Undo A Previous Import. This will display an list of EDDs imported where you can select the problem file. Before clicking the Undo Selected Import button be very careful that you select the correct file and that you have a copy of the file being deleted so you can re-import it later. Typically, all you need to do is to select a different choice from the Duplicates & Supersededs radio buttons for a new import.

An explanation of how Duplicates and Supersededs are handled follows.

Enviro Data allows duplicated data at both the sample and analysis levels. The Duplicates and Supersededs section of the Import Wizard - Data Checking Options form provides several options for handling of duplicated data. If you do not have the Duplicate and Superseded fields correctly populated in your import file, you MUST select one of the following options, which uses additional fields to determine duplicate samples.

Duplicates and Supersede Options

Options for Duplicates

Auto - This option looks for unique samples based on StationName, SampleDate_D (without time), SampleMatrixCode, SampleTop, DuplicateSample, SampleBottom, FilteredSampleCode and QCSampleCode or LabSampleID (if QCSampleCode is ‘Unknown’). If all of these are the same for the next sample, then the DuplicateSample field is incremented.

The recommended import format contains a QCSampleCode field. The import logic utilizes this field for ordering and assigning the DuplicateSample number. Enviro Data groups samples by the unique fields, orders the records based on the DuplicateOrder field in the QCCodes table, then increments the DuplicateSample number by comparing the QCSampleCode of each sample. These codes can be entered by the lab or by the data administrator before import. If the QCSampleCode is not known or is left blank, it defaults to “Unknown”. The code then compares the LabSampleID numbers of each sample to set the DuplicateSample number. 

Field ID - This option looks for unique samples based on StationName, SampleDate_D (with time), SampleMatrixCode, SampleTop, SampleBottom, DuplicateSample, FilteredSampleCode, QCSampleCode and FieldSampleID. If QCSampleCode is ‘Unknown’ then it just looks at FieldSampleID. If all of these are the same, the DuplicateSample field is incremented.

Alt ID - This option looks for unique samples based on StationName, SampleDate_D (with time), SampleMatrixCode, SampleTop, SampleBottom, DuplicateSample, FilteredSampleCode, QCSampleCode and AltSampleID. If QCSampleCode is ‘Unknown’ then it just looks at AltSampleID. If all of these are the same, the DuplicateSample field is incremented.

For these import options, correctly assigning QCSampleCode is crucial to identifying original vs. QC data as the QCSampleCode field is used to correctly identify original data vs. QC data.

Clients who are dealing with large amounts of QC and multiple samples per day may want to consider removing the Duplicate = 0 and Superseded = 0 defaults on the SELECT DATA screen. If you need to remove this, be aware that this will be changed back to the default with any new program updates. Call Geotech for more information about changing this program feature.

If you are importing data with QC sample information, you must populate the DuplicateOrder field in the QCCodes lookup table before importing. The Import Wizard uses this field to set your duplicate values based on the duplicate order. For example, if the duplicate order for your QC codes were Original = 1, Duplicate = 2, and Split = 3, then for samples with those codes you would end up with duplicate sample numbers of 0, 1, 2 respectively.

EDD - The next option uses the duplicate and superseded values as they are contained in the import file. Use this option if whoever is providing you with the data sets the values correctly in the EDD.

Lab ID - The final option uses the laboratory identification number from the import file to determine duplicates and superseded values.

Set Initial Dup/Sup to Zero - This option sets both the Duplicate and Superseded fields to zero prior to import.

Options for Superseded Analyses

Superseded values for are set using the unique key of SampleNumber, ParameterNumber, Superseded, AnalyticMethod, LeachMethodCode, ReportUnitsCode, Basis, FilteredAnalCode, QCAnalysisCode and LabSampleID. Unless the EDD option is checked, when a new analysis is imported matching this unique key is imported, the Superseded field for the original analyses is incremented.

#51