checking se and lfc consistency

Dealing with inconsistency between the LFC and DPM

The biomed VO team made accessible some useful tools to enforce consistency of Storage Elements (SE) and Logical File Catalog (LFC) available on github.

In the neugrid4you project we have a lot of big dataset made available to our user community, but some of the datasets are very big, (ADNI is something like 30 GiB) and we have to copy and register them in the grid on multiple SE and register in our LFC. In order to ensure that there isn’t any inconsistency (i.e. in order to cl ean all the inconsistencies ;) )

Retrieving inconsistencies

The diff-se-dump-lfc.sh script is used to find inconsistencies between a dump of a DPM and of the LFC.

dump-se-files.py \
  --url srm://<dpm_server_name>/dpm/<dpm_domain_name>/home/<vo_name> \
  --output-file <dpm_server_name>-dpm-dump.txt
export LFC_HOST=<lfc_host>
LFCBrowseSE <dpm_server_name> --vo <vo_name> --sfn > <lfc_host>-<vo_name>-lfc-dump.txt
diff-se-dump-lfc.sh --older-than 1 \
  --se <dpm_server_name> \
  --se-dump <dpm_server_name>-dpm-dump.txt \
  --lfc-dump <lfc_host>-<vo_name>-lfc-dump.txt

The script will report zombie files, ghost entries and files found both in the SE and the LFC.

Zombie files are files that exist in the SE but are no more referenced in the LFC.

Ghost entries are entries in the LFC that have no corresponding files on the SE.

Cleaning ghost entries

There is no automatic way of cleaning ghost entries, and in order to retrieve the corresponding LFN it is required to directly access the LFC database.

SELECT Cns_file_metadata.fileid,guid,name
  FROM Cns_file_metadata
  INNER JOIN Cns_file_replica
  ON Cns_file_metadata.fileid=Cns_file_replica.fileid
  WHERE sfn='srm: //<dpm_server_name>:8446/dpm/<dpm_domain_name>/home/<vo_name>/generated/2014-02-10/file-121aa7e8-a9ec-4401-84f1-24341a74433c';

Script for retrieving the guid of a sfn:

#!/bin/sh

set -e

SFN="$1"
DB_NAME=cns_db
QUERY="SELECT guid FROM Cns_file_metadata INNER
JOIN Cns_file_replica
ON Cns_file_metadata.fileid=Cns_file_replica.fileid
WHERE sfn='$SFN'"

GUID=$(mysql --batch --silent "$DB_NAME" -e "$QUERY")

echo "guid:$GUID"

exit 0
for sfn in $(cat <dpm_server_name>.output_lfc_lost_files); do
  ./get-lfn-for-ghostfile.sh $sfn
done > guid-list.txt
for guid in $(cat guid-list.txt); do
  lcg-la $guid
done

Once the list of lfn has been retrieved it is possible to remove the wrong entries.

% lcg-lr lfn:/grid/<vo_name>/data/ADNI/IMAGES/128_S_4607/ADNI2/nG+ADNI2+128_S_4607+20121109+0847+S174741+3T0+T2ST+ORIG+V01.tar.bz2
srm://<dpm_server_name>/dpm/<dpm_domain_name>/home/<vo_name>/generated/2013-09-20/file7fa6f030-f029-418c-a60c-5a8d04253a68
srm://<dpm2_server_name>/dpm/<dpm2_domain_name>/home/<vo_name>/generated/2013-09-20/file2e44af61-a0e0-4868-af30-d08d9e3a7a69

% lcg-del --force srm://<dpm_server_name>/dpm/<dpm_domain_name>/home/<vo_name>/generated/2013-09-20/file7fa6f030-f029-418c-a60c-5a8d04253a68

% lcg-lr lfn:/grid/<vo_name>/data/ADNI/IMAGES/128_S_4607/ADNI2/nG+ADNI2+128_S_4607+20121109+0847+S174741+3T0+T2ST+ORIG+V01.tar.bz2
srm://<dpm2_server_name>/dpm/<dpm2_domain_name>/home/<vo_name>/generated/2013-09-20/file2e44af61-a0e0-4868-af30-d08d9e3a7a69

% lcg-rep -d <dpm_server_name>
lfn:/grid/<vo_name>/data/ADNI/IMAGES/128_S_4607/ADNI2/nG+ADNI2+128_S_4607+20121109+0847+S174741+3T0+T2ST+ORIG+V01.tar.bz2

% lcg-lr
% lfn:/grid/<vo_name>/data/ADNI/IMAGES/128_S_4607/ADNI2/nG+ADNI2+128_S_4607+20121109+0847+S174741+3T0+T2ST+ORIG+V01.tar.bz2
srm://<dpm_server_name>/dpm/<dpm_domain_name>/home/<vo_name>/generated/2014-02-12/filecb922278-02c3-4642-b085-0f3695c9aaee
srm://<dpm2_server_name>/dpm/<dpm2_domain_name>/home/<vo_name>/generated/2013-09-20/file2e44af61-a0e0-4868-af30-d08d9e3a7a69

grid

324 Words

2014-02-11 00:00 +0000