Checking SE and LFC consistency

Dealing with inconsistency between the LFC and DPM

The biomed VO team made accesible some useful tools to enforce
consistency of Storage Elements (SE) and Logical File CAtalog (LFC)
available on github.

In the neugrid4you project we have a lot of
big dataset made available to our
user community, but some of the datasets are very big,
(ADNI is
something like 30 GiB) and we have to copy and register them in the grid
on multiple SE and register in our LFC.
In order to ensure that there isn’t any inconsistency (i.e. in order to
cl ean all the inconsistencies ;) )

Retrieving inconsistencies

The diff-se-dump-lfc.sh script is used to find inconsistencies between
a dump of a DPM and of the LFC.

Dumping the DPNS VO-specific folder
1
2
3
dump-se-files.py \
--url srm://<dpm_server_name>/dpm/<dpm_domain_name>/home/<vo_name> \
--output-file <dpm_server_name>-dpm-dump.txt
Dumping the LFC VO-specific folder
1
2
export LFC_HOST=<lfc_host>
LFCBrowseSE <dpm_server_name> --vo <vo_name> --sfn > <lfc_host>-<vo_name>-lfc-dump.txt
Searching for inconsistencies for files older than 1 month
1
2
3
4
diff-se-dump-lfc.sh --older-than 1 \
--se <dpm_server_name> \
--se-dump <dpm_server_name>-dpm-dump.txt \
--lfc-dump <lfc_host>-<vo_name>-lfc-dump.txt

The script will report zombie files, ghost entries and files found
both in the SE and the LFC.

Zombie files are files that exist in the SE but are no more referenced
in the LFC.

Ghost entries are entries in the LFC that have no corresponding files on
the SE.

Cleaning ghost entries

There is no automatic way of cleaning ghost entries, and in order to
retrieve the corresponding LFN it is required to directly access the LFC
database.

1
2
3
4
5
SELECT Cns_file_metadata.fileid,guid,name
FROM Cns_file_metadata
INNER JOIN Cns_file_replica
ON Cns_file_metadata.fileid=Cns_file_replica.fileid
WHERE sfn='srm: //<dpm_server_name>:8446/dpm/<dpm_domain_name>/home/<vo_name>/generated/2014-02-10/file-121aa7e8-a9ec-4401-84f1-24341a74433c';
Script for retrieving the guid of a sfn
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/bin/sh
set -e
SFN="$1"
DB_NAME=cns_db
QUERY="SELECT guid FROM Cns_file_metadata INNER
JOIN Cns_file_replica
ON Cns_file_metadata.fileid=Cns_file_replica.fileid
WHERE sfn='$SFN'"
GUID=$(mysql --batch --silent "$DB_NAME" -e "$QUERY")
echo "guid:$GUID"
exit 0
1
2
3
for sfn in $(cat <dpm_server_name>.output_lfc_lost_files); do
./get-lfn-for-ghostfile.sh $sfn
done > guid-list.txt
1
2
3
for guid in $(cat guid-list.txt); do
lcg-la $guid
done

Once the list of lfn has been retrieved it is possible to remove the
wrong entries.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
% lcg-lr lfn:/grid/<vo_name>/data/ADNI/IMAGES/128_S_4607/ADNI2/nG+ADNI2+128_S_4607+20121109+0847+S174741+3T0+T2ST+ORIG+V01.tar.bz2
srm://<dpm_server_name>/dpm/<dpm_domain_name>/home/<vo_name>/generated/2013-09-20/file7fa6f030-f029-418c-a60c-5a8d04253a68
srm://<dpm2_server_name>/dpm/<dpm2_domain_name>/home/<vo_name>/generated/2013-09-20/file2e44af61-a0e0-4868-af30-d08d9e3a7a69
% lcg-del --force srm://<dpm_server_name>/dpm/<dpm_domain_name>/home/<vo_name>/generated/2013-09-20/file7fa6f030-f029-418c-a60c-5a8d04253a68
% lcg-lr lfn:/grid/<vo_name>/data/ADNI/IMAGES/128_S_4607/ADNI2/nG+ADNI2+128_S_4607+20121109+0847+S174741+3T0+T2ST+ORIG+V01.tar.bz2
srm://<dpm2_server_name>/dpm/<dpm2_domain_name>/home/<vo_name>/generated/2013-09-20/file2e44af61-a0e0-4868-af30-d08d9e3a7a69
% lcg-rep -d <dpm_server_name>
lfn:/grid/<vo_name>/data/ADNI/IMAGES/128_S_4607/ADNI2/nG+ADNI2+128_S_4607+20121109+0847+S174741+3T0+T2ST+ORIG+V01.tar.bz2
% lcg-lr
% lfn:/grid/<vo_name>/data/ADNI/IMAGES/128_S_4607/ADNI2/nG+ADNI2+128_S_4607+20121109+0847+S174741+3T0+T2ST+ORIG+V01.tar.bz2
srm://<dpm_server_name>/dpm/<dpm_domain_name>/home/<vo_name>/generated/2014-02-12/filecb922278-02c3-4642-b085-0f3695c9aaee
srm://<dpm2_server_name>/dpm/<dpm2_domain_name>/home/<vo_name>/generated/2013-09-20/file2e44af61-a0e0-4868-af30-d08d9e3a7a69