Software or script to list and delete based on header
Posted: July 19th, 2020, 20:27
Objective:
I need to scan and delete *.dat files that matches a defined criteria (or create a .txt list for later deletion) in a folder and all it's subfolders
But it's a lot of files (more than 5,000,000)
Sample folder structure:
c:\DATA\001\file001.dat"
c:\DATA\002\file005 this file name is long and have spaces.dat"
c:\DATA\004\file003.dat"
...
c:\DATA\500\file510.dat"
...
c:\DATA\800\file910.dat"
and so on ...
Criteria for files to be deleted
-- Files name is *.dat
-- All *.dat files inside a given folder and all it's subfolders that match below criteria
-- *.dat file header must be (grep / hex notation) \x4F\x4C\x44\x44\x41\x54\x41
Note1: by header I mean at file offset 0x00 (the very start of the file)
So if the file does not have \x4F\x4C\x44\x44\x41\x54\x41 at offset 0x00 but have it anywhere else it should also be deleted
Note2: scan needs to be done in grep / hexadecimal byte format, not text format
- Preferably a solution that works on windows like a batch file or a powershell script
- Can make use of non-native windows third party utilities - i.e cguwin32
If not possible on Windows, a linux solution would also be welcome
(maybe something like grep piped with hexdump and rm?)
Any suggestions?
I uploaded a sample dataset on this post so one can make tests on files with and without the specified creteria
File names in the dataset are self explanatory
I need to scan and delete *.dat files that matches a defined criteria (or create a .txt list for later deletion) in a folder and all it's subfolders
But it's a lot of files (more than 5,000,000)
Sample folder structure:
c:\DATA\001\file001.dat"
c:\DATA\002\file005 this file name is long and have spaces.dat"
c:\DATA\004\file003.dat"
...
c:\DATA\500\file510.dat"
...
c:\DATA\800\file910.dat"
and so on ...
Criteria for files to be deleted
-- Files name is *.dat
-- All *.dat files inside a given folder and all it's subfolders that match below criteria
-- *.dat file header must be (grep / hex notation) \x4F\x4C\x44\x44\x41\x54\x41
Note1: by header I mean at file offset 0x00 (the very start of the file)
So if the file does not have \x4F\x4C\x44\x44\x41\x54\x41 at offset 0x00 but have it anywhere else it should also be deleted
Note2: scan needs to be done in grep / hexadecimal byte format, not text format
- Preferably a solution that works on windows like a batch file or a powershell script
- Can make use of non-native windows third party utilities - i.e cguwin32
If not possible on Windows, a linux solution would also be welcome
(maybe something like grep piped with hexdump and rm?)
Any suggestions?
I uploaded a sample dataset on this post so one can make tests on files with and without the specified creteria
File names in the dataset are self explanatory