All times are UTC - 5 hours [ DST ]




Post new topic Reply to topic  [ 20 posts ] 
Author Message
 Post subject: Software or script to list and delete based on header
PostPosted: July 19th, 2020, 20:27 
Offline

Joined: May 2nd, 2009, 0:17
Posts: 80
Location: Brazil
Objective:
I need to scan and delete *.dat files that matches a defined criteria (or create a .txt list for later deletion) in a folder and all it's subfolders

But it's a lot of files (more than 5,000,000)

Sample folder structure:
c:\DATA\001\file001.dat"
c:\DATA\002\file005 this file name is long and have spaces.dat"
c:\DATA\004\file003.dat"
...
c:\DATA\500\file510.dat"
...
c:\DATA\800\file910.dat"

and so on ...

Criteria for files to be deleted
-- Files name is *.dat
-- All *.dat files inside a given folder and all it's subfolders that match below criteria
-- *.dat file header must be (grep / hex notation) \x4F\x4C\x44\x44\x41\x54\x41

Note1: by header I mean at file offset 0x00 (the very start of the file)
So if the file does not have \x4F\x4C\x44\x44\x41\x54\x41 at offset 0x00 but have it anywhere else it should also be deleted

Note2: scan needs to be done in grep / hexadecimal byte format, not text format

- Preferably a solution that works on windows like a batch file or a powershell script
- Can make use of non-native windows third party utilities - i.e cguwin32

If not possible on Windows, a linux solution would also be welcome
(maybe something like grep piped with hexdump and rm?)

Any suggestions?

I uploaded a sample dataset on this post so one can make tests on files with and without the specified creteria
File names in the dataset are self explanatory


Attachments:
File comment: Sample dataset for testing a working solution
dataset for testing.zip [3.89 KiB]
Downloaded 139 times
Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: July 20th, 2020, 0:13 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 12666
Location: Australia
I could write a program to do it, but it would require at least 10msec access time for each file. So that's 50,000 seconds in total, which is about 14 hours just to read all the files. Then there's the computation time and file deletion time on top of that.

In fact I have already written a program which I could adapt for your purposes, but I would hope that someone may have a readymade solution:
http://www.hddoracle.com/viewtopic.php?f=22&t=2918

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: July 20th, 2020, 2:06 
Offline

Joined: May 2nd, 2009, 0:17
Posts: 80
Location: Brazil
Hi and thank you for your response and for sharing that link
I will make some tries with both software over there

14 hours would not be that big of a problem, but offcourse if time could be reduced it would be great.

Do you think your software could be altered to simultaneously check for 2 hex sequences on each file, but use an inverse lookup? (NOT have XXXXXXXX hex) ?

The idea would be a simultaneous criteria search, for example:

Delete files that:

1. do NOT start with 44415441

and

2. do NOT end with FFFFFFFF <----- here "end" = the very last 4 bytes

If it fails any of the criteria above, the file is NOT do be deleted

p.s.
if you think you need to set a price to make a customized software we can discuss in PM


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: July 20th, 2020, 15:34 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 12666
Location: Australia
Your requirements seem trivial. I don't want any money for my program, just give me some time to write it. I think it's best if the program produces a list of files to be deleted rather than actually deleting them.

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: July 21st, 2020, 12:50 
Offline

Joined: May 2nd, 2009, 0:17
Posts: 80
Location: Brazil
fzabkar wrote:
I think it's best if the program produces a list of files to be deleted rather than actually deleting them.

Totally agree with you about the file listing first!

Below some extra suggestions to make the software even more usefull and flexible

#1. Besides from creating a list of files, also have an option to actually delete the files

-BR- wrote:
do NOT end with FFFFFFFF <----- here "end" = the very last 4 bytes

#2. Having an option that the end of file search is not performed only at the very end of the file.
For example, having an option to look for the last XX MB, as some files add aditional data to the very end like some extra "00"s

#3. Having an option to change the required search parameters
- change the extension being searched
- change the hex code being searched at the top part (
- change the hex code being searched at the bottom

Suggestion #3 above would be interesting so that the software can be easily adapted to other file types

If you need any help with testing, please let me know


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: July 21st, 2020, 16:49 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 12666
Location: Australia
I have a preliminary version that incorporates your earlier requirements. I'll try to add your latest requirements soon.

I fear that adding too many "features" will slow down the program, but I'll leave that up to you.

BTW, it's a Win32 program that runs from the command line in a CMD window. It could be recompiled for Win64, but I would have no way of testing it.


Attachments:
srchhdft.7z [22.55 KiB]
Downloaded 135 times

_________________
A backup a day keeps DR away.
Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: July 21st, 2020, 17:31 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 12666
Location: Australia
I propose a new command line as follows:

SRCHHDFT inf=file_list outf=bat_file_for_deletion [-][h(ex)/t(ext)]hdr=header_data [AND/OR] [-][h(ex)/t(ext)]ftr=footer_data

Searching for a footer at any place other than the end of the file will make the program run very slowly

BTW. I recommend the following command line for generating a file list:

dir pathname\*.ext /b /s > filelist.txt


Just one question. Can you not specify a custom header and footer search in your data recovery tool?

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: July 21st, 2020, 18:50 
Offline

Joined: May 2nd, 2009, 0:17
Posts: 80
Location: Brazil
fzabkar wrote:
Just one question. Can you not specify a custom header and footer search in your data recovery tool?

Not in any that I know of, specially if it´s to consider both footer and header at the same time
Do you know one that can?


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: July 21st, 2020, 18:54 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 12666
Location: Australia
-BR- wrote:
fzabkar wrote:
Just one question. Can you not specify a custom header and footer search in your data recovery tool?

Not in any that I know of, specially if it´s to consider both footer and header at the same time
Do you know one that can?

No, but I haven't looked for one.

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: July 21st, 2020, 22:25 
Offline

Joined: May 2nd, 2009, 0:17
Posts: 80
Location: Brazil
fzabkar wrote:
I have a preliminary version that incorporates your earlier requirements. I'll try to add your latest requirements soon.

Worked very well, congratulations !!! :D :D :D

If you would allow me (and like the challenge 8) ), some feature suggestions and a question:

1. Do you think you could implement footer and header hex to allow an OR operator ?
Logical example: if footer = AA or BB or CC
---> in this case would match file if any of the 3 criterias is found
---> or if inverse is set, than would match a file if neither of the 3 criterias is found

2. About the inverse option
Does it refers to both header and footer at the same time?
Would it be possible to enable it for footer or header individually? (one separatedly from the other)


3. About footer searching not at the very end
fzabkar wrote:
Searching for a footer at any place other than the end of the file will make the program run very slowly

I understand it would make the program run slower, but some times it would be needed.

Some data files will have the footer at the very end, but some other files might have like a few extra bytes after the real footer, for example:
<file-header><file-data><file-footer><00000000> or some other data here at the very end

In my experience I believe this is restricted to at most 4096 bytes in most files, generally less than 512 bytes but actually real cases are less than 32 or 16 bytes

Having a footer lenght option would be very usefull

For example ftrlen=4096 option would instruct it to search for footer from the very end until 4096 bytes before it
Would it be possible that you compile a version that have this feature?
If it would decrease speed even if the option is not used, maybe compile a separate executable (1 with 1 without)

Thanks again you have excelent coding skills !!!


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: July 21st, 2020, 22:52 
Offline

Joined: May 2nd, 2009, 0:17
Posts: 80
Location: Brazil
-BR- wrote:
1. Do you think you could implement footer and header hex to allow an OR operator ?
Logical example: if footer = AA or BB or CC
---> in this case would match file if any of the 3 criterias is found
---> or if inverse is set, than would match a file if neither of the 3 criterias is found

Here maybe something like

hhdr=AA|BB|CC

where "|" stands for "OR"

And to avoid any misunderstanding, the above example is showing a header search for just 1 byte (AA or BB or CC)
If the search would be for 3 bytes, it would be like:

hhdr=AAAAAA|BBBBBB|CCCCCC


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: July 30th, 2020, 15:32 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 12666
Location: Australia
This version allows for several headers and footers (as discussed via PM). I haven't tested it exhaustively. I'll try to add the 4KB feature on my next attempt.

Code:
Usage:  SRCHHDFT inf=file_list outf=log/bat_file delete=[now/later] [+/-]hdr=data1/data2/dataN [AND/OR] [+/-]ftr=data1/data2/dataN

This program searches for all files in the file list with matching header and footer.

If "delete=now" is specified, then the matching files are deleted and the results logged to "outf".
If "delete=later" is specified, then the del commands are written to the BATch file specified by "outf".

The header/footer can be any of data1, data2, ... dataN (hexadecimal only).

Both header and footer are optional.

If "+hdr" or "+ftr" is specified, then the header/footer must match one of data1, data2, ... dataN.
If "-hdr" or "-ftr" is specified, then the header/footer must not match any of data1, data2, ... dataN.

Hexadecimal strings must have an even number of characters.
Use a leading 0 if necessary, eg 0ABC.

Examples:

SRCHHDFT inf=DATlist.txt outf=delold.log delete=now hdr=41424344 AND -ftr=ff0123FF
SRCHHDFT inf=DATlist.txt outf=delold.bat delete=later -hdr=012345 OR ftr=ffFFFF/414E44
SRCHHDFT inf="list of files to search.txt" outf="delete bad files.bat" delete=later hdr=424144

A file count is displayed after every 1000 files have been processed.

Type F to print the current file count.
Type Q to save results and quit program.


Attachments:
srchhdft_5.7z [25.94 KiB]
Downloaded 124 times

_________________
A backup a day keeps DR away.
Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: August 3rd, 2020, 16:23 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 12666
Location: Australia
I have fixed a DELETE=NOW bug (I was trying to delete open files) and a keypress bug. I have also added the ftrlen feature.

However, I think I may still need to improve the error checking. For example, the program might do stupid things if you specify a ftrlen such as 1 instead of 4096.

http://www.users.on.net/~fzabkar/temp/srchhdft_7.exe
http://www.users.on.net/~fzabkar/temp/srchhdft_7.bas

Code:
Usage:

  SRCHHDFT inf=file_list outf=log/bat_file delete=[now/later] ftrlen=size_of_footer_block [+/-]hdr=data1/data2/dataN [AND/OR] [+/-]ftr=data1/data2/dataN

This program searches for all files in the file list with matching header and footer.

If "delete=now" is specified, then the matching files are deleted and the results logged to "outf".
If "delete=later" is specified, then the del commands are written to the BATch file specified by "outf".

The header/footer can be any of data1, data2, ... dataN (hexadecimal only).

Both header and footer are optional.

If "+hdr" or "+ftr" is specified, then the header/footer must match one of data1, data2, ... dataN.
If "-hdr" or "-ftr" is specified, then the header/footer must not match any of data1, data2, ... dataN.

Hexadecimal strings must have an even number of characters.
Use a leading 0 if necessary, eg 0ABC.

Specifying ftrlen = 0 searches for a footer at the end of the file.
Ftrlen = N (decimal) searches for footers in the last N bytes of the file.

Examples:

SRCHHDFT inf=DATlist.txt outf=delold.log delete=now ftrlen=0 hdr=41424344 AND -ftr=ff0123FF
SRCHHDFT inf=DATlist.txt outf=delold.bat delete=later ftrlen=4096 -hdr=012345 OR ftr=ffFFFF/414E44
SRCHHDFT inf="list of files to search.txt" outf="delete bad files.bat" delete=later ftrlen=0 hdr=424144

A file count is displayed after every 1000 files have been processed.

Type F to print the current file count.
Type Q to save results and quit program.

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: September 1st, 2020, 14:37 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 12666
Location: Australia
Latest version:

http://www.users.on.net/~fzabkar/temp/srchhdft_8.bas
http://www.users.on.net/~fzabkar/temp/srchhdft_8.exe

Code:
Usage:

  SRCHHDFT inf=file_list outf=log/bat/list_file delete=[now/later/list] hdrlen=size_of_header_block ftrlen=size_of_footer_block [+/-]hdr=data1/data2/dataN [AND/OR] [+/-]ftr=data1/data2/dataN

This program searches for all files in the file list with matching header and footer.

If "delete=now" is specified, then the matching files are deleted and the results logged to "outf".
If "delete=later" is specified, then the del commands are written to the BATch file specified by "outf".
If "delete=list" is specified, then the file specs are written to the list file specified by "outf".

The header/footer can be any of data1, data2, ... dataN (hexadecimal only).

Both header and footer are optional.

If "+hdr" or "+ftr" is specified, then the header/footer must match one of data1, data2, ... dataN.
If "-hdr" or "-ftr" is specified, then the header/footer must not match any of data1, data2, ... dataN.

Hexadecimal strings must have an even number of characters.
Use a leading 0 if necessary, eg 0ABC.

Specifying hdrlen = 0 searches for a header at the start of the file.
Hdrlen = N (decimal) searches for headers in the first N bytes of the file.

Specifying ftrlen = 0 searches for a footer at the end of the file.
Ftrlen = N (decimal) searches for footers in the last N bytes of the file.

Examples:

  SRCHHDFT inf=DATlist.txt outf=delold.log delete=now hdrlen=0 ftrlen=0 hdr=41424344 AND -ftr=ff0123FF
  SRCHHDFT inf=DATlist.txt outf=delold.lst delete=list hdrlen=64 ftrlen=4096 -hdr=012345 OR ftr=ffFFFF/414E44
  SRCHHDFT inf="list of files to search.txt" outf="delete bad files.bat" delete=later hdrlen=0 ftrlen=0 hdr=424144

A file count is displayed after every 1000 files have been processed.

Type F to print the current file count.
Type Q to save results and quit program.

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: September 2nd, 2020, 15:32 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 12666
Location: Australia
The following command creates a list file without directories:

Code:
dir filespec /b /s /a-d

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: September 2nd, 2020, 16:55 
Offline

Joined: May 13th, 2019, 7:50
Posts: 205
Location: Nederland
I like it fzabkar but it needs more cowbell

_________________
www.disktuna.com - photo repair service


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: September 6th, 2020, 16:08 
Offline

Joined: November 22nd, 2017, 21:47
Posts: 293
Location: France
@fzabkar : By any chance, could this tool be adapted to deal with this situation ?
It's significantly more complex as I need to search thousands of strings simultaneously. In my tests, WinHex choked with the total number (more than 42000), but worked fine with about 5000. The problem is, when running the “Simultaneous search” with a “Grep Hex” search string, the ouput is missing the actual search string, or it is translated to ASCII characters, so I can't match files from group A (obtained by “raw” carving) and group B (obtained by filesystem based recovery) found with the same search string, which defeats the whole purpose. If I run the search with ASCII strings, normally the strings should appear in the ouput, allowing to sort the list and group files matching the same search string, but it is much less reliable, since "00" bytes or line breaks prevent the search strings containing them from being processed as a mere sequence of bytes, so I would have to run multiple passes with different strings extracted at different offsets, until all files have been duly processed.


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: September 6th, 2020, 22:30 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 12666
Location: Australia
@abolibibelot, here is a tool that may be what you are looking for (bgrep):
https://github.com/nneonneo/bgrep
https://github.com/nneonneo/bgrep/archive/master.zip

Code:
bgrep

Binary grep with support for sophisticated regexes and grep(1)-like usage.
Usage

bgrep's command-line options mirror those of grep(1) very closely. The main difference is that bgrep operates on hex strings instead of text strings.

Examples:

  bgrep -r 'ffd9' /home/user/pictures - find all files with a JPEG header in them

  bgrep '00??00' binary - find one-byte strings in a binary

  bgrep -C 16 -t hex '09f91102' dvdcss - find instances of a certain encryption key in a program

  bgrep -F 'PK' file.zip - find zip entry headers in a zip file

  bgrep -E '\0[\x20-\x7e]{1,8}\0' unknown.exe - find printable strings between 1 and 8 chars long in a program (using Python regex syntax)

  bgrep -W -w 4 '0000f4ce' input.bin - find the word 0x0000f4ce in little-endian order (ce f4 00 00)

bgrep defaults to displaying binary content in a hexdump format, and even supports colour by default on supported terminals, just like grep.

Installing

As a prerequisite, you will need Python 3, at least 3.2 (higher preferred). After installing that, a simple ...

wget 'https://raw.githubusercontent.com/nneonneo/bgrep/master/bgrep.py' -O /usr/local/bin/bgrep

... will do the trick.

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: September 6th, 2020, 23:26 
Offline

Joined: November 22nd, 2017, 21:47
Posts: 293
Location: France
@fzabkar : Thanks, I have seen it in the mean time in the thread you linked in your first reply. The problem is that each command can search a single string, I don't see a way to search several specific strings simultaneously, let alone thousands, making it far less powerful / practical than WinHex for that kind of situation.

By the way, WinHex has another feature which may work wonders in that kind of situation, called “block-wise hashing and matching”, but is only available with a ‘forensic’ license, which I can't afford in the foreseeable future.
https://www.cjoint.com/c/JIhdv5LU23A


Top
 Profile  
 
 Post subject: Re: Software or script to list and delete based on header
PostPosted: September 7th, 2020, 6:50 
Offline

Joined: November 22nd, 2017, 21:47
Posts: 293
Location: France
A compromise nonetheless might be to use that tool to quickly identify “bad” ASCII search terms.
But I'll continue in the other thread for the sake of clarity.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 20 posts ] 

All times are UTC - 5 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 19 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group