MAMEWorld >> EmuChat
View all threads Index   Threaded Mode Threaded  

Pages: 1

midget35
MAME Fan
Reged: 02/20/07
Posts: 41
Send PM


Chdman questions
#342940 - 07/26/15 12:58 PM


I'm a developer. I'm working on DAT management software and looking for some information on chdman's functionality.

Specifically - I'd like to get the hashes (crc, sha1, md5, etc.) of all files inside a chd (before they were compressed), without havcing to extract them first. Is this possible, or could I make a feature request?

If anyone can put me in touch with the right people, I'd be appreciative!



mw
MAME Fan
Reged: 01/01/07
Posts: 76
Send PM


Re: Chdman questions new [Re: midget35]
#342958 - 07/26/15 06:25 PM


> I'm a developer. I'm working on DAT management software and looking for some
> information on chdman's functionality.
>
> Specifically - I'd like to get the hashes (crc, sha1, md5, etc.) of all files inside
> a chd (before they were compressed), without havcing to extract them first. Is this
> possible, or could I make a feature request?
>
> If anyone can put me in touch with the right people, I'd be appreciative!

Question: what exactly is the "etc."

The sha1 of the uncompressed data is already available with the "info" command of chdman.exe. But you really have to run a "verify" command to trust it (in case the CHD file is damaged). Which takes as long as an extraction.

I gather that, to save HD space and time, you want to extract to memory and compute various other hashes.

So it depends on what the "etc" is, because that will have to be coded into the variant of chdman you are asking for.

Your best bet for a work-around at this time is to use PeaZip. It has a tool (called PeaUtils on the tool menu) that computes various (15*) kinds of hashes on a file. But you will have to use the chdman extract command to create the file for PeaZip to look at.

* from the help .pdf, supported hashes: Adler32, CRC16/24/32/64, eDonkey, MD4, MD5, Ripemd160, SHA1, SHA224/256/386/512, Whirlpool512

Edit:

Here is a link to a more ambitious CRC calculator, which allows you to define you own algorithm.

http://reveng.sourceforge.net/

Edited by mw (07/26/15 06:33 PM)



midget35
MAME Fan
Reged: 02/20/07
Posts: 41
Send PM


Re: Chdman questions new [Re: mw]
#342960 - 07/26/15 07:27 PM


Hey mw. Really appreciate the thought you put into your answer. I have overcomplicated things with my little 'etc.' up there, so please disregard it.

Also - let's assume the chd itself is NEVER damaged.

It boils down to this:

I don't really understand where the sha1 value (that we find in UME softwarelist xml dats) comes from. For example, the pcecd game 'ys3' has an sha1 of:

2c9be3926d7cb5e5d46e5418be8fd0c01deb309f

... but that is not a) the hash of the chd itself, or b) the hash of the Trurip iso (which I think it is a clone of). Further - what is the hash of the cue file it would also contain?

Understanding this would be a big help!

Thanks again



CiroConsentino
Frontend freak!
Reged: 09/21/03
Posts: 6211
Loc: Alien from Terra Prime... and Brazil
Send PM


Re: Chdman questions new [Re: midget35]
#342963 - 07/26/15 07:44 PM


You can get the SHA-1 from the .chd's header. Just take a look at mame\source\mame\src\lib\util\chd.h source file to declare those header types in your application read the checksum from them. Make sure you support all header versions from v1 to v5...

That's how my frontend Emu Loader validate CHD files. What compiler is your application created with ?
My frontend is made with Borland Delphi 7.



Emu Loader
Ciro Alfredo Consentino
home: http://emuloader.mameworld.info
e-mail: [email protected]



midget35
MAME Fan
Reged: 02/20/07
Posts: 41
Send PM


Re: Chdman questions new [Re: CiroConsentino]
#342971 - 07/26/15 09:24 PM


Hey Ciro. Good to speak with you bud. Big fan of your software

Built with C#. The tool I am working on is here:
www.datinate.com

I'll have some tutorial vids up there tomorrow, too.

Does that includes the cue and iso sha1s? Gonna check it out in a bit. Still - a shame they arent't included in the DATs...

Thanks!



CiroConsentino
Frontend freak!
Reged: 09/21/03
Posts: 6211
Loc: Alien from Terra Prime... and Brazil
Send PM


Re: Chdman questions new [Re: midget35]
#342983 - 07/26/15 11:37 PM


>> Does that includes the cue and iso sha1s? Gonna check it out in a bit.
>> Still - a shame they arent't included in the DATs...

no, just SHA-1 for the .chd file.
The checksum in the chd header i correct because a long time ago I was using a function to generate SHA-1 checksums but it was taking several minutes to do it for each CHD file, and the generated checksum was exactly the same compared to the one from the file's header.

Since you're app is in C# you'll have no problem using the source from MAME.
Note that the checksum is for the entire .chd file. There are no multiple checksums for the files inside the .chd.



Emu Loader
Ciro Alfredo Consentino
home: http://emuloader.mameworld.info
e-mail: [email protected]



mw
MAME Fan
Reged: 01/01/07
Posts: 76
Send PM


Re: Chdman questions new [Re: midget35]
#342991 - 07/27/15 04:52 AM


> It boils down to this:
>
> I don't really understand where the sha1 value (that we find in UME softwarelist xml
> dats) comes from. For example, the pcecd game 'ys3' has an sha1 of:
>
> 2c9be3926d7cb5e5d46e5418be8fd0c01deb309f
>
> ... but that is not a) the hash of the chd itself, or b) the hash of the Trurip iso
> (which I think it is a clone of). Further - what is the hash of the cue file it would
> also contain?
>
> Understanding this would be a big help!
>
> Thanks again

The hash that appears in the .xml files is the sha1 hash of the compressed data in the .chd file, _not_ the whole .chd file. A .chd file consists of a header area and a data area. The two sha1 hashes that appear in the chdman "info" command are the "SHA1" and the "Data SHA1". It is the "SHA1" that corresponds to the hash listed in the .xml files. It is a hash of only the data portion of the .chd file.

If you wished, you could verify the "SHA1" by cutting off the header portion of the .chd file and running a conventional sha1 pass on that remaining data portion of the file - _or_ you could use the .chd file without cutting it, but start the sha1 computation at the beginning of the data portion of the .chd file.

The other sha1 hash, named "Data SHA1" by chdman, is the sha1 hash of the data before it was compressed. To re-compute that, you have to expand the .chd back to it's original .raw format, and then run the sha1 pass on that.

These two hashes are tabulated when the .chd file is created. The two hashes and some other things are placed in the header portion of the .chd file, and the compressed data from the original media occupies the data portion.

There is a third sha1 that can be produced - if you run the sha1 tabulation on the entire .chd file. This is the sha1 you will see when you have a dir2dat. The sha1 on the entire .chd file is different than the "SHA1" and "DataSHA1", of course.

None of these three sha1 hashes can be calculated from the other.

As for what sha1 hashes are correct for .iso, or .cue, who knows? Those sha1 hashes are not as clearly defined, because chdman has nothing to do with them. The sha1 hashes for .iso and .cue are only defined by the person who made those files. There isn't necessarily a standard to adhere to. There may be a standard practice, which might be good enough. But the different sha1 hashes are not related to one another. The only way to compare one to another is through a "rosetta stone" - that is, a list that someone has compiled. There is no computational way of converting one sha1 hash to another - short of obtaining the .chd and hashing it.



R. Belmont
Cuckoo for IGAvania
Reged: 09/21/03
Posts: 9716
Loc: ECV-197 The Orville
Send PM


Re: Chdman questions new [Re: midget35]
#343004 - 07/27/15 04:42 PM


> ... but that is not a) the hash of the chd itself, or b) the hash of the Trurip iso
> (which I think it is a clone of). Further - what is the hash of the cue file it would
> also contain?

The CHD's hash is a hash of the data itself (minus any header or other data) and the chdman metadata stream (which contains the geometry/track types/etc for CDs). You cannot directly recover the hashes of the source file(s) from a CHD; CHDMAN can and does do things like separating sector, subchannel, and audio data in order to gain better compression ratios.



midget35
MAME Fan
Reged: 02/20/07
Posts: 41
Send PM


Re: Chdman questions new [Re: CiroConsentino]
#343008 - 07/27/15 05:13 PM


Thanks Ciro. I'd like to raise a feature request so that I can get the sha1 of both the cue and iso by just running a quick chdman command line request.

Not sure how I'd go about logging a request with the MAME team, though...



R. Belmont
Cuckoo for IGAvania
Reged: 09/21/03
Posts: 9716
Loc: ECV-197 The Orville
Send PM


Re: Chdman questions new [Re: midget35]
#343009 - 07/27/15 05:20 PM


> Thanks Ciro. I'd like to raise a feature request so that I can get the sha1 of both
> the cue and iso by just running a quick chdman command line request.
>
> Not sure how I'd go about logging a request with the MAME team, though...

The CHD didn't necessarily come from cue/bin. It could also have come from toc/bin, bare iso, or that Dreamcast specialty format. And we're looking to add more formats. What you seek is not possible.



midget35
MAME Fan
Reged: 02/20/07
Posts: 41
Send PM


Re: Chdman questions new [Re: R. Belmont]
#343011 - 07/27/15 05:49 PM


Thanks all for the ongoing information all

@R.Belmont: I know I was oversimplifying things by going on about the cue/bin format. I know there can be different 'contained' files inside the CHD.

but my broader point was: would it be possible to store the hash information of the child file(s) as part of the CHD header? Why do you say what I seek is not possible??

@mw: "The sha1 hashes for .iso and .cue are only defined by the person who made those files. There isn't necessarily a standard to adhere to."

Forgive me, mw, but isn't that all a hash ever is anyway? What I'm struggling to get my head around is how one might quickly report on the the CHD content(like one can view 7z file content instantly before the lengthy extraction process).



Haze
Reged: 09/23/03
Posts: 5245
Send PM


Re: Chdman questions new [Re: midget35]
#343014 - 07/27/15 06:07 PM


the problem is that things like .cue and .bin are really bad approximations of a CD and eventually the internal format of the CHD will progress well beyond a level where those things can even be represented.

the SHA1 of the CHD is the SH1 of all the data and metadata that the CHD format uses to store a CD.

cuesheets can contain information like the filename of the .bin file (and even the path it's stored on) as well sa comment fields and the like, completely arbitrary information that we have no reason to store. It's also a loose and badly defined format (or at least badly used, many pieces of software interpret the same cues in different ways which has actually been a major problem in building software lists)

When CHDMAN outputs a .cue and .bin (or any other format) it attempts to build them based on the data that is stored in the CHD the best it can, nowhere is the original .cue or .bin being stored. If we were to store the original cue data in the CHD as non-hashed data people would quickly remove it anyway (as we see with romsets, readmes getting stripped etc.) If we store the original cue data and hash it we'd be hashing arbitrary data meaning you could have a CHD with the exact same data but different internal hashes because of an additional comment in the cue file or similar.

Also remember CHD is used for a lot more than CDs, and is completely unaware of the underlying file systems.



midget35
MAME Fan
Reged: 02/20/07
Posts: 41
Send PM


Re: Chdman questions new [Re: Haze]
#343018 - 07/27/15 06:26 PM


Thanks Haze,

So... is this statement correct? (I am simplistically assuming everything is cue + bin again):

The Data SHA1 is the combined data of the source (pre-chd compressed) cue and bin files, minus cue metadata like comments (and other arbitrary variable data).

i.e. The combined cue + bin checksums of the original ( pre-chd compressed) source files would not match the Data SHA1.

Am I getting it?



Haze
Reged: 09/23/03
Posts: 5245
Send PM


Re: Chdman questions new [Re: midget35]
#343020 - 07/27/15 06:57 PM


> Thanks Haze,
>
> So... is this statement correct? (I am simplistically assuming everything is cue +
> bin again):
>
> The Data SHA1 is the combined data of the source (pre-chd compressed) cue and bin
> files, minus cue metadata like comments (and other arbitrary variable data).
>

if you do a chdman info -i e:\mamechds\ddr2m\885jaa02.chd

SHA1: f02bb09f41533c6ec496a662d815e85b304fcc72
Data SHA1: 5c57eeafa4c8dd8e62bab04fcabd46e586d1b59d

The "SHA1" I believe is the full hash of the uncompressed metadata and data, this is the one that gets listed in the drivers as it identifies the individual CHD. (Metadata can include things like disk geometry on HDD images, and the lock code on certain types of Flash cards too for example - ie things that are just as important as the data)

the "Data SHA1:" I believe is just the hash of the uncompressed data part. It means you can see if 2 CHDs are the same except for the metadata part more easily.

> i.e. The combined cue + bin checksums of the original ( pre-chd compressed) source
> files would not match the Data SHA1.
>

correct, it would not match, remember even with CDs some of the source images have multiple tracks split into multiple files (eg .bin and .wav files, in which case obviously we don't want to be storing wav headers either, but rather the actual data) The data SHA1 we store is after we've ensured all that is in a standard internal format.

> Am I getting it?

basically as long as you understand that the only way to verify a CHD is to decompress it, ideally by calling CHDMAN and asking it to perform a verify operation, yes.

The idea of the format is that the CHD file can be verified as non-corrupt by using CHDMAN without the need for extra cue files etc. because the CHD itself contains the hash for all the data it stores, should the CHD become corrupted then verification against that hash would fail.



midget35
MAME Fan
Reged: 02/20/07
Posts: 41
Send PM


Re: Chdman questions new [Re: Haze]
#343021 - 07/27/15 07:59 PM


Hmmm... I would really like to trace the extracted chd content back to its source.

Please indulge me again: if I store a trurip disc in chd, and then extract it, am I right in saying the sha1s of the cue and bin will not be the same as the source files I added to the chd in the 1st instance? That's the crux for me.

It looks like chd compression is irreversible?



R. Belmont
Cuckoo for IGAvania
Reged: 09/21/03
Posts: 9716
Loc: ECV-197 The Orville
Send PM


Re: Chdman questions new [Re: midget35]
#343022 - 07/27/15 08:20 PM


> Hmmm... I would really like to trace the extracted chd content back to its source.
>
> Please indulge me again: if I store a trurip disc in chd, and then extract it, am I
> right in saying the sha1s of the cue and bin will not be the same as the source files
> I added to the chd in the 1st instance? That's the crux for me.
>
> It looks like chd compression is irreversible?

It's reversible; for well-formed source CD inputs, -extractcd to the same format will give hash-matching binary data and semantically-matching metadata (cue/toc).

But getting full hash-matching everything back is unlikely; as Haze alluded to, for each .cue file, there are dozens of other .cue files which aren't perfect matches but ultimately create the same disc.



mw
MAME Fan
Reged: 01/01/07
Posts: 76
Send PM


Re: Chdman questions new [Re: midget35]
#343025 - 07/27/15 08:40 PM


> @mw: "The sha1 hashes for .iso and .cue are only defined by the person who made those
> files. There isn't necessarily a standard to adhere to."
>
> Forgive me, mw, but isn't that all a hash ever is anyway? What I'm struggling to get
> my head around is how one might quickly report on the the CHD content(like one can
> view 7z file content instantly before the lengthy extraction process).

I think the replies by rbelmont and haze have gone into more depth about the lack of standards among .iso and .cue files.

A .7z file has something called the "central directory record" (or something like that), which contains a description of the contents of each of the .7z file items. A CHD has no such record in it, other than the header. The chdman.exe "info" command shows all the contents of the header. That's all you get. If you want to do is "preview" the contents of a .chd file, all you can do is examine that header information.

A "hash" is not a descriptor of the contents of anything. A "hash" is a (hopefully unique) signature of the contents of something.

Each .chd file only contains the data for one item. It may contain various items (such as metadata) which explain the contents of the data area of the .chd file.

A .7z file is a general purpose storage archiving system. It can contain all kinds of stuff. A .chd file only contains one item, but has a variety of ways of describing it via the information in the header.



midget35
MAME Fan
Reged: 02/20/07
Posts: 41
Send PM


Re: Chdman questions new [Re: R. Belmont]
#343026 - 07/27/15 08:55 PM


Ah... So this is the thing that bothers me - if each component file in the chd had its pre-compressed sha1 stored, I could probably legitimately identify the source files on at least some occasions by looking for those hashes in other dats . It frustrates me that only the combined sha1 of the cue and bin files are stored. That makes tracing the origin source files substantially more time consuming (got to extract and sha1 scan every chd).

Am I making sense here? Wouldn't this be a reasonable feature request for chdman?



R. Belmont
Cuckoo for IGAvania
Reged: 09/21/03
Posts: 9716
Loc: ECV-197 The Orville
Send PM


Re: Chdman questions new [Re: midget35]
#343030 - 07/27/15 09:48 PM


> Ah... So this is the thing that bothers me - if each component file in the chd had
> its pre-compressed sha1 stored, I could probably legitimately identify the source
> files on at least some occasions by looking for those hashes in other dats . It
> frustrates me that only the combined sha1 of the cue and bin files are stored. That
> makes tracing the origin source files substantially more time consuming (got to
> extract and sha1 scan every chd).
>
> Am I making sense here? Wouldn't this be a reasonable feature request for chdman?

I don't understand how that would be useful. At this point it's kind of a weird feature from outer space.



midget35
MAME Fan
Reged: 02/20/07
Posts: 41
Send PM


Re: Chdman questions new [Re: R. Belmont]
#343034 - 07/27/15 11:22 PM


I am developing Datinate - an application that compares dats. Please have a look R Belmont - hopefully my feature will seem more terrestrial!

www.datinate.com



Moose
Don't make me assume my ultimate form!
Reged: 05/03/04
Posts: 1483
Loc: Outback, Australia
Send PM


Re: Chdman questions new [Re: midget35]
#343038 - 07/28/15 02:17 AM


> I am developing Datinate - an application that compares dats. Please have a look R
> Belmont - hopefully my feature will seem more terrestrial!
>
> www.datinate.com

This looks very interesting !

If you could select file(s) on your PC (e.g. such as a ROM file you have just dumped) and see if they were already cataloged in any of the DATs (anything with the same SHA1, etc) then that would be a nice feature.

And, if it is a new file (or a different version of an already cataloged file), then have the ability to add it to a DAT ?

Anyway, the skies the limit. I think what you are developing could well be *very* useful. There's tons of things you could do (if only there was time / we were immortal ....)



Moose



midget35
MAME Fan
Reged: 02/20/07
Posts: 41
Send PM


Re: Chdman questions new [Re: mw]
#343064 - 07/28/15 08:32 PM


Given that the individual files are listed, I don't get why the file size and sha1 can't be as well



mw
MAME Fan
Reged: 01/01/07
Posts: 76
Send PM


Re: Chdman questions new [Re: midget35]
#343101 - 07/29/15 08:40 AM


> Given that the individual files are listed, I don't get why the file size and sha1
> can't be as well

I'm not sure if I understand what you mean by this. A .chd has a file, not files. The metadata describes what went into that file, but not the way DOS does, the descriptions are not based on conventional filenames, filesizes and individual hashes.

The .chd format was not designed to be another archive format like .rar or .7z. Conventional archive formats are designed from the ground up to be general purpose.

At first, the .chd format was designed to be very specific to compressing the data found the hard disks on arcade games. It needed to be fast, it has to unpack the data much more quickly than conventional archive systems, because there's a game going on in the background. One doesn't have the luxury to stop the game to unpack a large or tightly-compressed file. There was no need to catalog the files on the compressed hard disk, the decompression system unpacked the raw sectors as needed. The only info needed was the cylinder and sector layout to describe the physical properties of the uncompressed media.

Later, when it came time to emulate arcade games that used CDROM (and later, flash) as the media, the .chd format was expanded to cover the those. (This is the metadata information) The emphasis continues to be to support the raw data, not caring what the game calls it's files and folders internally.

This is why your inquiries are not making sense to the MAMEdevs. They don't have any interest in developing another archive format. The .chd format as it stands works fine for it's intended purpose.

Although you can see great benefit if there was an easier way to translate the .chd format to a conventional view of files and folders, there is no point in doing that because it does not contribute to the emulation process.

I'm not trying to tell you to stop trying to convince them, perhaps you can think of another tack to make it a benefit to have more extended information as part of the .chd format, instead of a hindrance.



midget35
MAME Fan
Reged: 02/20/07
Posts: 41
Send PM


Re: Chdman questions new [Re: mw]
#343102 - 07/29/15 10:06 AM


Yep, Thanks all for your great help guys. And thanks especially to you, mw, for your time here. Didn't mean to overstay my welcome on this topic, if indeed I have... passion is a driver!

I'll go away, do my homework, and hopefully sell the pros of my vision a little better next time.

Cheers


Pages: 1

MAMEWorld >> EmuChat
View all threads Index   Threaded Mode Threaded  

Extra information Permissions
Moderator:  Robbbert, Tafoid 
0 registered and 193 anonymous users are browsing this forum.
You cannot start new topics
You cannot reply to topics
HTML is enabled
UBBCode is enabled
Thread views: 3481