MD5 or SHA checksum support in Daz Studio?

mrinalmrinal Posts: 641
edited June 2016 in The Commons

Does anyone know if Daz Studio support MD5, SHA or any other checksum algorithm to protect against accidental file system errors due to bad sectors? Is there any utility to verify the integrity of (non-encrypted) content files provided by Daz?

 

Post edited by mrinal on

Comments

  • I believe that Connect uses the md5 signature when checking to see if it needs to download an updated file or use the local copy, but I don't think it actively monitors the value otherwise.

  • To answer my own question, not it doesn't actively monitor the signatures (because of the potential performance drag) but it is possible to generate and compare signatures using scripting with createDigest functions of DzApp though I'm not at all clear on how that works http://docs.daz3d.com/doku.php/public/software/dazstudio/4/referenceguide/scripting/api_reference/object_index/app_dz

  • mrinalmrinal Posts: 641

    That would be very much desirable for any of the following sceanrios:

    1) Checking any accidental unintended modifications to the library content. Connect or DIM should flag those packages and allow the user to redownload them in case those changes are unintended. No surprises if texture files, pose presets or morph dials appear missing. A few times I have accidentally overwritten geometries in the original content and realised it only when using them in another project. Making the library read-only is not an option as that would make installing updates difficult.

    2) Content integrity for network shares. I would very much like to trigger the checks regularly to identify content corruption due to bad sectors in disks or file index corruption. Though there are other ways (OS utility tools) to monitor disk health, but most of those approaches rely on the disk to be directly connected to the system via PCI(e) or USB with limited options for network shared drives (NAS). For collection libraries to span several hundred gigbytes it would be convenient if a custom solution is available which could provide health checkup for the entire library irrespective of how the storage volumes are managed. Though technical it might be possible to implement RAID mirroring in NAS drives, it would be far more simple and approachable to run a quick checksum test overnight maybe once a week.

    3) Confidence while working offline. Just run a checksum test to identify possible errors before taking the laptop offline and one can be assured of the integrity of their library and not worry about encountering file corruptions mid-way.

    How hard it is to implement that in Daz Studio?

    QT already provides a checksum API so no new inventions required here. The MD5/SHA checksum values can be calculated from existing files in addition for those of packages (.zip and .sep) and those can be specified as part of the content metadata. All that the Studio needs is to calculate the checksum of the local files and validate them against the checksum in the metadata whenever validation is triggered. Depending on the number (and size) of files the duration could vary - hence recommended overnight.

    As you mentioned, I assume a similar approach is already done but only during first installation or subsequent updates of individual content. I believe many of us would have moved content around (drive upgrades, backup and restore) and this would be very helpful if checksum validation is provided as a menu option or as a separate utility.

    I assume that Connect is already taking care of checksums for encrypted content as decryption would fail even for the slightest error. Though, redownloading may be handled transparently but that would require the studio to be online.

    By the way, how are others dealing with these issues presently?

  • mrinalmrinal Posts: 641
    edited June 2016

    To answer my own question, not it doesn't actively monitor the signatures (because of the potential performance drag) but it is possible to generate and compare signatures using scripting with createDigest functions of DzApp though I'm not at all clear on how that works http://docs.daz3d.com/doku.php/public/software/dazstudio/4/referenceguide/scripting/api_reference/object_index/app_dz

    I was already writing my earlier response and saw your comments after posting. Yes, the performance drag could be significant as checksum validation also depends on the size of the files in addition to their count. That is why it should be manually triggered only whenever convenient and should not start automatically (other than for installation of new content or upgrades).

    Now that Connect downloads individual items in separate folders (identified by their SKUs), it might help running checksum on selected folders at a time. That way it is better than running checkdisk or other system utilities which require dedicated access to the drive and blocks access while it runs. Also for network shares without RAID mirroring there are not many options.

    Thanks for the API reference link. Plugin developers can write a plugin to calculate the signature of the files first time when they get installed but considering that a "source of truth" would be risky - what if the local installation itself had unidentified errors? or it got installed on a bad sector? I would consider it more reliable if the checksums are calculated on Daz servers and provided as part of metadata download. Calculating checksum signatures for files can be automated on Daz servers and should not require any manual effort by PAs or Daz staff. A plugin, if developed, would caculate the checksum of local files but it still need to validate them against a "source of truth". Hence, the need for providing them as part of metadata download.

    Post edited by mrinal on
  • mrinalmrinal Posts: 641

    For those of us who are wondering what MD5/SHA Checksum is and why is it relevant for Daz content libraries, here is a quick explanation: http://www.geeksengine.com/article/checksum.html

     

  • TaozTaoz Posts: 9,739

    As an alternative you could try this:

    http://www.ajcsoft.com/active-backup.htm

    It will monitor your files and report if any changes are made.

  • This seems to work, though it needs more/better error checking. I'm not sure how the version with an array of tokens would work - passing an array of a file name used the file name, not the contents

    var testFile = new DzFile( "E:/Documents/HashTest.txt" );if ( testFile.open( DzFile.ReadOnly ) ) {	var fileContent = new ByteArray ( testFile.read() );	testFile.close();	var hash = App.createDigest( fileContent );	print ( hash );}

     

  • mrinalmrinal Posts: 641
    edited June 2016
    Taozen said:

    As an alternative you could try this:

    http://www.ajcsoft.com/active-backup.htm

    It will monitor your files and report if any changes are made.

    Most backup solutions rely on comparing metadata (timestamps, size) of files to identify if the file has changed across two different locations. They do not (usually) compare file contents which is necessary to identify bad sectors. If an underlying disk sector goes bad in the backup disk itself there would be no way to identify the corruption without actually reading the contents of that file.

    Also, file/directory comaprison would only highlight the difference and tell you that the copies are different. Judging which is the "source of truth" would still be a challenge i.e. one would have to still analyze whether the change was due to a genuine update to the content or an accidental modification. And since one would have the (false) sense of security that their backups are their "source of truth" one may get tempted to overwrite from their backups as their first option. Now while doing so, imagine the scenario of realizing that the backup disks themselves have developed error over time. How many versions of backups can one afford?

    Having said that, backups are a good source of protection against entire disk failures or partion table corruption as they may be the only goto option without redownloading the entire library.

     

     

    This seems to work, though it needs more/better error checking. I'm not sure how the version with an array of tokens would work - passing an array of a file name used the file name, not the contents

    var testFile = new DzFile( "E:/Documents/HashTest.txt" );if ( testFile.open( DzFile.ReadOnly ) ) {	var fileContent = new ByteArray ( testFile.read() );	testFile.close();	var hash = App.createDigest( fileContent );	print ( hash );}

     

    The Daz API/utility would always return a hash value even if the file is corrupted which may happen during first install itself if the file gets written to a bad sector. We would still require a source of truth for the hash to be validated against. The checksum in the distribution (as part of metadata or an additional file) would be "the single source of truth". The checksum utility/API would then calculate the hash of the local file and compare it with the checksum specified by the distribution. The checksum file can be distributed alongside the binaries as .md5 or .sha1 files or both as in the case of Apache http://www-us.apache.org/dist//httpd/binaries/netware/

    The checksum can be specified by Daz in either way:

    1) As part of metadata which would require a plugin to be developed for generating the hash and comparing it with the checksum as you mentioned. OR

    2) As an additional .md5 or .sha1 file for each binary files which can be read by any freely available checksum tool and implemented recursively for all directories and file through batch or shell scripts. The .md5 and .sha1 files are nothing but contain the checksum value (based on the algorithm used) of the original file as it was made available for download. Here are some more examples and links: https://www.openoffice.org/download/checksums.html

    But in either case, the original checksum value has to be provided by Daz in order to avoid any ambiguity in determining the source of truth.

    Since Daz would be updating the original checksum every time a file is updated this would prevent any false alarms which are otherwise difficult to isolate by simple folder diffing tools. And did I already say that checksums would also identify bad sectors on your disks (because it actually reads the entire file to calculate the hash)

    P.S. I would be careful with the term signature as it is usually associated with PGP signing which is not what I am referring here.

    Post edited by mrinal on
  • mrinalmrinal Posts: 641
    edited June 2016

    Can't explain it better than this:

    Download this small Fciv utility from Microsoft https://support.microsoft.com/en-us/kb/841290 and extract it to get the fciv.exe and ReadMe.txt (we would need both)

    fciv.exe -md5 -add ReadMe.txt -xml checksums.xml

    It would create a checksums.xml with the checksum information for file ReadMe.txt.

    I would expect Daz to provide us this checksum data for each binary file for the content which I would treat as the source of truth.

    Now modify the ReadMe.txt any way you want. Save the file and then run the command:

    fciv.exe -v -md5 -xml checksums.xml

    This command recalculates the checksum by actually reading the contents of the file and validating it against the checksum specified in checksums.xml

    If everything goes right you should see something like:

    List of modified files:-----------------------readme.txt        Hash is         : 79ac8d043dc8739f661c45cc33fc07ac        It should be    : 7ca527f1f980aea7406271a134b97dfb

    This is a clear indication that the file contents has been modified locally (deliberately or accidentally). Now had Daz updated the file contents, they would also have released the updated checksums signifying that it is a genuine modification and the validation would have passed without issues. This would have been difficult using a file comparison tool of backup utilities which would only say that the two versions are different but leave you with the decision to select which one you want to retain; that's where the source of confusion and ambguity lies.

    Now don't quote me literally that I am asking for an xml file. The information contained in the xml file can be provided either through metadata or an external .md5 file. Either way it should be fine as long as Daz provides those checksums along with their content files.

    Post edited by mrinal on
Sign In or Register to comment.