Ignore:
Timestamp:
2000-07-13T10:21:53+12:00 (24 years ago)
Author:
sjboddie
Message:

merged changes to trunk into New_Config_Format branch

File:
1 edited

Legend:

Unmodified
Added
Removed
  • branches/New_Config_Format-branch/gsdl/src/hashfile/hashfile.cpp

    r915 r1279  
    143143
    144144
     145
     146// create a hash string from the contents of a file.
     147//
     148// The file is treated as a large base 256 number (each char is a digit),
     149// and the hash value is the remainder when this number is divided by a
     150// very large prime.
     151//
     152// PROBLEM: This is a flawed hash function because the rightmost (highest)
     153// value in remainder is more likely to be a "1" than it is to be any other
     154// number. 
     155//
     156// EVIDENCE: About 50% of the files in any GSDL directory have a hash code
     157// that starts with 01* which implies (based on the my_convert_num function)
     158// that the rightmost "digit" of the remainder = z such that (z % 16 = 0)
     159// and (z / 16 = 1) which means (z = 1).
     160//
     161// MAJOR REASON: suppose our prime number was 19.  Then, if we have a
     162// reasonably random distribution of numbers for which we are going to
     163// calculate (N % 19), we expect to get a roughly uniform distribution
     164// of remainders where the possible values are 0, 1, 2, 3... 18.  The
     165// problem is that 9 out of the 19 possible values (10,11... 18) start
     166// with the digit 1.  Thus our hascode will start with "01".
     167//
     168// ANOTHER PROBLEM: Characters in the file are read one at a time; after each
     169// one is read it is prepended to the remainder, then the remainder is
     170// recalculated on the string thus far seen.  I am sure the math here
     171// is wrong - if I try calculating (111 mod 7) by the same algorithm, it
     172// simply does nor work.
     173//
     174// ANOTHER POSSIBLE PROLEM: Each character from the file is read into
     175// remainder at the most significant end, and when that character
     176// is a zero you get a number like "01" which would be considered
     177// larger than a number like "8" because it is longer (two  digits
     178// instead of 1). 
     179//
     180// These comments added by Gordon Paynter ([email protected]) in
     181// June 2000.  I didn't write any code, however.
     182
    145183char *hashfile (char *filename) {
    146184  FILE *infile = (FILE *)NULL;
    147185  int i;
    148186
    149   // calculate the 8 multiples of the prime number to use
    150   // in the long division
    151187  number primepow[8];
    152188  number pow;
     
    163199  pow.len = 12;
    164200
     201  // calculate 8 multiples of the prime number.
     202  // These are used to find the remainder using only subtraction operations
    165203  for (i=0; i<8; i++) {
    166204    primepow[i] = pow;
     
    168206  }
    169207
     208  // The "remainder" after division by the prime.  Our result.
     209  remainder.len = 0;
     210
     211  // open the file
    170212  infile = fopen (filename, "rb");
    171213  if (infile == NULL) {
    172214    return (char *)NULL;
    173215  }
     216  c = (unsigned char)fgetc(infile);
    174217 
    175   remainder.len = 0;
    176   c = (unsigned char)fgetc(infile);
    177218  while (!feof (infile)) {
    178     // remainder = remainder * 256 + c
     219
     220    // make sure the remainder has not grown too large
    179221    if (remainder.len == MAXNUMLEN-1) {
    180222      fprintf (stderr, "ERROR - number overflow\n");
    181223      return (char *)NULL;
    182224    }
     225
     226    // remainder = remainder * 256 + c
    183227    for (i=remainder.len; i>0; i--) {
    184228      remainder.num[i] = remainder.num[i-1];
    185229    }
    186230    remainder.num[0] = c;
    187     if (remainder.len > 0 || c != 0) remainder.len = remainder.len+1;
     231    if (remainder.len > 0 || c != 0) remainder.len = remainder.len+1;
    188232   
     233    // remainder = (remainder % large-prime-number)
    189234    for (i=7; i>=0; i--) {
    190235      my_ifpos_dec (remainder, primepow[i]);
    191236    }
    192237
     238    // read a new character from the file
    193239    c = (unsigned char)fgetc(infile);
    194240  }
Note: See TracChangeset for help on using the changeset viewer.