I released this under GPL some time ago, (2003-May) and have beenadvertising it occasionally here, where it seemed applicable. Ihave received no bug reports.I have just gotten around to writing a usage manual for it, whichfollows. I would like some opinions on it. Please don''t quote thewhole thing back at me, a short excerpt followed by your pithycommentary will do nicely. I am off for an operation Monday,so I won''t be available for a while after that, and you might aswell hold things after Saturday.HOW TO USE hashlib==================<http://cbfalconer.home.att.net/download/hashlib.zip>To use this easily you should also have a copy of hashlib.hprinted out, or easily available in another editor window. Itdescribes the complete interface to hashlib, and this is just anexplanation of the functions, why they exist, etc. hashlib.his extensively commented.What is it for?==============You may be wondering "for what is hashlib useful". The answeris that it is a storage facility. You can hand it things, andit will tuck them away, and make it easy for you to find themlater.A major point is that the time it takes to store, find, delete,or retrieve an item is almost constant no matter how big thetable gets. Also, you don''t have to worry about the table size,because it will automatically adapt itself. It may hold 5 itemsor millions. The limit is your memory.What does it do?===============For a list of the things it will do, you should have the file"hashlib.h" handy. This details all the things you can do, andhow to customize the system to your data. The interfacefunctions are:hshinit, hshkill Make or destroy a hashtablehshfind, hshinsert, hshdelete Insert, find, take out itemshshwalk For advanced usage, laterhshstatus Such things as how many storedCustomizing to your data:========================In order to use a table, the first thing you have to do is tocreate it with hshinit. At that time you tell hashlib how toprocess your data. I will return to this later.Your actual data takes some form, which is entirely up to you.It must be possible to refer to a complete data item by asingle pointer. Your data also will have some sort of key, oreven multiple keys. It can have whatever auxiliary data youlike. This implies you must define a structure somewhere foryour own benefit:typedef struct hashitem {sometype yourkey;otherstuff yourdata;} item, *itemptr;The field names, structure name, typedef''d names, etc areentirely up to you. Somewhere in your program you will haveat least one of these things. hashlib will make more of themin which to store copies of the data you insert.Equality========Since hashlib works on all forms of data, it obviously can''tread your data description. So you have to tell it how tofind out that two data items have the identical key. Thisintroduces the type (defined in hashlib.h):typedef int (*hshcmpfn)(void *litem, void *ritem);which is a function you will design and program, and one ofthe items you pass in the hshinit call is a pointer to thatfunction. Let us assume that in the item definition abovesometype is int (such as the typedef below under copying).Then your comparison function could be:mycmp(void *litem, void *ritem){itemptr left = litem;itemptr right = ritem;int lvalue, rvalue;lvalue = left->yourkey;rvalue = right->yourkey;return lvalue == rvalue;}NOTE: I have made this function more complex than it needbe, in order to emphasize how it goes about it.The left and right pointers come from hashlib, and hashlibdoesn''t know about your data type. Therefore it converts theminto the C universal pointer, a "void *". When you get themback you have to convert them back into itemptr, so you canaccess the fields of your data.All hashlib cares about is "are they equal", so the abovereturns only 0 or 1, for notequal and equal. The comparisonroutine will be useful for other things if you make it return-1, 0, or +1 for less, equal, greater. To do this you couldmake the return statement say:return (lvalue > rvalue) - (lvalue < rvalue);which will turn out to be 1-0, 0-0, 0-1 for the three cases.The point is not to return (lvalue - rvalue), because thiscan run into overflow and give erroneous results.Copying=======When you pass an item to hashlib you don''t want to worry aboutwho owns the space it takes. Therefore the principle is"hashlib owns all the items it stores". Thus hashlib makes acopy of any data item it inserts into the table. Once more,only you know how to do this, and you have to tell hashlib.typedef void *(*hshdupfn)(void *item);in hashlib.h specifies what this function must look like. Forthe simple structure above, all it would have to do is mallocspace for a copy, and copy the fields. Remember it is dealingwith pointer to data, and the first thing you have to do ismake the item pointer into a pointer to your structure.Lets make the simple data structure above more concrete:typedef struct hashitem {int yourkey;int yourdata;} item, *itemptr;Then the hshdupefn (notice how the function is defined byediting the typedef for hshdupfn) could be:void *mydupe(void *item){itemptr myitem = item;itemptr newitem;if (newitem = malloc(sizeof *newitem) {newitem.yourkey = myitem.yourkey;newitem.yourdata = myitem.yourdata;/* or "*newitem = *myitem" in this case */}return newitem;}Notice again that only your code knows what is in the items tobe stored, and thus how to copy them. Your item can be ascomplicated as you wish. So lets make it store strings:typedef struct hashitem {char *yourkey;int yourdata;} item, *itemptr;and see how it affects the hshdupefn. Yourkey is now just apointer to a string somewhere, which may want to be modifiedor used in some manner. So we have do what is called a deepcopy.void *mydupe(void *item){itemptr myitem = item;itemptr newitem;if (newitem = malloc(sizeof *newitem) {if (newitem->yourkey =malloc(1+strlen(myitem->yourkey) {strcpy(newitem->yourkey, myitem->yourkey;newitem.yourdata = myitem.yourdata;}else { /* we ran out of memory, release and fail */free(newitem)newitem = NULL}}return newitem;}Notice how it returns NULL if malloc fails to secure thenecessary memory anywhere. This allows hashlib to do theright things under nasty cases, such as exhausting memory.The need for a deep copy is generally signalled by havingpointers in your data type description. All those pointers haveto be resolved to data that can belong to the hash table.Letting Go==========Once you have thrown a whole mess of data at hashlib, and it iskeeping track, you may decide to release it all. While youcould often just abandon it, and let the operating system cleanup after you when your program ends, this is not a goodpractice. Besides, your program may not end. So you have totell hashlib how to get rid of one item, which it will use toget rid of all of them when you use the hshkill function(described later).typedef void (*hshfreefn)(void *item);in hashlib.h describes that function. Now we will assume thecomplex hshdupefn last described above, and the correspondingtype definition for an item. Again, we build the functionheader by editing the typedef and converting the passed void*pointer:void myundupe(void *item){itemptr myitem = item;free(myitem->yourkey); /* First, because this won''t */free(myitem); /* exist after this one. */}thus returning all the allocated memory. Notice how it undoeseverything that mydupe did. The mydupe/myundupe pair could evenopen and close files, but you will rarely want to handlethousands of open files at once.Hashing=======This is fundamental to the efficient operation of a hashtable,although hashlib can put up with pretty rotten hashing and stillgrind out answers (but it may take a long time). What we needto do is calculate a single unsigned long value from the key.What these functions are is basically black magic, thereforehashlib contains a couple of utility functions usable forhashing strings. There are also examples of hashing integersin the hashtest.c program along with some references to thesubject of creating hash functions.Because of the efficient way hashlib handles overflows (itbasically just corrects them) it is necessary to have twohash functions. For the above item type with strings, theywould be:typedef unsigned long (*hshfn)(void *item);for reference, which we edit again and get:unsigned long myhash(void *item){itemptr myitem = item; /* getting used to this? */return hshstrhash(myitem->yourkey);}and we need two such functions, so:unsigned long myrehash(void *item){itemptr myitem = item; /* getting used to this? */return hshstrehash(myitem->yourkey);}which basically differ only in their names and in theconvenience hash function they call.Now we have finally customized the system to our own dataformat. We will tell hashlib about these functions whenwe create a hashtable with hshinit.Using hashlib=============First, we need some way to refer to the table. So we musthave a data item of type hshtbl* to hold it. We will initializethat by calling hshinit. This is much like opening a file. Forconvenience here is the prototype for hshinit again:/* initialize and return a pointer to the data base */hshtbl *hshinit(hshfn hash, hshfn rehash,hshcmpfn cmp,hshdupfn dupe, hshfreefn undupe,int hdebug);Now this following is a fragment from your code:hshtbl *mytable;/* initialize and return a pointer to the data base */mytable = hshinit(myhash, myrehash,mycmp,mydupe, myundupe,0);which tells hashlib all about the customizing functions you havecreated. Note that all those functions can be static, unlessyou have other uses for them outside your source file. You canuse those functions yourself as you please.Don''t forget the final 0 in the call to hshinit. That parameterprovides for future extensions and debugging abilities, andpassing a zero here will maintain compatibility.You can create more than one hash table if you desire. If theyhandle the same data format you can just do exactly the samecall as above, except you will need a new variable of typehshtbl* to hold the table identification. If they don''t holdthe same data type you can supply different functions tohshinit. It is up to you.hshtbl *mysecondtable;mysecondtable = hshinit(....); /* as before */These tables will live until you exterminate them. Meanwhileyou can store, find, delete, etc. items from the table. Youdestroy the table by calling hshkill with the pointer thathshinit returned.hshkill(mytable); /* all gone */but until that is done, lets use the functions:Inserting (storing) data:========================= From here on I am assuming you have opened the hash table withmytable = hshinit(...), and that you have defined your datawith:typedef struct hashitem {char *yourkey;int yourdata;} item, *itemptr;Surprise, you store data by calling hshinsert. Here is theprototype, for reference:void * hshinsert(hshtbl *master, void *item);and you call it with a pointer to the table in which to insertthe item, and a pointer to the item to insert.You may have a variable of type item (after all, you know whatit is, even if hashlib does not). So the critical items are:hshtable *mytable;item myitem;item *something;You will put the data you want into myitem, filling its fieldsas needed. Then you call:something = hshinsert(mytable, &myitem);If, after this, ''something'' is NULL, the insertion failed(probably because you ran out of memory). Otherwise ''something''points to the piece of memory owned by hshlib which stores acopy of myitem. You can use something to modify the storedcopy, but you MUST NOT do anything that would change the valueof the key, and thus change what a hshfn such as myhash ormyrehash returns when passed that item. NEVER EVER do that.One thing you might want to do is have a field in an item thatholds a count. You could have the dupe function zero thisfield, so that you know how it is initialized. Then, whenhshinsert returns an itemptr you can use that to incrementthat field. That way you can keep track of how many times agiven key has been inserted.NOTE: If hshinsert finds an item already stored, it simplyreturns a pointer to that storage. It does not use the dupefunction to make another copy.Finding a data item by the key:==============================Again we have the same variables as above for insertion. Wesimply call:something = hshfind(mytable, &item);and if ''something'' is NULL the item is not present, otherwiseit is a pointer to the memory holding it. The same cautionsas for hshinsert hold, i.e. you MUST NOT do anything thataffects the key and thus the hash functions. Being presentmeans only that ''something'' and &item have identical keys, asdefined by mycmp() function.Deleting stored items:=====================Again, we have the same variables. Surprise, the calling formatis the same:something = hshdelete(mytable, &item);but now there is a significant difference. The hash table nolonger owns the memory that stored that item, you do. So youhave to do something with it, assuming it isn''t NULL (meaningthat the value in item was never stored in the table). Whatyou do is up to you, but sooner or later you should releaseit by:myundupe(something);which you designed specifically for this purpose.Other abilities===============I plan to add information about walking the entire contents ofthe table, and performing operations on each stored item. Thereare illustrations of these operations in the demonstrationapplications (markov and wdfreq) in the hashlib package.--Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)Available for consulting/temporary embedded and systems.<http://cbfalconer.home.att.net> USE worldnet address! 解决方案 CBFalconer wrote: I released this under GPL some time ago, (2003-May) and have been advertising it occasionally here, where it seemed applicable. I have received no bug reports.No comments for the manual, but for the hashlib - constructive, as Ihope. The most important thing first: No bugs found so far. ;)Your hashlib is "generously" commented. Unfortunately, there are a lotof comments like this one:master->hstatus.probes++; /* count total probes */while the more important comments are completely missing: Comments onthe algorithm and design decisions. Here are some ideas for themissing comments.The general element insertion looks like this:+--------------------------+| I x x D x N |+--------------------------+Array, size is a prime.I is the initial search position denoted by the first hash value.The step size for the search if the I position is not NULL is denotedby the second hash value. Step is added with wrap-around.x are arbitrary elements.D is an entry marked as DELETED.N is an empty entry (NULL).- Why is it important that the array size is a prime? (mathematicalguaranteed to cover all array entries for *any* step size below thearray size and except 0)- Why is it important to mark deleted entries as DELETED instead ofsimply resetting them to NULL? (breaking the search chains)- Why are DELETED entries skipped instead of being reused? (possibleby memorizing the first DELETED/NULL entry, nonetheless completelysearching the current chain to the end, then inserting on thememorized position)- Why is a second hash function used for the step size, instead ofcalculating it from the first value, e.g. instead ofh2 = master->rehash(item) % (master->currentsz >> 3) + 1;calculating it byh2 = ((h >> 13) | (h << 19)) % (master->currentst >> 3) + 1;If the first hash function is weak (but fast), then there willprobably be a lot of collisions, making it necessary to call thesecond (better and slower) hash function, anyway. So why not using onesingle and strong hash function right from the beginning? Was this anarbitrary choice or backed up by literature or profiling?- How were the thresholds chosen? Arbitrary (for easy binarycalculation) or by literature or profiling?#define TTHRESH(sz) (sz - (sz >> 3))if (master->hstatus.hdeleted > (master->hstatus.hentries / 4))HolgerIn article <41***************@yahoo.com>,CBFalconer <cb********@worldnet.att.net> wrote:I released this under GPL some time ago, (2003-May) and have beenadvertising it occasionally here, where it seemed applicable. Ihave received no bug reports.I have just gotten around to writing a usage manual for it, whichfollows. I would like some opinions on it.It would also be helpful to have an abbreviated reference manual, witha list of what functions you define, what functions the user needs todefine, what they do, and requirements on the input and output, butwithout the verbose commentary.(I haven''t looked at the package, only the usage manual you posted, soif you already have this I can safely be ignored, at least on this point.)F''rexample, this is nice when you''re not familiar with the library:Inserting (storing) data:=========================From here on I am assuming you have opened the hash table withmytable = hshinit(...), and that you have defined your datawith: typedef struct hashitem { char *yourkey; int yourdata; } item, *itemptr;Surprise, you store data by calling hshinsert. Here is theprototype, for reference:void * hshinsert(hshtbl *master, void *item);and you call it with a pointer to the table in which to insertthe item, and a pointer to the item to insert.You may have a variable of type item (after all, you know whatit is, even if hashlib does not). So the critical items are:hshtable *mytable;item myitem;item *something;You will put the data you want into myitem, filling its fieldsas needed. Then you call: something = hshinsert(mytable, &myitem);If, after this, ''something'' is NULL, the insertion failed(probably because you ran out of memory). Otherwise ''something''points to the piece of memory owned by hshlib which stores acopy of myitem. You can use something to modify the storedcopy, but you MUST NOT do anything that would change the valueof the key, and thus change what a hshfn such as myhash ormyrehash returns when passed that item. NEVER EVER do that.One thing you might want to do is have a field in an item thatholds a count. You could have the dupe function zero thisfield, so that you know how it is initialized. Then, whenhshinsert returns an itemptr you can use that to incrementthat field. That way you can keep track of how many times agiven key has been inserted.NOTE: If hshinsert finds an item already stored, it simplyreturns a pointer to that storage. It does not use the dupefunction to make another copy.But once you''ve got a working knowledge of it, it''s a lot easier to goto this to refresh your memory for, say, "What''s the significance ofthe return value for that one?":}Inserting (storing) data}------------------------}void *hshinsert(hshtbl *, void *);}handleptr=hshinsert(mytable, &myitem);}}Inserts item myitem into the table referenced by mytable.}Returns:}Pointer to newly allocated (with user-specified dupe function) internal} item storage if the item was not already in the table}Pointer to already-allocated internal item storage if the item was} already in the table}NULL on failure (most likely out-of-memory)}}The handle pointer may be used to modify non-key data in the item.}It MUST NOT be used to modify key data.dave--Dave Vandervies dj******@csclub.uwaterloo.caI''m ashamed to admit it, but I actually thought of this possibility whenwriting the code shown. That means I''ve been hanging around comp.lang.c*much* too long. --Eric Sosman in comp.lang.cDave Vandervies wrote: CBFalconer <cb********@worldnet.att.net> wrote: I released this under GPL some time ago, (2003-May) and have been advertising it occasionally here, where it seemed applicable. I have received no bug reports. I have just gotten around to writing a usage manual for it, which follows. I would like some opinions on it. It would also be helpful to have an abbreviated reference manual, with a list of what functions you define, what functions the user needs to define, what they do, and requirements on the input and output, but without the verbose commentary.But I thought that was just what I was doing! These things are alldetailed in the .h file and here I was trying to tie them togetheras a logical entity, so it could be used intelligently.--Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)Available for consulting/temporary embedded and systems.<http://cbfalconer.home.att.net> USE worldnet address! 这篇关于hashlib包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!