This section describes the algorithms in the Util directory of the SML/NJ library.
The utility library provides a functional polymorphic merge-sort for lists. It is in the ListMergeSort structure of the list-mergesort.sml source file. Here is the signature.
signature LIST_SORT = sig val sort : ('a * 'a -> bool) -> 'a list -> 'a list (* (sort gt l) sorts the list l in ascending order using the * ``greater-than'' relationship defined by gt. *) val uniqueSort : ('a * 'a -> order) -> 'a list -> 'a list (* uniquesort produces an increasing list, removing equal * elements *) val sorted : ('a * 'a -> bool) -> 'a list -> bool (* (sorted gt l) returns true if the list is sorted in ascending * order under the ``greater-than'' predicate gt. *) end |
Here is an example of their use.
val strings = [ "fred", "wilma", "barney", "betty", "wilma", "pebbles", "bambam" ] fun sort_strings (data: string list) = ( ListMergeSort.sort (op >) data ) fun unique_strings (data: string list) = ( ListMergeSort.uniqueSort String.compare data ) |
The expression (op >) has an ambiguous overloaded type. The compiler has to be able to decide the type at compile time so it can't be polymorphic over all types with a greater-than operator. In sort_strings the type is fixed at string comparison by the context. I could have been more explicit and written String.> for the operator.
The unique_strings function removes duplicate strings from the list. The family of compare functions for the basic types return the three-way comparison: LESS, EQUAL or GREATER.
If you are working with arrays there is a quick-sort implementation in ArrayQSort with the following signature. It sorts an array in-place so it's imperative. It uses the compare functions such as the String.compare I mentioned above.
signature ARRAY_SORT = sig type 'a array val sort : ('a * 'a -> order) -> 'a array -> unit val sorted : ('a * 'a -> order) -> 'a array -> bool end |
A more abstract version of this for any array-like types is provided by this functor.
functor ArrayQSortFn (A : MONO_ARRAY) : MONO_ARRAY_SORT signature MONO_ARRAY_SORT = sig structure A : MONO_ARRAY val sort: (A.elem * A.elem -> order) -> A.array -> unit val sorted: (A.elem * A.elem -> order) -> A.array -> bool end |
Once you've sorted your array you can do a binary search on it.
functor BSearchFn (A : MONO_ARRAY) : sig structure A : MONO_ARRAY (* binary search on ordered monomorphic arrays. The comparison function cmp embeds a projection function from the element type to the key type. *) val bsearch : (('a * A.elem) -> order) -> ('a * A.array) -> (int * A.elem) option end |
The first argument to bsearch is the comparison function. It compares the key you are searching for with an element in the array. For example the array element might be a pair and you want to compare the key against the first element of the pair. The second is a pair of the key to search for and the array. If the array element is found then its index and value is returned. Here is an example of table lookup using binary search. The table is "static" i.e. built and sorted before being saved to the heap.
local datatype Gender = Male | Female type Pair = string * Gender (* Compare two pairs. *) fun compare ((n1, _), (n2, _)) = String.compare(n1, n2) structure PairArray = MonoArrayFn(type elem = Pair) structure Searcher = BSearchFn(PairArray) structure Sorter = ArrayQSortFn(PairArray) val gender = [ ("fred", Male), ("wilma", Female), ("barney", Male), ("betty", Female), ("wilma", Female), ("pebbles", Female), ("bambam", Male) ] val sorted_gender = PairArray.fromList gender val _ = Sorter.sort compare sorted_gender in fun find_gender name = let (* Compare a key with a pair *) fun cmp (key, (n, _)) = String.compare(key, n) in case Searcher.bsearch cmp (name, sorted_gender) of NONE => NONE | SOME (_, (_, g)) => SOME g end fun show_gender Male = "male" | show_gender Female = "female" end |
Since BSearchFn needs a MONO_ARRAY I have to be consistent and use ArrayQSortFn too. The following code, for example, doesn't work.
val sorted_gender = Array.fromList gender val _ = ArrayQSort.sort compare sorted_gender Error: case object and rules don't agree [tycon mismatch] operator domain: string * PairArray.array operand: string * (string * Gender) array in expression: (Searcher.bsearch cmp) (name,sorted_gender) |
The issue is that PairArray.array is an opaque type of unknown implementation whereas ArrayQSort.sort works on the specialised type Pair array. The two are different types as far as the compiler can tell.
The utility library provides a format function which emulates the C sprintf function. It appears in the Format structure of the format.sml source file. Here is the signature.
signature FORMAT = sig datatype fmt_item = ATOM of Atom.atom | LINT of LargeInt.int | INT of Int.int | LWORD of LargeWord.word | WORD of Word.word | WORD8 of Word8.word | BOOL of bool | CHR of char | STR of string | REAL of Real.real | LREAL of LargeReal.real | LEFT of (int * fmt_item) (* left justify in field *) | RIGHT of (int * fmt_item) (* right justify in field *) exception BadFormat (* bad format string *) exception BadFmtList (* raised on type mismatch *) val format : string -> fmt_item list -> string val formatf : string -> (string -> unit) -> fmt_item list -> unit end |
The first argument to the format function is a printf-style format string. In place of the C varargs mechanism your values to be printed must be wrapped in the fmt_item datatype. The formatf function can be used to print the string as it is being formed by making its second argument the TextIO.print function.
The formats that are recognised have the format
"% <flags> <width> <prec> <type>" |
(without the white space) or "%%" for a literal percent character. The flags are listed in Table 5-1. You can have more than one flag. The width is a decimal integer. The precision value is only allowed for real number formats.
Table 5-1. Format flags.
" " | A blank character means put a blank in the sign field for positive numbers. Negative signs will appear as usual. |
+ | Put a plus sign for positive numbers. |
- | Put a minus sign for negative numbers. |
~ | Put a tilde for negative numbers. This includes any exponent. |
# | Include a base indicator. This means "0" for octal numbers and "0x" for hexadecimal numbers. |
0 | Pad the number with zeros on the left. |
The type characters are listed in Table 5-2.
Table 5-2. Format types.
d | Signed decimal. |
X | Uppercase hexadecimal. |
x | Lowercase hexadecimal. |
o | Octal. |
c | Char. |
s | String. |
b | Bool. |
E | Scientific notation with an uppercase exponent. |
e | Scientific notation with a lowercase exponent. |
f | Floating point. |
G | Automatic choice of E or f. |
g | Automatic choice of e or f. |
Here is a simple example.
structure F = Format fun test_format() = ( F.formatf "A decimal %d, some hex %#08x and some real %.4f\n" print [F.INT ~23, F.WORD 0wxbeef, F.REAL 3.14159265] ) |
It produces this output.
A decimal -23, some hex 0x00beef and some real 3.1416 |
Note that the 0x is counted in the width of the hexadecimal field but that's they way it happens in the C printf too.
To go with the format function there is a scan function, in the Scan structure. Here is the signature.
signature SCAN = sig datatype fmt_item = ATOM of Atom.atom | LINT of LargeInt.int | INT of Int.int | LWORD of LargeWord.word | WORD of Word.word | WORD8 of Word8.word | BOOL of bool | CHR of char | STR of string | REAL of Real.real | LREAL of LargeReal.real | LEFT of (int * fmt_item) (* left justify in field *) | RIGHT of (int * fmt_item) (* right justify in field *) exception BadFormat (* bad format string *) val sscanf : string -> string -> fmt_item list option val scanf : string -> (char, 'a) StringCvt.reader -> (fmt_item list, 'a) StringCvt.reader end |
Although it is not obvious, the fmt_item type in the Scan structure is the same one as in the Format structure, not just different types with the same name. So you can use them interchangably. In the current implementation flags and field widths in the format string are ignored.
The scanf function is designed to work with the StringCvt scanning infrastructure (see the section called Text Scanning in Chapter 3). The sscanf function is just defined as scanf applied to strings using StringCvt.scanString. If the return value is NONE then the scan failed. Here is a simple function to test sscanf.
structure S = Scan fun test_scan() : unit = let val items = valOf(S.sscanf "%d %s %f" "123 abc 3.45") val display = ListFormat.fmt { init = "[", sep = " ", final = "]", fmt = show_item } items in print display; print "\n" end and show_item (S.INT n) = Int.toString n | show_item (S.STR s) = s | show_item (S.REAL r) = Real.toString r | show_item _ = "unknown" |
This example also demonstrates a use of the utilities in the ListFormat structure. See the list-format-sig.sml source file for more details.
Here is a demonstration of scanf. It will continue to read until it finds three integers separated by white space, even over several lines. Any other input will result in failure.
fun test_scan_io() = let val _ = print "Enter 3 integers\n" in case TextIO.scanStream (S.scanf "%d %d %d") TextIO.stdIn of SOME items => ( print "got "; print (ListFormat.listToString show_item items); print "\n" ) | NONE => print "The reading failed\n" end |
The recommended random number generator is Random in the random.sml source file. According to the blurb in the source file it uses a subtract-with-borrow (SWB) generator as described in Marsaglia and Zaman, "A New Class of Random Number Generators," Ann. Applied Prob. 1(3), 1991, pp. 462-480. Here is an extract from the signature.
signature RANDOM = sig type rand (* the internal state of a random number generator *) val rand: (int * int) -> rand (* create rand from initial seed *) val toString: rand -> string val fromString: string -> rand val randInt: rand -> int (* generate ints uniformly in [minInt,maxInt] *) |
A generator has the type rand. You can create as many generators as you like. A generator is updated imperatively by functions like randInt. The toString and fromString functions would be useful to save the state of a generator in a file.
The IOUtil structure in io-util.sml contains some functions which perform a work function with the standard input or output redirected to a file. They match some utility functions available in the Scheme language. The functions in the PathUtil structure in path-util.sml search for files in lists of directories in the Unix PATH format.
The Iterate structure in iterate.sml provides some simple functions for looping by performing a function multiple times. It includes a generic "for" loop, in case you're hankering for one.
The TimeLimit structure in time-limit.sml provides a function to perform a work function and interrupt it if it runs for too long. It uses the SML/NJ interval timer facility (see the section called The Interval Timer in Chapter 4) which uses the SIGALRM signal.
Don't use TimeLimit with CML |
Because it uses SIGALRM, which is the same signal that the CML library uses, it will ruin the pre-emption of threads. |
The remaining few structures in the utility library are oriented towards compiler writers. The GraphSCCFn functor in graph-scc.sml is a strongly-connected components algorithm for finding cycles in directed graphs. The "uref" source files provide special-purpose reference types that look like they would be useful for type-checking algorithms in compilers. The ParserComb structure in parser-comb.sml provides some utility functions for hand-written recursive-descent parsers but ML-Lex and ML-Yacc would probably be easier to use.
Avoid the IntInf structure in int-inf.sml which implements arbitrary precision integers. A more polished implementation is part of the Basis library (see the section called Integers in Chapter 3).