diff options
author | Emilio Jesus Gallego Arias <e+git@x80.org> | 2017-12-15 18:51:45 +0100 |
---|---|---|
committer | Emilio Jesus Gallego Arias <e+git@x80.org> | 2017-12-23 19:20:30 +0100 |
commit | 5ffa147bd2fe548df3ac9053fe497d0871a5f6df (patch) | |
tree | cc62882184c34e33e2995a5a4ff4ebfcbd0defe0 /clib/unicode.mli | |
parent | dea75d74c222c25f6aa6c38506ac7a51b339e9c6 (diff) |
[lib] Split auxiliary libraries into Coq-specific and general.
Up to this point the `lib` directory contained two different library
archives, `clib.cma` and `lib.cma`, which a rough splitting between
Coq-specific libraries and general-purpose ones.
We know split the directory in two, as to make the distinction clear:
- `clib`: contains libraries that are not Coq specific and implement
common data structures and programming patterns. These libraries
could be eventually replace with external dependencies and the rest
of the code base wouldn't notice much.
- `lib`: contains Coq-specific common libraries in widespread use
along the codebase, but that are not considered part of other
components. Examples are printing, error handling, or flags.
In some cases we have coupling due to utility files depending on Coq
specific flags, however this commit doesn't modify any files, but only
moves them around, further cleanup is welcome, as indeed a few files
in `lib` should likely be placed in `clib`.
Also note that `Deque` is not used ATM.
Diffstat (limited to 'clib/unicode.mli')
-rw-r--r-- | clib/unicode.mli | 58 |
1 files changed, 58 insertions, 0 deletions
diff --git a/clib/unicode.mli b/clib/unicode.mli new file mode 100644 index 000000000..32ffbb8e9 --- /dev/null +++ b/clib/unicode.mli @@ -0,0 +1,58 @@ +(************************************************************************) +(* v * The Coq Proof Assistant / The Coq Development Team *) +(* <O___,, * INRIA - CNRS - LIX - LRI - PPS - Copyright 1999-2017 *) +(* \VV/ **************************************************************) +(* // * This file is distributed under the terms of the *) +(* * GNU Lesser General Public License Version 2.1 *) +(************************************************************************) + +(** Unicode utilities *) + +type status + +(** Classify a unicode char into 3 classes or unknown. *) +val classify : int -> status + +(** Return [None] if a given string can be used as a (Coq) identifier. + Return [Some (b,s)] otherwise, where [s] is an explanation and [b] is severity. *) +val ident_refutation : string -> (bool * string) option + +(** Tells if a valid initial character for an identifier *) +val is_valid_ident_initial : status -> bool + +(** Tells if a valid non-initial character for an identifier *) +val is_valid_ident_trailing : status -> bool + +(** Tells if a character is unclassified *) +val is_unknown : status -> bool + +(** First char of a string, converted to lowercase + @raise Assert_failure if the input string is empty. *) +val lowercase_first_char : string -> string + +(** Split a string supposed to be an ident at the first letter; + as an optimization, return None if the first character is a letter *) +val split_at_first_letter : string -> (string * string) option + +(** Return [true] if all UTF-8 characters in the input string are just plain + ASCII characters. Returns [false] otherwise. *) +val is_basic_ascii : string -> bool + +(** [ascii_of_ident s] maps UTF-8 string to a string composed solely from ASCII + characters. The non-ASCII characters are translated to ["_UUxxxx_"] where + {i xxxx} is the Unicode index of the character in hexadecimal (from four + to six hex digits). To avoid potential name clashes, any preexisting + substring ["_UU"] is turned into ["_UUU"]. *) +val ascii_of_ident : string -> string + +(** Validate an UTF-8 string *) +val is_utf8 : string -> bool + +(** Return the length of a valid UTF-8 string. *) +val utf8_length : string -> int + +(** Variant of {!String.sub} for UTF-8 strings. *) +val utf8_sub : string -> int -> int -> string + +(** Return a "%XX"-escaped string if it contains non UTF-8 characters. *) +val escaped_if_non_utf8 : string -> string |