aboutsummaryrefslogtreecommitdiffhomepage
path: root/clib/unicode.mli
diff options
context:
space:
mode:
authorGravatar Emilio Jesus Gallego Arias <e+git@x80.org>2017-12-15 18:51:45 +0100
committerGravatar Emilio Jesus Gallego Arias <e+git@x80.org>2017-12-23 19:20:30 +0100
commit5ffa147bd2fe548df3ac9053fe497d0871a5f6df (patch)
treecc62882184c34e33e2995a5a4ff4ebfcbd0defe0 /clib/unicode.mli
parentdea75d74c222c25f6aa6c38506ac7a51b339e9c6 (diff)
[lib] Split auxiliary libraries into Coq-specific and general.
Up to this point the `lib` directory contained two different library archives, `clib.cma` and `lib.cma`, which a rough splitting between Coq-specific libraries and general-purpose ones. We know split the directory in two, as to make the distinction clear: - `clib`: contains libraries that are not Coq specific and implement common data structures and programming patterns. These libraries could be eventually replace with external dependencies and the rest of the code base wouldn't notice much. - `lib`: contains Coq-specific common libraries in widespread use along the codebase, but that are not considered part of other components. Examples are printing, error handling, or flags. In some cases we have coupling due to utility files depending on Coq specific flags, however this commit doesn't modify any files, but only moves them around, further cleanup is welcome, as indeed a few files in `lib` should likely be placed in `clib`. Also note that `Deque` is not used ATM.
Diffstat (limited to 'clib/unicode.mli')
-rw-r--r--clib/unicode.mli58
1 files changed, 58 insertions, 0 deletions
diff --git a/clib/unicode.mli b/clib/unicode.mli
new file mode 100644
index 000000000..32ffbb8e9
--- /dev/null
+++ b/clib/unicode.mli
@@ -0,0 +1,58 @@
+(************************************************************************)
+(* v * The Coq Proof Assistant / The Coq Development Team *)
+(* <O___,, * INRIA - CNRS - LIX - LRI - PPS - Copyright 1999-2017 *)
+(* \VV/ **************************************************************)
+(* // * This file is distributed under the terms of the *)
+(* * GNU Lesser General Public License Version 2.1 *)
+(************************************************************************)
+
+(** Unicode utilities *)
+
+type status
+
+(** Classify a unicode char into 3 classes or unknown. *)
+val classify : int -> status
+
+(** Return [None] if a given string can be used as a (Coq) identifier.
+ Return [Some (b,s)] otherwise, where [s] is an explanation and [b] is severity. *)
+val ident_refutation : string -> (bool * string) option
+
+(** Tells if a valid initial character for an identifier *)
+val is_valid_ident_initial : status -> bool
+
+(** Tells if a valid non-initial character for an identifier *)
+val is_valid_ident_trailing : status -> bool
+
+(** Tells if a character is unclassified *)
+val is_unknown : status -> bool
+
+(** First char of a string, converted to lowercase
+ @raise Assert_failure if the input string is empty. *)
+val lowercase_first_char : string -> string
+
+(** Split a string supposed to be an ident at the first letter;
+ as an optimization, return None if the first character is a letter *)
+val split_at_first_letter : string -> (string * string) option
+
+(** Return [true] if all UTF-8 characters in the input string are just plain
+ ASCII characters. Returns [false] otherwise. *)
+val is_basic_ascii : string -> bool
+
+(** [ascii_of_ident s] maps UTF-8 string to a string composed solely from ASCII
+ characters. The non-ASCII characters are translated to ["_UUxxxx_"] where
+ {i xxxx} is the Unicode index of the character in hexadecimal (from four
+ to six hex digits). To avoid potential name clashes, any preexisting
+ substring ["_UU"] is turned into ["_UUU"]. *)
+val ascii_of_ident : string -> string
+
+(** Validate an UTF-8 string *)
+val is_utf8 : string -> bool
+
+(** Return the length of a valid UTF-8 string. *)
+val utf8_length : string -> int
+
+(** Variant of {!String.sub} for UTF-8 strings. *)
+val utf8_sub : string -> int -> int -> string
+
+(** Return a "%XX"-escaped string if it contains non UTF-8 characters. *)
+val escaped_if_non_utf8 : string -> string