1 files changed, 367 insertions, 0 deletions
diff --git a/pcre2-10.20/ChangeLog b/pcre2-10.20/ChangeLog
new file mode 100644
index 00000000..196e9532
--- /dev/null
+++ b/pcre2-10.20/ChangeLog
@@ -0,0 +1,367 @@
+Change Log for PCRE2
+--------------------
+
+Version 10.20 30-June-2015
+--------------------------
+
+1. Callouts with string arguments have been added.
+
+2. Assertion code generator in JIT has been optimized.
+
+3. The invalid pattern (?(?C) has a missing assertion condition at the end. The
+pcre2_compile() function read past the end of the input before diagnosing an
+error. This bug was discovered by the LLVM fuzzer.
+
+4. Implemented pcre2_callout_enumerate().
+
+5. Fix JIT compilation of conditional blocks whose assertion is converted to
+(*FAIL). E.g: /(?(?!))/.
+
+6. The pattern /(?(?!)^)/ caused references to random memory. This bug was
+discovered by the LLVM fuzzer.
+
+7. The assertion (?!) is optimized to (*FAIL). This was not handled correctly
+when this assertion was used as a condition, for example (?(?!)a|b). In
+pcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect
+error about an unsupported item.
+
+8. For some types of pattern, for example /Z*(|d*){216}/, the auto-
+possessification code could take exponential time to complete. A recursion
+depth limit of 1000 has been imposed to limit the resources used by this
+optimization. This infelicity was discovered by the LLVM fuzzer.
+
+9. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class
+such as \S in non-UCP mode, explicit wide characters (> 255) can be ignored
+because \S ensures they are all in the class. The code for doing this was
+interacting badly with the code for computing the amount of space needed to
+compile the pattern, leading to a buffer overflow. This bug was discovered by
+the LLVM fuzzer.
+
+10. A pattern such as /((?2)+)((?1))/ which has mutual recursion nested inside
+other kinds of group caused stack overflow at compile time. This bug was
+discovered by the LLVM fuzzer.
+
+11. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment
+between a subroutine call and its quantifier was incorrectly compiled, leading
+to buffer overflow or other errors. This bug was discovered by the LLVM fuzzer.
+
+12. The illegal pattern /(?(?<E>.*!.*)?)/ was not being diagnosed as missing an
+assertion after (?(. The code was failing to check the character after (?(?<
+for the ! or = that would indicate a lookbehind assertion. This bug was
+discovered by the LLVM fuzzer.
+
+13. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with
+a fixed maximum following a group that contains a subroutine reference was
+incorrectly compiled and could trigger buffer overflow. This bug was discovered
+by the LLVM fuzzer.
+
+14. Negative relative recursive references such as (?-7) to non-existent
+subpatterns were not being diagnosed and could lead to unpredictable behaviour.
+This bug was discovered by the LLVM fuzzer.
+
+15. The bug fixed in 14 was due to an integer variable that was unsigned when
+it should have been signed. Some other "int" variables, having been checked,
+have either been changed to uint32_t or commented as "must be signed".
+
+16. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1)))
+caused a stack overflow instead of the diagnosis of a non-fixed length
+lookbehind assertion. This bug was discovered by the LLVM fuzzer.
+
+17. The use of \K in a positive lookbehind assertion in a non-anchored pattern
+(e.g. /(?<=\Ka)/) could make pcre2grep loop.
+
+18. There was a similar problem to 17 in pcre2test for global matches, though
+the code there did catch the loop.
+
+19. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*),
+and a subsequent item in the pattern caused a non-match, backtracking over the
+repeated \X did not stop, but carried on past the start of the subject, causing
+reference to random memory and/or a segfault. There were also some other cases
+where backtracking after \C could crash. This set of bugs was discovered by the
+LLVM fuzzer.
+
+20. The function for finding the minimum length of a matching string could take
+a very long time if mutual recursion was present many times in a pattern, for
+example, /((?2){73}(?2))((?1))/. A better mutual recursion detection method has
+been implemented. This infelicity was discovered by the LLVM fuzzer.
+
+21. Implemented PCRE2_NEVER_BACKSLASH_C.
+
+22. The feature for string replication in pcre2test could read from freed
+memory if the replication required a buffer to be extended, and it was not
+working properly in 16-bit and 32-bit modes. This issue was discovered by a
+fuzzer: see http://lcamtuf.coredump.cx/afl/.
+
+23. Added the PCRE2_ALT_CIRCUMFLEX option.
+
+24. Adjust the treatment of \8 and \9 to be the same as the current Perl
+behaviour.
+
+25. Static linking against the PCRE2 library using the pkg-config module was
+failing on missing pthread symbols.
+
+26. If a group that contained a recursive back reference also contained a
+forward reference subroutine call followed by a non-forward-reference
+subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to
+compile correct code, leading to undefined behaviour or an internally detected
+error. This bug was discovered by the LLVM fuzzer.
+
+27. Quantification of certain items (e.g. atomic back references) could cause
+incorrect code to be compiled when recursive forward references were involved.
+For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/. This bug was
+discovered by the LLVM fuzzer.
+
+28. A repeated conditional group whose condition was a reference by name caused
+a buffer overflow if there was more than one group with the given name. This
+bug was discovered by the LLVM fuzzer.
+
+29. A recursive back reference by name within a group that had the same name as
+another group caused a buffer overflow. For example: /(?J)(?'d'(?'d'\g{d}))/.
+This bug was discovered by the LLVM fuzzer.
+
+30. A forward reference by name to a group whose number is the same as the
+current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused a
+buffer overflow at compile time. This bug was discovered by the LLVM fuzzer.
+
+31. Fix -fsanitize=undefined warnings for left shifts of 1 by 31 (it treats 1
+as an int; fixed by writing it as 1u).
+
+32. Fix pcre2grep compile when -std=c99 is used with gcc, though it still gives
+a warning for "fileno" unless -std=gnu99 us used.
+
+33. A lookbehind assertion within a set of mutually recursive subpatterns could
+provoke a buffer overflow. This bug was discovered by the LLVM fuzzer.
+
+34. Give an error for an empty subpattern name such as (?'').
+
+35. Make pcre2test give an error if a pattern that follows #forbud_utf contains
+\P, \p, or \X.
+
+36. The way named subpatterns are handled has been refactored. There is now a
+pre-pass over the regex which does nothing other than identify named
+subpatterns and count the total captures. This means that information about
+named patterns is known before the rest of the compile. In particular, it means
+that forward references can be checked as they are encountered. Previously, the
+code for handling forward references was contorted and led to several errors in
+computing the memory requirements for some patterns, leading to buffer
+overflows.
+
+37. There was no check for integer overflow in subroutine calls such as (?123).
+
+38. The table entry for \l in EBCDIC environments was incorrect, leading to its
+being treated as a literal 'l' instead of causing an error.
+
+39. If a non-capturing group containing a conditional group that could match
+an empty string was repeated, it was not identified as matching an empty string
+itself. For example: /^(?:(?(1)x|)+)+$()/.
+
+40. In an EBCDIC environment, pcretest was mishandling the escape sequences
+\a and \e in test subject lines.
+
+41. In an EBCDIC environment, \a in a pattern was converted to the ASCII
+instead of the EBCDIC value.
+
+42. The handling of \c in an EBCDIC environment has been revised so that it is
+now compatible with the specification in Perl's perlebcdic page.
+
+43. Single character repetition in JIT has been improved. 20-30% speedup
+was achieved on certain patterns.
+
+44. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in
+ASCII/Unicode. This has now been added to the list of characters that are
+recognized as white space in EBCDIC.
+
+45. When PCRE2 was compiled without Unicode support, the use of \p and \P gave
+an error (correctly) when used outside a class, but did not give an error
+within a class.
+
+46. \h within a class was incorrectly compiled in EBCDIC environments.
+
+47. JIT should return with error when the compiled pattern requires
+more stack space than the maximum.
+
+48. Fixed a memory leak in pcre2grep when a locale is set.
+
+
+Version 10.10 06-March-2015
+---------------------------
+
+1. When a pattern is compiled, it remembers the highest back reference so that
+when matching, if the ovector is too small, extra memory can be obtained to
+use instead. A conditional subpattern whose condition is a check on a capture
+having happened, such as, for example in the pattern /^(?:(a)|b)(?(1)A|B)/, is
+another kind of back reference, but it was not setting the highest
+backreference number. This mattered only if pcre2_match() was called with an
+ovector that was too small to hold the capture, and there was no other kind of
+back reference (a situation which is probably quite rare). The effect of the
+bug was that the condition was always treated as FALSE when the capture could
+not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug
+has been fixed.
+
+2. Functions for serialization and deserialization of sets of compiled patterns
+have been added.
+
+3. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove
+excess code units at the end of the data block that may occasionally occur if
+the code for calculating the size over-estimates. This change stops the
+serialization code copying uninitialized data, to which valgrind objects. The
+documentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not
+include the general overhead. This has been corrected.
+
+4. All code units in every slot in the table of group names are now set, again
+in order to avoid accessing uninitialized data when serializing.
+
+5. The (*NO_JIT) feature is implemented.
+
+6. If a bug that caused pcre2_compile() to use more memory than allocated was
+triggered when using valgrind, the code in (3) above passed a stupidly large
+value to valgrind. This caused a crash instead of an "internal error" return.
+
+7. A reference to a duplicated named group (either a back reference or a test
+for being set in a conditional) that occurred in a part of the pattern where
+PCRE2_DUPNAMES was not set caused the amount of memory needed for the pattern
+to be incorrectly calculated, leading to overwriting.
+
+8. A mutually recursive set of back references such as (\2)(\1) caused a
+segfault at compile time (while trying to find the minimum matching length).
+The infinite loop is now broken (with the minimum length unset, that is, zero).
+
+9. If an assertion that was used as a condition was quantified with a minimum
+of zero, matching went wrong. In particular, if the whole group had unlimited
+repetition and could match an empty string, a segfault was likely. The pattern
+(?(?=0)?)+ is an example that caused this. Perl allows assertions to be
+quantified, but not if they are being used as conditions, so the above pattern
+is faulted by Perl. PCRE2 has now been changed so that it also rejects such
+patterns.
+
+10. The error message for an invalid quantifier has been changed from "nothing
+to repeat" to "quantifier does not follow a repeatable item".
+
+11. If a bad UTF string is compiled with NO_UTF_CHECK, it may succeed, but
+scanning the compiled pattern in subsequent auto-possessification can get out
+of step and lead to an unknown opcode. Previously this could have caused an
+infinite loop. Now it generates an "internal error" error. This is a tidyup,
+not a bug fix; passing bad UTF with NO_UTF_CHECK is documented as having an
+undefined outcome.
+
+12. A UTF pattern containing a "not" match of a non-ASCII character and a
+subroutine reference could loop at compile time. Example: /[^\xff]((?1))/.
+
+13. The locale test (RunTest 3) has been upgraded. It now checks that a locale
+that is found in the output of "locale -a" can actually be set by pcre2test
+before it is accepted. Previously, in an environment where a locale was listed
+but would not set (an example does exist), the test would "pass" without
+actually doing anything. Also the fr_CA locale has been added to the list of
+locales that can be used.
+
+14. Fixed a bug in pcre2_substitute(). If a replacement string ended in a
+capturing group number without parentheses, the last character was incorrectly
+literally included at the end of the replacement string.
+
+15. A possessive capturing group such as (a)*+ with a minimum repeat of zero
+failed to allow the zero-repeat case if pcre2_match() was called with an
+ovector too small to capture the group.
+
+16. Improved error message in pcre2test when setting the stack size (-S) fails.
+
+17. Fixed two bugs in CMakeLists.txt: (1) Some lines had got lost in the
+transfer from PCRE1, meaning that CMake configuration failed if "build tests"
+was selected. (2) The file src/pcre2_serialize.c had not been added to the list
+of PCRE2 sources, which caused a failure to build pcre2test.
+
+18. Fixed typo in pcre2_serialize.c (DECL instead of DEFN) that causes problems
+only on Windows.
+
+19. Use binary input when reading back saved serialized patterns in pcre2test.
+
+20. Added RunTest.bat for running the tests under Windows.
+
+21. "make distclean" was not removing config.h, a file that may be created for
+use with CMake.
+
+22. A pattern such as "((?2){0,1999}())?", which has a group containing a
+forward reference repeated a large (but limited) number of times within a
+repeated outer group that has a zero minimum quantifier, caused incorrect code
+to be compiled, leading to the error "internal error: previously-checked
+referenced subpattern not found" when an incorrect memory address was read.
+This bug was reported as "heap overflow", discovered by Kai Lu of Fortinet's
+FortiGuard Labs. (Added 24-March-2015: CVE-2015-2325 was given to this.)
+
+23. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine
+call within a group that also contained a recursive back reference caused
+incorrect code to be compiled. This bug was reported as "heap overflow",
+discovered by Kai Lu of Fortinet's FortiGuard Labs. (Added 24-March-2015:
+CVE-2015-2326 was given to this.)
+
+24. Computing the size of the JIT read-only data in advance has been a source
+of various issues, and new ones are still appear unfortunately. To fix
+existing and future issues, size computation is eliminated from the code,
+and replaced by on-demand memory allocation.
+
+25. A pattern such as /(?i)[A-`]/, where characters in the other case are
+adjacent to the end of the range, and the range contained characters with more
+than one other case, caused incorrect behaviour when compiled in UTF mode. In
+that example, the range a-j was left out of the class.
+
+
+Version 10.00 05-January-2015
+-----------------------------
+
+Version 10.00 is the first release of PCRE2, a revised API for the PCRE
+library. Changes prior to 10.00 are logged in the ChangeLog file for the old
+API, up to item 20 for release 8.36.
+
+The code of the library was heavily revised as part of the new API
+implementation. Details of each and every modification were not individually
+logged. In addition to the API changes, the following changes were made. They
+are either new functionality, or bug fixes and other noticeable changes of
+behaviour that were implemented after the code had been forked.
+
+1. Including Unicode support at build time is now enabled by default, but it
+can optionally be disabled. It is not enabled by default at run time (no
+change).
+
+2. The test program, now called pcre2test, was re-specified and almost
+completely re-written. Its input is not compatible with input for pcretest.
+
+3. Patterns may start with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) to set the
+PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is
+matched by that pattern.
+
+4. For the benefit of those who use PCRE2 via some other application, that is,
+not writing the function calls themselves, it is possible to check the PCRE2
+version by matching a pattern such as /(?(VERSION>=10)yes|no)/ against a
+string such as "yesno".
+
+5. There are case-equivalent Unicode characters whose encodings use different
+numbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is
+theoretically possible for this to happen in UTF-16 too.) If a backreference to
+a group containing one of these characters was greedily repeated, and during
+the match a backtrack occurred, the subject might be backtracked by the wrong
+number of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly
+(and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should
+capture the final character, which is the three bytes E2, B1, and A5 in UTF-8.
+Incorrect backtracking meant that group 2 captured only the last two bytes.
+This bug has been fixed; the new code is slower, but it is used only when the
+strings matched by the repetition are not all the same length.
+
+6. A pattern such as /()a/ was not setting the "first character must be 'a'"
+information. This applied to any pattern with a group that matched no
+characters, for example: /(?:(?=.)|(?<!x))a/.
+
+7. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges for
+those parentheses to be closed with whatever has been captured so far. However,
+it was failing to mark any other groups between the highest capture so far and
+the currrent group as "unset". Thus, the ovector for those groups contained
+whatever was previously there. An example is the pattern /(x)|((*ACCEPT))/ when
+matched against "abcd".
+
+8. The pcre2_substitute() function has been implemented.
+
+9. If an assertion used as a condition was quantified with a minimum of zero
+(an odd thing to do, but it happened), SIGSEGV or other misbehaviour could
+occur.
+
+10. The PCRE2_NO_DOTSTAR_ANCHOR option has been implemented.
+
+****