Merge new string builtin

This adds the new builtin 'string' which supports various string manipulation and matching algorithms, including PCRE based regular expressions. Fixes #2296 Squashed commit of the following: commit 4c3eaeb6e57d76463e9683c327142b0aeafb92b8 Author: ridiculousfish <corydoras@ridiculousfish.com> Date: Sat Sep 12 12:51:30 2015 -0700 Remove testdata and doc dirs from pcre2 source commit b2a8b4b50f2398b204fb72cfe4b5ba77ece2e1ab Merge: 11c8a47 7974aab Author: ridiculousfish <corydoras@ridiculousfish.com> Date: Sat Sep 12 12:32:40 2015 -0700 Merge branch 'string' of git://github.com/msteed/fish-shell into string-test commit 7974aab6d367f999f1140ab34c2535cef5cf3b00 Author: Michael Steed <msteed@saltstack.com> Date: Fri Sep 11 13:00:02 2015 -0600 build pcre2 lib only, no docs commit eb20b43d2d96b7e6d24618158ce71078de83c40b Merge: 1a09e70 5f519cb Author: Michael Steed <msteed68@gmail.com> Date: Thu Sep 10 20:00:47 2015 -0600 Merge branch 'string' of github.com:msteed/fish-shell into string commit 1a09e709d028393c9e9e6dc9a84278f399a15f3d Author: Michael Steed <msteed68@gmail.com> Date: Thu Sep 10 19:58:24 2015 -0600 rebase on master & address the fallout commit a0ec9772cd1a0a548a501a7633be05dab4e5ee46 Author: Michael Steed <msteed68@gmail.com> Date: Thu Sep 10 19:26:45 2015 -0600 use fish's wildcard_match() for glob matching commit 64c25a01e3f7234f220ba13545cf658a7492b1a4 Author: Michael Steed <msteed68@gmail.com> Date: Thu Aug 27 08:19:23 2015 -0600 some fixes from review - string_get_arg_stdin(): simplify and don't discard the argument when the trailing newline is absent - fix calls to pcre2 for e.g. string match -r -a 'a*' 'b' - correct test for args coming from stdin commit ece7f35ec5f4093763627d68d671b6c0c876896d Author: Michael Steed <msteed68@gmail.com> Date: Sat Aug 22 19:35:56 2015 -0600 fixes from review - Makefile.in: restore iwyu target - regex_replacer_t::replace_matches(): correct size passed to realloc() commit 9ff7477a926c4572e26171cab3cd42f8086be678 Author: Michael Steed <msteed68@gmail.com> Date: Thu Aug 20 13:08:33 2015 -0600 Minor doc improvements commit baf4e096b22dde3063b85b833795eb570d660ba7 Author: Michael Steed <msteed68@gmail.com> Date: Wed Aug 19 18:29:02 2015 -0600 another attempt to fix the ci build commit 896a2c2b279a419747bea26102229fbe84534a6f Author: Michael Steed <msteed68@gmail.com> Date: Wed Aug 19 18:03:49 2015 -0600 Updates after review comments - make match/replace without -a operate on the first match on each argument - use different exit codes for "no operation performed" and errors, as grep does - refactor regex compile code - use human-friendly error messages from pcre2 - improve error handling & reporting elsewhere - add a few tests - make some doc fixes - some simplification & cleanup - fix ci build failure (I hope) commit efd47dcbda2ca247d58bee56a7774cd75a1062fd Author: Michael Steed <msteed68@gmail.com> Date: Wed Aug 12 00:26:07 2015 -0600 fix dependencies for parallel make commit ed0850e2db467362066a3d94e3ececd17c1756cd Author: Michael Steed <msteed68@gmail.com> Date: Tue Aug 11 23:37:22 2015 -0600 Add missing pcre2 files + .gitignore commit 9492e7a7e929c03554336be1ddf80ca6b37f53c5 Author: Michael Steed <msteed68@gmail.com> Date: Tue Aug 11 22:44:05 2015 -0600 add pcre2-10.20 and update license.hdr commit 1a60b933718feb20c0bf7c9e257b8e495014ea1b Author: Michael Steed <msteed68@gmail.com> Date: Tue Aug 11 22:41:19 2015 -0600 add string builtin files - string builtin source, tests, & docs - changes to configure.ac & Makefile.in commit 5f519cb2a2c05213e0a88a7add7af288bc1c1352 Author: Michael Steed <msteed68@gmail.com> Date: Thu Sep 10 19:26:45 2015 -0600 use fish's wildcard_match() for glob matching commit 2ecd24f79500879e2de5bdf1b4c19dd44fc6ac85 Author: Michael Steed <msteed68@gmail.com> Date: Thu Aug 27 08:19:23 2015 -0600 some fixes from review - string_get_arg_stdin(): simplify and don't discard the argument when the trailing newline is absent - fix calls to pcre2 for e.g. string match -r -a 'a*' 'b' - correct test for args coming from stdin commit 45b777e4dc85c05cd4a186f4bdcae543c21aaf08 Author: Michael Steed <msteed68@gmail.com> Date: Sat Aug 22 19:35:56 2015 -0600 fixes from review - Makefile.in: restore iwyu target - regex_replacer_t::replace_matches(): correct size passed to realloc() commit 981cbb6ddf742a5fe8881af916e7b870b7e6422a Author: Michael Steed <msteed68@gmail.com> Date: Thu Aug 20 13:08:33 2015 -0600 Minor doc improvements commit ddb6a2a8fdb6aa31aad41e80d5481bb32c6ed8ff Author: Michael Steed <msteed68@gmail.com> Date: Wed Aug 19 18:29:02 2015 -0600 another attempt to fix the ci build commit 1e34e3191b028162863d263e9868052f75194aa5 Author: Michael Steed <msteed68@gmail.com> Date: Wed Aug 19 18:03:49 2015 -0600 Updates after review comments - make match/replace without -a operate on the first match on each argument - use different exit codes for "no operation performed" and errors, as grep does - refactor regex compile code - use human-friendly error messages from pcre2 - improve error handling & reporting elsewhere - add a few tests - make some doc fixes - some simplification & cleanup - fix ci build failure (I hope) commit 34232e152df17a3cfbf0a094dd51d148a4f04e6f Author: Michael Steed <msteed68@gmail.com> Date: Wed Aug 12 00:26:07 2015 -0600 fix dependencies for parallel make commit 00d7e781697f53454beb91c1d0fc4b2d28d6e034 Author: Michael Steed <msteed68@gmail.com> Date: Tue Aug 11 23:37:22 2015 -0600 Add missing pcre2 files + .gitignore commit 4498aa5f576e09634f7f619443e74d2f33c108e4 Author: Michael Steed <msteed68@gmail.com> Date: Tue Aug 11 22:44:05 2015 -0600 add pcre2-10.20 and update license.hdr commit 290c58c72e22db644ccf6fa9088051644980ed0a Author: Michael Steed <msteed68@gmail.com> Date: Tue Aug 11 22:41:19 2015 -0600 add string builtin files - string builtin source, tests, & docs - changes to configure.ac & Makefile.in
author: Michael Steed <msteed@saltstack.com> 2015-09-12 12:59:40 -0700
committer: ridiculousfish <corydoras@ridiculousfish.com> 2015-09-21 16:41:25 -0700
commit: d83ef07ca76c03852366e4e810053edc19796761 (patch)
tree: 93f671c5fe577128fd14a1f013fff764e5c5abba /src
parent: e70ed961eab80dab41ebfda7b91d80d9ef041be7 (diff)
5 files changed, 1667 insertions, 1 deletions
diff --git a/src/builtin.cpp b/src/builtin.cpp
index bc1b18de..a55916f7 100644
--- a/src/builtin.cpp
+++ b/src/builtin.cpp
@@ -399,6 +399,7 @@ static void builtin_missing_argument(parser_t &parser, const wchar_t *cmd, const
 #include "builtin_jobs.cpp"
 #include "builtin_set_color.cpp"
 #include "builtin_printf.cpp"
+#include "builtin_string.cpp"
 
 /* builtin_test lives in builtin_test.cpp */
 int builtin_test(parser_t &parser, wchar_t **argv);
@@ -4123,6 +4124,7 @@ static const builtin_data_t builtin_datas[]=
     { 		L"set_color",  &builtin_set_color, N_(L"Set the terminal color")   },
     { 		L"source",  &builtin_source, N_(L"Evaluate contents of file")   },
     { 		L"status",  &builtin_status, N_(L"Return status information about fish")  },
+    { 		L"string",  &builtin_string, N_(L"Manipulate strings")  },
     { 		L"switch",  &builtin_generic, N_(L"Conditionally execute a block of commands")   },
     { 		L"test",  &builtin_test, N_(L"Test a condition")   },
     { 		L"true",  &builtin_true, N_(L"Return a successful result") },
diff --git a/src/builtin_string.cpp b/src/builtin_string.cpp
new file mode 100644
index 00000000..6a53e548
--- /dev/null
+++ b/src/builtin_string.cpp
@@ -0,0 +1,1375 @@
+/** \file builtin_string.cpp
+  Implementation of the string builtin.
+*/
+
+#define PCRE2_CODE_UNIT_WIDTH WCHAR_T_BITS
+#ifdef _WIN32
+#define PCRE2_STATIC
+#endif
+#include "pcre2.h"
+
+#include "wildcard.h"
+
+#define MAX_REPLACE_SIZE    size_t(1048576)  // pcre2_substitute maximum output size in wchar_t
+#define STRING_ERR_MISSING  _(L"%ls: Expected argument\n")
+
+enum
+{
+    BUILTIN_STRING_OK = 0,
+    BUILTIN_STRING_NONE = 1,
+    BUILTIN_STRING_ERROR = 2
+};
+
+static void string_error(const wchar_t *fmt, ...)
+{
+    va_list va;
+    va_start(va, fmt);
+    wcstring errstr = vformat_string(fmt, va);
+    va_end(va);
+
+    stderr_buffer += L"string ";
+    stderr_buffer += errstr;
+}
+
+static void string_unknown_option(parser_t &parser, const wchar_t *subcmd, const wchar_t *opt)
+{
+    string_error(BUILTIN_ERR_UNKNOWN, subcmd, opt);
+    builtin_print_help(parser, L"string", stderr_buffer);
+}
+
+static bool string_args_from_stdin()
+{
+    return builtin_stdin != STDIN_FILENO || !isatty(builtin_stdin);
+}
+
+static const wchar_t *string_get_arg_stdin()
+{
+    static wcstring warg;
+
+    std::string arg;
+    for (;;)
+    {
+        char ch = '\0';
+        int rc = read_blocked(builtin_stdin, &ch, 1);
+
+        if (rc < 0)
+        {
+            // failure
+            return 0;
+        }
+
+        if (rc == 0)
+        {
+            // eof
+            if (arg.empty())
+            {
+                return 0;
+            }
+            else
+            {
+                break;
+            }
+        }
+
+        if (ch == '\n')
+        {
+            break;
+        }
+
+        arg += ch;
+    }
+
+    warg = str2wcstring(arg.c_str(), arg.size());
+    return warg.c_str();
+}
+
+static const wchar_t *string_get_arg_argv(int *argidx, wchar_t **argv)
+{
+    return (argv && argv[*argidx]) ? argv[(*argidx)++] : 0;
+}
+
+static const wchar_t *string_get_arg(int *argidx, wchar_t **argv)
+{
+    if (string_args_from_stdin())
+    {
+        return string_get_arg_stdin();
+    }
+    else
+    {
+        return string_get_arg_argv(argidx, argv);
+    }
+}
+
+static int string_escape(parser_t &parser, int argc, wchar_t **argv)
+{
+    const wchar_t *short_options = L"n";
+    const struct woption long_options[] =
+    {
+        { L"no-quoted", no_argument, 0, 'n' },
+        { 0, 0, 0, 0 }
+    };
+
+    escape_flags_t flags = ESCAPE_ALL;
+    wgetopter_t w;
+    for (;;)
+    {
+        int c = w.wgetopt_long(argc, argv, short_options, long_options, 0);
+
+        if (c == -1)
+        {
+            break;
+        }
+        switch (c)
+        {
+            case 0:
+                break;
+
+            case 'n':
+                flags |= ESCAPE_NO_QUOTED;
+                break;
+
+            case '?':
+                string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
+                return BUILTIN_STRING_ERROR;
+        }
+    }
+
+    int i = w.woptind;
+    if (string_args_from_stdin() && argc > i)
+    {
+        string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
+        return BUILTIN_STRING_ERROR;
+    }
+
+    int nesc = 0;
+    const wchar_t *arg;
+    while ((arg = string_get_arg(&i, argv)) != 0)
+    {
+        stdout_buffer += escape(arg, flags);
+        stdout_buffer += L'\n';
+        nesc++;
+    }
+
+    return (nesc > 0) ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
+}
+
+static int string_join(parser_t &parser, int argc, wchar_t **argv)
+{
+    const wchar_t *short_options = L"q";
+    const struct woption long_options[] =
+    {
+        { L"quiet", no_argument, 0, 'q'},
+        { 0, 0, 0, 0 }
+    };
+
+    bool quiet = false;
+    wgetopter_t w;
+    for (;;)
+    {
+        int c = w.wgetopt_long(argc, argv, short_options, long_options, 0);
+
+        if (c == -1)
+        {
+            break;
+        }
+        switch (c)
+        {
+            case 0:
+                break;
+
+            case 'q':
+                quiet = true;
+                break;
+
+            case '?':
+                string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
+                return BUILTIN_STRING_ERROR;
+        }
+    }
+
+    int i = w.woptind;
+    const wchar_t *sep;
+    if ((sep = string_get_arg_argv(&i, argv)) == 0)
+    {
+        string_error(STRING_ERR_MISSING, argv[0]);
+        return BUILTIN_STRING_ERROR;
+    }
+
+    if (string_args_from_stdin() && argc > i)
+    {
+        string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
+        return BUILTIN_STRING_ERROR;
+    }
+
+    int nargs = 0;
+    const wchar_t *arg;
+    while ((arg = string_get_arg(&i, argv)) != 0)
+    {
+        if (!quiet)
+        {
+            stdout_buffer += arg;
+            stdout_buffer += sep;
+        }
+        nargs++;
+    }
+    if (nargs > 0 && !quiet)
+    {
+        stdout_buffer.resize(stdout_buffer.length() - wcslen(sep));
+        stdout_buffer += L'\n';
+    }
+
+    return (nargs > 1) ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
+}
+
+static int string_length(parser_t &parser, int argc, wchar_t **argv)
+{
+    const wchar_t *short_options = L"q";
+    const struct woption long_options[] =
+    {
+        { L"quiet", no_argument, 0, 'q'},
+        { 0, 0, 0, 0 }
+    };
+
+    bool quiet = false;
+    wgetopter_t w;
+    for (;;)
+    {
+        int c = w.wgetopt_long(argc, argv, short_options, long_options, 0);
+
+        if (c == -1)
+        {
+            break;
+        }
+        switch (c)
+        {
+            case 0:
+                break;
+
+            case 'q':
+                quiet = true;
+                break;
+
+            case '?':
+                string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
+                return BUILTIN_STRING_ERROR;
+        }
+    }
+
+    int i = w.woptind;
+    if (string_args_from_stdin() && argc > i)
+    {
+        string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
+        return BUILTIN_STRING_ERROR;
+    }
+
+    const wchar_t *arg;
+    int nnonempty = 0;
+    while ((arg = string_get_arg(&i, argv)) != 0)
+    {
+        size_t n = wcslen(arg);
+        if (n > 0)
+        {
+            nnonempty++;
+        }
+        if (!quiet)
+        {
+            stdout_buffer += to_string(int(n));
+            stdout_buffer += L'\n';
+        }
+    }
+
+    return (nnonempty > 0) ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
+}
+
+struct match_options_t
+{
+    bool all;
+    bool ignore_case;
+    bool index;
+    bool quiet;
+
+    match_options_t(): all(false), ignore_case(false), index(false), quiet(false) { }
+};
+
+class string_matcher_t
+{
+protected:
+    match_options_t opts;
+    int total_matched;
+
+public:
+    string_matcher_t(const match_options_t &opts_)
+        : opts(opts_), total_matched(0)
+    { }
+
+    virtual ~string_matcher_t() { }
+    virtual bool report_matches(const wchar_t *arg) = 0;
+    int match_count() { return total_matched; }
+};
+
+class wildcard_matcher_t: public string_matcher_t
+{
+    wcstring wcpattern;
+
+public:
+    wildcard_matcher_t(const wchar_t * /*argv0*/, const wchar_t *pattern, const match_options_t &opts)
+        : string_matcher_t(opts)
+    {
+        wcpattern = parse_util_unescape_wildcards(pattern);
+
+        if (opts.ignore_case)
+        {
+            for (int i = 0; i < wcpattern.length(); i++)
+            {
+                wcpattern[i] = towlower(wcpattern[i]);
+            }
+        }
+    }
+
+    virtual ~wildcard_matcher_t() { }
+
+    bool report_matches(const wchar_t *arg)
+    {
+        // Note: --all is a no-op for glob matching since the pattern is always
+        // matched against the entire argument
+        bool match;
+        if (opts.ignore_case)
+        {
+            wcstring s = arg;
+            for (int i = 0; i < s.length(); i++)
+            {
+                s[i] = towlower(s[i]);
+            }
+            match = wildcard_match(s, wcpattern, false);
+        }
+        else
+        {
+            match = wildcard_match(arg, wcpattern, false);
+        }
+        if (match)
+        {
+            total_matched++;
+        }
+        if (!opts.quiet)
+        {
+            if (match)
+            {
+                if (opts.index)
+                {
+                    stdout_buffer += L"1 ";
+                    stdout_buffer += to_string(wcslen(arg));
+                    stdout_buffer += L'\n';
+                }
+                else
+                {
+                    stdout_buffer += arg;
+                    stdout_buffer += L'\n';
+                }
+            }
+        }
+        return true;
+    }
+};
+
+static const wchar_t *pcre2_strerror(int err_code)
+{
+    static wchar_t buf[128];
+    pcre2_get_error_message(err_code, (PCRE2_UCHAR *)buf, sizeof(buf) / sizeof(wchar_t));
+    return buf;
+}
+
+struct compiled_regex_t
+{
+    pcre2_code *code;
+    pcre2_match_data *match;
+
+    compiled_regex_t(const wchar_t *argv0, const wchar_t *pattern, bool ignore_case)
+        : code(0), match(0)
+    {
+        // Disable some sequences that can lead to security problems
+        uint32_t options = PCRE2_NEVER_UTF;
+#if PCRE2_CODE_UNIT_WIDTH < 32
+        options |= PCRE2_NEVER_BACKSLASH_C;
+#endif
+
+        int err_code = 0;
+        PCRE2_SIZE err_offset = 0;
+
+        code = pcre2_compile(
+            PCRE2_SPTR(pattern),
+            PCRE2_ZERO_TERMINATED,
+            options | (ignore_case ? PCRE2_CASELESS : 0),
+            &err_code,
+            &err_offset,
+            0);
+        if (code == 0)
+        {
+            string_error(_(L"%ls: Regular expression compile error: %ls\n"),
+                    argv0, pcre2_strerror(err_code));
+            string_error(L"%ls: %ls\n", argv0, pattern);
+            string_error(L"%ls: %*ls\n", argv0, err_offset, L"^");
+            return;
+        }
+
+        match = pcre2_match_data_create_from_pattern(code, 0);
+        if (match == 0)
+        {
+            DIE_MEM();
+        }
+    }
+
+    ~compiled_regex_t()
+    {
+        if (match != 0)
+        {
+            pcre2_match_data_free(match);
+        }
+        if (code != 0)
+        {
+            pcre2_code_free(code);
+        }
+    }
+};
+
+class pcre2_matcher_t: public string_matcher_t
+{
+    const wchar_t *argv0;
+    compiled_regex_t regex;
+
+    int report_match(const wchar_t *arg, int pcre2_rc)
+    {
+        // Return values: -1 = error, 0 = no match, 1 = match
+        if (pcre2_rc == PCRE2_ERROR_NOMATCH)
+        {
+            return 0;
+        }
+        if (pcre2_rc < 0)
+        {
+            string_error(_(L"%ls: Regular expression match error: %ls\n"),
+                    argv0, pcre2_strerror(pcre2_rc));
+            return -1;
+        }
+        if (pcre2_rc == 0)
+        {
+            // The output vector wasn't big enough. Should not happen.
+            string_error(_(L"%ls: Regular expression internal error\n"), argv0);
+            return -1;
+        }
+        PCRE2_SIZE *ovector = pcre2_get_ovector_pointer(regex.match);
+        for (int j = 0; j < pcre2_rc; j++)
+        {
+            PCRE2_SIZE begin = ovector[2*j];
+            PCRE2_SIZE end = ovector[2*j + 1];
+            if (!opts.quiet)
+            {
+                if (begin != PCRE2_UNSET && end != PCRE2_UNSET)
+                {
+                    if (opts.index)
+                    {
+                        stdout_buffer += to_string(begin + 1);
+                        stdout_buffer += ' ';
+                        stdout_buffer += to_string(end - begin);
+                    }
+                    else if (end > begin) // may have end < begin if \K is used
+                    {
+                        stdout_buffer += wcstring(&arg[begin], end - begin);
+                    }
+                    stdout_buffer += L'\n';
+                }
+            }
+        }
+        return 1;
+    }
+
+public:
+    pcre2_matcher_t(const wchar_t *argv0_, const wchar_t *pattern, const match_options_t &opts)
+        : string_matcher_t(opts),
+          argv0(argv0_),
+          regex(argv0_, pattern, opts.ignore_case)
+    { }
+
+    virtual ~pcre2_matcher_t() { }
+
+    bool report_matches(const wchar_t *arg)
+    {
+        // A return value of true means all is well (even if no matches were
+        // found), false indicates an unrecoverable error.
+        if (regex.code == 0)
+        {
+            // pcre2_compile() failed
+            return false;
+        }
+
+        int matched = 0;
+
+        // See pcre2demo.c for an explanation of this logic
+        PCRE2_SIZE arglen = wcslen(arg);
+        int rc = report_match(arg, pcre2_match(regex.code, PCRE2_SPTR(arg), arglen, 0, 0, regex.match, 0));
+        if (rc < 0)
+        {
+            // pcre2 match error
+            return false;
+        }
+        if (rc == 0)
+        {
+            // no match
+            return true;
+        }
+        matched++;
+        total_matched++;
+
+        // Report any additional matches
+        PCRE2_SIZE *ovector = pcre2_get_ovector_pointer(regex.match);
+        while (opts.all || matched == 0)
+        {
+            uint32_t options = 0;
+            PCRE2_SIZE offset = ovector[1]; // Start at end of previous match
+
+            if (ovector[0] == ovector[1])
+            {
+                if (ovector[0] == arglen)
+                {
+                    break;
+                }
+                options = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
+            }
+
+            rc = report_match(arg, pcre2_match(regex.code, PCRE2_SPTR(arg), arglen, offset, options, regex.match, 0));
+            if (rc < 0)
+            {
+                return false;
+            }
+            if (rc == 0)
+            {
+                if (options == 0)
+                {
+                    // All matches found
+                    break;
+                }
+                ovector[1] = offset + 1;
+                continue;
+            }
+            matched++;
+            total_matched++;
+        }
+        return true;
+    }
+};
+
+static int string_match(parser_t &parser, int argc, wchar_t **argv)
+{
+    const wchar_t *short_options = L"ainqr";
+    const struct woption long_options[] =
+    {
+        { L"all", no_argument, 0, 'a'},
+        { L"ignore-case", no_argument, 0, 'i'},
+        { L"index", no_argument, 0, 'n'},
+        { L"quiet", no_argument, 0, 'q'},
+        { L"regex", no_argument, 0, 'r'},
+        { 0, 0, 0, 0 }
+    };
+
+    match_options_t opts;
+    bool regex = false;
+    wgetopter_t w;
+    for (;;)
+    {
+        int c = w.wgetopt_long(argc, argv, short_options, long_options, 0);
+
+        if (c == -1)
+        {
+            break;
+        }
+        switch (c)
+        {
+            case 0:
+                break;
+
+            case 'a':
+                opts.all = true;
+                break;
+
+            case 'i':
+                opts.ignore_case = true;
+                break;
+
+            case 'n':
+                opts.index = true;
+                break;
+
+            case 'q':
+                opts.quiet = true;
+                break;
+
+            case 'r':
+                regex = true;
+                break;
+
+            case '?':
+                string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
+                return BUILTIN_STRING_ERROR;
+        }
+    }
+
+    int i = w.woptind;
+    const wchar_t *pattern;
+    if ((pattern = string_get_arg_argv(&i, argv)) == 0)
+    {
+        string_error(STRING_ERR_MISSING, argv[0]);
+        return BUILTIN_STRING_ERROR;
+    }
+
+    if (string_args_from_stdin() && argc > i)
+    {
+        string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
+        return BUILTIN_STRING_ERROR;
+    }
+
+    string_matcher_t *matcher;
+    if (regex)
+    {
+        matcher = new pcre2_matcher_t(argv[0], pattern, opts);
+    }
+    else
+    {
+        matcher = new wildcard_matcher_t(argv[0], pattern, opts);
+    }
+
+    const wchar_t *arg;
+    while ((arg = string_get_arg(&i, argv)) != 0)
+    {
+        if (!matcher->report_matches(arg))
+        {
+            delete matcher;
+            return BUILTIN_STRING_ERROR;
+        }
+    }
+
+    int rc = matcher->match_count() > 0 ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
+    delete matcher;
+    return rc;
+}
+
+struct replace_options_t
+{
+    bool all;
+    bool ignore_case;
+    bool quiet;
+
+    replace_options_t(): all(false), ignore_case(false), quiet(false) { }
+};
+
+class string_replacer_t
+{
+protected:
+    const wchar_t *argv0;
+    replace_options_t opts;
+    int total_replaced;
+
+public:
+    string_replacer_t(const wchar_t *argv0_, const replace_options_t &opts_)
+        : argv0(argv0_), opts(opts_), total_replaced(0)
+    { }
+
+    virtual ~string_replacer_t() {}
+    virtual bool replace_matches(const wchar_t *arg) = 0;
+    int replace_count() { return total_replaced; }
+};
+
+class literal_replacer_t: public string_replacer_t
+{
+    const wchar_t *pattern;
+    const wchar_t *replacement;
+    int patlen;
+
+public:
+    literal_replacer_t(const wchar_t *argv0, const wchar_t *pattern_, const wchar_t *replacement_,
+                        const replace_options_t &opts)
+        : string_replacer_t(argv0, opts),
+          pattern(pattern_), replacement(replacement_), patlen(wcslen(pattern))
+    { }
+
+    virtual ~literal_replacer_t() { }
+
+    bool replace_matches(const wchar_t *arg)
+    {
+        wcstring result;
+        if (patlen == 0)
+        {
+            result = arg;
+        }
+        else
+        {
+            int replaced = 0;
+            const wchar_t *cur = arg;
+            while (*cur != L'\0')
+            {
+                if ((opts.all || replaced == 0) &&
+                    (opts.ignore_case ? wcsncasecmp(cur, pattern, patlen) : wcsncmp(cur, pattern, patlen)) == 0)
+                {
+                    result += replacement;
+                    cur += patlen;
+                    replaced++;
+                    total_replaced++;
+                }
+                else
+                {
+                    result += *cur;
+                    cur++;
+                }
+            }
+        }
+        if (!opts.quiet)
+        {
+            stdout_buffer += result;
+            stdout_buffer += L'\n';
+        }
+        return true;
+    }
+};
+
+class regex_replacer_t: public string_replacer_t
+{
+    compiled_regex_t regex;
+    wcstring replacement;
+
+    wcstring interpret_escapes(const wchar_t *orig)
+    {
+        wcstring result;
+
+        while (*orig != L'\0')
+        {
+            if (*orig == L'\\')
+            {
+                orig += read_unquoted_escape(orig, &result, true, false);
+            }
+            else
+            {
+                result += *orig;
+                orig++;
+            }
+        }
+
+        return result;
+    }
+
+public:
+    regex_replacer_t(const wchar_t *argv0, const wchar_t *pattern, const wchar_t *replacement_,
+                        const replace_options_t &opts)
+        : string_replacer_t(argv0, opts),
+          regex(argv0, pattern, opts.ignore_case),
+          replacement(interpret_escapes(replacement_))
+    { }
+
+    virtual ~regex_replacer_t() { }
+
+    bool replace_matches(const wchar_t *arg)
+    {
+        // A return value of true means all is well (even if no replacements
+        // were performed), false indicates an unrecoverable error.
+        if (regex.code == 0)
+        {
+            // pcre2_compile() failed
+            return false;
+        }
+
+        uint32_t options = opts.all ? PCRE2_SUBSTITUTE_GLOBAL : 0;
+        int arglen = wcslen(arg);
+        PCRE2_SIZE outlen = (arglen == 0) ? 16 : 2 * arglen;
+        wchar_t *output = (wchar_t *)malloc(sizeof(wchar_t) * outlen);
+        if (output == 0)
+        {
+            DIE_MEM();
+        }
+        int pcre2_rc = 0;
+        for (;;)
+        {
+            pcre2_rc = pcre2_substitute(
+                            regex.code,
+                            PCRE2_SPTR(arg),
+                            arglen,
+                            0,  // start offset
+                            options,
+                            regex.match,
+                            0,  // match context
+                            PCRE2_SPTR(replacement.c_str()),
+                            PCRE2_ZERO_TERMINATED,
+                            (PCRE2_UCHAR *)output,
+                            &outlen);
+
+            if (pcre2_rc == PCRE2_ERROR_NOMEMORY)
+            {
+                if (outlen < MAX_REPLACE_SIZE)
+                {
+                    outlen = std::min(2 * outlen, MAX_REPLACE_SIZE);
+                    output = (wchar_t *)realloc(output, sizeof(wchar_t) * outlen);
+                    if (output == 0)
+                    {
+                        DIE_MEM();
+                    }
+                    continue;
+                }
+                string_error(_(L"%ls: Replacement string too large\n"), argv0);
+                free(output);
+                return false;
+            }
+            break;
+        }
+
+        bool rc = true;
+        if (pcre2_rc < 0)
+        {
+            string_error(_(L"%ls: Regular expression substitute error: %ls\n"),
+                    argv0, pcre2_strerror(pcre2_rc));
+            rc = false;
+        }
+        else
+        {
+            if (!opts.quiet)
+            {
+                stdout_buffer += output;
+                stdout_buffer += L'\n';
+            }
+            total_replaced += pcre2_rc;
+        }
+
+        free(output);
+        return rc;
+    }
+};
+
+static int string_replace(parser_t &parser, int argc, wchar_t **argv)
+{
+    const wchar_t *short_options = L"aiqr";
+    const struct woption long_options[] =
+    {
+        { L"all", no_argument, 0, 'a'},
+        { L"ignore-case", no_argument, 0, 'i'},
+        { L"quiet", no_argument, 0, 'q'},
+        { L"regex", no_argument, 0, 'r'},
+        { 0, 0, 0, 0 }
+    };
+
+    replace_options_t opts;
+    bool regex = false;
+    wgetopter_t w;
+    for (;;)
+    {
+        int c = w.wgetopt_long(argc, argv, short_options, long_options, 0);
+
+        if (c == -1)
+        {
+            break;
+        }
+        switch (c)
+        {
+            case 0:
+                break;
+
+            case 'a':
+                opts.all = true;
+                break;
+
+            case 'i':
+                opts.ignore_case = true;
+                break;
+
+            case 'q':
+                opts.quiet = true;
+                break;
+
+            case 'r':
+                regex = true;
+                break;
+
+            case '?':
+                string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
+                return BUILTIN_STRING_ERROR;
+        }
+    }
+
+    int i = w.woptind;
+    const wchar_t *pattern, *replacement;
+    if ((pattern = string_get_arg_argv(&i, argv)) == 0)
+    {
+        string_error(STRING_ERR_MISSING, argv[0]);
+        return BUILTIN_STRING_ERROR;
+    }
+    if ((replacement = string_get_arg_argv(&i, argv)) == 0)
+    {
+        string_error(STRING_ERR_MISSING, argv[0]);
+        return BUILTIN_STRING_ERROR;
+    }
+
+    if (string_args_from_stdin() && argc > i)
+    {
+        string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
+        return BUILTIN_STRING_ERROR;
+    }
+
+    string_replacer_t *replacer;
+    if (regex)
+    {
+        replacer = new regex_replacer_t(argv[0], pattern, replacement, opts);
+    }
+    else
+    {
+        replacer = new literal_replacer_t(argv[0], pattern, replacement, opts);
+    }
+
+    const wchar_t *arg;
+    while ((arg = string_get_arg(&i, argv)) != 0)
+    {
+        if (!replacer->replace_matches(arg))
+        {
+            delete replacer;
+            return BUILTIN_STRING_ERROR;
+        }
+    }
+
+    int rc = replacer->replace_count() > 0 ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
+    delete replacer;
+    return rc;
+}
+
+static int string_split(parser_t &parser, int argc, wchar_t **argv)
+{
+    const wchar_t *short_options = L":m:qr";
+    const struct woption long_options[] =
+    {
+        { L"max", required_argument, 0, 'm'},
+        { L"quiet", no_argument, 0, 'q'},
+        { L"right", no_argument, 0, 'r'},
+        { 0, 0, 0, 0 }
+    };
+
+    long max = LONG_MAX;
+    bool quiet = false;
+    bool right = false;
+    wgetopter_t w;
+    for (;;)
+    {
+        int c = w.wgetopt_long(argc, argv, short_options, long_options, 0);
+
+        if (c == -1)
+        {
+            break;
+        }
+        switch (c)
+        {
+            case 0:
+                break;
+
+            case 'm':
+            {
+                errno = 0;
+                wchar_t *endptr = 0;
+                max = wcstol(w.woptarg, &endptr, 10);
+                if (*endptr != L'\0' || errno != 0)
+                {
+                    string_error(BUILTIN_ERR_NOT_NUMBER, argv[0], w.woptarg);
+                    return BUILTIN_STRING_ERROR;
+                }
+                break;
+            }
+
+            case 'q':
+                quiet = true;
+                break;
+
+            case 'r':
+                right = true;
+                break;
+
+            case ':':
+                string_error(STRING_ERR_MISSING, argv[0]);
+                return BUILTIN_STRING_ERROR;
+
+            case '?':
+                string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
+                return BUILTIN_STRING_ERROR;
+        }
+    }
+
+    int i = w.woptind;
+    const wchar_t *sep;
+    if ((sep = string_get_arg_argv(&i, argv)) == 0)
+    {
+        string_error(STRING_ERR_MISSING, argv[0]);
+        return BUILTIN_STRING_ERROR;
+    }
+
+    if (string_args_from_stdin() && argc > i)
+    {
+        string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
+        return BUILTIN_STRING_ERROR;
+    }
+
+    std::list<wcstring> splits;
+    int seplen = wcslen(sep);
+    int nsplit = 0;
+    const wchar_t *arg;
+    if (right)
+    {
+        while ((arg = string_get_arg(&i, argv)) != 0)
+        {
+            int nargsplit = 0;
+            if (seplen == 0)
+            {
+                // Split to individual characters
+                const wchar_t *cur = arg + wcslen(arg) - 1;
+                while (cur > arg && nargsplit < max)
+                {
+                    splits.push_front(wcstring(cur, 1));
+                    cur--;
+                    nargsplit++;
+                    nsplit++;
+                }
+                splits.push_front(wcstring(arg, cur - arg + 1));
+            }
+            else
+            {
+                const wchar_t *end = arg + wcslen(arg);
+                const wchar_t *cur = end - seplen;
+                while (cur >= arg && nargsplit < max)
+                {
+                    if (wcsncmp(cur, sep, seplen) == 0)
+                    {
+                        splits.push_front(wcstring(cur + seplen, end - cur - seplen));
+                        end = cur;
+                        cur -= seplen;
+                        nargsplit++;
+                        nsplit++;
+                    }
+                    else
+                    {
+                        cur--;
+                    }
+                }
+                splits.push_front(wcstring(arg, end - arg));
+            }
+        }
+    }
+    else
+    {
+        while ((arg = string_get_arg(&i, argv)) != 0)
+        {
+            const wchar_t *cur = arg;
+            int nargsplit = 0;
+            if (seplen == 0)
+            {
+                // Split to individual characters
+                const wchar_t *last = arg + wcslen(arg) - 1;
+                while (cur < last && nargsplit < max)
+                {
+                    splits.push_back(wcstring(cur, 1));
+                    cur++;
+                    nargsplit++;
+                    nsplit++;
+                }
+                splits.push_back(cur);
+            }
+            else
+            {
+                while (cur != 0)
+                {
+                    const wchar_t *ptr = (nargsplit < max) ? wcsstr(cur, sep) : 0;
+                    if (ptr == 0)
+                    {
+                        splits.push_back(cur);
+                        cur = 0;
+                    }
+                    else
+                    {
+                        splits.push_back(wcstring(cur, ptr - cur));
+                        cur = ptr + seplen;
+                        nargsplit++;
+                        nsplit++;
+                    }
+                }
+            }
+        }
+    }
+
+    if (!quiet)
+    {
+        std::list<wcstring>::const_iterator si = splits.begin();
+        while (si != splits.end())
+        {
+            stdout_buffer += *si;
+            stdout_buffer += L'\n';
+            si++;
+        }
+    }
+
+    return (nsplit > 0) ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
+}
+
+static int string_sub(parser_t &parser, int argc, wchar_t **argv)
+{
+    const wchar_t *short_options = L":l:qs:";
+    const struct woption long_options[] =
+    {
+        { L"length", required_argument, 0, 'l'},
+        { L"quiet", no_argument, 0, 'q'},
+        { L"start", required_argument, 0, 's'},
+        { 0, 0, 0, 0 }
+    };
+
+    int start = 0;
+    int length = -1;
+    bool quiet = false;
+    wgetopter_t w;
+    wchar_t *endptr = 0;
+    for (;;)
+    {
+        int c = w.wgetopt_long(argc, argv, short_options, long_options, 0);
+
+        if (c == -1)
+        {
+            break;
+        }
+        switch (c)
+        {
+            case 0:
+                break;
+
+            case 'l':
+                errno = 0;
+                length = int(wcstol(w.woptarg, &endptr, 10));
+                if (*endptr != L'\0' || errno != 0)
+                {
+                    string_error(BUILTIN_ERR_NOT_NUMBER, argv[0], w.woptarg);
+                    return BUILTIN_STRING_ERROR;
+                }
+                if (length < 0)
+                {
+                    string_error(_(L"%ls: Invalid length value '%d'\n"), argv[0], length);
+                    return BUILTIN_STRING_ERROR;
+                }
+                break;
+
+            case 'q':
+                quiet = true;
+                break;
+
+            case 's':
+                errno = 0;
+                start = int(wcstol(w.woptarg, &endptr, 10));
+                if (*endptr != L'\0' || errno != 0)
+                {
+                    string_error(BUILTIN_ERR_NOT_NUMBER, argv[0], w.woptarg);
+                    return BUILTIN_STRING_ERROR;
+                }
+                if (start == 0)
+                {
+                    string_error(_(L"%ls: Invalid start value '%d'\n"), argv[0], start);
+                    return BUILTIN_STRING_ERROR;
+                }
+                break;
+
+            case ':':
+                string_error(STRING_ERR_MISSING, argv[0]);
+                return BUILTIN_STRING_ERROR;
+
+            case '?':
+                string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
+                return BUILTIN_STRING_ERROR;
+        }
+    }
+
+    int i = w.woptind;
+    if (string_args_from_stdin() && argc > i)
+    {
+        string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
+        return BUILTIN_STRING_ERROR;
+    }
+
+    int nsub = 0;
+    const wchar_t *arg;
+    while ((arg = string_get_arg(&i, argv)) != 0)
+    {
+        wcstring::size_type pos = 0;
+        wcstring::size_type count = wcstring::npos;
+        wcstring s(arg);
+        if (start > 0)
+        {
+            pos = start - 1;
+        }
+        else if (start < 0)
+        {
+            wcstring::size_type n = -start;
+            pos = n > s.length() ? 0 : s.length() - n;
+        }
+        if (pos > s.length())
+        {
+            pos = s.length();
+        }
+
+        if (length >= 0)
+        {
+            count = length;
+        }
+        if (pos + count > s.length())
+        {
+            count = wcstring::npos;
+        }
+
+        if (!quiet)
+        {
+            stdout_buffer += s.substr(pos, count);
+            stdout_buffer += L'\n';
+        }
+        nsub++;
+    }
+
+    return (nsub > 0) ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
+}
+
+static int string_trim(parser_t &parser, int argc, wchar_t **argv)
+{
+    const wchar_t *short_options = L":c:lqr";
+    const struct woption long_options[] =
+    {
+        { L"chars", required_argument, 0, 'c'},
+        { L"left", no_argument, 0, 'l'},
+        { L"quiet", no_argument, 0, 'q'},
+        { L"right", no_argument, 0, 'r'},
+        { 0, 0, 0, 0 }
+    };
+
+    int leftright = 0;
+    bool quiet = false;
+    wcstring chars = L" \f\n\r\t";
+    wgetopter_t w;
+    for (;;)
+    {
+        int c = w.wgetopt_long(argc, argv, short_options, long_options, 0);
+
+        if (c == -1)
+        {
+            break;
+        }
+        switch (c)
+        {
+            case 0:
+                break;
+
+            case 'c':
+                chars = w.woptarg;
+                break;
+
+            case 'l':
+                leftright |= 1;
+                break;
+
+            case 'q':
+                quiet = true;
+                break;
+
+            case 'r':
+                leftright |= 2;
+                break;
+
+            case ':':
+                string_error(STRING_ERR_MISSING, argv[0]);
+                return BUILTIN_STRING_ERROR;
+
+            case '?':
+                string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
+                return BUILTIN_STRING_ERROR;
+        }
+    }
+
+    int i = w.woptind;
+    if (string_args_from_stdin() && argc > i)
+    {
+        string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
+        return BUILTIN_STRING_ERROR;
+    }
+
+    const wchar_t *arg;
+    int ntrim = 0;
+    while ((arg = string_get_arg(&i, argv)) != 0)
+    {
+        const wchar_t *begin = arg;
+        const wchar_t *end = arg + wcslen(arg);
+        if (!leftright || (leftright & 1))
+        {
+            while (begin != end && chars.find_first_of(begin, 0, 1) != wcstring::npos)
+            {
+                begin++;
+                ntrim++;
+            }
+        }
+        if (!leftright || (leftright & 2))
+        {
+            while (begin != end && chars.find_first_of(end - 1, 0, 1) != wcstring::npos)
+            {
+                end--;
+                ntrim++;
+            }
+        }
+        if (!quiet)
+        {
+            stdout_buffer += wcstring(begin, end - begin);
+            stdout_buffer += L'\n';
+        }
+    }
+
+    return (ntrim > 0) ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
+}
+
+static const struct string_subcommand
+{
+    const wchar_t *name;
+    int (*handler)(parser_t &, int argc, wchar_t **argv);
+}
+string_subcommands[] =
+{
+    { L"escape", &string_escape },
+    { L"join", &string_join },
+    { L"length", &string_length },
+    { L"match", &string_match },
+    { L"replace", &string_replace },
+    { L"split", &string_split },
+    { L"sub", &string_sub },
+    { L"trim", &string_trim },
+    { 0, 0 }
+};
+
+/**
+   The string builtin, for manipulating strings.
+*/
+/*static*/ int builtin_string(parser_t &parser, wchar_t **argv)
+{
+    int argc = builtin_count_args(argv);
+    if (argc <= 1)
+    {
+        string_error(STRING_ERR_MISSING, argv[0]);
+        builtin_print_help(parser, L"string", stderr_buffer);
+        return BUILTIN_STRING_ERROR;
+    }
+
+    if (wcscmp(argv[1], L"-h") == 0 || wcscmp(argv[1], L"--help") == 0)
+    {
+        builtin_print_help(parser, L"string", stderr_buffer);
+        return BUILTIN_STRING_OK;
+    }
+
+    const string_subcommand *subcmd = &string_subcommands[0];
+    while (subcmd->name != 0 && wcscmp(subcmd->name, argv[1]) != 0)
+    {
+        subcmd++;
+    }
+    if (subcmd->handler == 0)
+    {
+        string_error(_(L"%ls: Unknown subcommand '%ls'\n"), argv[0], argv[1]);
+        builtin_print_help(parser, L"string", stderr_buffer);
+        return BUILTIN_STRING_ERROR;
+    }
+
+    argc--;
+    argv++;
+    return subcmd->handler(parser, argc, argv);
+}
diff --git a/src/common.cpp b/src/common.cpp
index 78d0e238..50cb1102 100644
--- a/src/common.cpp
+++ b/src/common.cpp
@@ -1106,7 +1106,7 @@ static wint_t string_last_char(const wcstring &str)
 }
 
 /* Given a null terminated string starting with a backslash, read the escape as if it is unquoted, appending to result. Return the number of characters consumed, or 0 on error */
-static size_t read_unquoted_escape(const wchar_t *input, wcstring *result, bool allow_incomplete, bool unescape_special)
+size_t read_unquoted_escape(const wchar_t *input, wcstring *result, bool allow_incomplete, bool unescape_special)
 {
     if (input[0] != L'\\')
     {
diff --git a/src/common.h b/src/common.h
index e27968fd..88bbf480 100644
--- a/src/common.h
+++ b/src/common.h
@@ -825,6 +825,9 @@ wcstring escape_string(const wcstring &in, escape_flags_t flags);
    character set.
 */
 
+/** Given a null terminated string starting with a backslash, read the escape as if it is unquoted, appending to result. Return the number of characters consumed, or 0 on error */
+size_t read_unquoted_escape(const wchar_t *input, wcstring *result, bool allow_incomplete, bool unescape_special);
+
 /** Unescapes a string in-place. A true result indicates the string was unescaped, a false result indicates the string was unmodified. */
 bool unescape_string_in_place(wcstring *str, unescape_flags_t escape_special);
 
diff --git a/src/fish_tests.cpp b/src/fish_tests.cpp
index 3b36b577..1b314799 100644
--- a/src/fish_tests.cpp
+++ b/src/fish_tests.cpp
@@ -4017,6 +4017,291 @@ static void test_wcstring_tok(void)
     }
 }
 
+int builtin_string(parser_t &parser, wchar_t **argv);
+extern wcstring stdout_buffer;
+static void run_one_string_test(const wchar_t **argv, int expected_rc, const wchar_t *expected_out)
+{
+    parser_t parser(PARSER_TYPE_GENERAL, true);
+    wcstring &out = stdout_buffer;
+    out.clear();
+    int rc = builtin_string(parser, const_cast<wchar_t**>(argv));
+    wcstring args;
+    for (int i = 0; argv[i] != 0; i++)
+    {
+        args += escape_string(argv[i], ESCAPE_ALL) + L' ';
+    }
+    args.resize(args.size() - 1);
+    if (rc != expected_rc)
+    {
+        err(L"Test failed on line %lu: [%ls]: expected return code %d but got %d",
+                __LINE__, args.c_str(), expected_rc, rc);
+    }
+    else if (out != expected_out)
+    {
+        err(L"Test failed on line %lu: [%ls]: expected [%ls] but got [%ls]",
+                __LINE__, args.c_str(),
+                escape_string(expected_out, ESCAPE_ALL).c_str(),
+                escape_string(out, ESCAPE_ALL).c_str());
+    }
+}
+
+static void test_string(void)
+{
+    static struct string_test
+    {
+        const wchar_t *argv[15];
+        int expected_rc;
+        const wchar_t *expected_out;
+    }
+    string_tests[] =
+    {
+        { {L"string", L"escape", 0},                                1, L"" },
+        { {L"string", L"escape", L"", 0},                           0, L"''\n" },
+        { {L"string", L"escape", L"-n", L"", 0},                    0, L"\n" },
+        { {L"string", L"escape", L"a", 0},                          0, L"a\n" },
+        { {L"string", L"escape", L"\x07", 0},                       0, L"\\cg\n" },
+        { {L"string", L"escape", L"\"x\"", 0},                      0, L"'\"x\"'\n" },
+        { {L"string", L"escape", L"hello world", 0},                0, L"'hello world'\n" },
+        { {L"string", L"escape", L"-n", L"hello world", 0},         0, L"hello\\ world\n" },
+        { {L"string", L"escape", L"hello", L"world", 0},            0, L"hello\nworld\n" },
+        { {L"string", L"escape", L"-n", L"~", 0},                   0, L"\\~\n" },
+
+        { {L"string", L"join", 0},                                  2, L"" },
+        { {L"string", L"join", L"", 0},                             1, L"" },
+        { {L"string", L"join", L"", L"", L"", L"", 0},              0, L"\n" },
+        { {L"string", L"join", L"", L"a", L"b", L"c", 0},           0, L"abc\n" },
+        { {L"string", L"join", L".", L"fishshell", L"com", 0},      0, L"fishshell.com\n" },
+        { {L"string", L"join", L"/", L"usr", 0},                    1, L"usr\n" },
+        { {L"string", L"join", L"/", L"usr", L"local", L"bin", 0},  0, L"usr/local/bin\n" },
+        { {L"string", L"join", L"...", L"3", L"2", L"1", 0},        0, L"3...2...1\n" },
+        { {L"string", L"join", L"-q", 0},                           2, L"" },
+        { {L"string", L"join", L"-q", L".", 0},                     1, L"" },
+        { {L"string", L"join", L"-q", L".", L".", 0},               1, L"" },
+
+        { {L"string", L"length", 0},                                1, L"" },
+        { {L"string", L"length", L"", 0},                           1, L"0\n" },
+        { {L"string", L"length", L"", L"", L"", 0},                 1, L"0\n0\n0\n" },
+        { {L"string", L"length", L"a", 0},                          0, L"1\n" },
+        { {L"string", L"length", L"\U0002008A", 0},                 0, L"1\n" },
+        { {L"string", L"length", L"um", L"dois", L"três", 0},       0, L"2\n4\n4\n" },
+        { {L"string", L"length", L"um", L"dois", L"três", 0},       0, L"2\n4\n4\n" },
+        { {L"string", L"length", L"-q", 0},                         1, L"" },
+        { {L"string", L"length", L"-q", L"", 0},                    1, L"" },
+        { {L"string", L"length", L"-q", L"a", 0},                   0, L"" },
+
+        { {L"string", L"match", 0},                                         2, L"" },
+        { {L"string", L"match", L"", 0},                                    1, L"" },
+        { {L"string", L"match", L"", L"", 0},                               0, L"\n" },
+        { {L"string", L"match", L"?", L"a", 0},                             0, L"a\n" },
+        { {L"string", L"match", L"*", L"", 0},                              0, L"\n" },
+        { {L"string", L"match", L"**", L"", 0},                             0, L"\n" },
+        { {L"string", L"match", L"*", L"xyzzy", 0},                         0, L"xyzzy\n" },
+        { {L"string", L"match", L"**", L"plugh", 0},                        0, L"plugh\n" },
+        { {L"string", L"match", L"a*b", L"axxb", 0},                        0, L"axxb\n" },
+        { {L"string", L"match", L"a??b", L"axxb", 0},                       0, L"axxb\n" },
+        { {L"string", L"match", L"-i", L"a??B", L"axxb", 0},                0, L"axxb\n" },
+        { {L"string", L"match", L"-i", L"a??b", L"Axxb", 0},                0, L"Axxb\n" },
+        { {L"string", L"match", L"a*", L"axxb", 0},                         0, L"axxb\n" },
+        { {L"string", L"match", L"*a", L"xxa", 0},                          0, L"xxa\n" },
+        { {L"string", L"match", L"*a*", L"axa", 0},                         0, L"axa\n" },
+        { {L"string", L"match", L"*a*", L"xax", 0},                         0, L"xax\n" },
+        { {L"string", L"match", L"*a*", L"bxa", 0},                         0, L"bxa\n" },
+        { {L"string", L"match", L"*a", L"a", 0},                            0, L"a\n" },
+        { {L"string", L"match", L"a*", L"a", 0},                            0, L"a\n" },
+        { {L"string", L"match", L"a*b*c", L"axxbyyc", 0},                   0, L"axxbyyc\n" },
+        { {L"string", L"match", L"a*b?c", L"axxbyc", 0},                    0, L"axxbyc\n" },
+        { {L"string", L"match", L"*?", L"a", 0},                            0, L"a\n" },
+        { {L"string", L"match", L"*?", L"ab", 0},                           0, L"ab\n" },
+        { {L"string", L"match", L"?*", L"a", 0},                            0, L"a\n" },
+        { {L"string", L"match", L"?*", L"ab", 0},                           0, L"ab\n" },
+        { {L"string", L"match", L"\\*", L"*", 0},                           0, L"*\n" },
+        { {L"string", L"match", L"a*\\", L"abc\\", 0},                      0, L"abc\\\n" },
+        { {L"string", L"match", L"a*\\?", L"abc?", 0},                      0, L"abc?\n" },
+
+        { {L"string", L"match", L"?", L"", 0},                              1, L"" },
+        { {L"string", L"match", L"?", L"ab", 0},                            1, L"" },
+        { {L"string", L"match", L"??", L"a", 0},                            1, L"" },
+        { {L"string", L"match", L"?a", L"a", 0},                            1, L"" },
+        { {L"string", L"match", L"a?", L"a", 0},                            1, L"" },
+        { {L"string", L"match", L"a??B", L"axxb", 0},                       1, L"" },
+        { {L"string", L"match", L"a*b", L"axxbc", 0},                       1, L"" },
+        { {L"string", L"match", L"*b", L"bbba", 0},                         1, L"" },
+        { {L"string", L"match", L"0x[0-9a-fA-F][0-9a-fA-F]", L"0xbad", 0},  1, L"" },
+
+        { {L"string", L"match", L"-a", L"*", L"ab", L"cde", 0},             0, L"ab\ncde\n" },
+        { {L"string", L"match", L"*", L"ab", L"cde", 0},                    0, L"ab\ncde\n" },
+        { {L"string", L"match", L"-n", L"*d*", L"cde", 0},                  0, L"1 3\n" },
+        { {L"string", L"match", L"-n", L"*x*", L"cde", 0},                  1, L"" },
+        { {L"string", L"match", L"-q", L"a*", L"b", L"c", 0},               1, L"" },
+        { {L"string", L"match", L"-q", L"a*", L"b", L"a", 0},               0, L"" },
+
+        { {L"string", L"match", L"-r", 0},                                  2, L"" },
+        { {L"string", L"match", L"-r", L"", 0},                             1, L"" },
+        { {L"string", L"match", L"-r", L"", L"", 0},                        0, L"\n" },
+        { {L"string", L"match", L"-r", L".", L"a", 0},                      0, L"a\n" },
+        { {L"string", L"match", L"-r", L".*", L"", 0},                      0, L"\n" },
+        { {L"string", L"match", L"-r", L"a*b", L"b", 0},                    0, L"b\n" },
+        { {L"string", L"match", L"-r", L"a*b", L"aab", 0},                  0, L"aab\n" },
+        { {L"string", L"match", L"-r", L"-i", L"a*b", L"Aab", 0},           0, L"Aab\n" },
+        { {L"string", L"match", L"-r", L"-a", L"a[bc]", L"abadac", 0},      0, L"ab\nac\n" },
+        { {L"string", L"match", L"-r", L"a", L"xaxa", L"axax", 0},          0, L"a\na\n" },
+        { {L"string", L"match", L"-r", L"-a", L"a", L"xaxa", L"axax", 0},   0, L"a\na\na\na\n" },
+        { {L"string", L"match", L"-r", L"a[bc]", L"abadac", 0},             0, L"ab\n" },
+        { {L"string", L"match", L"-r", L"-q", L"a[bc]", L"abadac", 0},      0, L"" },
+        { {L"string", L"match", L"-r", L"-q", L"a[bc]", L"ad", 0},          1, L"" },
+        { {L"string", L"match", L"-r", L"(a+)b(c)", L"aabc", 0},            0, L"aabc\naa\nc\n" },
+        { {L"string", L"match", L"-r", L"-a", L"(a)b(c)", L"abcabc", 0},    0, L"abc\na\nc\nabc\na\nc\n" },
+        { {L"string", L"match", L"-r", L"(a)b(c)", L"abcabc", 0},           0, L"abc\na\nc\n" },
+        { {L"string", L"match", L"-r", L"(a|(z))(bc)", L"abc", 0},          0, L"abc\na\nbc\n" },
+        { {L"string", L"match", L"-r", L"-n", L"a", L"ada", L"dad", 0},     0, L"1 1\n2 1\n" },
+        { {L"string", L"match", L"-r", L"-n", L"-a", L"a", L"bacadae", 0},  0, L"2 1\n4 1\n6 1\n" },
+        { {L"string", L"match", L"-r", L"-n", L"(a).*(b)", L"a---b", 0},    0, L"1 5\n1 1\n5 1\n" },
+        { {L"string", L"match", L"-r", L"-n", L"(a)(b)", L"ab", 0}, 0, L"1 2\n1 1\n2 1\n" },
+        { {L"string", L"match", L"-r", L"-n", L"(a)(b)", L"abab", 0}, 0, L"1 2\n1 1\n2 1\n" },
+        { {L"string", L"match", L"-r", L"-n", L"-a", L"(a)(b)", L"abab", 0}, 0, L"1 2\n1 1\n2 1\n3 2\n3 1\n4 1\n" },
+        { {L"string", L"match", L"-r", L"*", L"", 0},                       2, L"" },
+        { {L"string", L"match", L"-r", L"-a", L"a*", L"b", 0},              0, L"\n\n" },
+        { {L"string", L"match", L"-r", L"foo\\Kbar", L"foobar", 0},         0, L"bar\n" },
+        { {L"string", L"match", L"-r", L"(foo)\\Kbar", L"foobar", 0},       0, L"bar\nfoo\n" },
+        { {L"string", L"match", L"-r", L"(?=ab\\K)", L"ab", 0},             0, L"\n" },
+        { {L"string", L"match", L"-r", L"(?=ab\\K)..(?=cd\\K)", L"abcd", 0}, 0, L"\n" },
+
+        { {L"string", L"replace", 0},                                       2, L"" },
+        { {L"string", L"replace", L"", 0},                                  2, L"" },
+        { {L"string", L"replace", L"", L"", 0},                             1, L"" },
+        { {L"string", L"replace", L"", L"", L"", 0},                        1, L"\n" },
+        { {L"string", L"replace", L"", L"", L" ", 0},                       1, L" \n" },
+        { {L"string", L"replace", L"a", L"b", L"", 0},                      1, L"\n" },
+        { {L"string", L"replace", L"a", L"b", L"a", 0},                     0, L"b\n" },
+        { {L"string", L"replace", L"a", L"b", L"xax", 0},                   0, L"xbx\n" },
+        { {L"string", L"replace", L"a", L"b", L"xax", L"axa", 0},           0, L"xbx\nbxa\n" },
+        { {L"string", L"replace", L"bar", L"x", L"red barn", 0},            0, L"red xn\n" },
+        { {L"string", L"replace", L"x", L"bar", L"red xn", 0},              0, L"red barn\n" },
+        { {L"string", L"replace", L"--", L"x", L"-", L"xyz", 0},            0, L"-yz\n" },
+        { {L"string", L"replace", L"--", L"y", L"-", L"xyz", 0},            0, L"x-z\n" },
+        { {L"string", L"replace", L"--", L"z", L"-", L"xyz", 0},            0, L"xy-\n" },
+        { {L"string", L"replace", L"-i", L"z", L"X", L"_Z_", 0},            0, L"_X_\n" },
+        { {L"string", L"replace", L"-a", L"a", L"A", L"aaa", 0},            0, L"AAA\n" },
+        { {L"string", L"replace", L"-i", L"a", L"z", L"AAA", 0},            0, L"zAA\n" },
+        { {L"string", L"replace", L"-q", L"x", L">x<", L"x", 0},            0, L"" },
+        { {L"string", L"replace", L"-a", L"x", L"", L"xxx", 0},             0, L"\n" },
+        { {L"string", L"replace", L"-a", L"***", L"_", L"*****", 0},        0, L"_**\n" },
+        { {L"string", L"replace", L"-a", L"***", L"***", L"******", 0},     0, L"******\n" },
+        { {L"string", L"replace", L"-a", L"a", L"b", L"xax", L"axa", 0},    0, L"xbx\nbxb\n" },
+
+        { {L"string", L"replace", L"-r", 0},                                2, L"" },
+        { {L"string", L"replace", L"-r", L"", 0},                           2, L"" },
+        { {L"string", L"replace", L"-r", L"", L"", 0},                      1, L"" },
+        { {L"string", L"replace", L"-r", L"", L"", L"", 0},                 0, L"\n" },  // pcre2 behavior
+        { {L"string", L"replace", L"-r", L"", L"", L" ", 0},                0, L" \n" }, // pcre2 behavior
+        { {L"string", L"replace", L"-r", L"a", L"b", L"", 0},               1, L"\n" },
+        { {L"string", L"replace", L"-r", L"a", L"b", L"a", 0},              0, L"b\n" },
+        { {L"string", L"replace", L"-r", L".", L"x", L"abc", 0},            0, L"xbc\n" },
+        { {L"string", L"replace", L"-r", L".", L"", L"abc", 0},             0, L"bc\n" },
+        { {L"string", L"replace", L"-r", L"(\\w)(\\w)", L"$2$1", L"ab", 0}, 0, L"ba\n" },
+        { {L"string", L"replace", L"-r", L"(\\w)", L"$1$1", L"ab", 0},      0, L"aab\n" },
+        { {L"string", L"replace", L"-r", L"-a", L".", L"x", L"abc", 0},     0, L"xxx\n" },
+        { {L"string", L"replace", L"-r", L"-a", L"(\\w)", L"$1$1", L"ab", 0}, 0, L"aabb\n" },
+        { {L"string", L"replace", L"-r", L"-a", L".", L"", L"abc", 0},      0, L"\n" },
+        { {L"string", L"replace", L"-r", L"a", L"x", L"bc", L"cd", L"de", 0}, 1, L"bc\ncd\nde\n" },
+        { {L"string", L"replace", L"-r", L"a", L"x", L"aba", L"caa", 0},    0, L"xba\ncxa\n" },
+        { {L"string", L"replace", L"-r", L"-a", L"a", L"x", L"aba", L"caa", 0}, 0, L"xbx\ncxx\n" },
+        { {L"string", L"replace", L"-r", L"-i", L"A", L"b", L"xax", 0},     0, L"xbx\n" },
+        { {L"string", L"replace", L"-r", L"-i", L"[a-z]", L".", L"1A2B", 0}, 0, L"1.2B\n" },
+        { {L"string", L"replace", L"-r", L"A", L"b", L"xax", 0},            1, L"xax\n" },
+        { {L"string", L"replace", L"-r", L"a", L"$1", L"a", 0},             2, L"" },
+        { {L"string", L"replace", L"-r", L"(a)", L"$2", L"a", 0},           2, L"" },
+        { {L"string", L"replace", L"-r", L"*", L".", L"a", 0},              2, L"" },
+        { {L"string", L"replace", L"-r", L"^(.)", L"\t$1", L"abc", L"x", 0}, 0, L"\tabc\n\tx\n" },
+
+        { {L"string", L"split", 0},                                         2, L"" },
+        { {L"string", L"split", L":", 0},                                   1, L"" },
+        { {L"string", L"split", L".", L"www.ch.ic.ac.uk", 0},               0, L"www\nch\nic\nac\nuk\n" },
+        { {L"string", L"split", L"..", L"....", 0},                         0, L"\n\n\n" },
+        { {L"string", L"split", L"-m", L"x", L"..", L"....", 0},            2, L"" },
+        { {L"string", L"split", L"-m1", L"..", L"....", 0},                 0, L"\n..\n" },
+        { {L"string", L"split", L"-m0", L"/", L"/usr/local/bin/fish", 0},   1, L"/usr/local/bin/fish\n" },
+        { {L"string", L"split", L"-m2", L":", L"a:b:c:d", L"e:f:g:h", 0},   0, L"a\nb\nc:d\ne\nf\ng:h\n" },
+        { {L"string", L"split", L"-m1", L"-r", L"/", L"/usr/local/bin/fish", 0}, 0, L"/usr/local/bin\nfish\n" },
+        { {L"string", L"split", L"-r", L".", L"www.ch.ic.ac.uk", 0},        0, L"www\nch\nic\nac\nuk\n" },
+        { {L"string", L"split", L"--", L"--", L"a--b---c----d", 0},         0, L"a\nb\n-c\n\nd\n" },
+        { {L"string", L"split", L"-r", L"..", L"....", 0},                  0, L"\n\n\n" },
+        { {L"string", L"split", L"-r", L"--", L"--", L"a--b---c----d", 0},  0, L"a\nb-\nc\n\nd\n" },
+        { {L"string", L"split", L"", L"", 0},                               1, L"\n" },
+        { {L"string", L"split", L"", L"a", 0},                              1, L"a\n" },
+        { {L"string", L"split", L"", L"ab", 0},                             0, L"a\nb\n" },
+        { {L"string", L"split", L"", L"abc", 0},                            0, L"a\nb\nc\n" },
+        { {L"string", L"split", L"-m1", L"", L"abc", 0},                    0, L"a\nbc\n" },
+        { {L"string", L"split", L"-r", L"", L"", 0},                        1, L"\n" },
+        { {L"string", L"split", L"-r", L"", L"a", 0},                       1, L"a\n" },
+        { {L"string", L"split", L"-r", L"", L"ab", 0},                      0, L"a\nb\n" },
+        { {L"string", L"split", L"-r", L"", L"abc", 0},                     0, L"a\nb\nc\n" },
+        { {L"string", L"split", L"-r", L"-m1", L"", L"abc", 0},             0, L"ab\nc\n" },
+        { {L"string", L"split", L"-q", 0},                                  2, L"" },
+        { {L"string", L"split", L"-q", L":", 0},                            1, L"" },
+        { {L"string", L"split", L"-q", L"x", L"axbxc", 0},                  0, L"" },
+
+        { {L"string", L"sub", 0},                                   1, L"" },
+        { {L"string", L"sub", L"abcde", 0},                         0, L"abcde\n"},
+        { {L"string", L"sub", L"-l", L"x", L"abcde", 0},            2, L""},
+        { {L"string", L"sub", L"-s", L"x", L"abcde", 0},            2, L""},
+        { {L"string", L"sub", L"-l0", L"abcde", 0},                 0, L"\n"},
+        { {L"string", L"sub", L"-l2", L"abcde", 0},                 0, L"ab\n"},
+        { {L"string", L"sub", L"-l5", L"abcde", 0},                 0, L"abcde\n"},
+        { {L"string", L"sub", L"-l6", L"abcde", 0},                 0, L"abcde\n"},
+        { {L"string", L"sub", L"-l-1", L"abcde", 0},                2, L""},
+        { {L"string", L"sub", L"-s0", L"abcde", 0},                 2, L""},
+        { {L"string", L"sub", L"-s1", L"abcde", 0},                 0, L"abcde\n"},
+        { {L"string", L"sub", L"-s5", L"abcde", 0},                 0, L"e\n"},
+        { {L"string", L"sub", L"-s6", L"abcde", 0},                 0, L"\n"},
+        { {L"string", L"sub", L"-s-1", L"abcde", 0},                0, L"e\n"},
+        { {L"string", L"sub", L"-s-5", L"abcde", 0},                0, L"abcde\n"},
+        { {L"string", L"sub", L"-s-6", L"abcde", 0},                0, L"abcde\n"},
+        { {L"string", L"sub", L"-s1", L"-l0", L"abcde", 0},         0, L"\n"},
+        { {L"string", L"sub", L"-s1", L"-l1", L"abcde", 0},         0, L"a\n"},
+        { {L"string", L"sub", L"-s2", L"-l2", L"abcde", 0},         0, L"bc\n"},
+        { {L"string", L"sub", L"-s-1", L"-l1", L"abcde", 0},        0, L"e\n"},
+        { {L"string", L"sub", L"-s-1", L"-l2", L"abcde", 0},        0, L"e\n"},
+        { {L"string", L"sub", L"-s-3", L"-l2", L"abcde", 0},        0, L"cd\n"},
+        { {L"string", L"sub", L"-s-3", L"-l4", L"abcde", 0},        0, L"cde\n"},
+        { {L"string", L"sub", L"-q", 0},                            1, L"" },
+        { {L"string", L"sub", L"-q", L"abcde", 0},                  0, L""},
+
+        { {L"string", L"trim", 0},                                  1, L""},
+        { {L"string", L"trim", L""},                                1, L"\n"},
+        { {L"string", L"trim", L" "},                               0, L"\n"},
+        { {L"string", L"trim", L"  \f\n\r\t"},                      0, L"\n"},
+        { {L"string", L"trim", L" a"},                              0, L"a\n"},
+        { {L"string", L"trim", L"a "},                              0, L"a\n"},
+        { {L"string", L"trim", L" a "},                             0, L"a\n"},
+        { {L"string", L"trim", L"-l", L" a"},                       0, L"a\n"},
+        { {L"string", L"trim", L"-l", L"a "},                       1, L"a \n"},
+        { {L"string", L"trim", L"-l", L" a "},                      0, L"a \n"},
+        { {L"string", L"trim", L"-r", L" a"},                       1, L" a\n"},
+        { {L"string", L"trim", L"-r", L"a "},                       0, L"a\n"},
+        { {L"string", L"trim", L"-r", L" a "},                      0, L" a\n"},
+        { {L"string", L"trim", L"-c", L".", L" a"},                 1, L" a\n"},
+        { {L"string", L"trim", L"-c", L".", L"a "},                 1, L"a \n"},
+        { {L"string", L"trim", L"-c", L".", L" a "},                1, L" a \n"},
+        { {L"string", L"trim", L"-c", L".", L".a"},                 0, L"a\n"},
+        { {L"string", L"trim", L"-c", L".", L"a."},                 0, L"a\n"},
+        { {L"string", L"trim", L"-c", L".", L".a."},                0, L"a\n"},
+        { {L"string", L"trim", L"-c", L"\\/", L"/a\\"},             0, L"a\n"},
+        { {L"string", L"trim", L"-c", L"\\/", L"a/"},               0, L"a\n"},
+        { {L"string", L"trim", L"-c", L"\\/", L"\\a/"},             0, L"a\n"},
+        { {L"string", L"trim", L"-c", L"", L".a."},                 1, L".a.\n"},
+
+        { {0}, 0, 0 }
+    };
+
+    struct string_test *t = string_tests;
+    while (t->argv[0] != 0)
+    {
+        run_one_string_test(t->argv, t->expected_rc, t->expected_out);
+        t++;
+    }
+}
+
 /**
    Main test
 */
@@ -4108,6 +4393,7 @@ int main(int argc, char **argv)
     if (should_test_function("history_races")) history_tests_t::test_history_races();
     if (should_test_function("history_formats")) history_tests_t::test_history_formats();
     //history_tests_t::test_history_speed();
+    if (should_test_function("string")) test_string();
 
     say(L"Encountered %d errors in low-level tests", err_count);
     if (s_test_run_count == 0)
author	Michael Steed <msteed@saltstack.com>	2015-09-12 12:59:40 -0700
committer	ridiculousfish <corydoras@ridiculousfish.com>	2015-09-21 16:41:25 -0700
commit	d83ef07ca76c03852366e4e810053edc19796761 (patch)
tree	93f671c5fe577128fd14a1f013fff764e5c5abba /src
parent	e70ed961eab80dab41ebfda7b91d80d9ef041be7 (diff)