notmuch - thread-based email index, search and tagging

	Commit message (Collapse)	Author	Age
*	Store "from" and "subject" headers in the database.	Austin Clements	2011-11-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a rebase and cleanup of Istvan Marko's patch from id:m3pqnj2j7a.fsf@zsu.kismala.com Search retrieves these headers for every message in the search results. Previously, this required opening and parsing every message file. Storing them directly in the database significantly reduces IO and computation, speeding up search by between 50% and 10X. Taking full advantage of this requires a database rebuild, but it will fall back to the old behavior for messages that do not have headers stored in the database.
*	lib: make find_message{,by_filename) report errors	Ali Polatel	2011-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the functions notmuch_database_find_message() and notmuch_database_find_message_by_filename() functions did not properly report error condition to the library user. For more information, read the thread on the notmuch mailing list starting with my mail "id:871uv2unfd.fsf@gmail.com" Make these functions accept a pointer to 'notmuch_message_t' as argument and return notmuch_status_t which may be used to check for any error condition. restore: Modify for the new notmuch_database_find_message() new: Modify for the new notmuch_database_find_message_by_filename()
*	lib: Remove message document directly after removing the last file name.	Austin Clements	2011-09-23
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, notmuch_database_remove_message would remove the message file name, sync the change to the message document, re-find the message document, and then delete it if there were no more file names. An interruption after sync'ing would result in a file-name-less, permanently un-removable zombie message that would produce errors and odd results in searches. We could wrap this in an atomic section, but it's much simpler to eliminate the round-about approach and just delete the message document instead of sync'ing it if we removed the last filename.
*	lib: Indicate if there are more filenames after removal.	Austin Clements	2011-09-23
\| \| \| \| \| \|	Make _notmuch_message_remove_filename return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID if the message has more filenames and fix callers to handle this.
*	lib: Add missing status check in _notmuch_message_remove_filename.	Austin Clements	2011-09-12
\| \| \| \| \| \|	Previously, this function would synchronize the folder list even if removing the file name failed. Now it returns immediately if removing the file name fails.
*	Fix folder: coherence issue	Mark Anderson	2011-06-29
\| \| \| \| \| \| \| \| \| \|	Add removal of all ZXFOLDER terms to removal of all XFOLDER terms for each message filename removal. The existing filename-list reindexing will put all the needed terms back in. Test search-folder-coherence now passes. Signed-off-by:Mark Anderson <ma.skies@gmail.com>
*	fix sum moar typos [comments in source code]	Pieter Praet	2011-06-23
\| \| \| \| \| \| \| \| \| \|	Various typo fixes in comments within the source code. Signed-off-by: Pieter Praet <pieter@praet.org> Edited-by: Carl Worth <cworth@cworth.org> Restricted to just source-code comments, (and fixed fix of "descriptios" to "descriptors" rather than "descriptions").
*	Mark some structures in the library interface with visibility=default attribute.	Carl Worth	2011-05-11
\| \| \| \| \| \| \| \| \| \| \| \| \|	As of gcc 4.6, there are new warnings from -Wattributes along the lines of: warning: ‘_notmuch_messages’ declared with greater visibility than the type of its field ‘_notmuch_messages::iterator’ [-Wattributes] To squelch these, we decorate all such containing structs with __attribute__((visibility("default"))). We take care to let only the C++ compiler see this, (since the C compiler would otherwise warn about ignored visibility attributes on types).
*	Remove some variables which were set but not used.	Carl Worth	2011-05-11
\| \| \| \| \| \| \| \| \| \| \| \|	gcc (at least as of version 4.6.0) is kind enough to point these out to us, (when given -Wunused-but-set-variable explicitly or implicitly via -Wunused or -Wall). One of these cases was a legitimately unused variable. Two were simply variables (named ignored) we were assigning only to squelch a warning about unused function return values. I don't seem to be getting those warnings even without setting the ignored variable. And the gcc docs. say that the correct way to squelch that warning is with a cast to (void) anyway.
*	Add the tag list to the unified message metadata pass.	Austin Clements	2011-03-21
\| \| \| \| \| \| \| \| \| \| \| \| \|	Now each caller of notmuch_message_get_tags only gets a new iterator, instead of a whole new list. In principle this could cause problems with iterating while modifying tags, but through the magic of talloc references, we keep the old tag list alive even after the cache in the message object is invalidated. This reduces my index search from the 3.102 seconds before the unified metadata pass to 1.811 seconds (1.7X faster). Combined with the thread search optimization in b3caef1f0659dac8183441357c8fee500a940889, that makes this query 2.5X faster than when I started.
*	Add the file name list to the unified message metadata pass.	Austin Clements	2011-03-21
\| \| \| \| \| \| \| \| \| \| \| \| \|	Even if the caller never uses the file names, there is little cost to simply fetching the file name terms. However, retrieving the full paths requires additional database work, so the expansion from terms to full paths is performed lazily. This also simplifies clearing the filename cache, since that's now handled by the generic metadata cache code. This further reduces my inbox search from 3.102 seconds before the unified metadata pass to 2.206 seconds (1.4X faster).
*	Add a generic function to get a list of terms with some prefix.	Austin Clements	2011-03-21
\| \| \| \| \| \|	Replace _notmuch_convert_tags with this and simplify _create_filenames_for_terms_with_prefix. This will also come in handy shortly to get the message file name list.
*	Implement an internal generic string list and use it.	Austin Clements	2011-03-21
\| \| \| \| \| \| \| \| \| \| \| \|	This replaces the guts of the filename list and tag list, making those interfaces simple iterators over the generic string list. The directory, message filename, and tags-related code now build generic string lists and then wraps them in specific iterators. The real wins come in later patches, when we use these for even more generic functionality. As a nice side-effect, this also eliminates the annoying dependency on GList in the tag list.
*	Use a single unified pass to fetch scalar message metadata.	Austin Clements	2011-03-21
\| \| \| \| \| \| \| \| \| \|	This performs a single pass over a message's term list to fetch the thread ID, message ID, and reply-to, rather than requiring a pass for each. Xapian decompresses the term list anew for each iteration, so this reduces the amount of time spent decompressing message metadata. This reduces my inbox search from 3.102 seconds to 2.555 seconds (1.2X faster).
*	lib: Save and restore term position in message while indexing.	Carl Worth	2011-01-26
\| \| \| \| \|	This fixes the recently addead search-position-overlap bug as demonstrated in the test of the same name.
*	Add support for folder-based searching.	Carl Worth	2011-01-15
\| \| \| \| \| \| \| \|	A new "folder:" prefix in the query string can now be used to match the directories in which mail files are stored. The addition of this feature causes the recently added search-by-folder tests to now pass.
*	Correct some minor typos in a comment	Carl Worth	2011-01-15
\| \| \| \| \|	Nothing too important here. Just some misspellings I noticed while reading nearby code.
*	Optimize thread search using matched docid sets.	Austin Clements	2010-12-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reduces thread search's 1+2t Xapian queries (where t is the number of matched threads) to 1+t queries and constructs exactly one notmuch_message_t for each message instead of 2 to 3. notmuch_query_search_threads eagerly fetches the docids of all messages matching the user query instead of lazily constructing message objects and fetching thread ID's from term lists. _notmuch_thread_create takes a seed docid and the set of all matched docids and uses a single Xapian query to expand this docid to its containing thread, using the matched docid set to determine which messages in the thread match the user query instead of using a second Xapian query. This reduces the amount of time required to load my inbox from 4.523 seconds to 3.025 seconds (1.5X faster).
*	lib: Fix missing initialization of status field.	Carl Worth	2010-11-11
\| \| \| \| \|	This could have been a problematic bug. Fortuinately "gcc -O2" warns about it.
*	lib: Add two missing static qualifiers	Carl Worth	2010-11-11
\| \| \| \| \|	The debian packaging is nice enough to notice when we accidentally leak private symbols to the public interface.
*	tags_to_maildir_flags: Fix to preserve existing, unsupported flags	Carl Worth	2010-11-11
\| \| \| \| \| \|	This is to prevent notmuch from destroying any information the user has encoded as flags in the maildir filename. Tests are also added to the test suite to verify the documented behavior.
*	notmuch_message_tags_to_maildir_flags: Do nothing outside of "new" and "cur"	Carl Worth	2010-11-11
\| \| \| \| \| \| \| \| \| \| \|	Some people use notmuch with non-maildir files, (for example, email messages in MH format, or else cool things like using sluk[] to suck down feeds into a format that notmuch can index). To better support uses like that, don't do any renaming for files that are not in a directory named either "new" or "cur". [] https://github.com/krl/sluk/
*	notmuch_message_tags_to_maildir_flags: Don't exit on failure to rename.	Carl Worth	2010-11-11
\| \| \| \| \| \|	It is totally legitimate for a non-maildir directory to be named "new" (and not have a directory next to it named "cur"). To support this case at least, be silent about any rename failure.
*	notmuch_message_tags_to_maildir_flags: Fix to rename multiple files	Carl Worth	2010-11-11
\| \| \| \| \|	This function was documented as modifying every filename associated with the message. Fix it to actually do that.
*	maildir_flags_to_tags: Avoid interpreting "no info" as "no flags set".	Carl Worth	2010-11-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a filename has no maildir info at all, (that is, it does not contain the sequence ":2,"), we consider this distinct from a filename with an empty maildir info, (the ":2," separator is present, but no flags characters follow). Specifically, we regard a missing info field as providing no information, so tags will remain unchanged. On the other hand, an info field that is present but has no flags set will cause various tags to be cleared, (or in the case of "unread", added). This fixes the "remove info" case of the maildir-sync tests in the test suite.
*	Fix notmuch_message_tags_to_maildir_flags to effect rename immediately	Carl Worth	2010-11-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have tests to ensure that when the notmuch library renames a file that that rename takes place immediately in the database, (without requiring something like "notmuch new" to notice the change). This was working when the code was first added, but recently broke in the reworking of the maildir-synchronization interface since the tags_to_maildir_flags function can no longer assume that it is being called as part of _notmuch_message_sync. Fortunately, the fix is as simple as adding an explicit call to _notmuch_message_sync.
*	Fix notmuch_message_maildir_flags_to_tags to iterate over filenames	Carl Worth	2010-11-11
\| \| \| \| \| \| \| \| \|	As documented, this function now iterates over all filenames for the message, computing a logical OR of the flags set on the filenames, then uses the final result to set tags on the message. This change fixes 3 of the 10 maildir-sync tests that have been failing since being added.
*	lib: Add new, public notmuch_message_get_filenames	Carl Worth	2010-11-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This augments the existing notmuch_message_get_filename by allowing the caller access to all filenames in the case of multiple files for a single message. To support this, we split the iterator (notmuch_filenames_t) away from the list storage (notmuch_filename_list_t) where previously these were a single object (notmuch_filenames_t). Then, whenever the user asks for a file or filename, the message object lazily creates a complete notmuch_filename_list_t and then: For notmuch_message_get_filename, returns the first filename in the list. For notmuch_message_get_filenames, creates and returns a new iterator for the filename list.
*	lib: Remove the notion of TAGS_INVALID	Carl Worth	2010-11-11
\| \| \| \| \| \| \| \|	This rather ugly hack was recently obviated by the removal of the notmuch_database_set_maildir_sync function. Now, clients must make explicit calls to do any syncrhonization between maildir flags and tags. So the library no longer needs to worry about doing inconsistent synchronization while a message is only partially added.
*	lib: Rework interface for maildir_flags synchronization	Carl Worth	2010-11-11
\| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of having an API for setting a library-wide flag for synchronization (notmuch_database_set_maildir_sync) we instead implement maildir synchronization with two new library functions: notmuch_message_maildir_flags_to_tags and notmuch_message_tags_to_maildir_flags These functions are nicely documented here, (though the implementation does not quite match the documentation yet---as plainly evidenced by the current results of the test suite).
*	lib: Remove the synchronization of 'T' flag with "deleted" tag.	Carl Worth	2010-11-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tags in a notmuch database affect all messages with the identical message-ID. But maildir tags affect individual files. And since multiple files can contain the identical message-ID, there is not a one-to-one correspondence between messages affected by tags and flags. This is particularly dangerous with the 'T' (== "trashed") maildir flag and the corresponding "deleted" tag in the notmuch database. Since these flags/tags are often used to trigger irreversible deletion operations, the lack of one-to-one correspondence can be potentially dangerous. For example, consider the following sequence: 1. A third-party application is used to identify duplicate messages in the mail store, and mark all-but-one of each duplicate with the 'T' flag for subsequent deletion. 2. A "notmuch new" operation reads that 'T' flag, adding the "deleted" flag to the corresponding messages within the notmuch database. 3. A subsequent notmuch operation, (such as a "notmuch dump; notmuch restore" cycle) synchronized the "deleted" tag back to the mail store, applying the 'T' flag to all(!) filenames with duplicate message IDs. 4. A third-party application reads the 'T' flags and irreversibly deletes all mail messages which had any duplicates(!). In order to avoid this scenario, we simply refuse to synchronize the 'T' flag with the "deleted" tag. Instead, applications can set 'T' and act on it to delete files, or can set "deleted" and act on it to delete files. But in either case the semantics are clear and there is never dangerous propagation through the one-to-many mapping of notmuch message objects to files.
*	Make maildir synchronization configurable	Michal Sojka	2010-11-10
\| \| \| \| \| \| \|	This adds group [maildir] and key 'synchronize_flags' to the configuration file. Its value enables (true) or diables (false) the synchronization between notmuch tags and maildir flags. By default, the synchronization is disabled.
*	Maildir synchronization	Michal Sojka	2010-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows bi-directional synchronization between maildir flags and certain tags. The flag-to-tag mapping is defined by flag2tag array. The synchronization works this way: 1) Whenever notmuch new is executed, the following happens: o New messages are tagged with configured new_tags. o For new or renamed messages with maildir info present in the file name, the tags defined in flag2tag are either added or removed depending on the flags from the file name. 2) Whenever notmuch tag (or notmuch restore) is executed, a new set of flags based on the tags is constructed for every message and a new file name is prepared based on the old file name but with the new flags. If the flags differs and the old message was in 'new' directory then this is replaced with 'cur' in the new file name. If the new and old file names differ, the file is renamed and notmuch database is updated accordingly. The rename happens before the database is updated. In case of crash between rename and database update, the next run of notmuch new brings the database in sync with the mail store again.
*	lib: Eliminate some redundant includes of xapian.h	Carl Worth	2010-11-01
\| \| \| \| \|	Most files including this already include database-private.h which includes xapian.h already.
*	Avoid database corruption by not adding partially-constructed mail documents.	Carl Worth	2010-06-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we were using Xapian's add_document to allocate document ID values for notmuch_message_t objects. This had the drawback of adding a partially constructed mail document to the database. If notmuch was subsequently interrupted before fully populating this document, then later runs would be quite confused when seeing the partial documents. There are reports from the wild of people hitting internal errors of the form "Message ... has no thread ID" for example, (which is currently an unrecoverable error). We fix this by manually allocating document IDs without adding documents. With this change, we never call Xapian's add_document method, but only replace_document with either the current document ID of a message or a new one that we have allocated.
*	Fix misnamed function in internal documentation.	Carl Worth	2010-06-04
\| \| \| \| \| \|	The documentation for several functions mentioned _notmuch_message_set_sync which doesn't exist. Fix these to reference _notmuch_message_sync instead.
*	Add authors member to message	Dirk Hohndel	2010-04-26
\| \| \| \| \| \| \|	message->authors contains the author's name (as we want to print it) get / set methods are declared in notmuch-private.h Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
*	lib: Silence a compiler warning.	Carl Worth	2010-03-09
\| \| \| \| \|	The original code was harmless, but apparently some compilers aren't able to think deep enough to catch that.
*	lib: Rename iterator functions to prepare for reverse iteration.	Carl Worth	2010-03-09
\| \| \| \| \| \| \| \|	We rename 'has_more' to 'valid' so that it can function whether iterating in a forward or reverse direction. We also rename 'advance' to 'move_to_next' to setup parallel naming with the proposed functions 'move_to_first', 'move_to_last', and 'move_to_previous'.
*	Switch from random to sequential thread identifiers.	Carl Worth	2010-02-09
\| \| \| \| \| \| \| \| \| \| \| \| \|	The sequential identifiers have the advantage of being guaranteed to be unique (until we overflow a 64-bit unsigned integer), and also take up half as much space in the "notmuch search" output (16 columns rather than 32). This change also has the side effect of fixing a bug where notmuch could block on /dev/random at startup (waiting for some entropy to appear). This bug was hit hard by the test suite, (which could easily exhaust the available entropy on common systems---resulting in large delays of the test suite).
*	lib: Add non-content terms with a WDF value of 0.	Carl Worth	2010-01-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The WDF is the "within-document frequency" value for a particular term. It's intended to provide an indication of how frequent a term is within a document, (for use in computing relevance). Xapian's term generator already computes WDF values when we use that, (which we do for indexing all mail content). We don't use the term generator when adding single terms for things that don't actually appear in the mail document, (such as tags, the filename, etc.). In this case, the WDF value for these terms doesn't matter much. But Xapian's flint backend can be more efficient with changes to terms that don't affect the document "length". So there's a performance advantage for manipulating tags (with the flint backend) if the WDF of these terms is 0.
*	lib: Split the database upgrade into two phases for safer operation.	Carl Worth	2010-01-09
\| \| \| \| \| \| \| \| \|	The first phase copies data from the old format to the new format without deleting anything. This allows an old notmuch to still use the database if the upgrade process gets interrupted. The second phase performs the deletion (after updating the database version number). If the second phase is interrupted, there will be some unused data in the database, but it shouldn't cause any actual harm.
*	lib: Implement versioning in the database and provide upgrade function.	Carl Worth	2010-01-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The recent support for renames in the database is our first time (since notmuch has had more than a single user) that we have a database format change. To support smooth upgrades we now encode a database format version number in the Xapian metadata. Going forward notmuch will emit a warning if used to read from a database with a newer version than it natively supports, and will refuse to write to a database with a newer version. The library also provides functions to query the database format version: notmuch_database_get_version to ask if notmuch wants a newer version than that: notmuch_database_needs_upgrade and a function to actually perform that upgrade: notmuch_database_upgrade
*	lib: Consolidate checks for read-only database.	Carl Worth	2010-01-07
\| \| \| \| \| \| \| \| \| \| \|	Previously, many checks were deep in the library just before a cast operation. These have now been replaced with internal errors and new checks have instead been added at the beginning of all top-levelentry points requiring a read-write database. The new checks now also use a single function for checking and printing the error message. This will give us a convenient location to extend the check, (such as based on database version as well).
*	notmuch_message_get_filename: Support old-style filename storage.	Carl Worth	2010-01-07
\| \| \| \| \| \| \| \| \| \| \|	When a notmuch database is upgraded to the new database format, (to support file rename and deletion), any message documents corresponding to deleted files will not currently be upgraded. This means that a search matching these documents will find no filenames in the expected place. Go ahead and return the filename as originally stored, (rather than aborting with an internal error), in this case.
*	lib: Implement new notmuch_directory_t API.	Carl Worth	2010-01-06
\| \| \| \| \| \| \|	This new directory ojbect provides all the infrastructure needed to detect when files or directories are deleted or renamed. There's still code needed on top of this (within "notmuch new") to actually do that detection.
*	database: Abstract _filename_to_direntry from _add_message	Carl Worth	2010-01-06
\| \| \| \| \| \|	The code to map a filename to a direntry is something that we're going to want in a future _remove_message function, so put it in a new function _notmuch_database_filename_to_direntry .
*	database: Allowing storing multiple filenames for a single message ID.	Carl Worth	2010-01-06
\| \| \| \| \| \|	The library interface is unchanged so far, (still just notmuch_database_add_message), but internally, the old _set_filename function is now _add_filename instead.
*	database: Store mail filename as a new 'direntry' term, not as 'data'.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \|	Instead of storing the complete message filename in the data portion of a mail document we now store a 'direntry' term that contains the document ID of a directory document and also the basename of the message filename within that directory. This will allow us to easily store multple filenames for a single message, and will also allow us to find mail documents for files that previously existed in a directory but that have since been deleted.
*	lib: Abstract the extraction of a relative path from set_filename	Carl Worth	2010-01-06
\| \| \| \| \| \|	We'll soon be having multiple entry points that accept a filename path, so we want common code for getting a relative path from a potentially absolute path.