notmuch - thread-based email index, search and tagging

	Commit message (Collapse)	Author	Age
*	lib: Implement versioning in the database and provide upgrade function.	Carl Worth	2010-01-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The recent support for renames in the database is our first time (since notmuch has had more than a single user) that we have a database format change. To support smooth upgrades we now encode a database format version number in the Xapian metadata. Going forward notmuch will emit a warning if used to read from a database with a newer version than it natively supports, and will refuse to write to a database with a newer version. The library also provides functions to query the database format version: notmuch_database_get_version to ask if notmuch wants a newer version than that: notmuch_database_needs_upgrade and a function to actually perform that upgrade: notmuch_database_upgrade
*	notmuch new: Fix deletion support to recurse on removed directories.	Carl Worth	2010-01-07
\| \| \| \| \| \| \|	Previously, when notmuch detected that a directory had been deleted it was only removing files immediately in that directory. We now correctly recurse to also remove any directories (and files, etc.) within sub-directories, etc.
*	TODO: Add a couple of ideas that came up during recent coding.	Carl Worth	2010-01-07
\| \| \| \| \|	The notmuch_query_count_messages functions duplicates a lot of code undesirably.
*	Prefer READ_ONLY consistently over READONLY.	Carl Worth	2010-01-07
\| \| \| \| \| \|	Previously we had NOTMUCH_DATABASE_MODE_READ_ONLY but NOTMUCH_STATUS_READONLY_DATABASE which was ugly and confusing. Rename the latter to NOTMUCH_STATUS_READ_ONLY_DATABASE for consistency.
*	lib: Consolidate checks for read-only database.	Carl Worth	2010-01-07
\| \| \| \| \| \| \| \| \| \| \|	Previously, many checks were deep in the library just before a cast operation. These have now been replaced with internal errors and new checks have instead been added at the beginning of all top-levelentry points requiring a read-write database. The new checks now also use a single function for checking and printing the error message. This will give us a convenient location to extend the check, (such as based on database version as well).
*	lib: Clarify internal documentation of _notmuch_database_filename_to_direntry	Carl Worth	2010-01-07
\| \| \| \| \| \| \|	The original wording made it sound like this function was just doing some string manipulation. But this function actually creates new directory documents as a side effect. So make that explicit in its documentation.
*	notmuch_message_get_filename: Support old-style filename storage.	Carl Worth	2010-01-07
\| \| \| \| \| \| \| \| \| \| \|	When a notmuch database is upgraded to the new database format, (to support file rename and deletion), any message documents corresponding to deleted files will not currently be upgraded. This means that a search matching these documents will find no filenames in the expected place. Go ahead and return the filename as originally stored, (rather than aborting with an internal error), in this case.
*	notmuch new: Never ask the database for any names from a new directory.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we know that we are adding a new directory to the database, (and we therefore are using inode rather than strcmp-based sorting of the filenames), then we never want to see any names from the database. If we get any names that could only make us inadvertently remove files that we just added. Since it's not obvious from the Xapian documentation whether new terms being added as part of new documents will appear in the in-progress all-terms iteration we are using, (and this might differ based on Xapian backend and also might differ based on how many new directories are added and whether a flush threshold is reached). For all of these reasons, we play it safe and use NULL rather than a real notmuch_filenames_t iterator in this case to avoid any problem.
*	lib: Treat NULL as a valid (and empty) notmuch_filenames_t iterator.	Carl Worth	2010-01-06
\| \| \| \| \|	This will be convenient to avoid some special-casing in higher-level code.
*	notmuch new: Fix bug resulting in file removal on initial build of database.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bug here was that we would see that the database did not know anything about a directory so would get results from the filesystem in inode rather than strcmp order. However, we wouldn't actually ask for the list of files from the database until after recursing into the sub-directories. So by the time we traverse the filenames looking for deletions, the database does have entries and we end up detecting erroneous deletions because our filename list from the filesystem isn't in strcmp order. So ask for the list of names from the database before doing any additions to avoid this problem.
*	notmuch new: Fix to detect deletions of names at the end of the list.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \|	Previously we only scanned the list of filenames in the filesystem and detected a deletion whenever that scan skipped a name that existed in the database. That much was fine, but we also need to continue walking the list of names from the database when the filesystem list is exhausted. Without this, removing the last file or directory within any particular directory would go undetected.
*	notmuch new: Fix regression preventing addition of symlinked mail files.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \|	As described in the previous commit message, we introduced multiple symlink-based regressions in commit 3df737bc4addfce71c647792ee668725e5221a98 Here, we fix the case of symlinks to regular files by doing an extra stat of any DT_LNK files to determine if they do, in fact, link to regular files.
*	notmuch new: Fix regression preventing recursion through symlinks.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 3df737bc4addfce71c647792ee668725e5221a98 we switched from using stat() to using the d_type field in the result of scandir() to determine whether a filename is a regular file or a directory. This change introduced a regression in that the recursion would no longer traverse through a symlink to a directory. (Since stat() would resolve the symlink but with scandir() we see a distinct DT_LNK value in d_type). We fix this for directories by allowing both DT_DIR and DT_LNK values to recurse, and then downgrading the existing not-a-directory check within the recursion to not be an error. We also add a new not-a-directory check outside the recursion that is an error.
*	Fix typo in comment.	Carl Worth	2010-01-06
\| \| \| \|	The difference between "now" and "not" ends up being fairly dramatic.
*	notmuch new: Print counts of deleted and renamed messages.	Carl Worth	2010-01-06
\| \| \| \| \|	It's nice to be able to see a report indicating that the recently added support for detecting file rename and deletion is working.
*	lib: Indicate whether notmuch_database_remove_message removed anything.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \|	Similar to the return value of notmuch_database_add_message, we now enhance the return value of notmuch_database_remove_message to indicate whether the message document was entirely removed (SUCCESS) or whether only this filename was removed and the document exists under other filenamed (DUPLICATE_MESSAGE_ID).
*	lib: Update documentation of notmuch_database_add_message.	Carl Worth	2010-01-06
\| \| \| \| \| \| \|	Previously, adding a filename with the same message ID as an existing message would do nothing. But we recently fixed this to instead add the new filename to the existing message document. So update the documentation to match now.
*	Index content from citations and signatures.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \|	In the presentation we often omit citations and signatures, but this is not content that should be omitted from the index, (especially when the citation detection is wrong---see cases where a line beginning with "From" is corrupted to ">From" by mail processing tools).
*	notmuch new: Proper support for renamed and deleted files.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "notmuch new" command will now efficiently notice if any files or directories have been removed from the mail store and will appropriately update its database. Any given mail message (as determined by the message ID) may have multiple corresponding filenames, and notmuch will return one of them. When a filen is deleted, the corresponding filename will be removed from the message in the database. When the last filename is removed from a message, that message will be entirely removed from the database. All file additions are handled before any file removals so that rename is supported properly.
*	notmuch new: Store detected removed filenames for later processing.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \|	It is essential to defer the actual removal of any filenames from the database until we are entirely done adding any new files. This is to avoid any information loss from the database in the case of a renamed file or directory. Note that we're still not actually doing any removal---still just printing messages indicating the filenames that were detected as removed. But we're at least now printing those messages at a time when we actually can do the actual removal.
*	notmuch new: Detect deleted (renamed) files and directories.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This takes advantage of the notmuch_directory_t interfaces added recently (with cooresponding storage of directory documents in the database) to detect when files or entire directories are deleted or renamed within the mail store. This also fixes the recent regression where all files would be processed by every run of "notmuch new", (now only new files are processed once again). The deleted files and directories are only detected so far. They aren't properly removed from the database.
*	add_files_recursive: Make the maildir detection more efficient.	Carl Worth	2010-01-06
\| \| \| \| \| \|	Previously, we were re-scanning the entire list of entries for every directory entry. Instead, we can simply check if the entries look like a maildir once, up-front.
*	add_files_recursive: Separate scanning for directories and files for legibility.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \|	We now do two scans over the entries returned from scandir. The first scan is looking for directories (and making the recursive call). The second scan is looking for new files to add to the database. This is easier to read than the previous code which had a single loop and some if statements with ridiculously long bodies. It also has the advantage that once the directory scan is complete we can do a single comparison of the filesystem and database mtimes and entirely skip the second scan if it's not needed.
*	add_files_recursive: Use consistent naming for array and count variables.	Carl Worth	2010-01-06
\| \| \| \| \| \|	Previously we had an array named "namelist" and its count named "num_entries". We now use an array name of "fs_entries" and a count named "num_fs_entries" to try to preserve sanity.
*	notmuch new: Remove an unnecessary stat of every regular file in the mail store.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were previousl using the stat for two reasons. One was to obtain the mtime of the file. This usage was removed in the previous commit, (since the mtime is unreliable in the case of a file being moved into the mail store). The second reason was to identify regular and directory file types. But this information is already available in the result we get from scandir. What's left is simply a stat for each directory in the mailstore, (which we are still using to compare filesystem mtime with the mtime stored in the database).
*	notmuch new: Eliminate the check on the mtime of regular files before adding.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This check was buggy in that moving a pre-existing file into the mail store, (where the file existed before the last run of "notmuch new"), does not update the mtime of the file. So the message would never be added to the database. The fix here is not practical in the long run, (since it causes all files in the mail store to be processed in every run of "notmuch new" (!)). But this change will let us drop a stat() call that we don't otherwise need and will help move us toward proper database-backed detection of new files, (which will fix the bug without the performance impact of the current fix).
*	notmuch new: Fix internal documentation of add_files_recursive.	Carl Worth	2010-01-06
\| \| \| \| \| \|	To make it more clear that the mtime of a directory does not affect whether further sub-directories are examined, (they are examined unconditionally).
*	notmuch new: Rename the various timestamp variables to be more clear.	Carl Worth	2010-01-06
\| \| \| \| \| \|	The previous name of "path_mtime" was very ambiguous. The new names are much more obvious (fs_mtime is the mtime from the filesystem and db_mtime is the mtime from the database).
*	notmuch new: Avoid updating directory timestamp if interrupted.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \|	This was a very dangerous bug. An interrupted "notmuch new" session would still update the timestamp for the directory in the database. This would result in mail files that were not processed due to the original interruption never being picked up by future runs of "notmuch new". Yikes!
*	notmuch-new: Remove dead add_files_callback code.	Carl Worth	2010-01-06
\| \| \| \|	Always satisfying to delete code (even if tiny).
*	Make the add_files function static within notmuch-new.c.	Carl Worth	2010-01-06
\| \| \| \| \|	No other files need this function so we don't need it exported in notmuch-client.h.
*	Makefiles: Use .DEFAULT to support arbitrary targets from sub directories.	Carl Worth	2010-01-06
\| \| \| \| \|	Taking advantage of the .DEFAULT construct means that we won't need to explicitly list targets such as "clean", etc. in each sub-Makefile.
*	Add missing comment for NOTMUCH_STATUS_READONLY_DATABASE.	Carl Worth	2010-01-06
\| \| \| \|	And adjust the string representation of the same to match.
*	lib: Implement new notmuch_directory_t API.	Carl Worth	2010-01-06
\| \| \| \| \| \| \|	This new directory ojbect provides all the infrastructure needed to detect when files or directories are deleted or renamed. There's still code needed on top of this (within "notmuch new") to actually do that detection.
*	Revamp the proposed directory-tracking API slightly.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \|	This commit contains my changes to the API proposed by Keith. Nothing is dramatically different. There are minor things like changing notmuch_files_t to notmuch_filenames_t and then various things needed for completeness as noticed while implementing this, (such as notmuch_directory_destroy and notmuch_directory_set_mtime).
*	Prototypes for directory tracking	Keith Packard	2010-01-06
\| \| \| \| \|	There's no functionality here yet---just a sketch of what the interface could look like.
*	database: Add new, public notmuch_database_remove_message	Carl Worth	2010-01-06
\| \| \| \| \| \|	This will allow applications to support the removal of messages, (such as when a file is deleted from the mail store). No removal support is provided yet in commands such as "notmuch new".
*	database: Add new find_doc_ids_for_term interface.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \|	The existing find_doc_ids function is convenient when the caller doesn't want to be bothered constructing a term. But when the caller does have the term already, that interface is just wasteful. So we export a lower-level interface that maps a pre-constructed term to a document-ID iterators.
*	database: Make find_unique_doc_id enforce uniqueness (for a debug build)	Carl Worth	2010-01-06
\| \| \| \| \|	Catching any violation of this unique-ness constraint is very much in line with similar, existing INTERNAL_ERROR cases.
*	database: Abstract _filename_to_direntry from _add_message	Carl Worth	2010-01-06
\| \| \| \| \| \|	The code to map a filename to a direntry is something that we're going to want in a future _remove_message function, so put it in a new function _notmuch_database_filename_to_direntry .
*	database: Allowing storing multiple filenames for a single message ID.	Carl Worth	2010-01-06
\| \| \| \| \| \|	The library interface is unchanged so far, (still just notmuch_database_add_message), but internally, the old _set_filename function is now _add_filename instead.
*	database: Store mail filename as a new 'direntry' term, not as 'data'.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \|	Instead of storing the complete message filename in the data portion of a mail document we now store a 'direntry' term that contains the document ID of a directory document and also the basename of the message filename within that directory. This will allow us to easily store multple filenames for a single message, and will also allow us to find mail documents for files that previously existed in a directory but that have since been deleted.
*	database: Split _find_parent_id into _split_path and _find_directory_id	Carl Worth	2010-01-06
\| \| \| \| \| \|	Some pending commits want the _split_path functionality separate from mapping a directory to a document ID. The split_path function now returns the basename as well as the directory name.
*	database: Store directory path in 'data' of directory documents.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \|	We're planning to have mail documents refer to directory documents for the path of the containing directory. To support this, we need the path in the data, (since the path in the 'directory' term can be irretrievable as it will be the SHA1 sum of the path for a very long path).
*	database: Export _notmuch_database_find_parent_id for internal use.	Carl Worth	2010-01-06
\| \| \| \| \| \|	We'll soon have mail documents referring to their parent directory's directory documents, so we'll need access to _find_parent_id in files such as message.cc.
*	database: Store the parent ID for each directory document.	Carl Worth	2010-01-06
\| \| \| \| \| \| \|	Storing the document ID of the parent of each directory document will allow us to find all child-directory documents for a given directory document. We will need this in order to detect directories that have been removed from the mail store, (though we aren't yet doing this).
*	database: Rename internal directory value from XTIMESTAMP to XDIRECTORY.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \|	The recent change from storing absolute paths to relative paths means that new directory documents will already be created, (and the old ones will just linger stale in the database). Given that, we might as well put a clean name on the term in the new documents, (and no real flag day is needed).
*	database: Store directory paths as relative, not absolute.	Carl Worth	2010-01-06
\| \| \| \| \| \| \|	We were already storing relative mail filenames, so this is consistent with that. Additionally, it means that directory documents remain valid even if the database is relocated within its containing filesystem.
*	lib: Document that the filename is stored in the 'data' of a mail document	Carl Worth	2010-01-06
\| \| \| \| \| \|	Our database schema documentation previously didn't give any indication of where this most essential piece of information is stored.
*	lib: Rename set/get_timestamp to set/get_directory_mtime.	Carl Worth	2010-01-06
\| \| \| \| \| \|	I've been suitably scolded by Keith for doing a premature generalization that ended up just making the documentation more convoluted. Fix that.