notmuch - thread-based email index, search and tagging

	Commit message (Collapse)	Author	Age
...
*	new: read db_files and db_subdirs only if mtime changed	Karel Zak	2011-03-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The db_files and db_subdirs are unnecessary for unchanged directories. maildir with 10000 e-mails: old version: $ time ./notmuch new No new mail. real 0m0.053s user 0m0.028s sys 0m0.026s new version: $ time ./notmuch new No new mail. real 0m0.032s user 0m0.009s sys 0m0.023s Signed-off-by: Karel Zak <kzak@redhat.com> Reviewed-by: Austin Clements <amdragon@mit.edu> Looks good (faster than, but provably equivalent to the original code! notmuch_directory_get_child_* are side-effect free, db_files/db_subdirs aren't used between where they were set in the old code and where they are set in the new code, and db_files/db_subdirs are initialized to NULL when declared). Another timing data point: Old code: ./notmuch new 0.77s user 0.28s system 99% cpu 1.051 total New code: ./notmuch new 0.09s user 0.27s system 98% cpu 0.368 total
*	new: Print progress estimates only when we have sufficient information	Michal Sojka	2011-01-26
\| \| \| \| \| \| \| \| \| \|	Without this patch, it might happen that the remaining time or processing rate were calculated just after start where nothing was processed yet. This resulted into division by a very small number (or zero) and the printed information was of little value. Instead of printing nonsenses we print only that the operation is in progress. The estimates will be printed later, after there is enough data.
*	new: Enhance progress reporting	Michal Sojka	2011-01-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	notmuch new reports progress only during the "first" phase when the files on disk are traversed and indexed. After this phase, other operations like rename detection and maildir flags synchronization are performed, but the user is not informed about them. Since these operations can take significant time, we want to inform the user about them. This patch enhances the progress reporting facility that was already present. The timer that triggers reporting is not stopped after the first phase but continues to run until all operations are finished. The rename detection and maildir flag synchronization are enhanced to report their progress.
*	new: Add all initial tags at once	Michal Sojka	2011-01-26
\| \| \| \| \| \| \| \|	If there are several tags applied to the new messages, it is beneficial to store them to the database at one, because it saves some time, especially when the notmuch new is run for the first time. This patch decreased the time for initial import from 1h 35m to 1h 14m.
*	Do not defer maildir flag synchronization for new messages	Austin Clements	2011-01-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a simplified version of a patch originally by Michal Sojka <sojkam1@fel.cvut.cz> which is designed to have the same performance benefits. Michal said the following: When notmuch new is run for the first time, it is not necessary to defer maildir flags synchronization to later because we already know that no files will be removed. Performing the maildinr flag synchronization immediately after the message is added to the database has the advantage that the message is likely hot in the disk cache so the synchronization is faster. Additionally, we also save one database query for each message, which must be performed when the operation is deferred. Without this patch, the first notmuch new of 200k messages (3 GB) took 1h and 46m out of which 20m was maildir flags synchronization. With this patch, the whole operation took only 1h and 36m. Unlike Michal's patch, this version does the deferral for any new message, rather than doing it only on the first run of "notmuch new".
*	notmuch new: Scan directory whenever fs mtime is not equal to db mtime	Carl Worth	2010-12-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we would only scan a directory if the filesystem modification time was strictly newer than the database modification time for the directory. This would cause a problem for systems with an unstable clock, (if a new mail was added to the filesystem, then the system clock rolled backward, "notmuch new" would not find the message until the clock caught up and the directory was modified again). Now, we always scan the directory if the modification time of the directory is not exactly the same between the filesystem and the database. This avoids the problem described above even with an unstable system clock.
*	notmuch new: Defer maildir_flags synchronization until after removals	Carl Worth	2010-11-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a file in the mailstore is renamed, this appears to "notmuch new" as both an added file and a removed file (for the same message). We want the synchronization of the maildir_flags to reflect the final state, (after the rename is complete). Therefore, it's incorrect to perform the synchronization immediately after adding a new file. Instead we queue up these synchronizations (by message ID[]) and perform them after the removals are complete. With this change, the "dump/restore" case of the maildir-sync tests, as well as the recent "remove 'S'" case both now pass where they were failing before. Interestingly, the "remove info" test was passing before, but now fails. This is actually due to a separate bug, (and the bug just fixed was masking it, by preventing the test from performing as desired). [] It's important to queue by message ID---queueing actual message objects does not work since the message objects will retain stale data such as the old filenames.
*	lib: Rework interface for maildir_flags synchronization	Carl Worth	2010-11-11
\| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of having an API for setting a library-wide flag for synchronization (notmuch_database_set_maildir_sync) we instead implement maildir synchronization with two new library functions: notmuch_message_maildir_flags_to_tags and notmuch_message_tags_to_maildir_flags These functions are nicely documented here, (though the implementation does not quite match the documentation yet---as plainly evidenced by the current results of the test suite).
*	Avoid abbreviation, preferring notmuch_config_get_maildir_synchronize_flags	Carl Worth	2010-11-11
\| \| \| \| \| \| \| \| \| \| \| \| \|	Since the name of the configuration parameter here is: maildir.synchronize_flags the convention is that the functions to get and set this parameter should match it in name. Hence: notmuch_config_get_maildir_synchronize_flags etc. (as opposed to notmuch_config_get_maildir_sync).
*	Make maildir synchronization configurable	Michal Sojka	2010-11-10
\| \| \| \| \| \| \|	This adds group [maildir] and key 'synchronize_flags' to the configuration file. Its value enables (true) or diables (false) the synchronization between notmuch tags and maildir flags. By default, the synchronization is disabled.
*	Maildir synchronization	Michal Sojka	2010-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows bi-directional synchronization between maildir flags and certain tags. The flag-to-tag mapping is defined by flag2tag array. The synchronization works this way: 1) Whenever notmuch new is executed, the following happens: o New messages are tagged with configured new_tags. o For new or renamed messages with maildir info present in the file name, the tags defined in flag2tag are either added or removed depending on the flags from the file name. 2) Whenever notmuch tag (or notmuch restore) is executed, a new set of flags based on the tags is constructed for every message and a new file name is prepared based on the old file name but with the new flags. If the flags differs and the old message was in 'new' directory then this is replaced with 'cur' in the new file name. If the new and old file names differ, the file is renamed and notmuch database is updated accordingly. The rename happens before the database is updated. In case of crash between rename and database update, the next run of notmuch new brings the database in sync with the mail store again.
*	Sprinkle some const-correctness around new_tags.	Carl Worth	2010-04-23
\| \| \| \|	To eliminate a compiler warning.
*	notmuch-config: make new message tags configurable	Ben Gamari	2010-04-23
\| \| \| \| \| \|	Add a new_tags option in the [messages] section of the configuration file to allow the user to specify which tags should be added to new messages by notmuch new.
*	Prevent data loss caused by SIGINT during notmuch new	Michal Sojka	2010-04-13
\| \| \| \| \| \| \| \| \| \|	When Ctrl-C is pressed in a wrong time during notmuch new, it can lead to removal of messages from the database even if the files were not removed. It happened at least once to me. Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
*	lib: Rename iterator functions to prepare for reverse iteration.	Carl Worth	2010-03-09
\| \| \| \| \| \| \| \|	We rename 'has_more' to 'valid' so that it can function whether iterating in a forward or reverse direction. We also rename 'advance' to 'move_to_next' to setup parallel naming with the proposed functions 'move_to_first', 'move_to_last', and 'move_to_previous'.
*	Fix misspelling of DT_UNKNOWN.	Carl Worth	2010-01-23
\| \| \| \| \|	How foolish of me to advertise the fact that I pushed a commit without compiling it first...
*	Add some comments to document the recently-fixed handling of d_type.	Carl Worth	2010-01-23
\| \| \| \| \|	The fix was subtle, (requiring less code than originally expected), so it behooves us to document it well.
*	notmuch new: Fix to work on filesystems returning DT_UNKNOWN	Geo Carncross	2010-01-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Such as reiserfs or xfs. This has been broken since the merge of support for rename and deletion of files from the mail store. Here's the original justification for the patch: A review of notmuch-new.c shows three uses of ->d_type: Near line 153, in _entries_resemble_maildir() we can simply allow for DT_UNKNOWN. This would fail if people have MH-style folders which have three folders called "new" "cur" and "tmp", but that seems unlikely, in which case the "tmp" folder would simply not be scanned. Near line 273 in add_files_recursive() we have another check. If DT_UNKNOWN, we fall through, then add_files_recursive() does a stat almost immediately, returning with success if the path isn't a directory. Thus, the fallback is already written. Finally, near line 343, in add_files_recursive() (a long function) we have another check. Here we can simply treat DT_UNKNOWN as DT_LNK, since the logic for the stat() results are the same.
*	notmuch new: Print upgrade progress report as a percentage.	Carl Worth	2010-01-09
\| \| \| \| \| \| \| \| \| \| \| \|	Previously we were printing a number of messages upgraded so far. The original motivation for this was to accurately reflect the fact that there are two passes, (so each message is processed twice and it's not accurate to represent with a single count). But as it turns out, the second pass takes zero time (relatively speaking) so we're still not accounting for it. If nothing else, the percentage-based reporting makes for a cleaner API for the progress_notify function.
*	notmuch new: Don't prevent database upgrade from being interrupted.	Carl Worth	2010-01-08
\| \| \| \| \| \| \| \| \| \|	Our signal handler is designed to quickly flush out changes and then exit. But if a database upgrade is in progress when the user interrupts, then we just want to immediately abort. We could do something fancy like add a return value to our progress_notify function to allow it to tell the upgrade process to abort. But it's actually much cleaner and robust to delay the installation of our signal handler so that the default abort happens on SIGINT.
*	notmuch new: Automatically upgrade the database if necessary.	Carl Worth	2010-01-07
\| \| \| \| \| \|	This takes advantage of the recently added library support to detect if the database needs to be upgraded and then automatically performs that upgrade, (with a nice progress report).
*	notmuch new: Fix deletion support to recurse on removed directories.	Carl Worth	2010-01-07
\| \| \| \| \| \| \|	Previously, when notmuch detected that a directory had been deleted it was only removing files immediately in that directory. We now correctly recurse to also remove any directories (and files, etc.) within sub-directories, etc.
*	Prefer READ_ONLY consistently over READONLY.	Carl Worth	2010-01-07
\| \| \| \| \| \|	Previously we had NOTMUCH_DATABASE_MODE_READ_ONLY but NOTMUCH_STATUS_READONLY_DATABASE which was ugly and confusing. Rename the latter to NOTMUCH_STATUS_READ_ONLY_DATABASE for consistency.
*	notmuch new: Never ask the database for any names from a new directory.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we know that we are adding a new directory to the database, (and we therefore are using inode rather than strcmp-based sorting of the filenames), then we never want to see any names from the database. If we get any names that could only make us inadvertently remove files that we just added. Since it's not obvious from the Xapian documentation whether new terms being added as part of new documents will appear in the in-progress all-terms iteration we are using, (and this might differ based on Xapian backend and also might differ based on how many new directories are added and whether a flush threshold is reached). For all of these reasons, we play it safe and use NULL rather than a real notmuch_filenames_t iterator in this case to avoid any problem.
*	notmuch new: Fix bug resulting in file removal on initial build of database.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bug here was that we would see that the database did not know anything about a directory so would get results from the filesystem in inode rather than strcmp order. However, we wouldn't actually ask for the list of files from the database until after recursing into the sub-directories. So by the time we traverse the filenames looking for deletions, the database does have entries and we end up detecting erroneous deletions because our filename list from the filesystem isn't in strcmp order. So ask for the list of names from the database before doing any additions to avoid this problem.
*	notmuch new: Fix to detect deletions of names at the end of the list.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \|	Previously we only scanned the list of filenames in the filesystem and detected a deletion whenever that scan skipped a name that existed in the database. That much was fine, but we also need to continue walking the list of names from the database when the filesystem list is exhausted. Without this, removing the last file or directory within any particular directory would go undetected.
*	notmuch new: Fix regression preventing addition of symlinked mail files.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \|	As described in the previous commit message, we introduced multiple symlink-based regressions in commit 3df737bc4addfce71c647792ee668725e5221a98 Here, we fix the case of symlinks to regular files by doing an extra stat of any DT_LNK files to determine if they do, in fact, link to regular files.
*	notmuch new: Fix regression preventing recursion through symlinks.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 3df737bc4addfce71c647792ee668725e5221a98 we switched from using stat() to using the d_type field in the result of scandir() to determine whether a filename is a regular file or a directory. This change introduced a regression in that the recursion would no longer traverse through a symlink to a directory. (Since stat() would resolve the symlink but with scandir() we see a distinct DT_LNK value in d_type). We fix this for directories by allowing both DT_DIR and DT_LNK values to recurse, and then downgrading the existing not-a-directory check within the recursion to not be an error. We also add a new not-a-directory check outside the recursion that is an error.
*	Fix typo in comment.	Carl Worth	2010-01-06
\| \| \| \|	The difference between "now" and "not" ends up being fairly dramatic.
*	notmuch new: Print counts of deleted and renamed messages.	Carl Worth	2010-01-06
\| \| \| \| \|	It's nice to be able to see a report indicating that the recently added support for detecting file rename and deletion is working.
*	notmuch new: Proper support for renamed and deleted files.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "notmuch new" command will now efficiently notice if any files or directories have been removed from the mail store and will appropriately update its database. Any given mail message (as determined by the message ID) may have multiple corresponding filenames, and notmuch will return one of them. When a filen is deleted, the corresponding filename will be removed from the message in the database. When the last filename is removed from a message, that message will be entirely removed from the database. All file additions are handled before any file removals so that rename is supported properly.
*	notmuch new: Store detected removed filenames for later processing.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \|	It is essential to defer the actual removal of any filenames from the database until we are entirely done adding any new files. This is to avoid any information loss from the database in the case of a renamed file or directory. Note that we're still not actually doing any removal---still just printing messages indicating the filenames that were detected as removed. But we're at least now printing those messages at a time when we actually can do the actual removal.
*	notmuch new: Detect deleted (renamed) files and directories.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This takes advantage of the notmuch_directory_t interfaces added recently (with cooresponding storage of directory documents in the database) to detect when files or entire directories are deleted or renamed within the mail store. This also fixes the recent regression where all files would be processed by every run of "notmuch new", (now only new files are processed once again). The deleted files and directories are only detected so far. They aren't properly removed from the database.
*	add_files_recursive: Make the maildir detection more efficient.	Carl Worth	2010-01-06
\| \| \| \| \| \|	Previously, we were re-scanning the entire list of entries for every directory entry. Instead, we can simply check if the entries look like a maildir once, up-front.
*	add_files_recursive: Separate scanning for directories and files for legibility.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \|	We now do two scans over the entries returned from scandir. The first scan is looking for directories (and making the recursive call). The second scan is looking for new files to add to the database. This is easier to read than the previous code which had a single loop and some if statements with ridiculously long bodies. It also has the advantage that once the directory scan is complete we can do a single comparison of the filesystem and database mtimes and entirely skip the second scan if it's not needed.
*	add_files_recursive: Use consistent naming for array and count variables.	Carl Worth	2010-01-06
\| \| \| \| \| \|	Previously we had an array named "namelist" and its count named "num_entries". We now use an array name of "fs_entries" and a count named "num_fs_entries" to try to preserve sanity.
*	notmuch new: Remove an unnecessary stat of every regular file in the mail store.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were previousl using the stat for two reasons. One was to obtain the mtime of the file. This usage was removed in the previous commit, (since the mtime is unreliable in the case of a file being moved into the mail store). The second reason was to identify regular and directory file types. But this information is already available in the result we get from scandir. What's left is simply a stat for each directory in the mailstore, (which we are still using to compare filesystem mtime with the mtime stored in the database).
*	notmuch new: Eliminate the check on the mtime of regular files before adding.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This check was buggy in that moving a pre-existing file into the mail store, (where the file existed before the last run of "notmuch new"), does not update the mtime of the file. So the message would never be added to the database. The fix here is not practical in the long run, (since it causes all files in the mail store to be processed in every run of "notmuch new" (!)). But this change will let us drop a stat() call that we don't otherwise need and will help move us toward proper database-backed detection of new files, (which will fix the bug without the performance impact of the current fix).
*	notmuch new: Fix internal documentation of add_files_recursive.	Carl Worth	2010-01-06
\| \| \| \| \| \|	To make it more clear that the mtime of a directory does not affect whether further sub-directories are examined, (they are examined unconditionally).
*	notmuch new: Rename the various timestamp variables to be more clear.	Carl Worth	2010-01-06
\| \| \| \| \| \|	The previous name of "path_mtime" was very ambiguous. The new names are much more obvious (fs_mtime is the mtime from the filesystem and db_mtime is the mtime from the database).
*	notmuch new: Avoid updating directory timestamp if interrupted.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \|	This was a very dangerous bug. An interrupted "notmuch new" session would still update the timestamp for the directory in the database. This would result in mail files that were not processed due to the original interruption never being picked up by future runs of "notmuch new". Yikes!
*	notmuch-new: Remove dead add_files_callback code.	Carl Worth	2010-01-06
\| \| \| \|	Always satisfying to delete code (even if tiny).
*	Make the add_files function static within notmuch-new.c.	Carl Worth	2010-01-06
\| \| \| \| \|	No other files need this function so we don't need it exported in notmuch-client.h.
*	lib: Implement new notmuch_directory_t API.	Carl Worth	2010-01-06
\| \| \| \| \| \| \|	This new directory ojbect provides all the infrastructure needed to detect when files or directories are deleted or renamed. There's still code needed on top of this (within "notmuch new") to actually do that detection.
*	lib: Rename set/get_timestamp to set/get_directory_mtime.	Carl Worth	2010-01-06
\| \| \| \| \| \|	I've been suitably scolded by Keith for doing a premature generalization that ended up just making the documentation more convoluted. Fix that.
*	notmuch new: Remove hack to ignore read-only directories in mail store.	Carl Worth	2010-01-06
\| \| \| \| \| \| \|	This was really the last thing keeping the initial run of "notmuch new" being different from all other runs. And I'm taking a fresh look at the performance of "notmuch new" anyway, so I think we can safely drop this optimization.
*	notmuch new: Restrict the "not much" pun to the first run.	Carl Worth	2010-01-06
\| \| \| \| \| \| \| \| \|	Several people complained that the humor wore thin very quickly. The most significant case of "not much mail" is when counting the user's initial mail collection. We've promised on the web page that no matter how much mail the user has, notmuch will consider it to be "not much" so let's say so. (This message was in place very early on, but was inadvertently dropped at some point.)
*	Avoid compiler warnings due to ignored write return values	Dirk-Jan C. Binnema	2009-12-01
\| \| \| \| \| \| \| \| \| \| \| \| \|	Glibc (at least) provides the warn_unused_result attribute on write, (if optimizing and _FORTIFY_SOURCE is defined). So we explicitly ignore the return value in our signal handler, where we couldn't do anything anyway. Compile with: make CFLAGS="-O -D_FORTIFY_SOURCE" before this commit to see the warning.
*	notmuch-new: Check for non-fatal errors from stat()	Chris Wilson	2009-11-27
\| \| \| \| \| \| \| \| \| \| \|	Currently we assume that all errors on stat() a dname is fatal (but continue anyway and report the error at the end). However, some errors reported by stat() such as a missing file or insufficient privilege, we can simply ignore and skip the file. For the others, such as a fault (unlikely!) or out-of-memory, we handle like the other fatal errors by jumping to the end. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
*	Fix up whitespace styling from previous commit.	Carl Worth	2009-11-27
\| \| \| \| \|	Function name in definition belong left-aligned. Body of if statement cannot be on the same line as the "if".