Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Update scraper rule for heise.de | Frédéric Guillot | 2018-08-25 |
| | |||
* | Use canonical imports | Frédéric Guillot | 2018-08-24 |
| | |||
* | Add support for published tag in Atom feeds | neepl | 2018-07-17 |
| | |||
* | Add embedly.com to iframe whitelist | Frédéric Guillot | 2018-07-10 |
| | |||
* | New `add_dynamic_image` rewriter for JavaScript-loaded images. | dzaikos | 2018-07-09 |
| | | | | | | Searches tags for various `data-*` attributes and sets `img` tag `src` attribute appropriately. Falls back to searching `noscript` for `img` tags. Includes unit tests. | ||
* | Processor: Do rewriter before sanitizer for `entry.Content`. | dzaikos | 2018-07-06 |
| | | | | Addresses #163. | ||
* | Add support for protocol relative YouTube URLs | Frédéric Guillot | 2018-07-04 |
| | |||
* | Sandbox iframes when sanitizing. | dzaikos | 2018-07-03 |
| | | | | | | Updated iframe unit tests. Refactored sanitizer.getExtraAttributes() to use `switch` instead of multiple `if` statements. | ||
* | Add specific 404 and 401 error messages | Frédéric Guillot | 2018-06-30 |
| | |||
* | Refactor AddImageTitle rewriter. | dzaikos | 2018-06-26 |
| | | | | | | | | | | * Only processes images with `src` **and** `title` attributes (others are ignored). * Processes **all** images in the document (not just the first one). * Wraps the image and its title attribute in a `figure` tag with the title attribute's contents in a `figcaption` tag. Updated xkcd rewriter unit test. Added another xkcd rewriter unit test to check rendering of images without title tags. | ||
* | Improve sanitizer to remove style tag contents. | dzaikos | 2018-06-24 |
| | | | | | | See #157. Refactored how blacklisted tags are handled so they're easier manage in the future. | ||
* | Improve sanitizer to remove script and noscript contents | Dave Z | 2018-06-23 |
| | | | | | These tags where removed but the content was rendered as escaped HTML. See #157 | ||
* | Add new fields for feed username/password | Frédéric Guillot | 2018-06-19 |
| | |||
* | Rewrite iframe Youtube URLs to https://www.youtube-nocookie.com | Frédéric Guillot | 2018-06-12 |
| | |||
* | Handle feeds with dates formatted as Unix timestamp | Frédéric Guillot | 2018-05-08 |
| | |||
* | Add API endpoint to import OPML file | Frédéric Guillot | 2018-04-29 |
| | |||
* | Move HTTP client to its own package | Frédéric Guillot | 2018-04-28 |
| | |||
* | Scrape parent element for iframe | aniran | 2018-04-27 |
| | | | | | | | | Current behavior: if you have an `iframe` scraper rule, `scrapContent` tries to return the inner HTML of the `iframe`, which turns up blank. New behavior: like `img` elements, if an `iframe` is matched by a scraper rule, the parent element's inner HTML (i.e. the `iframe` is returned). | ||
* | Add soundcloud and bandcamp iframe sources | aniran | 2018-04-27 |
| | |||
* | Add support for Dublin Core date in RDF feeds | Frédéric Guillot | 2018-04-10 |
| | |||
* | Handle some non-english date formats | Frédéric Guillot | 2018-04-09 |
| | |||
* | Rename RSS parser getters | Frédéric Guillot | 2018-04-09 |
| | |||
* | Get the right comments URL when having multiple namespaces | Frédéric Guillot | 2018-04-09 |
| | |||
* | Add unit test for comments url and French translation | Frédéric Guillot | 2018-04-07 |
| | |||
* | Add CommentsURL to entry | Ben Brooks | 2018-04-07 |
| | |||
* | Handle RSS author elements with inner HTML | Frédéric Guillot | 2018-03-18 |
| | |||
* | Convert enclosure size field to bigint | Frédéric Guillot | 2018-03-14 |
| | |||
* | Fix broken OPML import with Go 1.10 | Frédéric Guillot | 2018-03-14 |
| | |||
* | Improve parser error messages | Frédéric Guillot | 2018-02-27 |
| | |||
* | Support localized feed errors generated by background workers | Frédéric Guillot | 2018-02-27 |
| | |||
* | Handle Atom feeds with HTML title | Frédéric Guillot | 2018-02-17 |
| | |||
* | Improve error handling for HTTP client | Frédéric Guillot | 2018-02-08 |
| | |||
* | Strip invalid XML characters to avoid parsing errors | Frédéric Guillot | 2018-02-07 |
| | |||
* | Remove period for feed errors | Frédéric Guillot | 2018-02-07 |
| | |||
* | Improve error handling when the response is empty | Frédéric Guillot | 2018-02-07 |
| | |||
* | Show API URL endpoints in user interface | Frédéric Guillot | 2018-01-31 |
| | |||
* | Do not override existing entries when the crawler is enabled | Frédéric Guillot | 2018-01-20 |
| | |||
* | Handle more encoding edge cases | Frédéric Guillot | 2018-01-20 |
| | | | | | | - Feeds with charset specified only in Content-Type header and not in XML document - Feeds with charset specified in both places - Feeds with charset specified only in XML document and not in HTTP header | ||
* | Do not crawl existing entry URLs | Frédéric Guillot | 2018-01-20 |
| | |||
* | Add more comments (GoDoc) | Frédéric Guillot | 2018-01-11 |
| | |||
* | Add scraper rule for darkreading.com | Frédéric Guillot | 2018-01-06 |
| | |||
* | Add more scraper rules | Frédéric Guillot | 2018-01-04 |
| | |||
* | Add content length check when refreshing feeds | Frédéric Guillot | 2018-01-04 |
| | |||
* | Handle more date formats | Frédéric Guillot | 2018-01-03 |
| | |||
* | If the website URL is empty, assign the feed URL | Frédéric Guillot | 2018-01-03 |
| | |||
* | Rename helper packages | Frédéric Guillot | 2018-01-02 |
| | |||
* | Make sure the scraper parse only HTML documents | Frédéric Guillot | 2018-01-02 |
| | |||
* | Add scraper rules for version2.dk and ing.dk | Frédéric Guillot | 2017-12-27 |
| | |||
* | Add more scraper rules | Frédéric Guillot | 2017-12-27 |
| | |||
* | Add support for data URL favicons | Frédéric Guillot | 2017-12-22 |
| |