- Dec 21, 2018
-
-
- Dec 17, 2018
-
-
-
When there are no <td>, TableCell.call_without_colspan returned empty list, but TableCell.call_with_colspan crashed because it couldn't count tds. Compatible behavior is to also return empty list.
-
It's not exactly a timeout but for now it seems sufficient for requests to retry. I don't know exactly in what circumstances this errors occurs in the wild, but it can tentatively be reproduced experimentally with a very long wait (minutes or hours) before doing the handshake.
-
If "," is the thousands separator, "12,345" should be accepted but not "123,45".
-
"- 123" should reasonably be accepted.
-
-
- Dec 02, 2018
-
-
The "colspan" attribute enables handling of <td> tags that have a "colspan" attribute that is higher than 1. These columns occupy more than one slot in the table, creating a column shift that we must handle otherwise the col_names will not fit anymore with the col heads.
-
- Nov 10, 2018
-
-
NSS seem to have different behaviors on different distros, e.g. Debian and CentOS, and it also depends on the NSS version, which creates a whole matrix of cases. Try to force SQL database use for >=3.35 and default (probably DBM) for versions <3.35. NSS might still ask the infamous question: Enter Password or Pin for "NSS Certificate DB": but deleting the old generated *.db files should solve it.
-
Some shit sites like cragr/lcl/bforbank currently cause NSS to have error SEC_ERROR_OCSP_UNKNOWN_CERT, even in Firefox. Since disabling cannot be done per module, just disable it for those dumbasses.
-
-
-
All Page classes have a logger, mimic it.
-
- Oct 11, 2018
-
-
CleanDecimal used to parse "123foo456" as "123456". Now it raises an exception if there are multiple numbers in the parsed text.
-
FilterError is dedicated to filters and thus can be used.
-
-
-
- Oct 09, 2018
-
-
python3 requires str, but python2 requires bytes.
-
- Sep 16, 2018
-
-
-
-
-
Some sites rely on TLS extension "AIA" and do not provide a complete certificate chain up to the CA. The AIA extension lets the site define an HTTP URL where to fetch the parent certificates. When encountering this case, the parent certificate must be checked to really be the parent of the certificate being validated. Also, the parent certificate must be in the trusted CAs.
-
- Aug 18, 2018
-
-
-
This can be useful for responsive sites.
-
Sometimes, nss_get_version will return "3.21.3 Extended ECC" which can't be parsed. Trim junk to be able to parse it.
-
- Aug 09, 2018
-
-
- Jul 29, 2018
-
-
With NSS, unlike python sockets, the timeout should be passed on every recv call. But since it's implemented in C, we're forced to reimplement read/readinto/etc. Use io.BufferedRWPair and io.RawIOBase to implement some of them and implement the others by hand.
-
NSS uses different filenames for its certificate database depending on its version (cert8.db before NSS 3.35, cert9.db after). This filename is checked to determine if the certificate db must be created, so we need to find the correct filename.
-
- Jun 29, 2018
-
-
- Jun 09, 2018
-
-
-
It may be executed multiple times per source page though. Since property "page" will generate a Page object on every access. It should be idempotent then.
-
-
Since pages can have a lot of javascript, URL change does not reflect when page changed. Use full page_source instead, and save when browser.page attribute is used. Requests can't be saved. page_source contains inline images which can be heavy, so support a configurable size quota.
-
By default, phantomjs might log in current dir. Use a temp path or responses_dirname if available.
-
Romain Bignon authored
-
- May 28, 2018
-
-
hydrargyrum authored
Having "class obj_x(ItemElement)" to do nested object parsing is possible. Support "class obj_x(ListElement)" to parse a list of sub-objects, and coerce to list instead of an iterator.
-
- May 12, 2018
-
-