- Sep 09, 2020
-
-
- Aug 26, 2020
-
-
When enabled, the documents is pre-processed so all links of the page are made absolute using the <base> href (if present) or the page URL. AbsoluteLink becomes pointless when this option is enabled. It's not enabled by default as it would break some existing XPaths like: starts-with(@HRef, "/foo")
-
-
If we try to create an AbstractPage of another one, the class inherited will be AbstractPage, in consequence Page. Therefore we need to recursively handle the inheritance.
-
- Aug 07, 2020
-
-
This reverts commit fd0ebef2ca7ea013cf1cc620911b59836d2adee0.
-
- Jul 12, 2020
-
-
Romain Bignon authored
-
- Jul 02, 2020
-
-
ntome authored
pickle may be unsafe for loading data. All we want is a cookie jar, we can just serialize its cookies, not necessarily the whole jar with its type and policy. Old format is base64(compress(pickle(jar))). New format is base64(compress(json(jar))) where JSON conversion is on the list of cookies (taking name, value, domain, path, secure (https) and expires). dump_state will now save using the new format while load_state supports both new and old format. This allows compatibility for some time, then later old format support in load_state will be dropped too.
-
ntome authored
-
ntome authored
HAR file was rewritten on every request, because JSON libs don't allow inserting data in an existing JSON without rewriting the whole file. However, if we put request/response entries at the end of the HAR data, only a fixed suffix exists after the entries. Then we can seek near the file end to a computed position, write the new entry (which overwrites terminators), and we can rewrite the overwritten terminators right after. Not only can we write only the new data, but we do not need to keep track of the shifted bytes. If the HAR wasn't written with the exact same options, we won't seek accurately though.
-
- Jun 17, 2020
-
-
hydrargyrum authored
This reverts commit 30082774.
-
Should be improved in the future by not filtering URL and verb and replicating transfer behavior.
-
-
HAR is used by many tools though not standardized. Some fields can't be filled like timings or pages.
-
-
Passing data='' or json={} to browser methods like open() or build_request() used to make a GET, but this is incorrect. We can simply check the non-None of one of those params. If a Request parameter is used though, it's more implicit, because we cannot guess what was the intention, we can't easily distinguish the default value from an intentionally empty value Request().data == [] Request().json is None
-
It's purely HTML, no reason to put it in "standard" filters.
-
- May 07, 2020
-
-
NSS uses a certificate database that is empty by default and not updated automatically when new CAs are added in /etc/ssl/certs. So we are forced to recreate the database from scratch and since it takes about 1 minute, due to "certutil" command being slow, we can't do it everytime. By implementing an update operation that only adds new certificates and removes obsolete ones, we run certutil much less, so the update is significantly faster. In order to detect changes to certificates, and because NSS databases and PEM data are very shitty to introspect, we will base ourselves mostly on NSS cert "nicknames". As /etc/ssl/certs contains a lot of duplicates, we will rely on PEM data hashing instead instead of filenames to detect different certificates, and they will be the nicknames. Simplified, an update operation goes like this: - list all db cert hashes (the nicknames) - hash all system certs - add in db all system certs for which the nickname was missing - remove from db the nicknames if the hash wasn't in system list For migrating to new nicknames format, basically we will have to purge the db and recreate it.
-
CleanDecimal has a "sign" parameter accepting a callable, but most uses cases are simply for forcing a sign because websites get numbers wrong. As a result, there's frequently this contrived pasted code: "sign=lambda _: -1". Also, this "sign" function is poorly imagined because one could pass "2" and get a value doubled. Allow the param to be simply an sign string (like "-"), and force the sign, to fit the common case.
-
-
- Apr 22, 2020
-
-
Recently, geckodriver file logs were disabled, so they are output on stdout, which breaks apps relying on stdout containing only weboob output. Some code for outputting file logs in responses folder or in $TMPDIR was already written for phantomjs, let's just reuse it for other drivers.
-
http://foo#BAR should be kept as is by normalization. We were looking for a "/" to end the authority of the URL, but the path can be empty and thus authority may end with "?" or "#". See RFC3986.
-
- Apr 15, 2020
-
-
The unicode character `−` (U+2212: MINUS SIGN) was ignored by CleanDecimal. This causes amounts that have this character as a sign to be positive instead of negative, for example on boursorama.
-
- Apr 08, 2020
-
-
- Mar 18, 2020
-
-
- Mar 05, 2020
-
-
-
The use case of this patch is being able to perform: Base(TableCell('foo'), CleanText('./span')) It was impossible without because CleanText was called with a list of matching elements, instead of a single lxml element. Not sure if it's the right way to fix the problem. Other ways include: - modifying TableCell to return the element instead of a list of elements - modifying CleanText to accept a list of elements - introducing another filter to take only the first element
-
- Takes in params as None (no params) or as a dict or list<tuple> - Will append the specified params to the url's existing params
-
Closes: 60283@sibi
-
this is useful when we need to override the xpath
-
And it was removed in recent selenium versions.
-
When passing an interger to CleanText we get this crash: AttributeError: 'int' object has no attribute 'itertext' This correction enables doing CleanText(Dict(something)) when 'something' is an integer without using Eval(str, Dict(something)).
-
This allows to customise the behavior of the browser.
-
Date state needs to be stored in a string type. But get_expire() needs a date type, we have to convert it again.
-
-
-
- Feb 12, 2020
-
-
A logger named only "iter_accounts" was not explicit enough anyway.
-
-
- Jan 30, 2020
-
-
When using proxies with selenium (eg: https_proxy=... boobank ...) it crashed with the error: Specified proxy type ({'ff_value': 1, 'string': 'MANUAL'}) not compatible with current setting ({'ff_value': 0, 'string': 'DIRECT'})
-
-