PROJET AUTOBLOG ~ Streisand Effect


bfontaine.net

Site original : bfontaine.net

⇐ retour index

Mise à jour

Mise à jour de la base de données, veuillez patienter...

Typing Tuples in Python

Saturday 25 September 2021 à 09:51

Python added support for type hints in 3.5. These are not typing as you may be used to in other languages since they have no effect on the compilation to bytecode nor at runtime, but rather hints for the (tools of the) developer.

def print_int(n: int):
    print(n)

print_int(1)
print_int("foo")

The code above doesn’t fail when you call print_int("foo") even though n is “typed” as an int. This is because this n: int is just a hint.

While you can check for type issues by running mypy by hand, type hints become really powerful when your editor/IDE supports them.

Types for collections can specify the inner types: a list (List) that contains strings (str) would be List[str].

from typing import List

stuff: list = []  # a list of anything (equivalent to List[Any])
stuff.append(1)
stuff.append("foo")

offsets: List[int] = []  # a list of ints
offsets.append(1)
offsets.append("foo")

In the snippet above, the last line is highlighted as an error in any good IDE, and mypy would complain about it.

Other container types exist as well, and they can be nested:

from typing import List, Dict, Iterable

# a dictionary where keys are str and values are List[str], i.e. lists of strings
friends: Dict[str, List[str]] = {}
friends["Alice"] = ["Sam", "Maria"]

# a function that takes an iterable of ints. It can be a list, a tuple, a generator, a set, etc
def average(s: Iterable[int]):
    total = 0
    count = 0
    for element in s:
        total += element
        count += 1

    return total / count

Given List[x], Collection[x], Sequence[x] and other Set[x], one would expect Tuple[x] to be a hint for a tuple that contains x. Well… no.

This is confusing at first, but Tuple[str] types a tuple of a single element of type str. To add more elements, you need to type them as well: a pair of ints would be Tuple[int, int], while a triplet of a string, a float and a boolean would be Tuple[str, int, bool].

While tuples can be used as sequences (e.g. for immutable/hashable equivalents to lists), I’d argue that their primary use is for fixed-length representations, such as pairs of results:

def match_object(data: bytes):
    # example code
    distances = [0.99, 0.97, 0.96]
    indices = [432, 12, 3923]
    return distances, indices

In this snippet, match_object returns a tuple of a list of floats and a list of integers (aka Tuple[List[float], List[int]]).

If you still want to type arbitrary-length homogeneous tuples, there’s a syntax for that: Tuple[int, ...] types a tuple of any length, including 0, that contains only int elements (and yes, ... is valid in Python).

For this and other interrogations (how to type a generator?), Mypy has a useful type hints cheat sheet.

TL;DR: if you know the size of the tuple, use Tuple[x], Tuple[x, x], Tuple[x, x, x], etc. If you don’t, use Tuple[x, ...], but all elements must be of type x.

Fix bin/magento taking all the RAM

Wednesday 16 December 2020 à 12:08

While working with Magento 2.3.6 on Bixoto I hit a weird issue: the bin/magento command-line tool was always eating all the RAM, even with a simple command:

$ ./bin/magento --help
PHP Fatal error:  Allowed memory size of 2147483648 bytes exhausted (tried to allocate 262144 bytes) in /home/baptiste/.../vendor/magento/module-store/Model/Config/Placeholder.php on line 146
Check https://getcomposer.org/doc/articles/troubleshooting.md#memory-limit-errors for more info on how to handle out of memory errors.

The issue, as weird as it sounds, is an empty configuration value that causes Magento to end up in an infinite loop.

When I installed Magento on my local machine, I deactivated HTTPS by setting web/secure/base_url to NULL in the table core_config_data. This alone is the cause of the issue.

Check in MySQL:

SELECT * FROM core_config_data WHERE path = 'web/secure/base_url' LIMIT 1;

If this shows a line with a NULL value, either delete it or replace it with some non-null value:

UPDATE core_config_data SET value='http://...' WHERE path = 'web/secure/base_url' limit 1;

This has been reported to Magento but was closed because “it’s not a bug”. I don’t think falling in an infinite loop on --help because some config value is NULL should really be a normal behavior, but at least now you know how to solve it.

Fix Virtualbox installation for Docker on macOS

Friday 20 November 2020 à 17:08

While following a tutorial to install Virtualbox in order to have docker working on macOS, I hit an issue where the docker-machine create command fails with an error that looks like this:

VBoxManage: error: Failed to create the host-only adapter
VBoxManage: error: VBoxNetAdpCtl: Error while adding new interface: failed to open /dev/vboxnetctl: No such file or directory
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component HostNetworkInterfaceWrap, interface IHostNetworkInterface
VBoxManage: error: Context: "RTEXITCODE handleCreate(HandlerArg *)" at line 95 of file VBoxManageHostonly.cpp

If you search on the Web, everybody says you have to open the Security & Privacy settings window and allow the Oracle kernel extensions to run. But I didn’t have it. I tried uninstalling Virtualbox, re-installing through the official website, reboot, uninstall, re-install with brew cask but I always had the issue. Some people reported having a failed Virtualbox installation but mine seemed ok.

I tried the spctl solution but it didn’t change anything.

In the end, I tried this StackOverflow answer:

sudo "/Library/Application Support/VirtualBox/LaunchDaemons/VirtualBoxStartup.sh" restart

It failed, but it told me to check the Security & Privacy setting window. I did, and I had the button everyone was talking about. I enabled the kernel extension, rebooted, and it worked.

Hope this can save some time to anyone having the same issue!

Introduction to Code-Golf in Clojure

Sunday 1 December 2019 à 18:12

Code-Golf is the art of writing the shortest program in a given language that implements some given algorithm. It started in the 90’s in the Perl community and spread to other languages; there are now languages dedicated to code-golfing and StackExchange has a Q&A website for it.

In 2015, for example, I wrote a blog post showing how to write a JavaScript modules manager that fits in 140 chars (the maximum length of a tweet at that time).

4clojure is a well-known website to learn Clojure through exercises of increasing difficulty, but it has a lesser-known code-golf challenge which you can enable by clicking on “Leagues” in the top menu. If you check the code-golf checkbox, you then get a score on each problem that is the number of non-whitespace characters of your solution; the smaller the better.

The first thing you’ll note when code-golfing is that the reader syntax for anonymous functions is a lot shorter than using fn:

; 18 chars
(fn [a b c] (* (+ a b) c))

; 13 chars
#(* (+ %1 %2) %3)
; 12 chars: -1 char because '%' is equivalent to '%1'
#(* (+ % %2) %3)

Unfortunately you can’t have a reader-syntax function inside another reader-syntax one, so you often have to transform the code not to use anonymous functions.

for is a very powerful tool for that, because it allows you to do the equivalent of map, and a lot more, with no function:

; invalid!
#(map #(* 2 %) %)

; 19 chars
#(map (fn [x] (* 2 x)) %)
; 17 chars
#(map (partial * 2) %)
; 15 chars
#(for [x %] (* 2 x))

; Note that for this specific example
; the best solution uses `map`:
#(map + % %)

On some problems it can even be shorter than using map + filter:

; 31 chars
(fn [x a]
  (map inc (filter #(< % a) x)))

; 26 chars
#(for [e x :when (< e a)] (inc e))

Some core functions are equivalent in some contexts and so the shorter one can substitute a longer one:

; 18 chars
#(filter identity %)
; 14 chars
#(filter comp %)

; 6 chars
(inc x)
(dec x)
; 5 chars
(+ x 1)
(- x 1)

; 12 chars
(reduce str x)
; 11 chars
(apply str x)

; 14 chars
(apply concat x)
; 13 chars
(mapcat comp x)

When you must use a long function name in multiple places, it might be shorter to let that function with a one-letter symbol:

; 120 chars
#(clojure.set/difference
   (clojure.set/union % %2)
   (clojure.set/union
     (clojure.set/difference % %2)
     (clojure.set/difference %2 %)))

; 73 chars
#(let [d clojure.set/difference u clojure.set/union]
   (d (u % %2) (u (d % %2) (d %2 %))))

; Note that for this specific example
; there is a 17-chars solution
#(set (filter %2 %))

Other tricks

Use indexed access on vectors:

; 15 chars
(first [:a :b :c])
; 11 chars
([:a :b :c] 0)

Use set literals as functions:

; 16 chars
(remove #(= :a %) x)
; 14 chars
(remove #{:a} x)

Inverse conditions to use shorter functions:

; 15 chars
(if (empty? p) a b)
; 12 chars
(if (seq p) b a)

Inlined code is sometimes shorter:

; 24 chars
(let [p (* 3 a)]
  (if (< p 5)
    a
    p))
; 19 chars
(if (< (* 3 a) 5)
  a
  (* 3 a))

Use 1 instead of :else/:default in cond:

; 24 chars
(cond
  (= m p) a
  (< m p) b
  :else c)

; 20 chars
(cond
  (= m p) a
  (< m p) b
  1 c)

Use maps instead of ifs for conditions on equality (this one really makes the code harder to read):

; 13 chars
(if (= "L" x) a b)
; 12 chars
(case x "L" a b)
; 10 chars
({"L" a} x b)

Make a Gif of a Website’s Evolution

Monday 3 December 2018 à 20:19

For the latest StackExchange “time”-themed contest, I made a gif showing the evolution of StackOverflow from 2008 to today:

the gif

(click on the image to play it again)

The first step was to find a decent API for the Internet Archive. It supports Memento, an HTTP-based protocol defined in the RFC 7089 in

  1. Using the memento_client wrapper, we can get the closest snapshot of a website at a given date with the following Python code:
from datetime import datetime, timedelta
from memento_client import MementoClient

mc = MementoClient(timegate_uri="https://web.archive.org/web/",
        check_native_timegate=False)

def get_snapshot_url(url, dt):
    info = mc.get_memento_info(url, dt)
    closest = info.get("mementos", {}).get("closest")
    if closest:
        return closest["uri"][0]

# As an example, let’s look at StackOverflow two weeks ago
url = "https://stackoverflow.com/"
two_weeks_ago = datetime.now() - timedelta(weeks=2)
snapshot_url = get_snapshot_url(url, two_weeks_ago)

print("StackOverflow from ~2 weeks ago: %s" % snapshot_url)

Don’t forget to install the memento_client lib:

pip install memento_client

Note this gives us the closest snapshot, so it might not be exactly two weeks ago.

We can use this code to loop using an increasing time delta in order to get snapshots at different times. But we don’t only want to get the URLs. We wants to make a screenshot of each one.

The easiest way to programmatically take a screenshot of a webpage is probably to use Selenium. I used Chrome as a driver; you can either download it from the ChromeDriver website or run the following command if you’re on a Mac with Homebrew:

brew install bfontaine/utils/chromedriver

We also need to install Selenium for Python:

pip install selenium

The code is pretty short:

from selenium import webdriver

def make_screenshot(url, filename):
    driver = webdriver.Chrome("chromedriver")
    driver.get(url)
    driver.save_screenshot(filename)
    driver.quit()

url = "https://web.archive.org/web/20181119211854/https://stackoverflow.com/"

make_screenshot(url, "stackoverflow_20181119211854.png")

If you run the code above, you should see a Chrome window open, go at the URL by itself, then close once the page is fully charged. You now have a screenshot of this page in stackoverflow_20181119211854.png! However, you’ll quickly notice the screenshot includes the Wayback Machine’s header over the top of the website:

This is handy when browsing through snapshots by hand, but not so much when we access them from Python.

Fortunately, we can get a header-less URL by changing it a bit: we can append id_ to the end of the date in order to get the page exactly as it was when the bot crawled it. However, this means it links to CSS and JS files that may not exist anymore. We can get a URL to an archived page that has been slightly modified to replace links with their archived version using im_ instead.

Page with header and rewritten links:
https://web.archive.org/web/20181119211854/...
Original page, as it was when crawled:
https://web.archive.org/web/20181119211854id_/...
Original page with rewritten links:
https://web.archive.org/web/20181119211854im_/...

Re-running the code using the modified URL gives us the correct screenshot:

url = "https://web.archive.org/web/20181119211854im_/https://stackoverflow.com/"
make_screenshot(url, "stackoverflow_20181119211854.png")

Joining the two bits of code we can make screenshots of a URL at different intervals. You may want to check the images once it’s done to remove inconsistencies. For example, the archived snapshots of Google’s homepage aren’t all in the same language.

Once we have all images, we can generate a gif using Imagemagick:

convert -delay 50 -loop 1 *.png stackoverflow.gif

I used the following parameters:

You may want to play with the -delay parameter depending on how many images you have as well as how often the website changes.

I also made a version with Google (~10MB) at 5 frames per second, with -delay 20. I used the same delay as the StackOverflow gif: at least 5 weeks between each screenshot. You can see which year the screenshot is from by looking at the bottom of each image.