Site original : bfontaine.net
When using any library for Google Cloud you can specify a service account with GOOGLE_APPLICATION_CREDENTIALS
, but
unfortunately that doesn’t work when using gsutil
in shell scripts. The documentation suggests to use
gcloud auth activate-service-account
, but that “activates” the service account for all gsutil
invocations, and
doesn’t work if you installed a standalone version of gsutil
—without gcloud
.
I wanted to have one service account per project so that each project has access to the relevant resources only. The
solution I found is to use a Boto file: this is a ini-like file format used for AWS configuration, but gsutil
also
supports it. You can tell gsutil
to find such file with BOTO_CONFIG
or give it a list of paths to look in with
BOTO_PATH
.
In a simple project where the main code is a shell script, the setup would look like this:
$ ls -a
.boto
script.sh
service-account.json
In .boto
:
[Credentials]
gs_service_key_file=/app/service-account.json
In script.sh
:
#!/bin/env bash -e
export BOTO_CONFIG=/app/.boto
gsutil ...
This is a bit cumbersome compared to GOOGLE_APPLICATION_CREDENTIALS
but it works well.
For the past three years I’ve been running a Twitter bot called @lemondeadit, a French equivalent of @NYT_first_said which tweets new words as they appear in the French newspaper Le Monde.
The account was created in June 2019 but the first commit dates back to 2018. One day I might blog a bit about how it works under the hood, but today I’d like to talk about a small bug that made me crazy for a while.
The bot has been running stable for a couple of years without me touching the code at all. But recently, I noticed that sometimes it would tweet some very common words as if they were new, like “économie” or “opéra”. I deleted them when I saw them, but I didn’t understand why it happened.
I initially attributed the bug to an issue with the search engine: when the bot sees a word it never saw before, it uses Le Monde’s internal search engine to verify that this word indeed appears in a single article. Around the end of 2019,Le Monde ditched their previous search engine, which was slow but exact, in favor of a Qwant-based one which is fast but inexact: it doesn’t respect your query (it may or may not autocorrect your query and there’s no way to prevent it from doing so), it’s not accurate (no result may or may not mean that there is no match) and it’s not stable (do the same query twice and you can get different results; even by a large order of magnitude). In French, we would say this is “de la merde”.
Anyway, I thought it was an issue where maybe using the internal search engine would give a single result instead of multiple ones, but this doesn’t make sense: the bot doesn’t check every single word with the search engine, but only those it thinks are unique. Since it has already scrapped virtually all articles from the newspaper, there’s no way it would consider “économie” as a new word.
Today, I decided to tackle the issue.
First, I searched in the database for a recent example, and I got “vaccinées” (the feminin plural form
of “vaccinated”). This is a very old word; it first appeared in Le Monde in 1946 and we saw it more than
1.7k times since then. The word even showed up twice in the database, despite a UNIQUE
constraint on the field:
word
-----------
vaccinées
vaccinées
I inspected the strings and noticed they were encoded differently, although they were both in UTF-8. The difference was
on the “é” character: it turns out that both are different: one is represented as the Unicode character
LATIN SMALL LETTER E WITH ACUTE
while the other is a combination of LATIN SMALL LETTER E
and
COMBINING ACUTE ACCENT
. They both render exactly the same on screen, which makes this issue so hard to detect by a
human.
>>> "é" == "é"
False
The fix was to normalize the text before interpreting its words. I did it in Python using unicodedata.normalize
:
text = unicodedata.normalize("NFKC", text)
There are multiple normalization forms, and I chose NFKC
: normalize equivalent characters to their canonical one, and
compose all combining characters such that the non-combined and the combined versions become the same.
After fixing the code, I needed to fix the database to fix already-parsed words. In PostgreSQL, we can use
SIMILAR TO
to search for strings matching a certain characters range: in my case, the “Combining Diacritical Marks”,
aka 0300-036F
:
SELECT word FROM word WHERE word SIMILAR TO '%[\u0300-\u036F]%' ;
I use Peewee to interact with Postgres from Python. While it doesn’t support SIMILAR TO
out of the box, it’s
simple to use a custom expression:
Word.select().where(Expression(Word.word, "SIMILAR TO", "%[\\u0300-\\u036F]%"))
Unfortunately this feature is not available in SQLite, the database I use to run my integration tests, and so I had to
adapt the code a little bit: first use SqliteExtDatabase
instead of SqliteDatabase
to get support for
REGEXP
, and then use the .regexp
operator:
# SQLite
Word.select().where(Word.word.regexp("[\u0300-\u036F]"))
I was then able to run a quick function to normalize the ~1k words affected by the issue.
To conclude, it seemed a very weird issue at first but in the end it allowed me to learn a few things about Unicode, Postgres, and Python’s fantastic standard library.
Python added support for type hints in 3.5. These are not typing as you may be used to in other languages since they have no effect on the compilation to bytecode nor at runtime, but rather hints for the (tools of the) developer.
def print_int(n: int):
print(n)
print_int(1)
print_int("foo")
The code above doesn’t fail when you call print_int("foo")
even though n
is “typed” as an int
. This is because
this n: int
is just a hint.
While you can check for type issues by running mypy
by hand, type hints become really
powerful when your editor/IDE supports them.
Types for collections can specify the inner types: a list (List
) that contains strings (str
) would be List[str]
.
from typing import List
stuff: list = [] # a list of anything (equivalent to List[Any])
stuff.append(1)
stuff.append("foo")
offsets: List[int] = [] # a list of ints
offsets.append(1)
offsets.append("foo")
In the snippet above, the last line is highlighted as an error in any good IDE, and mypy
would complain about it.
Other container types exist as well, and they can be nested:
from typing import List, Dict, Iterable
# a dictionary where keys are str and values are List[str], i.e. lists of strings
friends: Dict[str, List[str]] = {}
friends["Alice"] = ["Sam", "Maria"]
# a function that takes an iterable of ints. It can be a list, a tuple, a generator, a set, etc
def average(s: Iterable[int]):
total = 0
count = 0
for element in s:
total += element
count += 1
return total / count
Given List[x]
, Collection[x]
, Sequence[x]
and other Set[x]
, one would expect Tuple[x]
to be a hint for a tuple
that contains x
. Well… no.
This is confusing at first, but Tuple[str]
types a tuple of a single element of type str
. To add
more elements, you need to type them as well: a pair of ints would be Tuple[int, int]
, while a triplet of a string,
a float and a boolean would be Tuple[str, int, bool]
.
While tuples can be used as sequences (e.g. for immutable/hashable equivalents to lists), I’d argue that their primary use is for fixed-length representations, such as pairs of results:
def match_object(data: bytes):
# example code
distances = [0.99, 0.97, 0.96]
indices = [432, 12, 3923]
return distances, indices
In this snippet, match_object
returns a tuple of a list of floats and a list of integers
(aka Tuple[List[float], List[int]]
).
If you still want to type arbitrary-length homogeneous tuples, there’s a syntax for that: Tuple[int, ...]
types a tuple of any length, including 0, that contains only int
elements (and yes, ...
is valid in Python).
For this and other interrogations (how to type a generator?), Mypy has a useful type hints cheat sheet.
TL;DR: if you know the size of the tuple, use Tuple[x]
, Tuple[x, x]
, Tuple[x, x, x]
, etc. If you don’t,
use Tuple[x, ...]
, but all elements must be of type x
.
While working with Magento 2.3.6 on Bixoto I hit a weird issue: the
bin/magento
command-line tool was always eating all the RAM, even with a simple command:
$ ./bin/magento --help
PHP Fatal error: Allowed memory size of 2147483648 bytes exhausted (tried to allocate 262144 bytes) in /home/baptiste/.../vendor/magento/module-store/Model/Config/Placeholder.php on line 146
Check https://getcomposer.org/doc/articles/troubleshooting.md#memory-limit-errors for more info on how to handle out of memory errors.
The issue, as weird as it sounds, is an empty configuration value that causes Magento to end up in an infinite loop.
When I installed Magento on my local machine, I deactivated HTTPS by setting web/secure/base_url
to NULL
in the table core_config_data
. This alone is the cause of the issue.
Check in MySQL:
SELECT * FROM core_config_data WHERE path = 'web/secure/base_url' LIMIT 1;
If this shows a line with a NULL
value, either delete it or replace it with some non-null value:
UPDATE core_config_data SET value='http://...' WHERE path = 'web/secure/base_url' limit 1;
This has been reported to Magento but was closed because “it’s not a bug”. I don’t think
falling in an infinite loop on --help
because some config value is NULL
should really be a
normal behavior, but at least now you know how to solve it.
While following a tutorial to install Virtualbox in order to have docker
working on macOS, I
hit an issue where the docker-machine create
command fails with an error that looks like this:
VBoxManage: error: Failed to create the host-only adapter
VBoxManage: error: VBoxNetAdpCtl: Error while adding new interface: failed to open /dev/vboxnetctl: No such file or directory
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component HostNetworkInterfaceWrap, interface IHostNetworkInterface
VBoxManage: error: Context: "RTEXITCODE handleCreate(HandlerArg *)" at line 95 of file VBoxManageHostonly.cpp
If you search on the Web, everybody says you have to open the Security & Privacy settings window
and allow the Oracle kernel extensions to run. But I didn’t have it. I tried uninstalling
Virtualbox, re-installing through the official website, reboot, uninstall, re-install with
brew cask
but I always had the issue. Some people reported having a failed Virtualbox
installation but mine seemed ok.
I tried the spctl
solution but it didn’t change anything.
In the end, I tried this StackOverflow answer:
sudo "/Library/Application Support/VirtualBox/LaunchDaemons/VirtualBoxStartup.sh" restart
It failed, but it told me to check the Security & Privacy setting window. I did, and I had the button everyone was talking about. I enabled the kernel extension, rebooted, and it worked.
Hope this can save some time to anyone having the same issue!