The name follows the same logic:
strcpy: string copy
scrcpy: screen copy
sndcpy: sound copy
This is a quick proof-of-concept, composed of:
The long-term goal is to implement this feature properly in
VLC must be installed on the computer.
Plug an Android 10 device with USB debugging enabled, and execute:
If several devices are connected (listed by
./ on Windows)
It will install the app on the device, and request permission to start audio capture:
Once you clicked on START NOW, press Enter in the console to start playing
on the computer. Press
c in the terminal to stop (except on Windows,
just disconnect the device or stop capture from the device notifications).
The sound continues to be played on the device. The volume can be adjusted independently on the device and on the computer.
- By default, apps that target versions up to and including to Android 9.0 do not permit playback capture. To enable it, include
android:allowAudioPlaybackCapture="true"in the app’s
- By default, apps that target Android 10 (API level 29) or higher allow their audio to be captured. To disable playback capture, include
android:allowAudioPlaybackCapture="false"in the app’s
So some apps might need to be updated to support audio capture.
Ideally, I would like
scrcpy to support audio forwarding directly. However,
this will require quite a lot of work.
And it will require to implement audio playback (done by VLC in this
PoC), but also audio recording (for
scrcpy --record file.mkv), encoding and
decoding to transmit a compressed stream, handle audio-video synchronization…
Since I develop scrcpy on my free time, this feature will probably not be integrated very soon. Therefore, I prefer to release a working proof-of-concept which does the job.
In order to support audio forwarding in scrcpy, I first implemented an experimentation on a separate branch (see issue #14). But it was too hacky and fragile to be merged (and it does not work on all platforms).
So I decided to write a separate tool: USBaudio.
It works on Linux with PulseAudio.
First, you need to build it (follow the instructions).
Plug an Android device. If USB debugging is enabled, just execute:
If USB debugging is disabled (or if multiple devices are connected), you need to
specify a device, either by their serial or vendor id and product_id (as
The audio should be played on the computer.
If it’s stuttering, try increasing the live caching value (at the cost of a higher latency):
Note that it can also be directly captured by OBS:
USBaudio executes 3 steps successively:
Note that enabling audio accessory changes the USB device product id, so it will close any adb connection (and scrcpy). Therefore, you should enable audio forwarding before running scrcpy.
To only enable audio accessory without playing:
The audio input sources can be listed by:
$ pactl list short sources ... 5 alsa_input.usb-LGE_Nexus_5_05f5e60a0ae518e5-01.analog-stereo module-alsa-card.c s16le 2ch 44100Hz RUNNING
Use the number (here
5) to play it with VLC:
Alternatively, you can use ALSA directly:
$ cat /proc/asound/cards ... 1 [N5 ]: USB-Audio - Nexus 5 LGE Nexus 5 at usb-0000:00:14.0-4, high speed
Use the device number (here
1) as follow:
If it works manually but not automatically (without
-n), then please open an
It does not work on all devices, it seems that audio accessory is not always well supported. But it’s better than nothing.
In practice, the playlist was also controlling playback (start, stop, change volume…), configuring audio and video outputs, storing media detected by discovery…
For VLC 4, we wanted a new playlist API, containing a simple list of items (instead of a tree), acting as a media provider for a player, without unrelated responsibilities.
One major design goal is to expose what UI frameworks need. Several user interfaces, like Qt, Mac OS and Android1, will use this API to display and interact with the main VLC playlist.
The playlist must be performant for common use cases and usable from multiple threads.
Indeed, in VLC, user interfaces are implemented as modules loaded dynamically. In general, there is exactly one user interface, but there may be none or (in theory) several. Thus, the playlist may not be bound to the event loop of some specific user interface. Moreover, the playlist may be modified from a player thread; for example, playing a zip archive will replace the item by its content automatically.
User interfaces need random access to the playlist items, so a vector is the
most natural structure to store the items. A vector is provided by the
standard library of many languages (
vector in C++,
Vec in Rust,
ArrayList in Java…). But here, we’re in C, so there is nothing.
In the playlist, we only need a vector of pointers, so I first proposed
improvements to an existing type,
vlc_array_t, which only
void * as items. But it was considered useless
(1, 2) because it is too limited and
Therefore, I wrote
vlc_vector. It is implemented using macros so that it’s
generic over its item type. For example, we can use a vector of
Internally, the playlist uses a vector of playlist items:
UI frameworks typically use list models to bind items to a list view component. A list model must provide:
In addition, the model must notify its view when items are inserted, removed, moved or updated, and when the model is reset (the whole content should be invalidated).
The playlist API exposes the functions and callbacks providing these features.
However, the core playlist may not be used as a direct data source for a list model. In other words, the functions of a list model must not delegate the calls to the core playlist.
To understand why, let’s consider a typical sequence of calls executed by a view on its model, from the UI thread:
If we implemented
get(index) by delegating to the playlist, we
would have to lock each call individually:
Note that locking and unlocking from the UI thread for every playlist item is not a good idea for responsiveness, but this is a minor issue here.
The real problem is that locking is not sufficient to guarantee correctness: the list view expects its model to return consistent values. Our implementation can break this assumption, because the playlist content could change asynchronously between calls. Here is an example:
The view could not process any notification of the item removal before the end
of the current execution in its event loop… that is, at least after
model.get(4). To avoid this problem, the data provided by view models must
always live in the UI thread.
This implies that the UI has to manage a copy of the playlist content. The UI playlist should be considered as a remote out-of-sync view of the core playlist.
Note that the copy must not be limited to the list of pointers to playlist items: the content which is displayed and susceptible to change asynchronously (media metadata, like title or duration) must also be copied. The UI needs a deep copy; otherwise, the content could change (and be exposed) before the list view was notified… which, again, would break assumptions about the model.
The core playlist and the UI playlist are out-of-sync. So we need to “synchronize” them:
The core playlist is the source of truth.
Every change to the UI playlist must occur in the UI thread, yet the core playlist notification handlers are executed from any thread. Therefore, playlist callback handlers must retrieve appropriate data from the playlist, then post an event to the UI event loop2, which will be handled from the UI thread. From there, the core playlist will be out-of-sync, so it would be incorrect to access it.
The order of events forwarded to the UI thread must be preserved. That way, the indices notified by the core playlist are necessarily valid within the context of the list model in the UI thread. The core playlist events can be understood as a sequence of “patches” that the UI playlist must apply to its own copy.
This only works if only the core playlist callbacks modify the list model content.
Since the list model can only be modified by the core playlist callbacks, it is incorrect to modify it on user actions. As a consequence, the changes must be requested to the core playlist, which will, in turn, notify the actual changes.
The synchronization is more tricky in that direction.
To understand why, suppose the user selects items 10 to 20, then drag & drop to move them to index 42. Once the user releases the mouse button to “drop” the items, we need to lock the core playlist to apply the changes.
The problem is that, before we successfully acquired the lock, another client may have modified the playlist: it may have cleared it, or shuffled it, or removed items 5 to 15… As a consequence, we cannot apply the “move” request as is, because it was created from a previous playlist state.
To solve the issue, we need to adapt the request to make it fit the current playlist state. In other words, resolve conflicts: find the items if they had been moved, ignore the items not found for removal…
For that purpose, in addition to functions modifying the content directly, the playlist exposes functions to request “desynchronized” changes, which automatically resolve conflicts and generate an appropriate sequence of events to notify the clients of the actual changes.
Let’s take an example. Initially, our playlist contains 10 items:
[A, B, C, D, E, F, G, H, I, J]
The user selects
[C, D, E, F, G] and press the
Del key to remove the items.
To apply the change, we need to lock the core playlist.
But at that time, another thread was holding the lock to apply some other
changes. It removed
I, and shuffled the playlist:
[E, B, D, J, C, G, H, A]
Once the other thread unlocks the playlist, our lock finally succeeds. Then, we
request_remove([C, D, E, F, G]) (this is pseudo-code, the real function
Internally, it triggers several calls:
Thus, every client (including the UI from which the user requested to remove the
items), will receive a sequence of 3 events
to each removed slice.
The slices are removed in descending order for both optimization (it minimizes the number of shifts) and simplicity (the index of a removal does not depend on previous removals).
In practice, it is very likely that the request will apply exactly to the
current state of the playlist. To avoid unnecessary linear searches to find the
items, these functions accept an additional
index_hint parameter, giving the
index of the items when the request was created. It should (hopefully) almost
always be the same as the index in the current playlist state.
Contrary to shuffle, random playback does not move the items within the playlist; instead, it does not play them sequentially.
To select the next item to play, we could just pick one at random.
But this is not ideal: some items will be selected several times (possibly in a row) while some others will not be selected at all. And if loop is disabled, when should we stop? After all n items have been selected at least once or after n playbacks?
Instead, we would like some desirable properties that work both with loop enabled and disabled:
In addition, if loop is enabled:
I wrote a
randomizer to select items “randomly” within all these
To get an idea of the results, here is a sequence produced for a playlist
containing 5 items (
E), with loop enabled (so that it
E D A B C E B C A D C B E D A C E A D B A D C E B A B D E C B C A E D E D B C A E C B D A C A E B D C D E A B E D B A C D C B A E D A B C E B D C A E D C A B E B A E C D C E D A B C E B A D E C B D A D B A C E C E B A D B C E D A E A C B D A D E B C D C A E B E A D C B C D B A E C E A B D C D E A B D A E C B C A D B E A B E C D A C B E D E D A B C D E C A B C A E B D E B D C A C A E D B D B E C A
Here is how it works.
The randomizer stores a single vector containing all the items of the playlist. This vector is not shuffled at once. Instead, steps of the Fisher-Yates algorithm are executed one-by-one on demand. This has several advantages:
It also maintains 3 indexes:
headindicates the end of the items already determinated for the current cycle (if loop is disabled, there is only one cycle),
nextpoints to the item after the current one3,
historypoints to the first item of ordered history from the last cycle.
0 next head history size |---------------|-----|.............|-------------| <-------------------> <-----------> determinated range history range
Let’s reuse the example I wrote in the documentation.
Here is the initial state with our 5 items:
history next | head | | | A B C D E
The playlist calls
Next() to retrieve the next random item. The randomizer
picks one item (say,
D), and swaps it with the current head (
history next | head | | | D B C A E <---> determinated range
The playlist calls
Next() one more time. The randomizer selects one item
outside the determinated range (say,
history next | head | | | D E C A B <--------> determinated range
The playlist calls
Next() one more time. The randomizer selects
history next | head | | | D E C A B <-------------> determinated range
The playlist then calls
Prev(). Since the “current” item is
C, the previous
next moves back.
history next | | head | | | | D E C A B <-------------> determinated range
The playlist calls
Next(), which returns
C, as expected.
history next | head | | | D E C A B <-------------> determinated range
The playlist calls
Next(), the randomizer selects
B, and returns it.
history next | head | | | D E C B A <------------------> determinated range
The playlist calls
Next(), the randomizer selects the last item (it has no
head now point one item past the end (their value is
the vector size).
history next head | D E C B A <-----------------------> determinated range
At this point, if loop is disabled, it is not possible to call
false). So let’s enable it by calling
SetLoop(), then let’s call
This will start a new loop cycle. Firstly,
head are reset, and
the whole vector belongs to the last cycle history.
history next head | D E C B A <------------------------> history range
Secondly, to avoid selecting
A twice in a row (as the last item of the
previous cycle and the first item of the new one), the randomizer will
immediately determine another item in the vector (say
C) to be the first of
the new cycle. The items that belong to the history are kept in order.
history move forward.
history next | | head | | C D E B A <---><------------------> determinated history range range
Finally, it will actually select and return the first item (
history next head | C D E B A <---><------------------> determinated history range range
Then, the user adds an item to the playlist (
F). This item is added in front
history next | head | | | C F D E B A <---> <------------------> determinated history range range
The playlist calls
Next(), the randomizer randomly selects
“disappears” from the history of the last cycle. This is a general property:
each item may not appear more than once in the “history” (both from the last
and the new cycle). The history order is preserved.
history next | head | | | C E F D B A <--------> <--------------> determinated history range range
The playlist then calls
Prev() 3 times, that yields
next is decremented (modulo size) on each call.
history | next head | | | | | C E F D B A <--------> <--------------> determinated history range range
Hopefully, the resulting randomness will match what people expect in practice.
The implementation is complicated by the fact that items metadata can change asynchronously (for example if the player is parsing it), making the comparison function inconsistent.
As a benefit, the items are locked only once to retrieved their metadata.
A playlist, on the other hand, needs a player, and registers itself as its media provider. They are tightly coupled:
To keep them synchronized:
This poses a lock-order inversion problem: for example, if thread A locks the playlist then waits for the player lock, while thread B locks the player then waits for the playlist lock, then thread A and B are deadlocked.
To avoid the problem, the player and the playlist share the same lock.
vlc_playlist_Lock() delegates to
vlc_player_Lock(). In practice,
the lock should be held only for short periods of time.
A separate API (media source and media tree) was necessary to expose what is called services discovery (used to detect media from various sources like Samba or MTP), which were previously managed by the old playlist.
Thus, we could kill the old playlist.
Actually, the Android app will maybe continue to implement its own playlist in Java/Kotlin, to avoid additional layers (Java/JNI and LibVLC). ↩
Even in the case where a core playlist callback is executed from the UI
thread, the event must be posted to the event queue, to avoid breaking
the order. Concretely, in Qt, this means connecting signals to slots using
Qt::QueuedConnection instead of the default
next instead of
current so that all indexes are unsigned, while
current could be
Tile encoding consists in splitting video frames into tiles that can be encoded and decoded independently in parallel (to use several CPUs), at the cost of a small loss in compression efficiency.
This speeds up encoding and increases decoding frame rate.
To prepare for tiling, some refactoring was necessary.
To illustrate it, here is a mini-plane containing 6×3 pixels. Padding is added for alignment (and other details), so its physical size is 8×4 pixels:
In memory, it is stored in a single array:
The number of array items separating one pixel to the one below is called the stride. Here, the stride is 8.
The encoder often needs to process rectangular regions. For that purpose, many functions received a slice of the plane array and the stride value:
This works fine, but the plane slice spans multiple rows.
Let’s split our planes into 4 tiles (2 columns × 2 rows):
In memory, the resulting plane regions are not contiguous:
In Rust, it is not sufficient not to read/write the same memory from several threads, it must be impossible to write (safe) code that could do it. More precisely, a mutable reference may not alias any other reference to the same memory.
As a consequence, passing a mutable slice (
&mut [u16]) spanning multiple
rows is incompatible with tiling. Instead, we need some structure, implemented
with unsafe code, providing a view of the authorized region of the underlying
As a first step, I replaced every piece of code which used a raw slice and the
stride value by the existing
structures (which first required to make planes generic after
After these changes, our function could be rewritten as follow:
So now, all the code using a raw slice and a stride value has been replaced. But
if we look at the definition of
PlaneMutSlice, we see that it still borrows
the whole plane:
So the refactoring, in itself, does not solves the problem.
What is needed now is a structure that exposes a bounded region of the plane.
For illustration purpose, let’s consider a minimal example, solving a similar problem: split a matrix into columns.
In memory, the matrix is stored in a single array:
To do so, let’s define a
ColumnMut type, and split the raw array into columns:
PhantomData is necessary to bind the lifetime (in practice,
when we store a raw pointer, we often need a
The iterator returned by
columns() yields a different column every time, so
the borrowing rules are respected.
Now, we can read from and write to a matrix via temporary column views:
Even if the columns are interlaced in memory, from a
ColumnMut instance, it is
not possible to access data belonging to another column.
rows fields must be kept private, otherwise they could
be changed from safe code in such a way that breaks boundaries and violates
A plane is split in a similar way, except that it provides plane regions instead of colums.
In practice, more structures related to the encoding state are split into tiles, provided both in const and mut versions, so there is a whole hierarchy of tiling structures:
+- FrameState → TileState | +- Frame → Tile | | +- Plane → PlaneRegion | + RestorationState → TileRestorationState | | +- RestorationPlane → TileRestorationPlane | | +- FrameRestorationUnits → TileRestorationUnits | + FrameMotionVectors → TileMotionVectors +- FrameBlocks → TileBlocks
The split is done by a separate component (see
tiler.rs), which yields a
tile context containing an instance of the hierarchy of tiling views for each
A priori, there are mainly two possibilities to express offsets during tile encoding:
The usage of tiling views strongly favors the first choice. For example, it would be confusing if a bounded region could not be indexed from 0:
Worse, this would not be possible at all for the second dimension:
Therefore, offsets used in tiling views are relative to the tile (contrary to libaom and AV1 specification).
Encoding a frame first involves frame-wise accesses (initialization), then tile-wise accesses (to encode tiles in parallel), then frame-wise accesses using the results of tile-encoding (deblocking, CDEF, loop restoration, …).
All the frame-level structures have been replaced by tiling views where necessary.
The tiling views exist only temporarily, during the calls to
encode_tile(). While they are alive, it is not possible to
access frame-level structures (the borrow checker statically prevents it).
Then the tiling structures vanish, and frame-level processing can continue.
This schema gives an overview:
\ +----------------+ | | | | | | | Frame-wise accesses | | > | | | - FrameState<T> | | | - Frame<T> +----------------+ | - Plane<T> / - ... || tiling views \/ \ +---+ +---+ +---+ +---+ | | | | | | | | | | Tile encoding (possibly in parallel) +---+ +---+ +---+ +---+ | | +---+ +---+ +---+ +---+ | Tile-wise accesses | | | | | | | | > +---+ +---+ +---+ +---+ | - TileStateMut<'_, T> | - TileMut<'_, T> +---+ +---+ +---+ +---+ | - PlaneRegionMut<'_, T> | | | | | | | | | +---+ +---+ +---+ +---+ | / || vanishing of tiling views \/ \ +----------------+ | | | | | | | Frame-wise accesses | | > | | | (deblocking, CDEF, ...) | | | +----------------+ | /
To enable tile encoding, parameters have been added to pass the (log2) number of
--tile-rows-log2. For example, to request 2x2
rav1e video.y4m -o video.ivf --tile-cols-log2 1 --tile-rows-log2 1
Currently, we need to pass the log2 of the number of tiles (like in libaom,
even if the
aomenc options are called
avoid any confusion. Maybe we could find a better option which is both correct,
non-confusing and user-friendly later.
Now that we can encode tiles, we must write them according to the AV1 bitstream specification, so that decoders can read the resulting file correctly.
Before tile encoding (i.e. with a single tile), rav1e produced a correct bitstream. Several changes were necessary to write multiple tiles.
In addition, when there are several tiles, it signals two more values, described below.
For entropy coding, the encoder maintain and update a CDF (Cumulative Distribution Function), representing the probabilities of symbols.
After a frame is encoded, the current CDF state is saved to be possibly used as a starting state for future frames.
But with tile encoding, each tile finishes with its own CDF state, so which one
should we associate to the reference frame? The answer is: any of them. But we
must signal the one we choose, in
context_update_tile_id; the decoder needs it
to decode the bitstream.
In practice, we keep the CDF from the biggest tile.
The size of an encoded tile, in bytes, is variable (of course). Therefore, we will need to signal the size of each tile.
To gain a few bytes, the number of bytes used to store the size itself is also
variable, and signaled by 2 bits in the frame header
Concretely, we must choose the smallest size that is sufficient to encode all the tile sizes for the frame.
tile_start_and_end_present_flag(we always disable it);
The tile size (minus 1) is written in little endian, and use the number of bytes we signaled in the frame header.
That’s all. This is sufficient to produce a correct bitstream with multiple tiles.
I tested on my laptop (8 CPUs) several encodings to compare encoding performance (this is not a good benchmark, but it gives an idea, you are encouraged to run your own tests). Here are the results:
tiles time speedup 1 7mn02,336s 1.00× 2 3mn53,578s 1.81× 4 2mn12,995s 3.05× 8* 1mn57,533s 3.59×
Speedups are quite good for 2 and 4 tiles.
Why not 2×, 4× and 8× speedup? Mainly because of Amdahl’s law.
Tile encoding parallelizes only a part of the whole process: there are still single-threaded processings at frame-level.
Suppose that a proportion p (between 0 and 1) of a given task can be
parallelized. Then its theoretical speedup is
1 / ((p/n) + (1-p)), where n
is the number of threads.
tiles speedup speedup speedup (p=0.9) (p=0.95) (p=0.98) 2 1.82× 1.90× 1.96× 4 3.07× 3.48× 3.77× 8 4.71× 5.93× 7.02×
Maybe counterintuitively, to increase the speedup brought by parallelization, non-parallelized code must be optimized (the more threads are used, the more the non-parallelized code represents a significant part).
The (not-so-reliable) benchmark results for 2 and 4 tiles suggest that tile encoding represents ~90% of the whole encoding process.
Not everything worked the first time.
The most common source of errors while implementing tile encoding was related to offsets.
When there was only one tile, all offsets were relative to the frame. With several tiles, some offsets are relative to the current tile, but some others are still relative to the whole frame. For example, during motion estimation, a motion vector can point outside tile boundaries in the reference frame, so we must take care to convert offsets accordingly.
The most obvious errors were catched by plane regions (which prevent access outside boundaries), but some others were more subtle.
Such errors could produce interesting images. For example, here is a screenshot of my first tiled video:
But the final boss bug was way more sneaky: it corrupted the bitstream (so the decoder was unable to decode), but not always, and never the first frame. When an inter-frame could be decoded, it was sometimes visually corrupted, but only for some videos and for some encoding parameters.
After more than one week of investigations, I finally found it.
Implementing this feature was an awesome journey. I learned a lot, both about Rust and video encoding (I didn’t even know what a tile was before I started).
Big thanks to the Mozilla/Xiph/Daala team, who has been very welcoming and helpful, and who does amazing work!
I developed an application to display and control Android devices connected on USB. It does not require any root access. It works on GNU/Linux, Windows and Mac OS.
It focuses on:
You can build, install and run it.
The application executes a server on the device. The client and the server communicate via a socket over an adb tunnel.
The server streams an H.264 video of the device screen. The client decodes the video frames and displays them.
The client captures input (keyboard and mouse) events, sends them to the server, which injects them to the device.
The documentation gives more details.
Here, I will detail several technical aspects of the application likely to interest developers.
It takes time to encode, transmit and decode the video stream. To minimize latency, we must avoid any additional delay.
For example, let’s stream the screen with
screenrecord and play it with VLC:
adb exec-out screenrecord --output-format=h264 - | vlc - --demux h264
Initially, it works, but quickly the latency increases and frames are broken. The reason is that VLC associates a PTS to frames, and buffers the stream to play frames at some target time.
As a consequence, it sometimes prints such errors on stderr:
ES_OUT_SET_(GROUP_)PCR is called too late (pts_delay increased to 300 ms)
Just before I started the project, Philippe, a colleague who played with WebRTC, advised me to “manually” decode (using FFmpeg) and render frames, to avoid any additional latency. This saved me from wasting time, it was the right solution.
If, for any reason, the rendering is delayed, decoded frames are dropped so that scrcpy always displays the last decoded frame.
Note that this behavior may be changed with a configuration flag:
mesonconf x -Dskip_frames=false
Capturing the device screen requires some privileges, which are granted to
It is possible to execute Java code as
shell on Android, by invoking
Here is a simple Java application:
Let’s compile and dex it:
javac -source 1.7 -target 1.7 HelloWorld.java "$ANDROID_HOME"/build-tools/27.0.2/dx \ --dex --output classes.dex HelloWorld.class
Then, we push
classes.dex to an Android device:
adb push classes.dex /data/local/tmp/
And execute it:
$ adb shell CLASSPATH=/data/local/tmp/classes.dex app_process / HelloWorld Hello, world!
The application can access the Android framework at runtime.
For example, let’s use
We link our class against
javac -source 1.7 -target 1.7 \ -cp "$ANDROID_HOME"/platforms/android-27/android.jar HelloWorld.java
Then run it as before.
The execution also works if
classes.dex is embedded in a zip/jar:
jar cvf hello.jar classes.dex adb push hello.jar /data/local/tmp/ adb shell CLASSPATH=/data/local/tmp/hello.jar app_process / HelloWorld
You know an example of a zip containing
classes.dex? An APK!
Therefore, it works for any installed APK containing a class with a main method:
$ adb install myapp.apk … $ adb shell pm path my.app.package package:/data/app/my.app.package-1/base.apk $ adb shell CLASSPATH=/data/app/my.app.package-1/base.apk \ app_process / HelloWorld
To simplify the build system, I decided to build the server as an APK using gradle, even if it’s not a real Android application: gradle provides tasks for running tests, checking style, etc.
Invoked that way, the server is authorized to capture the device screen.
Nothing is required to be installed on the device by the user: at startup, the client is responsible for executing the server on the device.
We saw that we can execute the main method of the server from an APK either:
Which one to choose?
$ time adb install server.apk … real 0m0,963s … $ time adb push server.apk /data/local/tmp/ … real 0m0,022s …
So I decided to push.
/data/local/tmp is readable and writable by
shell, but not
world-writable, so a malicious application may not replace the server just
before the client executes it.
If you executed the Hello, world! in the previous section, you may have
noticed that running
app_process takes some time:
Hello, World! is not
printed before some delay (between 0.5 and 1 second).
In the client, initializing SDL also takes some time.
Therefore, these initialization steps have been parallelized.
After usage, we want to remove the server (
from the device.
We could remove it on exit, but then, it would be left on device disconnection.
Instead, once the server is opened by
app_process, scrcpy unlinks (
it. Thus, the file is present only for less than 1 second (it is removed even
before the screen is displayed).
The file itself (not its name) is actually removed when the last associated open file
descriptor is closed (at the latest, when
Handling input received from a keyboard is more complicated than I thought.
There are 2 kinds of “keyboard” events:
Key events provide both the scancode (the physical location of a key on the keyboard) and the keycode (which depends on the keyboard layout). Only keycodes are used by scrcpy (it doesn’t need the location of physical keys).
However, key events are not sufficient to handle text input:
Sometimes it can take multiple key presses to produce a character. Sometimes a single key press can produce multiple characters.
Even simple characters may not be handled easily with key events, since they
depend on the layout. For example, on a French keyboard, typing
Therefore, scrcpy forwards key events to the device only for a limited set of keys. The remaining are handled by text input events.
On the Android side, we may not inject text directly (injecting a
created by the relevant constructor does not work).
Instead, we can retrieve a list of
KeyEvents to generate for a
events is initialized with an array of 4 events:
Injecting those events correctly generates the char
Unfortunately, the previous method only works for ASCII characters:
I first thought there was no way to inject such events from there, until I discussed with Philippe (yes, the same as earlier), who knew the solution: it works when we decompose the characters using combining diacritical dead key characters.
Concretely, instead of injecting
"é", we inject
The application window may have an icon, used in the title bar (for some desktop environments) and/or in the desktop taskbar.
The window icon must be set from an
Creating the surface with the icon content is up to the developer. For exemple,
we could decide to load the icon from a PNG file, or directly from its raw
pixels in memory.
Note that the image is not the content of the variable
icon_xpm declared in
icon.xpm: it’s the whole file! Thus,
icon.xpm may be both directly opened in
Gimp and included in C source code:
As a benefit, we directly “recognize” the icon from the source code, and we can patch it easily: in debug mode, the icon color is changed.
Developing this project was an awesome and motivating experience. I’ve learned a lot (I never used SDL or libav/FFmpeg before).
The resulting application works better than I initially expected, and I’m happy to have been able to open source it.