Nick Dow's blog : What is Deepfake, why is it dangerous and how to deal with it

The term deepfake - from deep learning and fake - is used quite widely and refers to almost any change in digital media data with the purpose of misleading a computer pattern recognition system or a living observer. For example, instead of one face, “attach” another to the figure of a person in the video (this is called Face Swap, face replacement), or instead of a satisfied smile, it is possible to realistically depict a grimace of indignation (Face Reenactment, replacement of facial expression).

In addition to deepfake, there are two main attack vectors for recognition systems: on computational algorithms and on the cameras themselves. The first type includes Adversarial attacks - carefully prepared changes in input data, which result in the neural network issuing an erroneous decision; Let's say, when the image of a turtle, which is completely unambiguous for a person, is perceived by a machine as a racing car. An attack “on camera”, the peak of popularity of which occurred in the first years of widespread adoption of facial recognition systems, is Liveness, the replacement of a living face in front of a camera with its artificial likeness - a photo printed on paper, a portrait photograph on a tablet screen, a realistically painted and 3D-printed mask.

Samuel Stefenson from Deep Nudes says that - "Fortunately, modern recognition systems are already quite good at resisting Liveness attacks at the algorithm level." A detailed analysis of the picture is carried out in search of special features: for a printed photo this could be a cut of a paper sheet, for a picture on a tablet - iridescent reflections from external light sources on the LCD matrix, etc. The person standing in front of the camera can be asked to shake his head up and down, turn in different directions to increase the reliability of identification and at the same time make sure that there is no “face substitution” here.

The situation is somewhat more complicated if the computing module of the recognition system and the camera are not combined in one housing (as with the iPhone in the case of Face ID or the Face Pay system in the Moscow metro), but interact via the Internet. For example, a person applies for a loan during a Zoom conference with a bank employee, and it is necessary to verify whether he is who he claims to be and whose documents he is presenting. Here the danger of traffic interception already arises: the attacker imitates an image from a virtual camera on a PC, replacing his own face with a computer model, similar to the popular “live masks” in video chats today, but much more believable.

A fake image, capable of misleading even a live operator, is sent to the video conference, so that the bank's recognition system receives deliberately false information.

Liveness attacks are the least resource-intensive for an attacker and therefore dangerous. But serious resources were invested in a timely manner to counter them, so today their effectiveness is very low. Adversarial attacks, on the contrary, are extremely complex to execute; they require deep knowledge of the structure and operating principles of a given image recognition system. That is why their relevance as a practical danger is close to zero.

A deepfake in itself is neither good nor bad - computing tools have objectively developed to a state that makes it possible to realistically simulate a human face over time in almost real time. This technology has many very good applications, just remember the appearance of Leia Organa in the film “Rogue One”, which takes place in the Star Wars universe immediately before the third episode “A New Hope”, filmed back in 1977. Leia in the new film was portrayed by actress Ingvild Deila, but her face was replaced by the face of Carrie Fisher, the original Leia, who passed away in 2016, generated by a neural network. In the same way, a famous actor can agree to the use of his face in a commercial without making adjustments to his busy schedule - a body double and a deepfake system will do everything for him.

The entertainment industry is perhaps best suited to embrace deepfake as a legal technology. In the same movie, when dubbing, the approach will allow you to get rid of the annoying discrepancy between the movements of the characters’ lips on the screen and the sounds they pronounce. In computer games, automatic procedural generation of believable human faces will also be in great demand - artists and 3D modelers will be able to focus more effort on developing the designs of armor, weapons, interiors, buildings and other elements. The already mentioned “live masks” in video chats will become even more impressive - you can not only put a cartoon image on your face, but also change your hairstyle, hair color, and acquire a very naturalistic beard or movable cat ears “on the fly.”

But there are also many negative examples of the use of deepfakes. In social networks, where artificial intelligence tries to automatically identify bots (fake accounts used to promote blogs or increase news ratings), the bet is often made on comparing profile photos with available databases. The plausible faces generated by the neural network can successfully deceive the AI countering bots - you will have to spend the energy and time of live moderators to identify fake accounts.

Finally, we must not forget about time: identifying a deepfake with a high probability is not a quick matter. It is important to maintain a balance here depending on the task: the same social network, when searching for bots using profile photos, can afford to spend more time analyzing each picture, while the transaction verification system via a webcam must respond to the request in a matter of seconds. We have to constantly invent new approaches.

In: