Errant: The Kinetic Propensity of Images 遊子：動勢考 (2018)
Algorithmic video Installation, multiple projection with digital documentation
Floor space: open space, 2 perpendicular walls, 10m (for projection) and 3m (research and process material)
佔地面積：開放空間, 兩面牆成垂直角度, 10米(投影) 和 3米(研究和過程資料)
artist’s website on the work […]
Errant: The Kinetic Propensity of Images is the first in a projected series of works about the analysis of cinema by means of machine learning methods. The work currently focuses on Chinese-language films, and specifically selected sequences of movements in King Hu’s films from the 1970s, including A Touch of Zen (1971, 俠女) and Raining in the Mountain (1979, 空山靈雨).
A shot or sequence in a film often contains several on-screen motions, for instance the movements of different fighters, as well as the motion of the camera itself. An important contribution of Chinese-language cinema heritage, particularly evident in action and swordplay films, lies in the ways that filmmakers organize these various motions into a visual ballet. When we focus our attention on the narrative information, we fail to notice the organization of movement in those films.
This project uses unsupervised machine learning to visualize cinematic movement as an end in itself. The algorithm segments each film into short sections and generates a graphic dictionary of basic movements using matrix factorization techniques. These fundamental motions are basic factors or elements that can be combined to represent other, more complex motions in the movie(s). Their combinations generate abstract visual representations of the movement in every sequence of the film.
The late Taiwan-based filmmaker King Hu 胡金銓 is well-known for his carefully choreographed swordplay sequences. His films are more concerned with the formal quality of body movements rather than with plot and story content. Paying tribute to King Hu’s aesthetics, this work uses machine learning to analyze body movements in King Hu’s films and to generate a corpus-specific taxonomy of basic motion patterns. The system then identifies motions that have similar patterns and displays them side by side. The aim is to encourage viewers to focus on pure movement rather than on narrative content, and to pay close attention to the balletic quality of movement in King Hu’s work. The two-part King Hu series in this exhibition combines Rodriguez’s former experiments in his works Gestus (2014, 2018) and Uncertainty Principle (2017), in which he uses mathematical concepts to analyse films’ motion and composition.
《遊子：動勢考》（2018）是羅海德用「機器學習」分析電影語言的新系列的首作，著眼於探索華語電影。作品目前主要選取了華語電影片段作分析，尤其選取了胡金銓導演七十年代作品如《俠女》(1971) 以及《空山靈雨》(1979) 的片段。
作品的目標是透過「無監督機器學習」(unsupervised machine learning) 展示電影動勢。過程中，一部電影被分拆成各部分，以「矩陣分解」(matrix factorization) 技術為其建構出一系列的基礎動態，作為該片的電影語言的專屬字典。當中蘊含的基礎動態，亦可視之為電影的基本元素，由此組成每段選片中更為繁複的動勢；這些組合亦生成了各組鏡頭的抽象再現。
已故台灣電影導演胡金銓以他精心編排的武俠場面聞名。他的電影着重身體動態多於故事情節和內容。作為對胡金銓美學的致敬，作品利用機器學習分析胡金銓電影的身體動態去產生專屬他的作品的基本動態模式的詞彙庫去統辨識擁有類近模式的動態並將之並列。這樣做，目的是讓觀眾專注於純碎的動態多於敘事內容，並密切關注胡金銓電影裏彷如舞蹈般的動作特質。本展覽中的胡金銓系列兩部曲揉合了藝術家在他過往的作品 (2014, 2018) 和 (2017) 的實驗 — 以數學概念分析與解構電影中的動勢。
Most film analysis and criticism describe movement in cinema by reference to the object that moves. Descriptions of scene motion typically focus on the nature of the moving object (whether it is a person, a car, etc.), its velocity, and perhaps certain aspects of its rhythm. Descriptions of global motion mainly focus on the camera as the source of that motion. Writers characterize camera movement as, for instance, a “pan”, “tilt”, “track”, “dolly”, “zoom”, etc. These terms presuppose a privileged object, the camera, as the source of the visible movement. The conventional vocabulary of critical analysis guides the expectations of the critic or theoretician, who sees only what they expect to find, and they expect to find only that for which they have acquired words. Writers on cinema almost invariably presuppose a mobile camera viewing mobile objects in a three-dimensional world. In other words, the focus is on the causes or sources of the movement rather than on its visible quality. Under these conditions, we lack the resources to describe or represent the phenomenological quality of motion in the cinema.
主流電影分析與評論在敍述電影動態時多指向運動中的物件：場景中的動態描述大多與物件的本體（如人、車等等）、速率、或特定的韵律相關；而總體的動態則主要來源於運鏡。評論人會為各式鏡頭技巧分門別類，諸如橫搖 (pan)、直搖 (tilt)、平移 (track)、推移/拉移 (dolly)、變焦 (zoom)等。這些詞彙假定了一種特殊的存在 — 鏡頭 — 作為可見動態的來源。這種評論常見的傳統詞彙主導了評論者或理論者的期望，使其觀察只限於其所能想像的；其能想像的亦已包含在他的詞彙之中，固此難有嶄新的發現。他們幾乎假定了一種虛構的情景：於三維空間內有一部流動攝影機在拍攝跟隨運動中的物件。換言之，評論重點在於想象動態的來源所引來的顯而易見的現象。這種情況下我們將無法描述或表現出電影動勢的現象學特質。
Our awareness of movement in mainstream films, advertisements and so on is typically bound to specific objects and locations in support of story content. Viewers do not typically attend to the visual qualities of movement itself. Our attention is directed to what is moving, not how it moves. In opposition to this dominant approach, this project aims to focus deep perception on motion. Its aim is, moreover, not purely formal. It embodies a reaction against the denigration of close attention that accompanies the “attention economy,” the commodification of attention in which we are currently immersed, and provides a medium for cultivating and enriching the content and manner of sense perception. In other words, the algorithm developed in this project makes movement perceptually salient as an end in itself.
我們對於主流電影和廣告裏動態的認知，往往規限於個別場景和物體與故事內容的關係，觀看者並不會特別留意動態的視覺特質。我們的關注被引導到甚麽在移動，而不是如何移動。與主流評論截然不同，《遊子 ：動勢考》以至整個發展下去的系列希望更深入集中於動態的感知，而非純粹的形式。時下我們所浸淫的「注意力經濟」商品化了「注意力」對於伴隨「注意力經濟」而來的對於專著力的忽視,《遊子 ：動勢考》提出了反抗的回應，並創作了一個培養和豐富感官知覺的內容及方式的媒介。換句話說，在本作品中所開發的演算法追求動態就是動態，在感知的層次上，而不再淪為情節內容的輔助。
The philosophical and conceptual aspects of this work are mediated by an awareness of computational technology. In particular, the methodology employed here relies on unsupervised machine learning to produce a visual dictionary of motion patterns. To understand this method, it is helpful to introduce in non-technical terms the concept of a dictionary, as that term is used in machine learning. Suppose we are given a set of objects, such as images of human faces. Every object in the dataset is stored as a description in a standardized format. For instance, a digital grayscale (“black and white”) image of n pixels is normally described as a list of n numbers. Each number represents the brightness of a pixel as a value between 0 (black) and 1 (white). The objects in question (the images) may be said to exist in a space of n dimensions. The dimensionality is determined by the number of independent elements required for a complete description of each object, in this instance the number of pixels in an image. For certain purposes, however, a full description of an object contains too much information. To recognize a digital image as the image of a face, for instance, it is not necessary to see every pixel in the image. Often a blurred or down-sampled version of the image remains recognizable as a face. Many applications in image analysis depend on the availability of methods for shortening the description of each image in a given database. These methods are generically known as dimensionality reduction methods. They are important not only because they can decrease the computational cost of certain operations on those objects, but also because a shorter description can capture essential features of each object and so eliminate superfluous information and highlight otherwise hidden properties of the data.
若要了解箇中的方法，必要先介紹「字典」一詞於機器學習中非技術性層面的概念。假設有一類的物件，例如人面的圖像，每一個物件都能化作一種可被儲藏的規範性的數據。舉例說明，一張具有n像素的灰階數碼影像一般能以個數字代表，每個像素的明暗度 (brightness) 皆能以一個由 0（黑）到 1（白）的數值表示，我們可以形容該物件（圖像）為存在於一個 n 維的空間之內。空間的維度取決於能夠完整地表現一個物件所需的獨立元素的數量，亦即例子中圖像的像素數量。然而，於特定的目標或情況下，物件的完整數據包含了過多的資訊。就像要從數碼影像中識別出人面影像，根本無需要仔細審視每個像素，因為大多數模糊化或向下採樣 (down-sampled) 後的人面圖像，仍然能被識別出來。很多影像分析的應用依賴簡化數據庫中的影像數據的方法，這些方法通稱為「維度減少」(dimensionality reduction) 方法。其重要性不僅在於能減低運算成本，同時也能更有效地捕捉到物件的本質特徵 (essential feature)，排除冗贅的訊息，以及勾勒出資料中潛藏的特性。
One line of approach proceeds by finding or constructing a set of paradigmatic or representative objects from the dataset. These objects are then used as a basic vocabulary or “dictionary” to represent other objects in the dataset. For instance, we may generate a vocabulary of simple facial parts. The unsupervised machine learning method known as Non-Negative Matrix Factorization (NNMF) is designed to generate a vocabulary of parts from any given database. Every face can then be represented by weighing and superposing these various parts. Instead of describing the face by giving every pixel of it, we can then simply list the weights for each of the facial parts in the construction of any face-image in the database. Each weight represents the contribution of the corresponding facial part to the construction of that particular face.
其中一種做法是從資料庫中找出或建構出一組模範式或具代表性的物件，並以此作為一個基本的詞彙或字典，用來代表資料庫中其他的物件，如在上述人面圖像的例子中，生成一個由各個簡單的人面部分組成的字典。而一種名為非負矩陣分解 (Non-Negative Matrix Factorization, 簡稱NNMF)的無監督機器學習(unsupervised machine learning)方法可以用於任意資料庫，並生成相應的字典。由此，每張人面能以字典中不同的部分透過加權(weighting)與重疊(superposing)組合而成。不同於以各個像素數值表現人面，資料庫中每張人面圖像都能夠簡化地以字典中不同的部分的加權值代表，並以之重構，而每一個加權值則代表著相應人面部分於特定人面中所佔的比重。
In this work, however, the entries in the dictionary are not faces. They are not still images at all. They are movement patterns, i.e., kinetic objects, obtained by analyzing optical flow.
《遊子 ：動勢考》的字典中包含的並非人面的部分，甚至也非固定的影像，而是動態的規律，以及透過分析光流 (optical flow)而得出的動態物件(kinetic objects)。
So-called optical flow techniques receive a video clip as input and estimate the movement depicted in every pair of consecutive frames in that clip. This estimation involves assigning a motion vector to every pixel (or region of pixels) in the first image. A motion vector can be visualized as an arrow. Its orientation represents the direction of the movement depicted in that pixel. Its magnitude represents the apparent speed of the motion. The following images show a short movie excerpt together with the matrix of vectors that represent the movement visible in every pair of consecutive frames in that excerpt.
所謂的「光流」技術用以分析影片中每對連續影格之間的動態，當中的估算涉及為前影格中每個或每組像素訂立一個移動向量 (motion vector)。移動向量能以一個箭頭表示，箭頭的指向代表像素(組)移動的方向，而箭頭的長度則表現了像素(組)移動的可見速度。以下的圖像表示了該影片中某幾對連續影格之間的可見動態。
Unsupervised machine learning can be used to decompose or factorize any movement block into a set (“dictionary”) of basic motion matrices together with a set of weights or mixing coefficients. The following diagram shows a selection of the dictionary extracted from the sequence that contains the above excerpt by using a variant of NNMF.
The decomposition assigns to every dictionary entry a sequence of weights, one weight per pair of consecutive frames in the sequence. These weights express the amount by which the elements in the dictionary are to be mixed in each frame to reconstruct approximately the original movement block.
We can think of the dictionary as akin to the periodic table of elements in chemistry, although the dictionary is constructed for a specific video sequence and is in this sense contextual rather than general. The point of the comparison is this: Just as complex chemical substances can be decomposed into chemical elements, a video’s movement blocks can be broken down into weighted combinations or mixtures of the matrices in the dictionary.
A sequence often contains complex movements involving various characters as well as the camera. Even a simple case, that of a camera viewing an empty corridor with a wide angle while tracking forward, may be kinetically quite complex if the layout of the scene contains many objects distributed from the foreground to the background. To capture this potential complexity, it is helpful to cluster the entries in the dictionary into groups of related matrices. We can think of each cluster as a smaller dictionary or sub-dictionary for this sequence. Each sub-dictionary represents one kind of movement, i.e. a kinetic object, contained within the sequence. In this way, unsupervised machine learning methods afford the separation of the kinetic content of a video sequence into distinct components.
We can think of a dictionary or sub-dictionary as a framework for describing movement in a specific video sequence. This framework need not correspond to the conventional categories of cinematic criticism and analysis. The algorithm is not “trained” by exposure to already known examples or “model answers” that embed familiar ways of understanding the subject matter. Rather, the algorithm extracts a dictionary by using optimization techniques. The algorithm identifies a dictionary of latent motions for each shot in a movie. The next step is to visualize these representations. In the first draft of this projects, they were visualized as moving points of light, using a variant of Laplacian interpolation.
可以想像每個字典或字典的某些部分為一個框架，用以描述特定影片的動勢，而該框架更無須對應於現存電影評論和分析中的某個傳統範疇。整個演算法並非以特定或已知的例子及標準「訓練」出來，亦避免了當中包藏著針對特定主題而設的既有理解方法，純粹以最佳化 (optimization) 技術提取出相應的字典。
這些抽象的字典或象徵隨後以移動光點的方式展現，該方法特別為此作品設計，當中應用了拉普拉斯內插法 (Laplacian Interpolation)。
The final version uses a technique known as streaklines. This method was developed in fluid dynamics, to represent, analyze, and visualize unsteady (time-varying) flows. It has also been used in computer vision, for instance, to study the behavior of crowds in public places.
The components of each dictionary in this project are all unsteady flow fields. It is appropriate to visualize every component of a dictionary using steaklines. Given a dictionary of latent motions, it is possible to reconstruct the original flow of a shot by combining the various dictionary representations. This reconstruction can also be represented using streaklines.
The final version of the work is a two-channel video projection. The left channel shows the decomposition of a shot into a dictionary of motions. Each dictionary is visualized using the streakline method.
The right channel shows the reconstruction of the optical flow of the shot by combining all the dictionary components.
It is best not to think of a dictionary as an ideal type or timeless model discovered by the algorithm. Rather, it is only one of several possible dictionaries that could be extracted from any given sequence. The dictionary provides a tool for the cultivation of attention towards patterns of movement in the cinematic material, and is legitimate only if understood in this way. It is intrinsically embodied and embedded. It is grounded in particular ways of organizing the kinetic flow of images, and it is directed towards deepening our attention to them.
+ + +
Hector RODRIGUEZ is a Hong Kong-based digital artist and theorist whose work explores the unique possibilities of computational technologies to reconfigure the history and aesthetics of moving images. He received a commendation award from the Hong Kong Government for his contributions to art and culture in 2014. He was awarded the Best Digital Work in the Hong Kong Art Biennial 2003, an Achievement Award at the Hong Kong Contemporary Art Awards 2012, and the Jury Selection Award of the Japan Media Art Festival 2012.
His works have been internationally exhibited in Taiwan, Singapore, US, Poland, Germany, Spain, Greece, France, London, and more. His recent exhibitions include the 15th &16th WRO Media Art Biennale, Poland (2013, 2015), “European Conference on Computer Vision (2018), A.I. Art Gallery, Conference on Neural Information Processing System (2018, Montreal), RIXC Art Science Festival (2017, Riga), Generative Art Conference/Exhibition (2017), Athens Media Art Festival (2018, Greece), xCoAx conference on computation and art (2016, GAMEC, Bergamo), CyNetArt competition (2016 Dresden) and many more. Recently, he had his solo retrospective “Hidden Variables” in Hong Kong, October 2018. He was the Artistic Director of the Microwave International Media Art Festival in 2004-2006, and is Director for Research and Education for the Writing Machine Collective. He currently teaches at the School of Creative Media, City University of Hong Kong, where he founded the undergraduate program in art and science.