
phylos, I knew these people…
2026
Phylos, I knew these people… is a video installation collaborating with large language models (LLMs) including GPT-4 by OpenAI, Text to Speech AI Voice by CapCut, and AI Monica. Instead of focussing on the quality or realism offered by these powerful tools, Phylos aims to question the fragility of human perception and analyse the fundamental cross-cultural inquiry into the spectator. The work approaches moving images as a constellation of unstable elements—image, sound, and text—rather than a unified whole. By isolating and recombining these layers, it tests the viewer’s reliance on the cinematic apparatus and examines how language and algorithmic logic mediate, and at times colonise, our understanding of reality.
The visual material derives from my personal archive of Super 8 footage: fragments of daily life and travel across Europe and the UK collected over recent years. These images operate as slices of memory. Through fragmentation and reassembly, they are stripped of their original context and reduced to malleable data. By reimagining/designing the absent sound due to the nature of Super 8 film, I prompted ‘translate these words into English in a poetic way’ in GPT-4, input these translations into editing software CapCut, and by using their text-to-speech AI, an echo of the authority of canonical nature documentaries has been summoned. The work asks how such a voice acquires legitimacy—and how easily memory can be reframed, overwritten, or misread. Due to the black box nature of models like GPT-4, we will never access its logic and training data, as with companies like CapCut. When a real human thought has been transformed twice discreetly, what has it become?
Carefully design the viewing process to deconstruct the same sequence into three: sequence 1 with only audio and subtitles, sequence 2 with visual and subtitles but no audio, and sequence 3 with the visual without colour grading and audio, but with subtitles. The audience aims to immerse themselves in a screening space, following the instructions, watching the three sequences, then following their perception of the videos, and answering eight questions in the form. Total duration under four minutes. What interest ms is comparing the film apparatus to the ‘artificial intelligence apparatus’, seeing how machine-generated materials can function as the most explicit instrument of manipulation.
At first, they adopt the familiar conventions of institutional cinema, subtly persuading the viewer to trust the text over the image—for instance, suggesting rainfall while the screen shows sunlight. Gradually, however, the typography destabilises. Influenced by the psychic friction between interior depression and an overstimulated social environment, the subtitles fracture into shifting scales and erratic compositions. They cease to translate and instead construct an architecture of unease: a visualisation of mismatched frequencies. By rearranging these elements—at times removing sound, at others isolating text—the installation foregrounds what might be termed the violence of translation. In an age governed by opaque systems, the one who controls the subtitles possesses the power to reshape the memory itself.