top of page

Something fun: Cats & A.I.

Updated: Jul 9, 2023

Recently, I utilised artificial intelligence (A.I.) to generate images of cats modelled after my tabby, Ducky.

(Cats are wonderful creatures. In Vietnam, the Year of the Cat is celebrated in 2023. In 2012, Google Brain created a neural network that recognises the pictures of cats with the help of 16,000 computer processes and a deep learning algorithm.)


The recent explosion of ChatGPT has directed mainstream attention to the field, though its practical uses has been adopted by various industries many years ago (e.g. facial recognition, medical imaging, and video-editing).

ree

I enlisted the help of VQGAN and CLIP (gif above) to generate the images. Phil Torrone describes VQGAN+CLIP as “a bunch of Python that can take words and make pictures based on trained data sets.” In technical terms, it is an interaction between two neural network architectures working in conjunction to generate images.


As the machine generated these images, I wondered: how does a machine ‘see’? What is the end game of A.I., and who are the winners and losers of the A.I. ‘revolution’?


How do machines ‘see’?


The discipline within A.I. that allows machines to ‘see’ is known as computer vision. ‘Seeing’ involves an element of perception, rather than solely ‘looking’.


The very first experiments in allowing machines to see began in the 1950s, and current developments in A.I. build on the corpus of work many years in the making. In gist, computer vision works with neural networks and other machine learning algorithms. Beyond computer vision, the dominant technique in contemporary A.I. stems from neural networks, which are self-learning algorithms that recognise and utilise patterns in data.


For this particularly set of images, it works off Generative Adversarial Networks (GANs), a class of machine learning frameworks. Katherine Crawson and Ryan Murdock are the early masterminds of VQGAN and CLIP, which were made public on Google Colab. Many other articles detail how to work through VQGAN+CLIP, and wonderful alternative resources (Stable Diffusion, Midjourney) have sprung up in the A.I. world. The papers and research on GANs (such as this one) are fantastic resources if you are interested in this field.


What is the A.I. end game?

Unfortunately, not cats.


It seems the aim is artificial general intelligence (A.G.I), where the machine possesses all of the human mind’s capabilities. After over 70 years of experimentation (Alan Turing first posed the question "Can machines think?" in 1950), scientists are still working towards this vision.


What we have today is “mediocre A.I.” according to Gary Marcus, emeritus professor of psychology and neural science at N.Y.U. He argues that neural networks today work on glorified cut and paste – the “king of pastiche”. It gathers data, carry out pattern recognition, and employs a paradigm where infinite data is required to solve things by brute force. But it fundamentally does not understand. In a podcast with Ezra Klein, Marcus argues that it is “mysticism to think that if we just make the systems that we have now bigger with more data, that we’re actually going to get to general intelligence.” His paper on ‘Deep Learning is Hitting a Wall’ details what could be done to achieve A.G.I. He has his fair share of critics, but that does not void his viewpoints.


Meta’s chief A.I. scientist, Yann Lecun also shares the view that current approaches will not lead to true artificial intelligence (Marcus and Lecun don't really see eye to eye, but they converge on this point), and recently commented that ChatGPT is not “particularly innovative”. While his critics took his comments as a veiled criticism of ChatGPT, they miss the larger point he was making: that current A.I. approaches today will never lead to true intelligence.


The current insurmountable hurdle that researchers have been tackling is in the idea of symbolic reasoning. Put very simply, Lecun and Jacob Browning explain that symbolic reasoning is the capacity for machines to "manipulate symbols in the ways familiar from algebra or logic" and there are two different camps on this. On one camp, some researchers argue that neural networks struggle with symbol manipulation because it needs to be hard-coded into the machine (i.e. it cannot be learnt), whereas other researchers are of the view that neural networks are already engaging in symbolic reasoning (i.e. machines can learn how to do so), albeit not reliably. GPT-3 and LaMDA are some examples of machines learning to manipulate symbols in the world, although these are still very much a work in progress.


The future of A.I.


As researchers work out the relationship between neural networks and symbolic reasoning, how does A.I. development then trickle down to the rest of us?


In 2021, Sam Altman, CEO of OpenAI (who launched ChatGPT) wrote a very optimistic text about the future of A.I. He posits that first, the A.I. revolution will create phenomenal wealth, and the price of many types of labour will fall towards zero. Second, the world will change drastically and policy changes will be needed to redistribute this wealth. Third, the standard of living for people can be improved in very significant ways.


Altman has since nuanced his stance significantly, and now acknowledges that there is still some way to go. In Dec 2022, he tweeted “ChatGPT is incredibly limited, but good enough at some things to create a misleading impression of greatness. it's a mistake to be relying on it for anything important right now. it’s a preview of progress; we have lots of work to do on robustness and truthfulness.”


Currently, A.I. researchers argue that the best use cases of deep learning are on matters that can be easily quantified where data is abundant, or where the stakes are low and rough or inaccurate results can be accepted. Researchers have relied on A.I. for comprehensive genome interpretation, where machine learning can help identify meaningful patterns for healthcare. In filmmaking and video-editing, A.I. can enhance the efficiency of employing visual effects and support casting decisions. These are industries that will benefit enormously from current forms of deep learning. Perhaps another cause for cheer is how China and the United States – despite all its geopolitical tensions – are currently the world’s leading collaborators in A.I. research, according to Stanford University’s Human-Centred Artificial Intelligence.


But one need not dig very deep to unearth the pernicious aspects of A.I. In 2016 during the Trump campaign, Cambridge Analytica and its parent company SCL group used A.I to make targeted advertisements, illustrating how AI can be used as an integral element of digital political campaigning and opinion shaping. Large tech firms also rely on A.I to drive its newsfeeds, which could perpetuate the polarisation of society by predicting what the viewer is most likely to engage or be interested in.


The (New) Age of Mechanical Reproduction

There is undeniably a lot of hype in A.I. While it is probably not the panacea for society's ills, many of the aforementioned issues can be fixed by human intervention. In its current form, its opportunity for mass participation is promising.


In Walter Benjamin’s ‘The Work of Art in the Age of Mechanical Reproduction’, he focused on film as a medium that can emancipate traditional notions of art generated from bourgeois tendencies and fascist ideologies. He called this the “aestheticisation of politics”, where the individual is erased or portrayed as self-contained and able (as opposed to their true social condition). With the new technological reproduction of art, it is able to emancipate the work of art from its “parasitical dependence on ritual”. He calls this the “politicisation of aesthetics”, where technology enables individuals to identify and resist the way art is exploited, and where art originates from the people.


If I were to extrapolate the “politicisation of aesthetics” to neural networks - the strength of neural networks is in its ability to scale collected data to enhance the performance of its pattern recognition allows for mass participation in unprecedented fashion. This could tip the balance in terms of collective bias. While deep learning does not yet support fields where genuine comprehension is required (e.g. autonomous vehicles, medical advice), but for fields that thrive off the collection of data, there is reason for celebration.


A New World

I do think the public hype in ChatGPT could instead be directed in understanding the fundamental frameworks and how to make A.I. better though, but that's just me! Deep learning and A.I. technology is a tool in our hands that extends beyond generating trippy images of cats and asking a chatbot to complete school work.

ree

Meanwhile, we must question how a tool can be wielded responsibly and equitably to those with less access and means. For a tool that allows for and thrives off mass participation, this is one other area where the digital divide needs to be bridged.

Comments


ADDITIONAL POSTS

bottom of page