The Serious Computer Vision Blog
(by Li Yang Ku) Recently I’ve been on leave to take care of my little one born earlier this year, which leaves me with some time to listen to quite a few audiobooks. This post is about some of the thoughts I gathered after all the listening while trying to grasp the meaning of my […]
(By Li Yang Ku) In my last post I talked about how generative diffusion models (such as DALLE, Imagen, and Stable Diffusion) work. I also mentioned that I would talk about specific models and tools like Stable Diffusion and ControlNet. I admit that this second post took a bit longer than I expected, mostly due to my laziness […]
(By Li Yang Ku) It’s interesting times to be in the field of Computer Vision. In the past I judge the quality of a Computer Vision publication based on it’s accuracy on benchmarks and the number of citations. Now I also consider how popular it is on Reddit and Youtube. With all the Computer Vision […]
(By Li Yang Ku) I worked at Vicarious, a robotics AI startup, from mid 2018 till it was acquired by Alphabet in 2022. Vicarious was a startup founded before the deep learning boom and it had been approaching AI through a more neuroscience based graphical model path. Nowadays it is definitely rare for AI startups […]
(By Li Yang Ku) In the past I’ve always avoided to make comments about consciousness. My view was that due to consciousness being internal to ourselves it is extremely difficult if not impossible to evaluate scientifically. Also, why talk about consciousness when we couldn’t even understand intelligence? However, some recent readings have changed my view […]
(By Li Yang Ku) 2023 August Update: Install file for Mac now available on my personal site. Visual Loop Machine is my new side project since the Rap Machine I made that completes rap sentences. It is a tool that plays visual loops generated by StyleGAN2 along music in real-time. One of the reasons I […]
(By Li Yang Ku) For many researchers in the field of Computer Vision, coming up with “the” object representation is a lifetime goal. An object representation is the result of mapping an Image to a feature space such that an agent can recognize or interact with these object. The field came a long way from […]
(By Li Yang Ku) I was at the Bay Area Robotics Symposium (BARS) at Stanford in person last week. It’s nice to see real person even though there is a mask mandate (which could be a good thing since the audience won’t be biased by the speaker’s look.) Faculty talks can be found in the […]
(By Li Yang Ku) In my previous post I talked about this web app I made that can generate rap lyrics using the transformer network. Transformer is currently the most popular approach for natural language related tasks (I am counting OpenAI’s GPT-3 as a transformer extension.) In this post I am going to talk about […]
(By Li Yang Ku) In this post I’ll briefly go through the problem of Task and Motion Planning (TAMP) and talk about some recent works that try to tackle it. One of the main motivation of solving the TAMP problem is to allow robots to solve household tasks like the robot Rosey in the cartoon […]
(By Li Yang Ku) Just like CVPR, RSS (Robotics: Science and Systems) is virtual this year and all the videos are free of charge. You can find all the papers here and corresponding videos on the RSS youtube page once you finished bingeing Netflix, Hulu, Amazon Prime, and Disney+. In this post, I am going […]
(By Li Yang Ku) CVPR is virtual this year for obvious reasons, and if you did not pay the $325 registration fee to attend this ‘prerecorded’ live event, you can now have a similar experience through watching all the recorded videos on their YouTube channel for free. Of course its not exactly the same since […]
by Li Yang Ku (Gooly) I was reading a neuroscience text book recently and came across a paragraph about the discovery of the Fusiform Face Area (FFA) in human brain. It was not news to me that there is a face area in the brain that activates when a face is observed, but I was […]
This is a guest post by Lila Mullany and Stephanie Casola from alwaysAI (in exchange they will post one of my articles in their company blog.) What this startup is developing might be useful to some of my readers that just want to implement deep learning vision apps without having to go through a steep […]
by Li Yang Ku (Gooly) The one thing that all computer vision scientists can agree on is probably that as of today, human vision is a lot better than computer vision algorithms (in the range of visible lights) on understanding our surrounding world. However, most computer vision scientists don’t usually look into our vision system […]
by Li Yang Ku (Gooly) (link to the rap machine if you prefer to try it out first) In my previous post, I gave a short tutorial on how to use the Google AI platform for small garage projects. In this post, I am going to follow up and talk about how I built (or […]
by Li Yang Ku (Gooly) In this post I am going to talk about the Google AI platform (previously called Google ML engine) and how to use it if deep learning is just your after work hobby. I will provide links to other tutorials and details at the end so that you can try it […]
by Li Yang Ku (Gooly) In this post I am going to talk about a fascinating talk by Nati Srebro at ICML this June. Srebro have given similar talks at many places but I think he really nailed it this time. This talk is interesting not only because he provided a different view of the […]
by Li Yang Ku (Gooly) Deep learning is one of the most successful scientific story in modern history, attracting billions of investment money in half a decade. However, there is always the other side of the story where people discover the less magical part of deep learning. This post is about a few research (quite […]
research.ioSign up to keep scrolling
Create your feed subscriptions, save articles, keep scrolling.