Chatterbox Turbo TTS with MLX-Audio
In my last post, I tried Qwen3-TTS... This time, I test Chatterbox Turbo by Resemble.ai, which is an open-source, MIT licensed, text-to-speech model with zero-shot cloning.
learnings
In my last post, I tried Qwen3-TTS... This time, I test Chatterbox Turbo by Resemble.ai, which is an open-source, MIT licensed, text-to-speech model with zero-shot cloning.
Alibaba’s Qwen3-TTS for Speech Synthesis (text-to-speech) was open-sourced (Apache-2.0) on 22 Jan 2026. And within the last couple of weeks, we now have Apple Silicon optimization via MLX-Audio. Here is code to create an audiobook from an ePub.
In March 2025, I posted an opinion piece entitled “My Disillusionment with Generative AI”. Things have gotten worse since then, to the extent that “Slop” and “AI Slop” have been picked as the Word(s) of the Year for 2025. What follows are my thoughts on the trajectory of technology and, more specifically, AI, as reflected in the Word(s) of the Year (WOTY).
In my last post, I described trying out Kokoro text-to-speech (TTS) model via Kokoro-FastAPI web UI in a macOS (native) container. Here, I install Kokoro-TTS and Abogen on Windows, to take advantage of my Nvidia GPU.
A short 2-in-1 post of two things I’ve been meaning to try out on macOS - first, to try the new macOS Container framework on macOS 15.5... and second, to spin up Kokoro Text-to-Speech (TTS) in a container.
I’m a techno optimist by default, I do believe technology can solve the problems we face today, and make our lives better. But I do have concerns with the direction we are taking with Generative AI, and the future we are heading towards. This is an opinion piece...
city96 has published GGUF versions of the Flux1.Dev model and T5 XXL text encoder, along with custom nodes to use them in ComfyUI - thought I’d try them on my M2 Mac mini, hoping for faster inference!
More Flux.1-based models! Go faster with FP8 or NF4! New LoRAs and ControlNets! There is quite a bit of interest with this model, as evidenced by the speed of community-led enhancements.
While I have many posts about SDXL, I do not use Stable Diffusion 3 at all - license concerns aside, it is simply not good, and may never get better. But just a few days ago, a new, freely available, offline model that is better than SDXL was released by the team that presented Latent Diffusion and created Stable Diffusion, Flux.1 by Black Forrest Labs!
I saw a post on Reddit, entitled “Llama 3 rocks with taking on a personality!”. A fun experiment! I thought to replicate it, blatantly copying the puzzle presented. Llama 3, released by Meta just before the weekend, is impressive.
I know it’s bad form to start off with a disclaimer: but the truth is, I do not know what I am doing. I am just testing out two new ComfyUI nodes, PerturbedAttentionGuidance and PerpNegGuider.
In my last post, I used ComfyUI-IF_AI_tools to integrate to the brxce/stable-diffusion-prompt-generator model running in Ollama. I wonder if I could use the base Mistral 7B model to help improve my uncreative prompts instead...
In my last post, I described running Mistral, a Large Language Model, locally using Ollama. To accompany that piece, I created a prompt and manually used AI to generate an image. Today, I’ll wire up a ComfyUI workflow to Ollama to do this seamlessly, thanks to ComfyUI-IF_AI_tools.
I keep posting about Stable Diffusion, but I do experiment with Large Language Models too! I do not have much to contribute in this regard, instead, here is the transcript of a game I played with the open source Mistral 7B model via Ollama.
More and more AI generated images are shared as short video clips. So, here a quick test of Stable Video Diffusion - which was released back in November last year. Don’t know why I didn’t post this when I posted about AnimateDiff and the Hotshot Motion model around the same time.
Do you want to convert a 2D image into a 3D model auto-magically? On 5 March 2024, Stability AI and Tripo AI released TripoSR: Fast 3D Object Generation from Single Images that does exactly that!
Differential Diffusion is the newest method (framework) of in-painting without an in-painting model. Instead, all that is needed is a mask (map) where the lighter the area, the greater the re-painting applied.
Ever wished you could generate Stable Diffusion XL images with transparent backgrounds? Well, your wish has been answered by the smart people behind the Transparent Image Layer Diffusion using Latent Transparency paper. They have made their code and models available, and what do you know, Chenlei Hu has ported it to ComfyUI!
With the advent of techniques like Adversarial Diffusion Distillation and Latent Consistency models, A.I. image synthesis based on Stable Diffusion XL has been getting faster and faster. Here is just quick comparison of a few models at 4-steps, some of which are fine-tuned and trained for realism.
Not long ago, in a attempt to obtain Consistent portraits using IP-Adapters for SDXL, I shared a comparison between IP-Adapter-Plus-Face and IP-Adapter-FaceID. Today I’ll look at InstantID.