Happy Lunar New Year of the Horse — Nvidida AI That Talks With Us, Google Creates Music with Lyrics for Us
- Jack Lau
- 22小时前
- 讀畢需時 3 分鐘
First of all, Happy Lunar New Year of the Horse to all readers.
I have always liked the symbolism of the Horse — energy, motion, and momentum, but also the importance of direction. Technology often runs fast; leadership is about deciding where it should go.
Recently, two AI developments caught my attention. On the surface they look unrelated, but together they tell an interesting story about where AI is heading.
One is NVIDIA’s PersonaPlex, a conversational voice AI that sounds remarkably natural.The other is Google Gemini’s Lyria 3, which can generate original music instantly from a simple prompt.
One changes how AI interacts.The other changes how AI creates.
Together, they hint that AI is shifting from being a tool we operate to something we increasingly work alongside.
When Conversation Starts Feeling Natural: NVIDIA PersonaPlex
Here’s the demo if you haven’t seen it:
When I watched this, what struck me was not the intelligence of the responses, but the feel of the interaction.
The AI reacts quickly.It acknowledges naturally.It sounds present.
Most voice assistants today still feel like machines:
You speak.It pauses.Then it answers.
That pause is enough to remind you that you are interacting with software.
PersonaPlex aims to remove that feeling by allowing the AI to listen and speak simultaneously, closer to how humans converse.
This small technical change creates a big experiential shift:
interruptions feel natural
responses feel immediate
rhythm feels human
When conversation flows naturally, the interface disappears.
We stop navigating software.We simply talk.
For organizations, this matters enormously. Customer engagement, internal workflows, training, healthcare intake, education — all of these depend heavily on interaction quality.
The breakthrough is not just smarter AI.
It is AI that feels present in the moment.
When Creation Becomes Conversational: Gemini + Lyria 3
Now consider this demo:
With Gemini’s Lyria 3, you describe a mood or idea, and within seconds the system generates:
lyrics
melody
vocals
arrangement
even artwork
When I first saw this, I didn’t think “AI can compose music.”
I thought:
Creation is becoming conversational.
For most of history, making music required instruments, software, and technical skills.Now the primary requirement is simply being able to describe the idea.
The AI handles execution.
We have already seen this pattern with writing, design, and coding.Music is just the latest creative domain to follow.
But the deeper change is this:
Creation no longer happens before the conversation.It happens inside the conversation.
A teacher can create a soundtrack while planning a lesson.A marketing team can test campaign ideas instantly.A creator can explore variations in real time.
Creation becomes part of thinking itself.
The Signal Behind Both
PersonaPlex improves how AI interacts.Lyria 3 improves how AI creates.
Put together, they point to the same direction:
AI is becoming less like software and more like a collaborator.
In the past, we gave software commands.Today, we ask AI for help.Tomorrow, we may simply work with it continuously.
For leaders, this is the more important question:
Not “Which AI model should we use?”But “How do we redesign work when interaction is natural and creation is instant?”
That answer will shape organizations far more than any individual technology choice.
And perhaps that is a fitting reflection for the Year of the Horse:
Speed is valuable.Direction is decisive.
Hugging Face Snapshot — Popular Open Models by Downloads (the Past 7 days)
(Hugging Face is the world’s largest open AI model hub, where developers share and download models used in real applications.)
Model (recently trending) | What it does | Organization |
Qwen3 / Qwen3.5 variants | Multimodal language, reasoning, coding | Alibaba |
DeepSeek reasoning models | Advanced reasoning & problem-solving LLMs | DeepSeek |
MiniMax agentic model | Workflow automation & agent-style tasks | MiniMax |
GLM / ChatGLM family | Dialogue & multilingual language tasks | |
Meta segmentation foundation model | Image/video understanding | Meta |
Mistral small & efficient LLM variants | Lightweight general language models | Mistral AI |
Sentence-Transformer updates | Embeddings & semantic search | Community / UKPLab |
Stable Diffusion fine-tunes | Image generation variants | Stability ecosystem |
Whisper derivatives | Speech recognition / transcription | OpenAI ecosystem |
Lightweight reranker & retrieval models | Search ranking & RAG pipelines | Various community teams |




留言