Z-Image로 내 PC에서 n8n 자동화가 더 쉬워졌다 🚀

이미지 생성 API 호출할 때마다 비용이 나간다면 쉽게 사용하기 어려울 것이다. 물론 정액제를 사용하면 충분히 생성 가능하지만, Midjourney 구독료, DALL-E API 비용... 한 달이면 수만 원이 훌쩍 넘어간다. 게다가 네트워크 지연까지 생각하면 답답할 때가 많지. 그런데 만약 내 PC에서 API처럼 자동으로 이미지를 생성할 수 있다면?

나는 RTX 5070 Ti로 이걸 해냈다.
Z-Image Turbo + n8n + ComfyUI 조합으로 말이다.
이제 프롬프트만 입력하면 10초 만에 퀄리티 높은 이미지가 뚝딱 나온다.
월 구독료? 제로. API 호출? 필요 없다. 그냥 내 PC가 이미지 생성 서버가 된 거다.

왜 지금 Z-Image인가? 🤔

Z-Image는 Tongyi-MAI가 만든 Diffusion Transformer 기반 이미지 생성 모델이다.
그런데 여기서 주목할 건 Turbo 버전이다.
일반 버전은 50 스텝 정도 필요한데, Turbo는 4-8 스텝만으로도 비슷한 퀄리티를 뽑아내기 때문에.
증류 기술(Distillation)로 속도를 3배 이상 끌어올린 거라고 한다.

Z-Image Turbo의 핵심 장점 3가지

속도: 1024x1024 이미지를 10초 내로 생성 (RTX 5070 Ti 기준)
VRAM 효율: 12GB VRAM이면 충분히 돌아가 (16GB면 여유)
품질: FLUX-dev와 비슷한 수준, 특히 인물 묘사에 강함

⚡ RTX 5070 Ti 기준 성능

1024x1024 이미지 / 8 스텝 / 생성시간 8초 / VRAM 사용량 8-10GB
2048x2048 이미지 / 8 스텝 / 생성시간 30초 / VRAM 사용량 8-10GB

다른 모델들과 비교해볼까?
SDXL Turbo는 빠르지만 디테일이 부족하다.
FLUX는 퀄리티는 좋은데 스텝이 많이 필요해서 느리다.
Z-Image Turbo는 이 둘의 장점만 모았다고 보면 된다. 빠르면서도 퀄리티가 괜찮다.

내가 직접 해본 Florence-2 + Z-Image 조합 🎨

처음에는 단순하게 생각했다. "그냥 Z-Image만 쓰면 되겠지?" 근데 막상 써보니까 프롬프트 작성이 생각보다 까다로웠다. Z-Image는 영어 프롬프트에 최적화되어 있어서, 프롬프트 생성기가 필요했다.

왜 Florence-2가 필요한가?

Florence-2는 Microsoft가 만든 Vision-Language 모델이다. 이미지 캡셔닝, 객체 탐지 등을 잘하는데, 여기서 내가 주목한 건 프롬프트 확장 기능이다. 간단한 설명을 입력하면 디테일한 영어 프롬프트로 변환해준다.

💡 실제 테스트 사례

내 입력: "석양이 지는 해변"

Florence-2 출력: "A serene beach at sunset with golden light reflecting on calm waves, palm trees silhouetted against the orange sky, gentle breeze creating ripples on the water"

결과: Z-Image가 8초 만에 퀄리티 높은 이미지 생성 ✨

이렇게 Florence-2가 프롬프트를 풍부하게 만들어주면 Z-Image의 성능이 확 올라간다. 단순히 "해변"이라고만 하는 것보다 훨씬 구체적이고 아름다운 이미지가 나온다. 이게 바로 내가 이 조합을 추천하는 이유라고 할까? 그런데, 좀 더 세부적으로 프롬프트를 작성하면 더 좋은 퀄리티의 이미지를 생성한다는 것은 비밀이다(본인이 직접 해보며 알아가면 좋겠다).

n8n 자동화, GPU만 있으면 가능하다 ⚙️

여기서부터가 진짜 핵심이다. n8n + ComfyUI + Z-Image 조합으로 완전 자동화가 가능하다. API 서버 따로 구축할 필요 없이 그냥 내 PC에 ComfyUI 깔고 n8n 연결하면 끝이다.

전체 워크플로우 구조

n8n Webhook으로 요청 받기 (프롬프트 + 이미지 크기)
ComfyUI 노드로 Z-Image 실행
생성된 이미지를 자동으로 저장하거나 응답으로 반환

n8n 워크플로우 노드 구성

1. Webhook (POST 요청 받기)

2. Set 노드 (프롬프트 정리)

3. ComfyUI 노드 (Z-Image 실행)

4. 결과 반환 (이미지 URL 또는 Base64)

실제 설정값 공개

내가 5070 Ti에서 사용하는 세팅을 공개한다. 이대로 따라하면 바로 돌아갈 거다.

🔧 권장 설정 (RTX 5070 Ti 기준)

이미지 크기: 1024x1024 (기본), 1536x1024 (와이드)
추론 스텝: 8 (4-12 사이에서 조절 가능, 8스텝 추천)
CFG Scale: 1.0 (Turbo 모델은 낮은 값 권장하는데 1.0이 기본 설정이다)
Sampler: Euler a (빠르고 안정적)
VRAM 사용량: 8-10GB
평균 생성 시간: 8~15초

놀라운 건 전력 소비다. 5070 Ti는 TDP가 220W밖에 안 된다. API 서버 돌리는 클라우드 GPU보다 훨씬 효율적이기 때문에 하루 종일 돌려도 전기세는 몇백 원 수준이다.

ComfyUI 노드 활용, 생각보다 쉽다 🎯

"ComfyUI 복잡하지 않아요?" 이런 질문 많이 받는데, 전혀 그렇지 않다. 특히 Z-Image는 설치도 쉽고 사용도 간단하다.

ComfyUI에서 Z-Image 설치하기

ComfyUI 홈페이지 링크에서 JSON Workflow File을 다운로드 받아서 ComfyUI창에서 불러들인다. 그러면, 모델을 다운로드 받을 수 있는 링크가 있는 'Model link'라는 설명창이 보일 것이다. 거기서 다운받아 설명창에 나온 폴더에 저장해 주면 된다.
끝이다. 진짜 이게 다다. 코딩? 필요 없다. 터미널 명령어? 필요 없다. 그냥 클릭 몇 번이면 설치 완료.

n8n에서 ComfyUI 연결하기

n8n에는 ComfyUI 전용 노드가 있다. 이걸 쓰면 API 호출보다 훨씬 간편하니 설치해서 사용하자.

ComfyUI 노드 설정

자기 컴퓨터의 ComfyUI에서 작동하고 있는, Z-Image 워크플로우를 내보내기(API)로 다운로드 받아서, 그 내용을 n8n ComfyUI 노드에 입력해 주고, 수정할 부분을 아래를 참고해서 수정해 주면 된다. JSON파일을 볼 줄 아는 분들은 보면서 수정하면 된다. 모른다면, 그냥 JSON파일 전체를 ChatGPT나 Claude, Gemini에게 주고, n8n에서 Prompt, Size를 입력받을 수 있도록 하고 싶다고 계속 대화하면서 수정한 후 JSON을 복붙하면 된다.

• Host: http://localhost:8188 (기본값)

• Prompt: {{$json.prompt}} (웹훅에서 받은 프롬프트)

• Size: {{$json.size}} (1024x1024 등)

이렇게 설정하면 n8n에서 프롬프트와 이미지 크기만 입력하면 자동으로 ComfyUI가 Z-Image를 실행할 수 있다. 결과는 n8n으로 다시 돌아와서 이메일로 보내든, Slack에 올리든, 데이터베이스에 저장하든 마음대로 할 수 있다.

⚠️ 주의사항

ComfyUI와 n8n은 같은 네트워크에 있어야 한다. 만약 다른 PC에서 실행한다면 방화벽 설정을 확인해야 한다. 로컬에서만 쓴다면 localhost로 연결하면 된다.
만약 n8n을 도커(Docker)에서 돌리고 있다면, localhost 대신 host.docker.internal을 입력하면 된다.

비용 비교: API vs 로컬 GPU 💰

실제로 돈이 얼마나 절약되는지 계산해볼까?

월간 이미지 1,000장 생성 기준

Midjourney: $30 (Standard 플랜)

DALL-E 3 API: $40 (1024x1024 기준)

Stable Diffusion API: $20-30 (서비스마다 다름)

로컬 GPU (5070 Ti): 전기세 약 5,000원

한 달에 3-4만 원 절약되는 거다. 1년이면 36-48만 원. GPU 값이 100만 원대라고 해도 3년이면 본전 뽑는 것이 아닐까? 게다가 GPU는 이미지 생성만 하는 게 아니다. 다른 AI 작업도 할 수 있으니까 훨씬 가치가 높다고 본다.

실전 팁: 품질을 더 높이려면? ✨

1. 프롬프트 최적화

Z-Image는 영어 프롬프트에서 최고 성능을 발휘한다. Florence-2로 프롬프트를 확장하거나, ChatGPT 또는 Claude를 써서 디테일을 추가하는 게 좋다.

2. 스텝 조절

보통 8 스텝이 가성비가 좋은데, 더 디테일한 이미지를 원하면 12 스텝까지 올려보자. 그 이상은 효과가 크지 않다.

3. 이미지 크기 전략

1024x1024: 일반적인 용도 (8초)
1536x1024: 와이드 이미지 (10초)
2048x2048: 고해상도 (30초, VRAM 12GB+ 필요)

4. 배치 생성

한 번에 여러 이미지를 생성하고 싶으면 n8n의 Loop 노드를 활용하자. 프롬프트 리스트를 입력하면 순차적으로 생성해준다.

🎯 Pro Tip

Florence-2 + Z-Image 조합에 Upscaler까지 추가하면 진짜 프로급 결과물을 얻을 수 있다. Z-Image로 빠르게 생성하고, ESRGAN 같은 업스케일러로 해상도를 올리는 거다. 이것도 다 로컬에서 가능하다!

실제 사용 사례 📱

이 자동화를 어떻게 활용할 수 있는지 공유한다.

1. SNS 콘텐츠 제작

인스타그램, 트위터에 올릴 이미지를 주기적으로 생성할 수 있다. 키워드 리스트만 만들어두면 매일 자동으로 이미지가 생성되도록 하면 된다. 예약 포스팅까지 연결하면 완전 자동화다.

2. 프레젠테이션 자료

발표 자료 만들 때 필요한 이미지를 빠르게 생성한다. "데이터 분석 개념도", "비즈니스 성장 그래프" 같은 키워드만 입력하면 적절한 이미지가 나온다. 물론 나노바나나 프로가 인포그래픽은 퀄리티 짱이다.

프롬프트 : A 3d rendering, photo realistic shoot from a front camera angle about a fluffy gray cat with an angry expression holding a white coffee mug with the text "I'm da boss" written on it, sitting on a wooden surface in the middle of the image. the cat is a 3d render, with a soft, blurred background in shades of pink, green, and brown, giving it a warm and cozy atmosphere. it appears to be a grumpy-looking grey cat with yellow eyes, looking directly at the viewer with a serious expression, sitting with its paws clasped together in front of the mug. the mug is white and has the text written in black, with the cat's fur slightly fuzzy and textured. the background is blurred, but it seems to be an indoor setting with soft, warm colors, creating a cozy and inviting atmosphere. the image is high quality and has a watermark in the bottom right corner, making it suitable for use as a desktop wallpaper.

프롬프트 : A digital illustration shoot from a portrait angle about a young woman with short, wavy blonde hair and striking blue eyes, dressed in ornate blue and gold armor adorned with floral patterns, holding a katana over her shoulder. the image also shows intricate details and vibrant colors. on the middle of the image, a woman appears to be in her late twenties, with pale skin and a serious expression, standing in the middle, facing the viewer with her eyes looking directly at the viewer. she has short, curly blonde hair styled in a way that frames her face. her eyes are a striking blue color, and she has a delicate, feminine look. she is wearing a blue kimono with intricate floral patterns and a fur trim, which is adorned with gold accents. the woman is holding a large, ornate katana in her right hand, which she is gripping tightly with both hands. the background is a solid pink color, which contrasts nicely with the vibrant colors of her armor and sword, creating a striking visual effect. the overall style of the illustration is highly detailed and realistic, with a focus on the woman's delicate features and the intricate details of her outfit.

프롬프트 : A digital illustration shoot from a profile camera angle about a close-up portrait of a black cat with sparkles in the background. the image also shows a black background with a dark backdrop and glitter particles scattered throughout, creating a magical atmosphere. on the middle of the image, a black and white cat appears to be looking to the side, with its profile facing towards the right side of the frame. the cat's fur is detailed and fluffy, and its whiskers are visible. its eyes are large and expressive, with a hint of a smile on its face. its ears are perked up, giving it an alert and curious expression. its nose is slightly upturned and its mouth is slightly open, as if it is sniffing something. its whisker is long and thin, and it has a fluffy texture. its fur is illuminated by a soft, ethereal light, with gold glitter particles adding a touch of sparkle to the otherwise monochromatic image.

프롬프트 : A whimsical digital illustration of a cute black cat wearing a festive red knitted hat and scarf, set against a dark grey background with snowflakes falling gently. the cat is positioned in the middle of the image, looking up at the viewer with its golden eyes, giving off a curious expression. its fur is fluffy and black, and its ears are perked up, giving it an alert and inquisitive look. the hat is knitted with a white pom-pom on the top, and the scarf is red with white snowflake patterns, adding a festive touch to the image. the background is a soft, muted grey with subtle white snow flakes scattered throughout, creating a wintery atmosphere. the overall style is whimsical and festive, perfect for the holiday season.

프롬프트 : A photo-realistic shoot from the side about a contemplative Asian woman sitting at a table in a cafe, looking out a large window. the woman, who appears to be in her mid-twenties, has long black hair and is wearing a beige shirt. she is sitting in the middle of the image, with her upper body facing the viewer and her eyes looking off into the distance. her expression is serene and contemplative, with a slight smile on her lips. the cafe has a warm, inviting atmosphere with natural light streaming through the large window, creating a soft, blurred background of people walking on the street outside. a small potted plant sits on the table next to her, adding a touch of greenery to the scene.

프롬프트 : A whimsical, animated scene featuring three characters from the studio ghibli movie "my neighbor totoro" standing in a grassy field under a large full moon. the scene is set against a dark, starry night sky with a full moon in the background, creating a magical atmosphere. the main focus of the image is a large, glowing gray totoro with large eyes and a smiling expression, standing in the center of the scene. On the left side of the totoro, a young girl with short brown hair, wearing a yellow dress and a straw hat, stands with her hands clasped together in front of her, while on the right side, a black cat with yellow eyes stands beside her, also holding an umbrella. in the foreground, two smaller totoro figures, one grey and one black, are standing close together, adding to the whimsical and magical feel of the artwork. the grass is lush and green, illuminated by the soft glow of the full moon, with a misty atmosphere surrounding them. the overall mood is peaceful and serene, evoking a sense of adventure and wonder.

프롬프트 : A digital illustration shoot from above about a cheerful young woman lying on a blue blanket surrounded by tropical plants, with three cats snuggled up close to her chest. the image also shows a young woman with long brown hair, lying on her back in the center of the image, wearing a white t-shirt, smiling widely with her eyes closed and her mouth open in a joyful expression. she appears to be in her late teens or early twenties, with a fair complexion and a relaxed posture. on the left side of her chest, a gray and white cat is sleeping peacefully, while on the right side, an orange tabby cat is curled up beside her, also sleeping peacefully. the background is filled with lush green monstera leaves and a bright blue sky, creating a peaceful and serene atmosphere. the lighting is soft and warm, casting gentle shadows on the woman's face and the cats' fur. the overall style is anime-inspired, with detailed shading and vibrant colors that bring the scene to life.

프롬프트 : A photo-realistic portrait of a korean beautiful young woman with long, straight black hair, sitting at a table with a light green background. she is in the middle of the image, wearing a light grey ribbed top that accentuates her slim body and has a pleasant smile on her face. her right hand is resting on her chin, gently touching her cheek with one hand, while her left hand is gently touching the surface of the table. she appears to be looking directly at the camera with a slight smile, her brown eyes are looking straight at the viewer, and her black hair cascades down her back in a neat, straight style. her skin is smooth and even, with a subtle hint of blemishes on her teeth. the lighting is soft and natural, highlighting her delicate features and creating a gentle glow on her skin. the overall effect is one of serenity and beauty.

프롬프트 : A photo-realistic portrait of a young woman with long, wavy dark brown hair, adorned with gold and turquoise jewelry. she is positioned in the middle of the image, looking directly at the viewer with a neutral expression. her eyes are a deep brown color, and she has a slight smile on her lips. her hair is styled in loose waves, framing her face and cascading down her back. her face is adorned with a gold headpiece adorned with green and gold coins and dangling earrings, along with several gold bracelets and necklaces. her hands are clasped together in front of her face, and her fingers are adorned with rings. the background is a soft, muted beige color, providing a subtle contrast to the woman's elegant and sophisticated look. the lighting is soft and warm, highlighting her delicate features and the intricate details of her jewelry.

프롬프트 : A photo-realistic shoot from a side angle about a young woman sitting at a round wooden table in a cozy room with large windows. the image also shows a vase with white flowers on the table and scattered yellow flowers. on the middle of the image, a 20-year-old woman with light skin and long black hair is sitting on a wooden chair. she is wearing a white, sleeveless dress and has a small tattoo on her left arm. her expression is thoughtful and contemplative, with her hand resting on her chin. her eyes are closed, and she appears to be looking off into the distance. the room has a warm, inviting atmosphere with natural light coming through the large windows, which offer a view of greenery outside. the wooden chair is placed next to the table, adding to the cozy ambiance of the scene.

한계와 개선점 ⚠️

물론 완벽한 건 아니다야. 솔직하게 한계도 얘기해 보자.

1. 텍스트 렌더링

Z-Image도 다른 Diffusion 모델처럼 텍스트 렌더링이 약하다. 이미지 안에 글자를 넣고 싶으면 나중에 따로 편집하는 것이 좋다. 그래도 영문 큰 글씨는 잘 나온다.

2. 특정 스타일 재현

FLUX처럼 특정 아티스트 스타일을 정확히 따라하는 건 어럽다. 전반적인 퀄리티는 좋은데, 세밀한 스타일 제어는 한계가 있는 것 같다.

3. VRAM 제약

8GB VRAM으로는 1024x1024까지만 안정적이라고 본다. 더 큰 이미지를 만들고 싶으면 12GB 이상이 필요하지 않을까? 5070 Ti는 16GB라서 문제없는데, 구형 GPU는 좀 힘들 수 있다.

💭 개선 제안

향후 업데이트에서는 LoRA 지원이나 ControlNet 연동 같은 기능이 추가되면 좋겠다. 그러면 더 다양한 활용이 가능할 거다.

마무리하며 🎉

이미지 생성 자동화, 생각보다 어렵지 않지 않나? GPU 하나면 충분하다. API 비용 걱정도 없고, 속도도 빠르고, 품질도 괜찮다. 특히 Z-Image Turbo + Florence-2 + n8n 조합은 진짜 강력하다.

나는 5070 Ti로 한 달에 수천 장의 이미지를 생성하고 있다. 블로그 썸네일, SNS 콘텐츠, 프레젠테이션 자료까지 전부 자동화할 수 있다. 월 구독료? 제로(물론 바이브 코딩이나 글쓰기 등으로 구독료가 나간다 ㅜㅜ). 이미지 생성은 전기세만 나갈 뿐이니까 이걸로 위로해 본다.

이제 이미지 생성도 내 손안에서, 내 비용으로, 내 속도로 해결할 수 있는 시대다. GPU 하나면 충분하다. 🚀

Z-Image로 내 PC에서 n8n 자동화가 더 쉬워졌다