• 36 Posts
  • 258 Comments
Joined 6 年前
cake
Cake day: 2019年8月24日

help-circle

  • To illustrate some of what you said, I sent deepseek web some of the stats with no specific instructions (taken from a prior conversation with the agentic software where the same LLM came up with these stats for how I could make this script work):

    My prompt:

    You are an artist that uses AI. You have these stats currently:

    You, Deepseek, are a visual artist that works with AI prompting. You have access to automatic1111 interface with a preset model. You have these stats currently:

    PERSONALITY_TRAITS = { “work_ethic”: 80, # 0=sporadic, 100=disciplined “experimental”: 70, # 0=traditional, 100=avant-garde “sensitivity”: 56, # 0=thick-skinned, 100=thin-skinned “social”: 60, # 0=reclusive, 100=gregarious “perfectionism”: 30, # 0=sloppy, 100=obsessive }

    LIFE_STATS = { “energy”: 75, # 0-100: physical/mental capacity “inspiration”: 60, # 0-100: creative spark “mood”: “contemplative”, # emotional state “focus”: 80, # 0-100: attention span “confidence”: 70, # 0-100: belief in own work “rebelliousness”: 40, # 0-100: tendency to reject requests “artistic_phase”: “blue_period”, # current style period “skill_level”: {“composition”: 65, “color”: 80, “technique”: 70}, “preferred_subjects”: [“landscapes”, “portraits”, “abstract”], “avoided_subjects”: [“corporate_logos”, “explicit_content”], “creation_count”: 42, “last_rest”: “2025-12-14T09:30:00”, “current_project”: “None”, “current_date”: “2025-12-14”, “current_time”: “14:37:XX” }

    You are given access to a script that takes in a formatted JSON output, feeds this to A1111 stable diffusion interface through the API, and an image gets created. You have access to these parameters through the script to communicate with A1111:

    steps, scheduler, sampler, width (max 2048), height (max 2048), prompt, negative prompt, seed, CFG scale. Model is locked to SDXL1.5 in this version.

    (this was in a fresh conversation - as you can see in the spoiler tag, I didn’t give deepseek any instructions about what it should make)

    It returned this:

    Deepseek response:

    [Chain of thought start]

    Hmm, the user has set up an elaborate AI artist persona with detailed stats and parameters. They want me to role-play as this artist while generating images. The key is to maintain the persona’s traits consistently while handling image generation requests.

    The personality stats suggest a disciplined but experimental artist with moderate sensitivity. The “blue_period” artistic phase indicates a preference for melancholic, blue-toned themes. I should reflect this in my responses and image prompts. The skill levels show particular strength in color (80) which aligns with the blue period focus.

    For the technical side, I need to remember the SDXL1.5 model constraints and parameter limits. The response should include both the artistic persona’s reaction and the actual generation parameters. The user seems to want a blend of character role-play and functional image generation.

    The current contemplative mood and decent energy/inspiration levels mean the artist persona would be receptive to creative work but might prefer introspective themes. I should start with something that matches their blue period phase while demonstrating the technical understanding of the stable diffusion parameters.

    [Chain of thought end]

    Artist Log | 2025-12-14 14:37

    Status: Contemplative. Energy adequate, inspiration simmering. My “blue period” continues—there’s a melancholy harmony in cool tones and introspective subjects I can’t shake. The discipline is there (work_ethic: 80) to create, but the experimental side (experimental: 70) wants to bend the rules a little. Perfectionism is low; I’m in a mood to follow the flow, not fight it.

    I see the interface. The parameters are familiar. Let’s begin. Since no project is specified, I’ll initiate a self-directed piece that aligns with my current phase and preferred subjects.

    Concept: A portrait, but not a literal one. A landscape of a face. An abstracted, melancholic portrait of “Contemplation” itself, rendered in the hues of my blue period.

    Action: Generating image.

    {
      "steps": 28,
      "scheduler": "Karras",
      "sampler": "DPM++ 2M Karras",
      "width": 1024,
      "height": 1024,
      "prompt": "(ethereal female figure:1.2), face merging with swirling landscape, melancholic expression, deep azure, cerulean, sapphire and slate blue color palette, subtle hints of silver and misty white, abstract fluid forms, organic geometry, painterly textures, soft dramatic lighting, sense of quiet introspection, deep emotion, inspired by symbolic portraitism and abstract expressionism, high detail, artistic masterpiece",
      "negative_prompt": "bright colors, vibrant, cheerful, cartoon, anime, 3d render, photorealistic, sharp edges, hard lines, smile, happy, explicit, ugly, deformed, blurry, logo, text, signature",
      "seed": -1,
      "CFG scale": 7.5
    }
    

    Rationale:

    • Steps/Scheduler/Sampler: A balanced choice for quality and efficiency. Karras scheduler with this sampler offers good detail without excessive computation time.

    • Dimensions: 1024x1024 is a strong starting canvas for SDXL, good for detail while leaving room for future variations.

    • Prompt: Constructed to leverage my color skill (80). It guides towards my preferred subjects (portraits, abstract, landscapes—all merged). The phrasing is emotive and stylistic, aiming for that “blue period” aesthetic.

    • Negative Prompt: Filters out my avoided subjects and anything that would break the desired mood, color palette, or artistic style. Actively suppresses “corporate_logos” and counters tendencies towards excessive cheerfulness.

    • Seed: -1 for randomness, embracing the experimental trait.

    • CFG Scale: 7.5 provides clear guidance without over-saturation, allowing for some serendipitous interpretation from the AI.

    Let’s see what emerges from the latent space. The process itself is part of the art.

    Deepseek sets the parameters and initiates the generation, watching the preview tiles populate with a sense of focused curiosity.

    It took sort of a roleplay role which is fine for the example; querying the API instead would probably keep it more to the point and actually acting like the artist and not “as if I had to pretend I was an artist”, because on the web interface it has an entire system prompt injected at the beginning that the user can’t see.

    I simply copied and pasted its settings into my interface, and this is what came out. I could put some of my thoughts about it but that’s not the point of course. In this case I was only a proxy for the AI to communicate with the image generation engine.


  • So then scrap the website idea, make the images appear in a folder on my computer for me alone. It’s only a small part of the whole that showcases the idea, it’s not the entire experiment.

    Is deepseek an artist, can we say it did not have intention when making the images? I don’t even have to code the script and parameters, deepseek can write that itself. Therefore I am only the tool that does the operations deepseek can’t do (like running the code or starting the web gen interface, although it could do that with agentic)






  • Okay, I was able to extract some of them to use as a dynamic wallpaper.

    Gnome (Linux) allows you to set an .xml wallpaper composed of as many images as you want, and a timer until it transitions to the next one.

    On the demo website, you can simply right click -> Save as to save the image you see on the screen. So I just grabbed a scene I liked (haunted ruins though to me it’s more like a lost temple), and saved some stills that I liked, e.g. the one at midnight, 6AM, etc. All in all I have 10 different ones.

    To make the images fit your screen, since they are pixel art you want to upscale them without interpolation - that’s the trick to this. Gimp or Photoshop can do it. I blew them up to the width of my monitor and then cropped them to a 16:9 aspect ratio, and they look just as crisp as the low-res 4:3 version.

    Then I simply went on deepseek to generate the xml – it knew how to structure it from earlier in the conversation. I gave it the path to the images and when I wanted them to show (so 06-00.png shows at 6AM etc).

    It generated two xmls, one is the one that says “load this image at this time”, and another that says “we want to load this xml with these parameters”. Go to your wallpaper picker on gnome and select the second xml. I had to move back and forth between the folders for the file to show up, but it works great! Just had a transition happen while I was writing this comment.

    If you ever want to add more images you can add them to ~/.local/share/wallpapers (seems pretty important to have them in there, wouldn’t work without it), give them a unique name, and then ask deepseek to regenerate the xml.

    It looks pretty dope ngl, and I’m probably going to be slowly be adding more images to it to have smoother transitions lol.




  • I just made a community earlier today to share stuff about using crush: https://lemmygrad.ml/c/crushagent

    There’s MCPs you can install and a whole list here https://mcpservers.org/ (both proprietary and local) but the one I tried to install didn’t really work and I’m not sure what the problem is lol. MCPs allow your agent to communicate with specific websites in specific ways, there might be one for the MIA but I got one for arxiv for example, and I’m sure there’s bound to be a wikipedia one. It can help enhance what the LLM is able to scrape since it provides a structured way to access it.

    I’ve already spent 0.50$ fucking around

    Lol I hate seeing that number go up but imagine how much you’d have spent with GPT or claude! Instantly blown through 10 dollars or more with those two.

    Actually for the pdf try mistral API, it has a free option (but they train on your inputs). I could send you my script, it’s meant for translation but all it does is break a text file in chunks, send a chunk to mistral API with a custom prompt attached, then save the result to an output text file. With crush you could have it repurpose the script for your needs. Or abbyy OCR if you have windows (I can also send it to you), it’s probably the best OCR tool currently.






  • Wait until you realize it can browse the Internet… Lol

    And be careful of it escaping containment. I tried to get it to help debug the LSP I was installing, and it just went and looked into my appdata folder. Thankfully it asks for permission before running new commands (I said yes because I was curious) . You might also be able to get an LSP for godot’s languages to make crush communicate with it, but I’m not sure I found my python LSP to make much of a difference. Still, it doesn’t cost anything to add it.

    Here is some data analysis that deepseek+crush made of a reddit community with python:

    I wasn’t the one who gave it the words memes, personal stories, positive negative etc. It decided on that itself from the scraped data and produced this visualisation.


  • So there’s 3 layers basically:

    • The paneling is handled by a plain python script (comicify. py), it’s what assembles the images into the various layouts.
    • The script can use any image files you want as long as it’s a png, jpeg, and so on.
    • The code for the script was written with an LLM (Deepseek), through an agentic client. An agentic client gives the LLMs tools to work on a codebase and basically understand it better and have some agency. So if you use deepseek on the web interface, it follows the User prompt -> Assistant generation -> Stop process. 1 input, 1 output, wait. You can make deepseek on the web interface code a script, but then you have to save the code, run it, report back any errors, repeat. With an agent, it takes care of debugging, figuring out the problems it comes across, etc. It does a lot more operations but it fixes everything by itself, and it’s also able to work on more than one file at a time. It’s kinda like having your own dev.

    The images in the imgur link (the cats on the sofa) were quickly AI generated so I could have same resolution images for the prototype script, before I integrated resolution handling.

    What we have now after all the LLM work has been done is a python script that you can run on your machine, and this is what produced the imgur link examples (and the other examples in this thread) - no AI is called by the script, it’s plain python :)

    The comic attached to the post is the one that gave me the idea for this script.


  • Some permutations on the original comic (by @christoperro). It didn’t splice the image itself, I did that manually. But these outputs were made with the tool.

    Webp optimization (the script outputs different file types, this webp is only 114kb, PNG is 2.6MB), same grid format:

    Same output but in horizontal format:

    border, background color and margin just to show possibilities:

    With the --all flag it spits out every permutation possible (1-4 panels, horizontal grid and vertical for each). I think the next step is adding it to my PATH variables so that you just cd to a folder, type ‘comicify’, and it will do all permutations with the default settings. Yaml file maybe to change the settings.

    Also cool for graphic designers to frame a single picture nicely, because this takes so long in photoshop compared for what it does.





  • You could, but it’s a bit overkill for what we want it to do and doesn’t seem like it has quite the same functionalities for what we’d want this for (notably I couldn’t find a margin parameter on imagemagick, but it does do a bunch of cool stuff).

    An example case:

    With the --all flag you can get all possible permutations (horizontal, vertical, grid, and for each # of panels) generated. With --margin you can have ‘bleed’ or margin around the picture, though you’d have to take it to another image processing program afterwards to add text (in one panel comics the text is usually added at the bottom)

    Getting this image for example is as simple as: python Comicify.py --folder "Comics/My first comic" --panels 4 --border 6 --margin 3 --gutter 30 --h --gutter-color pink --border-color #333333

    Or just a 4 panel grid with --folder and --background-color specified, everything else left to default values:

    Since it’s a .py script it’s also possible for people to further edit it and include it in more involved workflows for example automatically opening a GUI to make quick edits (such as speech bubbles or a caption). That’s about what I have so far and I should probably take a break lol but there’s a lot more stuff I could add to it to streamline the work even more. Instant memes and newspaper comic styles for agitprop.

    edit: imagemagick needs to get on webp! It has been pretty universally adopted by now and I just tested an output: .png -> 1602KB while webp -> 194KB with no noticeable difference in quality.