• 0 Posts
  • 186 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle
  • The majority of people using AI will not insist you use it, aside from just trying to get others a realistic look on what the technology actually is. Just like a photographer won’t preach to a painter that they should pick up a camera. But that does not mean there isn’t benefit to derive from understanding how the other produces art. And if the painter thinks the camera is doing all the work and the photographer is a fraud, it’s probably good for them to get some exposure and realize it’s not just pressing a button. Like I explain here how that metaphor works for AI.

    Most people I know that use AI are shocker, artists from before AI was a thing. They know how to draw with pencils, brushes, sponges, but also painting programs like Photoshop, sculpting programs, modeling programs, surface painting programs, shader production, algorithmic art. They are through and through artists. Them adding AI to their toolbox does not change that.



  • Hi, person here who’s drawn extensively with pencil as a kid but then slowed down. AI has reignited my passion for art, because unlike pencil or even digital drawing, you can iterate much more quickly, which allows me to do so while working a demanding job and having a life besides it. You are severely underestimating the amount of micro-decisions that go into AI art, and more importantly AI assisted art. If you’ll allow me, let me explain.

    Lets break it down into levels of effort and creative input, I like to refer back to photography since it has some comparison:


    1. Empty canvas prompting.
    spoiler

    To me that is essentially the equivalent of taking a selfie or a random shot. On the scale of effort this is none to very little. But a prompt can be unique if you put an extreme amount of effort into specifying the exact details, just like a random shot can in fact be a very informed random shot.

    You can put a rather massive amount of tokens into your prompt that all further specify this. At this point you can reasonably say you imagined at least the general look of the image before it was created. But you can’t say you had any part in actually drawing it or significantly determined how it was drawn. It’s basically impossible to get any kind of copyright protection over this unless you can back up your prompt very well, and only then would you get protection over your prompt, not what the AI drew.

    1. Image to image
    spoiler

    You can feed an image into the AI, you add noise to the image, and let the AI try to remove that noise (After all, this is the exact same as it does on an empty canvas, but that is completely random noise). This means that a large part of your original image specifies how the AI will further try to denoise the image. As such, you are guiding a large part of how the AI moves forward. No other artist would likely use the same input image as you, so human decision making plays a bigger part here.

    To me this is the equivalent of choosing a location you want to take a picture, and then scouting several locations to see which one works best. That’s the micro-decisions sneaking in. You are giving the AI an existing image, either created by yourself, or from previous iterations.

    At this point, you are essentially evolving an image. You are selecting attributes and design choices in the image you want to enhance and amplify. These are decisions the artists makes based on their view of how the final image will look like. Every iteration adds more decisions that no other artist would take the same way.

    1. Collaboration.
    spoiler

    The point where AI starts becoming a tool. You don’t start with point 1, or at least only use point 1 for brainstorming. You imagine the image beforehand, just like you would do with pencil. You can develop the image as much as you would like before going to the AI. You are making the exact micro decisions you are with drawing by hand, since it’s essentially the same up to this point. A photographer at this point would work out every fine detail before snapping the picture.

    Except for the fact that you know you are going to be using an AI, so certain aspects need more or less refinement to properly be enhanced by the use of AI. Just like you don’t start the same if you’re going to make a painting, or a silhouette, or any other kind of technique. At some point, you return the image to the AI and mostly perform step #2, perhaps returning to brainstorming with step #1 if you want to add or remove from your existing design.

    1. AI truly as a tool.
    spoiler

    Now to make something actually with #3, you start doing this process in iterations. Constantly going back and forth between photoshop and the AI, sometimes you spend days in photoshop, other times you spend days refining a part of the image with AI. There are also additional techniques like ControlNet, LoRAs, different models, different services, that can drastically enhance how well you get to what you want. A photographer at this point would take as many shots as they would need using their creatively controlled setup, and find the best on among them. Different lenses, different vocal lengths, different lighting (if applicable), different actions in the shot.


    Sadly, most people that talk confidently about how much they hate AI just know point #1 and maybe point #2. But I see point #3 and #4. And when I talk to artists that haven’t yet picked up on AI, but if they are aware (or made aware) of #3 and #4, suddenly their perspective also changes in regards to AI. But the hostility and the blind anger makes it quite hard to get through to people that not all art with AI is made equally. We should encourage people that want to use AI to reach the point of #3 and #4 so that their creative input is significant to the point it’s actually something produced by their human creativity. And this is also where an existing artist switching to using AI will most likely start off from.

    Also, in terms of time. #1 might take seconds, #2 might take minutes to hours, but #3 and #4 can take days or weeks. Still not as long as drawing those same pieces by hand, but sometimes I wish it was as easy as people would make it out to be.





  • I’m not a vegan - but we are omnivores, we can eat plants. There is nothing unnatural about it. Let alone if you compare it to our modern ‘normal’ food, which is chock full of extra sugar, extra fat, extra protein, extra artificial additives like preservatives, sweeteners, and what not. It’s also factual that you can get more energy out of directly consuming plant material than eating an animal that consumed said plant material. If you take the biggest offenders for that, cows. You need 8 kg of feed for them to produce a kg of meat, this is known as it’s feed conversion ratio (source). Other animals (Like chicken and fish) are better, but a ration below 1 is essentially impossible.

    I like the taste of meat as much as the next (average) person, but vegans do have a factual basis for their stance. But non-vegans rebuttal to that is realistically just “I don’t want to give up meat because I like it” not “the facts aren’t on your side.” - Lets be honest about that.



  • I feel you in avoiding public transit. That’s where my hate comes from as well. And yes, many people that do these things have have excuses. Because they need to, to justify doing their business in a place where their habit unavoidably harms and frustrates other people. I hate the fact we still allow that so readily as society. Or at least we should restrict it further to the point a normal person doesn’t have to be bothered by people like that in public. It undermines public services to an extent.

    But after I no longer needed to use public transit, I did start to see things in a slightly different light. And that’s the only thing I wanted to say. People that are conscientious about enjoying any kind of mind altering substances will choose to do so safely and harmlessly outside of public, or in designated places like clubs specifically for that substance. Harm reduction must be central to substance use. And I know now that many people have that mentality. But that mentality is somewhat threatened exactly because they make sure nobody is bothered by them. It causes the experience to be defined by those people in public places, the loud minority.


  • I hate public smokers with a passion. But you must realize that you have effectively zero exposure to people that contain their smoke by doing it at home or using a method without smoke production. And there could be a lot more of those.

    The last line is especially golden for me since I live in the Netherlands so we have plenty of weed being smoked but the vast vast majority of public smoke hinderance is from tobacco smokers. If they decide to smoke in public they have absolutely no shame and will literally do it at places like bus stops and just outside restaurants. Weed smokers rarely do that here. So if I were to believe you it seems to just be correlated to people with shitty attitudes rather than the substance.

    But there’s no denying that if everyone would drop alcohol for weed, it would be better. Not because weed is harmless but because alcohol is pretty terrible health wise.


  • I never anthropomorphized the technology, unfortunately due to how language works it’s easy to misinterpret it as such. I was indeed trying to explain overfitting. You are forgetting the fact that current AI technology (artificial neural networks) are based on biological neural networks. There is a range of quirks that it exhibits that biological neural networks do as well. But it is not human, nor anything close. But that does not mean that there are no similarities that can be rightfully pointed out.

    Overfitting isn’t just what you describe though. It also occurs if the prompt guides the AI towards a very specific part of it’s training data. To the point where the calculations it will perform are extremely certain about what words come next. Overfitting here isn’t caused by an abundance of data, but rather a lack of it. The training data isn’t being produced from within the model, but as a statistical inevitability of the mathematical version of your prompt. Which is why it’s tricking the AI, because an AI doesn’t understand copyright - it just performs the calculations. But you do. And so using that as an example is like saying “Ha, stupid gun. I pulled the trigger and you shot this man in front of me, don’t you know murder is illegal buddy?”

    Nobody should be expecting a machine to use itself ethically. Ethics is a human thing.

    People that use AI have an ethical obligation to avoid overfitting. People that produce AI also have an ethical obligation to reduce overfitting. But a prompt quite literally has infinite combinations (within the token limits) to consider, so overfitting will happen in fringe situations. That’s not because that data is actually present in the model, but because the combination of the prompt with the model pushes the calculation towards a very specific prediction which can heavily resemble or be verbatim the original text. (Note: I do really dislike companies that try to hide the existence of overfitting to users though, and you can rightfully criticize them for claiming it doesn’t exist)

    This isn’t akin to anything human, people can’t repeat pages of text verbatim like this and no toddler can be tricked into repeating a random page from a random book as you say.

    This is incorrect. A toddler can and will verbatim repeat nursery rhymes that it hears. It’s literally one of their defining features, to the dismay of parents and grandparents around the world. I can also whistle pretty much my entire music collection exactly as it was produced because I’ve listened to each song hundreds if not thousands of times. And I’m quite certain you too have a situation like that. An AI’s mind does not decay or degrade (Nor does it change for the better like humans) and the data encoded in it is far greater, so it will present more of these situations in it’s fringes.

    but it isn’t crafting its own sentences, it’s using everyone else’s.

    How do you think toddlers learn to make their first own sentences? It’s why parents spend so much time saying “Papa” or “Mama” to their toddler. Exactly because they want them to copy them verbatim. Eventually the corpus of their knowledge grows big enough to the point where they start to experiment and eventually develop their own style of talking. But it’s still heavily based on the information they take it. It’s why we have dialects and languages. Take a look at what happens when children don’t learn from others: https://en.wikipedia.org/wiki/Feral_child So yes, the AI is using it’s training data, nobody’s arguing it doesn’t. But it’s trivial to see how it’s crafting it’s own sentences from that data for the vast majority of situations. It’s also why you can ask it to talk like a pirate, and then it will suddenly know how to mix in the essence of talking like a pirate into it’s responses. Or how it can remember names and mix those into sentences.

    Therefore it is factually wrong to state that it doesn’t keep the training data in a usable format

    If your arguments is that it can produce something that happens to align with it’s training data with the right prompt, well yeah that’s not incorrect. But it is so heavily misguided and borders bad faith to suggest that this tiny minority of cases where overfitting occurs is indicative of the rest of it. LLMs are a prediction machines, so if you know how to guide it towards what you want it to predict, and that is in the training data, it’s going to predict that most likely. Under normal circumstances where the prompt you give it is neutral and unique, you will basically never encounter overfitting. You really have to try for most AI models.

    But then again, you might be arguing this based on a specific AI model that is very prone to overfitting, while I am arguing this out of the technology as a whole.

    This isn’t originality, creativity or anything that it is marketed as. It is storing, encoding and copying information to reproduce in a slightly different format.

    It is originality, as these AI can easily produce material never seen before in the vast, vast majority of situations. Which is also what we often refer to as creativity, because it has to be able to mix information and still retain legibility. Humans also constantly reuse phrases, ideas, visions, ideals of other people. It is intellectually dishonest to not look at these similarities in human psychology and then treat AI as having to be perfect all the time, never once saying the same thing as someone else. To convey certain information, there are only finite ways to do so within the English language.





  • Your first point is misguided and incorrect. If you’ve ever learned something by ‘cramming’, a.k.a. just repeating ingesting material until you remember it completely. You don’t need the book in front of you anymore to write the material down verbatim in a test. You still discarded your training material despite you knowing the exact contents. If this was all the AI could do it would indeed be an infringement machine. But you said it yourself, you need to trick the AI to do this. It’s not made to do this, but certain sentences are indeed almost certain to show up with the right conditioning. Which is indeed something anyone using an AI should be aware of, and avoid that kind of conditioning. (Which in practice often just means, don’t ask the AI to make something infringing)


  • This would be a good point, if this is what the explicit purpose of the AI was. Which it isn’t. It can quote certain information verbatim despite not containing that data verbatim, through the process of learning, for the same reason we can.

    I can ask you to quote famous lines from books all day as well. That doesn’t mean that you knowing those lines means you infringed on copyright. Now, if you were to put those to paper and sell them, you might get a cease and desist or a lawsuit. Therein lies the difference. Your goal would be explicitly to infringe on the specific expression of those words. Any human that would explicitly try to get an AI to produce infringing material… would be infringing. And unknowing infringement… well there are countless court cases where both sides think they did nothing wrong.

    You don’t even need AI for that, if you followed the Infinite Monkey Theorem and just happened to stumble upon a work falling under copyright, you still could not sell it even if it was produced by a purely random process.

    Another great example is the Mona Lisa. Most people know what it looks like and if they had sufficient talent could mimic it 1:1. However, there are numerous adaptations of the Mona Lisa that are not infringing (by today’s standards), because they transform the work to the point where it’s no longer the original expression, but a re-expression of the same idea. Anything less than that is pretty much completely safe infringement wise.

    You’re right though that OpenAI tries to cover their ass by implementing safeguards. Which is to be expected because it’s a legal argument in court that once they became aware of situations they have to take steps to limit harm. They can indeed not prevent it completely, but it’s the effort that counts. Practically none of that kind of moderation is 100% effective. Otherwise we’d live in a pretty good world.



  • I think I got the point just fine… you’re wasting a ton of electricity and potentially your own money on making text that is not bad training data. Which is exactly what I said would happen.

    LLMs are made of billions of lines of text, the last we know is for GPT3 with sources ranging from 570 GB to 45 TB of text. A short reddit comment is quite literally a drop in a swimming pool. It’s word prediction ability isnt going to change for the worse if you just post a readable comment. It will simply reinforce it.

    And sure you can lie in it, but LLM are trained on fiction as well and have to deal with that as well. There are supplementary techniques they apply to make the AI less prone to hallucinations that dont involve the training data, such as RLHF (Reinforcement learning from humans). But honestly speaking the truth is a dumb thing they try to use the AI for anyways. Its primary function has always been to predict words, not truth.

    You would have to do this at such a scale and so succesfully voting wise that by that time you are significantly represented in the data to poison it you are either dead, banned, bankrupt, excluded from the data, or Google will have moved on from Reddit.

    If you hate or dislike LLMs and want to stop them, let your voice be known. Talk to people about it. Convincing one person succesfully will be worth more than a thousand reddit comments. Poisoning the data directly is a thing, but it’s essentially impossible to inflict alone. It’s more a consequence of bad data gathering, bad storage practice, and bad training. None of those are in your control through a reddit comment.