Revolutionizing Urban Design and Architecture: How AI could Transform the Built Environment

When I discovered Midjourney AI, I tried it out of curiosity. But when I started using it to reimagine our cities, and ChatGPT was introduced, I realised this could change everything

Dec 24, 2022

*This newsletter post is too long for e-mail, I encourage to read it on Substack instead. My apologies!*

What is Midjourney AI?

Let's start with an explanation of Midjourney AI for those who don't exactly know how it works. It is an Artificial Intelligence algorithm that creates images based on textual input, and which is trained on millions of images. With the power of machine learning, the algorithm is continually improved to create ever better results. It can create, let's say, the image of an alligator wearing a Christmas hat, painted in the style of Rembrandt - or anything else you could imagine.

The Discovery

When I discovered Midjourney AI, I initially tried it out of curiosity. Making images in the Discord app was a bit of a weird user experience at first, with the endless stream of prompts and images appearing. A few months went by without looking at the software (although I had taken a subscription for later use) - when I decided to try the software again.

‘What would happen’ I thought, ‘if I order it to visualise a city in bird's eye view? And what would happen if I ask it to make a city to my specification - one that is human scaled, attractive, beautiful, with varying architecture and nice public space?’

I crafted a few prompts and tried them out, but the results were disappointing. The buildings looked interesting, but ‘off’ in a way: not realistic enough, as I wanted it to be.

Then a thing happened out of pure chance: another person in the same Discord channel (the user interface, where you enter the prompt) saw my prompt and edited it a bit, but using a few extra parameters: --v 4 and --q 3 - and it changed everything.

--v 4 changes the version Midjourney uses to version 4, and with --q you can set the amount of time the algorithm spends on the image.

Bird's Eye View

This time around, using the new parameters, I was able to create way more beautiful and realistic bird's eye view images:

The first convincing urban environment I was able to create with Midjourney

I was flabbergasted, to say the least. The amount of detail was staggering, and the urban fabric looked… convincing.

Naturally, my curiosity was piqued and I started generating tons of images - tweaking the architectural style, to see how the algorithm would handle it. I also started a Twitter thread to share the results, which was entertaining - and stimulated me to make even more images.

The Aesthetic City @_Aesthetic_City

Letting OMA do urban design vs. letting Midjourney create an urban design. Prompt: Bird's eye view of a beautiful, human scaled city with beautiful ornate buildings, attractive squares, happy people - etc. Where would you rather live?

The prompt I used was the following:

Bird's eye view of a beautiful, human scaled city with beautiful buildings, attractive squares, happy people, hyperrealistic, 4K, liveable, urban design, [".... architecture"] --v 4 --q 3

Try it out for yourself!

And the results? Well, here are some:

The First Lessons

The interesting thing is how it was able to create cities that looked pleasant to live in consistently. Although the details don't really add up, if you zoom in and look at them closely, but the general layout, the style of the buildings, the ambience - they are all right.

And that is a very useful thing to have - it can give an idea of what principles to apply in a new urban area to get to a pleasing result. We shouldn't think in terms of copying, but in learning the language of the attractive city. Think of principles like diversity of the facades, the density, function mix, what happens at eye level, corner solutions and even some general ideas for how to lay out squares and streets.

Because to create true urban fabric - the type of fabric we know has been successful for generations, which is interconnected and offers plenty of opportunities for small businesses to be set up, for incremental adaptation due to small plot sizes.

We still haven't learned the lessons from Jane Jacobs and Jan Gehl - we continue to build ‘towers in the green’, although there is already so much research pointing in the direction that that concept has miserably failed in creating cohesive, pleasant areas that are conducive to human contact and that prevent alienation.

Bajeskwartier | The Robin - Eefje Voogd — Design by OMA for the ‘Bajeskwartier’, Amsterdam. Blocks floating above public space casting long shadows, blank walls, a lot of glass, big plot sizes. And boxes, a lot of boxes.

Fanciful AI creation by Midjourney, showing a far more cohesive, organic looking urban fabric that offers far more contact between facade and the street, smaller plot sizes and still a high density.

The City at Eye Level

But after the bird's eye view images, I wanted to go one step further. Because although the bird's eye view gives an interesting view from above, the real experience of the city happens at eye level.

First I wanted to compare what difference Midjourney would make between a prompt asking to create a pleasant, but brutalist, Le Corbusier / Mies van der Rohe inspired environment, versus a pleasant, more traditional urban environment.

The results were as follows:

The attractive, green & traditional street at eye level.

The attractive(?) modern street at eye level.

The differences are striking. Many of our cities start to resemble the second picture nowadays; for some reason the algorithm had a hard time combining the words pushing the image towards an attractive, human scaled street with the required type of modernist architecture. Does this mean that the two are mutually exclusive? I do believe that modernist buildings often function more like an ‘autonomous object’ rather than an integrated part of the urban fabric, with often a bigger plot size, more introverted facades and of course less detail and a lack of ornament - all of which have an effect on the experience at eye level.

I decided to generate some more examples of streets with more traditional fabric, and it didn't disappoint:

The attractive street, based on Palladio and Vignola

One interesting thing that often popped up in the images was the potted plant: it was one of the directly applicable design interventions that rolled out of the software I could think of. Another thing that drew my attention were the pedestrian friendly streets. The realism of the generated streets shocked me as well: Some of them come close to being hard to distinguish from actual pictures.

Transforming a Square

The next step was to enter an image of a boring, not so beautiful square into Midjourney, and to let it generate an improved version of that square. This was a bit of a guess, as I could imagine it would lead to strange or unrealistic results, but I simply had to try.

I chose a city in the Netherlands that was built completely anew, a ‘new town', Lelystad. It is known for it being less aesthetically pleasing than other cities, to put it mildly.

The results were quite good, but not fully convincing yet:

The ‘upgraded’ design, using Midjourney AI

I had hoped for it to create some of the urban fabric generated in the streets and bird eye views I created before, but it consistently stuck to the horizontality of the first floor of the current, modern buildings and pasted a more traditional building on top. However, it did conform to the general layout of the space - and successfully inserted a fountain.

Applications & Consequences

These results made me think about the applications and further repercussions of this software. Could we use it in our democratic process, to show people what else is possible in their cities, by letting it generate new options? Will it lead to more people questioning what is possible in our cities, to more people realising there are more types of urban fabric that can be built rather than only the apartment block facing a parking lot or a barely used, windy park?

Then, a Dutch group based in Amsterdam pointed me to a platform aiming to do just that, only using another image generation algorithm (DALL-E). The team at https://transformyour.city/ has already thought of the possibilities, namely to rapidly generate reimagined streets, making them suitable for pedestrians and cyclists, and introducing attractive greenery.

Knowing that the opportunities of this software are already being picked up in en effort to do something good for our cities was reassuring. But I believe this is only the beginning.

Future use & ChatGPT

What will this software be used for in the future?

Right now, we know what Midjourney can do, as I demonstrated. DALL-E can already be used to fill in ‘gaps’ in a picture to generate something new there. Think of re-designing a building in a street, or changing the street itself. Adobe Photoshop has functions like the automatic sky replacement tools, and other increasingly intelligent image editing tools.

What would happen when Midjourney or DALL-E are integrated in more types of software, like Photoshop, to easily generate a new facade or part of a facade for a street? You could select a place in your image and tell the software to insert an object there, like a pedestrian or a fountain. You would not need any stock photos or cutouts anymore, as it would be generated by AI. You could tweak it according to your wishes.

So, integrations in other software will be hugely important, in my opinion.

Also, 3D models could be created automatically, and perhaps even floor plans and technical details. In the long term, the full stack from architectural concept to the working drawings could be generated. This could happen when it is linked to software like ChatGPT.

ChatGPT is software created by OpenAI that lets you chat with an AI that can generate text and code - like convincing and well written letters, abstracts, essays, recipes, workout schedules to even Wordpress plugins. ChatGPT gained 1 million users in under a week, making it the fastest tech platform ever.

The use of this tool can significantly increase productivity to seemingly divine levels, but its ultimate impact in the long term depends on the quality of the input provided and the quality of the output produced.

Fun fact, and proof in point: I wasn't happy with how I originally worded the above sentence, so I fed it in to ChatGPT to improve it and it gave me the line above.

Speculation

This software gives us a glimpse into a future in which human production of imagery is not needed anymore, making it easier for people to express what they want in the form of images instead of in words. This can greatly improve public participation processes.

Also, it creates a new way for people to get inspired and enthusiastic - as with this software, an aspirational view of the future can easily be made.

With more software integration, many professions will be severely disrupted. Quality of work will go up, creating a new problem: people's expectations will rise.

Will it threaten ‘real human output?’ I believe that true creativity still lies within the human mind, - it is us that creates the prompts and puts AI to work, for now at least. But the nature of human output will change. We will need to learn how to work with these tools, we will need to adapt to a world in which some types of work don't seem worthwhile anymore. There is a risk that this demotivation will lead to people creating less and less, depending more and more on these types of software - as it is so powerful and generative.

But perhaps the real power of this software comes paired with the power of the human mind to steer it, and to improve it.

Regarding our cities, I see opportunities in the fields of democratic processes, increased design work output, increased output by government officials that have to do permits - and thus, perhaps to an increase in building production. I see processes going more fluent using this software, and perhaps quality of drawings goes up. By inspiring people with these newly imagined, highly detailed and human scaled cities, we might also see more support for traditional urban fabric.

Conclusion

All in all, in the long term, it is hard to say what will happen. I gave some examples of shorter term consequences and speculations, but it all depends on the pace of innovation, software integrations and of course on us what will happen at some point.

Technological progress seems to accelerate, bringing Ray Kurzweil's vision of the Singularity ever nearer, as well as dystopian nightmarish situations as described in Nick Bostrom's Superintelligence. Let us hope the long term consequences will be benign - and let us do all we can to make it so.

Midjourney vision for a city based on Swedish traditional architecture. Note the colourfulness, the water, bridges and the fine grained but dense urban fabric.

For the city, I am optimistic. The AI's we create now seem to be on the emotional and artistic side, rather than the cold and calculating side as we used to imagine, back in the 60's and 70's (think Data from Star Trek).

And cities that respond more to our emotions, to our human nature and our creativity, are a good thing, in my opinion. If anything, we should try to use these new tools in a way that foster developments towards a more aesthetic, liveable city.

Share this article or leave a comment:

The Aesthetic City Newsletter

Discussion about this post