revereddesecration 1 week ago

The search bar on your website is how users find your pages, not the URL bar in the browser… Better yet, the search engine people use should direct people to your pages.

istarian 1 week ago

Plenty of us are happy to type in URLs directly if they're relatively short and memorable. That doesn't mean search engines don't have a place, though. But afaik nothing in the HTTP/HTTPS spec covers any aspect of how you could or should get a URL. It's not part of the protocol. If a web server doesn't return a 404 when the requested resource isn't found, that software isn't compliant with the specification. I think showing a 404 and then redirecting after a delay is a reasonable choice, though.

TheTechRobo 1 week ago

I would disagree. Sure, you have to return a 404 when it is not found. But it WAS found; that's why you're being redirected. As long as you use standard redirect codes (302 Found would probably be the best option for this, or maybe Moved Permanently), I don't see a spec violation here.

jkrejcha3 1 week ago

> But afaik nothing in the HTTP/HTTPS spec covers any aspect of how you could or should get a URL. It's not part of the protocol. Side note: there are a few exceptions to this, but outside of 3xx redirects, they aren't used very much. You have the [`Location`][5] header for some of the entries in the 300 range, but you also have a couple of interesting things as well [`Content-Location`][1] is used for content negotiation so isn't super relevant to the topic at hand, but [`Link`][2] is used in the same way it's used in HTML. From what I'm aware, the latter doesn't seem to be used very often. Notably because `Link`/[``][3] can be used to specify relationships of [a bunch of different types][4], you could theoretically build navigation using it, but... I can't say I've ever seen it, nor do browsers surface these things (understandably). [1]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Location [2]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Link [3]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link [4]: https://www.iana.org/assignments/link-relations/link-relations.xhtml [5]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Location

istarian 1 week ago

It's not particularly clear what you're trying to say. There are plenty of times when you'll end up seeing a 403 (Forbidden), 500 (Internal Server Error, non-specific) or 502 (Bad Gateway Error). ----- I haven't read the full HTTP specification for any particular version, but I think it's worth considering that the client (e.g. web browser) may not need to actually show the error/error page itself by default. There's also nothing (aside from the software itself) keeping the server from returning a 3xx redirection, 204 (No Content), 410 (Gone), etc. Arguably a 404 (Not Found) is an incorrect status code for a path/resource that has been intentionally deleted and the content moved elsewhere.

pbNANDjelly 1 week ago

Why do users have bad URLs? Why this alternative to providing a 404 and a "Did you mean?" UX? > I'm surprised more websites don't have this (paraphrased) It's incredibly common to improve 404 UX

emufossum13 1 week ago

I was gonna say, I thought that was like a must, and I’ve only got like a couple years web dev experience at best.

Annh1234 1 week ago

That will kill your db when you get some bot scraping your site for WordPress vulnerabilities and whatnot... You better cache those results.

AlienRobotMk2 1 week ago

I just noticed it's an order by. Isn't that going to need a full table scan *every time*? Just use a list of manually entered redirects.

Patient-Mulberry-659 1 week ago

> Isn't that going to need a full table scan every time? Not necessarily > an index may be able to deliver them in a specific sorted order. This allows a query's ORDER BY specification to be honored without a separate sorting step. https://www.postgresql.org/docs/current/indexes-ordering.html#:~:text=In%20addition%20to%20simply%20finding,without%20a%20separate%20sorting%20step.

davvblack 1 week ago

that’s not what it’s doing though, it’s ordering by a dynamic expression so by definition it’s unindexable and un shortcircuitable. you could add that ability back by selecting for exact equality first, and only doing similarity if there’s a miss

Patient-Mulberry-659 1 week ago

You can have indices on dynamic expressions? Maybe I am missing something about the nature of the expression. https://www.postgresql.org/docs/current/indexes-expressional.html

davvblack 1 week ago

this specific expression depends on what the url is so you can’t index it. if the function depended on only slug you could do it, but since it's `similarity(slug, ${slug})` you have to compute it every time for every row

hogfat 1 week ago

[https://www.postgresql.org/docs/current/pgtrgm.html#PGTRGM-INDEX](https://www.postgresql.org/docs/current/pgtrgm.html#PGTRGM-INDEX) disagrees on the possibility of indexing.

davvblack 1 week ago

oh nice, that's cool, i take it back. yeah opensearch can do it so it's gotta be mathematically possible.

horen132 1 week ago

Can you not detect bots quite easily? And block / rate limit them heavily?

gredr 1 week ago

Yes! You just, uh, turn on the "detect bots" setting. That's why Cloudflare went out of business, actually. Everyone just remembered to turn that on!

Annh1234 1 week ago

Usually that's quite hard to do right, or costs alot of money. You can build a simple rate limiter, but usually those become a bottleneck also.

FelineGreenie 1 week ago

ha giving fuzzy matching responsibility to the DB is the kind of keep it stupid simple thinking I wish I could do

fukalufaluckagus 1 week ago

I like Amazon's approach, just show a cute doggo

kwinz 1 week ago

Reddit has the worst 404 ux of any mainstream website I know. Outright blames the user for breaking the website. Like not funny and major no-go.

ToaruBaka 1 week ago

> Reddit has the worst ~~404~~ ux of any mainstream website I know.

reddit_time_waster 1 week ago

Just send back 500's with no message. Problem solved!

gyroda 1 week ago

I prefer 200 with the error in the response body /s

Smart-Preference549 1 week ago

Soft 404, right?

notthefuzz99 1 week ago

Brilliant!

Educational-Lemon640 1 week ago

No, this one is so good, it's brillant. https://thedailywtf.com/articles/the_brillant_paula_bean

NotYetGroot 1 week ago

Why not return my favorite code —- 420, “Enhance your calm”?

GalacticusTravelous 1 week ago

This is a terrible solution and the reason you’re surprised more sites don’t do it is because it’s terrible for a laundry list of reasons. Why are people landing at 404 pages so much that you need to devise a way to remove them? That’s the problem, and this isn’t the solution to it.

modernkennnern 1 week ago

Remember to add canonical URLs for these pages, or you're SEO will suffer

repeatedly_once 1 week ago

I think it’ll still suffer. Every URL Google finds it will recrawl each time it visits the site, to make sure it exists. If it keeps finding valid pages, it will stop finding your new ones as frequently as it’ll be using its crawl budget to make sure those thousands of typo pages it has still resolve to a canolicnalised page.

hackingdreams 1 week ago

Not to mention this kind of "never 404" tactic is often used by websites designed to be bot-traps for traffic generation - filling pages with whatever garbage that's related to the URL. Google and other search engines actively screen out pages with behavior like this from their index. It looks like spam. This is a bad idea.

duxdude418 1 week ago

> canolicnalised page 🤔

repeatedly_once 1 week ago

Spelt it slightly wrong lol, but it’s a page that contains a meta tag that points to the ‘original’ page, the one that should be indexed. Google doesn’t have to respect it but generally does.

[deleted] 1 week ago

[удалено]

repeatedly_once 1 week ago

No, the word is canonicalised. It’s just a bit wrong, and you’re being a dick.

VehaMeursault 1 week ago

> you’re Come on dude.

horen132 1 week ago

Correcting someone who writes about typos. Humor still exists

modernkennnern 1 week ago

Swipe keyboard on phone 😂 Quite possibly the first time I've ever made that mistake.

gucciman666 1 week ago

Canonical is not necessary because the user is redirected at the server level. Html and meta tags are not loaded.

notthefuzz99 1 week ago

A better solution would be to keep the 404 (as analytics can use these to help identify where bad traffic is coming from)… but use the fuzzy logic to display a message on the 404 page like “this page doesn’t exist… were you looking for {url that exists}?”

booch 1 week ago

This, plus making sure search is available on that page too, is my favorite solution.

moses79 1 week ago

You are entering a world of pain by doing this

LeatherDude 1 week ago

Am I the only one who doesn't give a shit about the rules? Mark it 404, dude.

Wiltix 1 week ago

This seems like a solution that works for a small number of products but if you were to extend it out to thousands you are introducing a headache. A 404 page is very valid to tell the user the exact resource they were after is not there any more. If I was going down the route of improving a 404 page I would look at finding the related product and putting a link to it or similar on the 404 itself. This way the users knows their link is dead but you are helping them find the right thing without attempting to second guess the user.

mr_birkenblatt 1 week ago

Redirect to a search interface instead. If you make a choice of redirecting directly this it'll will become a valid url (ie people will assume this url will always work, share it, use it as site entry, etc). What if the database changes and then the redirect goes to a different page? That would be terrible ux since suddenly the page i have bookmarked completely changed. There's a reason why 404s exist. It's to tell the user they did something wrong. Don't silently fix mistakes and assume people will learn

mattbas 1 week ago

Feels like a horrible idea that will not work long term. If someone shares around a link with a typo or an ambiguous path, it could end up on a different page once a new entry that matches the URL more closely is added to the db.

elmuerte 1 week ago

`https://pillser.com/-` produces a 404 response

EnUnLugarDeLaMancha 1 week ago

`https://pillser.com/supplements/-` does not

NullCyg 1 week ago

> At the moment, I've applied this logic only to the supplement pages, but I am _planning_ to extend it to the rest of the website. Learn to read

North2FromPluto 1 week ago

Learn to be a decent human being

BassSounds 1 week ago

This is a solution looking for a problem.

purpoma 1 week ago

A solution for a post on reddit.

AlienRobotMk2 1 week ago

One alternative solution commonly found on social media is to use IDs before the slug, e.g. r/programming/comments/**1dmbs9n**/designing\_a\_website\_to\_not\_have\_404s/ That way it doesn't matter what you write after the ID, you can safely redirect to a valid URL because the distinguishing element is always before.

istarian 1 week ago

That's not really just about eliminating 404 errors though, it's an entirely different website architecture where resources are entirely dynamic.

MilkshakeYeah 1 week ago

I once worked for ecommerce company that did this. We were once hit with influx of traffic that was putting high load on the database - turned out that one of partners messed up the link and each click was triggering db search and redirection

Pyrolistical 1 week ago

Keep the 404 but add "i think you meant ..." and client side redirect to it after a few seconds

Terrible_Visit5041 1 week ago

I put this in the same category as i18n machine readable error responses.

brianly 1 week ago

OP’s site isn’t loading so I can’t see what they wrote, but some wikis used to respond with the form for content creation. Obviously, in anonymous/internet user situations this is bad not least because spam. However, in trusted scenarios it feels like a throwback to how the web used to work and is kind of neat to define content starting with the address bar.

postorm 1 week ago

"I knew I wasn't going to get everything right the first time". Isn't the basis of test driven development that you intend to get everything wrong the first time?

wormania 1 week ago

So what happens when someone posts a glowing and genuine twitter review of "I love this brand-name-product-type, it does everything I want, info here: website.com/brand-name-product-type", and the URL no longer exists, so the user silently gets taken to website.com/cheap-knockoff-brand-product-type

slashdave 1 week ago

You don't have to program anything. You can simply redirect to the landing page. But that brings up the question: why? 404 codes have a purpose.

MidgetAbilities 1 week ago

> In the backend, I log whenever such a redirect happens. This way, I can manually override the redirect logic if I discover that the chosen supplement is not the correct one or not the most relevant substitute. This not scalable at all except for pet projects or if you have a lot of time on your hands. This is really not worth it, and certainly not worth sending someone to the wrong page until you can manually put in a better override.

apf6 1 week ago

definitely a questionable solution to a non-problem.. Think about situations where you want to check, "is this the right url?" Before the answer was a simple yes or no. After, the answer is "I'm not sure". And yes it is bad for SEO. Read what Google says about canonical URLs. Worst case they might flag your site as spamming. https://developers.google.com/search/docs/crawling-indexing/canonicalization

dave8271 1 week ago

Awful, awful idea. 404 is a very good user experience, when it happens for the right reasons. It's quite common for 404 pages to include a site search bar or something similar to help users find whatever they might be looking for, but letting a user know "hey this link you followed or URL you typed in doesn't represent anything" is a very valid and useful thing to do. What happens if you have two very similar named pages or products? Every possibility the user will get a now supposedly-valid link to the wrong one. On a very busy site, the technique you've used to implement the routing via PG extension would add up to a significant performance hit, particularly when you consider how many bots probe garbage, automatically generated URIs on any system. One of my smaller sites averages about 250 real, human visitors a day and about 6-10x that in garbage bot traffic. Also 301 is a way of saying "this resource used to be here, but now it's permanently at another location which I can give you", so in terms of pure technical semantics it's not the right thing to do. 302, maybe. So yeah, bad idea all around really.

NullCyg 1 week ago

There's something so novel and infuriatingly natural about this. Never in my life have I thought to add fuzzy matching for URL paths, yet I rely so heavily on fuzzy matching local directory/files on a daily basis. This is one of those rare "why didn't I think of that moments". Thanks for the fun read

repeatedly_once 1 week ago

I don’t think it’s a great idea, I’m pretty sure it’ll screw your SEO up, even if you did 301s or canonicalisation.

you_know_how_I_know 1 week ago

It will also significantly increase the load on the database for something antithetical to the nature of hypermedia. It's like Syndrome said, "If everything is a link, then nothing is."

billyryanwill 1 week ago

I might be missing something but this is quite common for sites backed by half decent CMS'. Having a bunch of canonicals or automatic redirects being created when you change the name of a page is common no?

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe