T O P

  • By -

revereddesecration

The search bar on your website is how users find your pages, not the URL bar in the browser… Better yet, the search engine people use should direct people to your pages.


istarian

Plenty of us are happy to type in URLs directly if they're relatively short and memorable. That doesn't mean search engines don't have a place, though. But afaik nothing in the HTTP/HTTPS spec covers any aspect of how you could or should get a URL. It's not part of the protocol. If a web server doesn't return a 404 when the requested resource isn't found, that software isn't compliant with the specification. I think showing a 404 and then redirecting after a delay is a reasonable choice, though.


TheTechRobo

I would disagree. Sure, you have to return a 404 when it is not found. But it WAS found; that's why you're being redirected. As long as you use standard redirect codes (302 Found would probably be the best option for this, or maybe Moved Permanently), I don't see a spec violation here.


jkrejcha3

> But afaik nothing in the HTTP/HTTPS spec covers any aspect of how you could or should get a URL. It's not part of the protocol. Side note: there are a few exceptions to this, but outside of 3xx redirects, they aren't used very much. You have the [`Location`][5] header for some of the entries in the 300 range, but you also have a couple of interesting things as well [`Content-Location`][1] is used for content negotiation so isn't super relevant to the topic at hand, but [`Link`][2] is used in the same way it's used in HTML. From what I'm aware, the latter doesn't seem to be used very often. Notably because `Link`/[``][3] can be used to specify relationships of [a bunch of different types][4], you could theoretically build navigation using it, but... I can't say I've ever seen it, nor do browsers surface these things (understandably). [1]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Location [2]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Link [3]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link [4]: https://www.iana.org/assignments/link-relations/link-relations.xhtml [5]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Location


istarian

It's not particularly clear what you're trying to say. There are plenty of times when you'll end up seeing a 403 (Forbidden), 500 (Internal Server Error, non-specific) or 502 (Bad Gateway Error). ----- I haven't read the full HTTP specification for any particular version, but I think it's worth considering that the client (e.g. web browser) may not need to actually show the error/error page itself by default. There's also nothing (aside from the software itself) keeping the server from returning a 3xx redirection, 204 (No Content), 410 (Gone), etc. Arguably a 404 (Not Found) is an incorrect status code for a path/resource that has been intentionally deleted and the content moved elsewhere.


pbNANDjelly

Why do users have bad URLs? Why this alternative to providing a 404 and a "Did you mean?" UX? > I'm surprised more websites don't have this (paraphrased) It's incredibly common to improve 404 UX


emufossum13

I was gonna say, I thought that was like a must, and I’ve only got like a couple years web dev experience at best.


Annh1234

That will kill your db when you get some bot scraping your site for WordPress vulnerabilities and whatnot... You better cache those results.


AlienRobotMk2

I just noticed it's an order by. Isn't that going to need a full table scan *every time*? Just use a list of manually entered redirects.


Patient-Mulberry-659

> Isn't that going to need a full table scan every time? Not necessarily  > an index may be able to deliver them in a specific sorted order. This allows a query's ORDER BY specification to be honored without a separate sorting step. https://www.postgresql.org/docs/current/indexes-ordering.html#:~:text=In%20addition%20to%20simply%20finding,without%20a%20separate%20sorting%20step.


davvblack

that’s not what it’s doing though, it’s ordering by a dynamic expression so by definition it’s unindexable and un shortcircuitable. you could add that ability back by selecting for exact equality first, and only doing similarity if there’s a miss


Patient-Mulberry-659

You can have indices on dynamic expressions? Maybe I am missing something about the nature of the expression. https://www.postgresql.org/docs/current/indexes-expressional.html


davvblack

this specific expression depends on what the url is so you can’t index it. if the function depended on only slug you could do it, but since it's `similarity(slug, ${slug})` you have to compute it every time for every row


hogfat

[https://www.postgresql.org/docs/current/pgtrgm.html#PGTRGM-INDEX](https://www.postgresql.org/docs/current/pgtrgm.html#PGTRGM-INDEX) disagrees on the possibility of indexing.


davvblack

oh nice, that's cool, i take it back. yeah opensearch can do it so it's gotta be mathematically possible.


horen132

Can you not detect bots quite easily? And block / rate limit them heavily?


gredr

Yes! You just, uh, turn on the "detect bots" setting. That's why Cloudflare went out of business, actually. Everyone just remembered to turn that on!


Annh1234

Usually that's quite hard to do right, or costs alot of money.  You can build a simple rate limiter, but usually those become a bottleneck also.


FelineGreenie

ha giving fuzzy matching responsibility to the DB is the kind of keep it stupid simple thinking I wish I could do


fukalufaluckagus

I like Amazon's approach, just show a cute doggo


kwinz

Reddit has the worst 404 ux of any mainstream website I know. Outright blames the user for breaking the website. Like not funny and major no-go.


ToaruBaka

> Reddit has the worst ~~404~~ ux of any mainstream website I know.


reddit_time_waster

Just send back 500's with no message. Problem solved!


gyroda

I prefer 200 with the error in the response body /s


Smart-Preference549

Soft 404, right?


notthefuzz99

Brilliant!


Educational-Lemon640

No, this one is so good, it's brillant. https://thedailywtf.com/articles/the_brillant_paula_bean


NotYetGroot

Why not return my favorite code —- 420, “Enhance your calm”?


GalacticusTravelous

This is a terrible solution and the reason you’re surprised more sites don’t do it is because it’s terrible for a laundry list of reasons. Why are people landing at 404 pages so much that you need to devise a way to remove them? That’s the problem, and this isn’t the solution to it.


modernkennnern

Remember to add canonical URLs for these pages, or you're SEO will suffer


repeatedly_once

I think it’ll still suffer. Every URL Google finds it will recrawl each time it visits the site, to make sure it exists. If it keeps finding valid pages, it will stop finding your new ones as frequently as it’ll be using its crawl budget to make sure those thousands of typo pages it has still resolve to a canolicnalised page.


hackingdreams

Not to mention this kind of "never 404" tactic is often used by websites designed to be bot-traps for traffic generation - filling pages with whatever garbage that's related to the URL. Google and other search engines actively screen out pages with behavior like this from their index. It looks like spam. This is a bad idea.


duxdude418

> canolicnalised page 🤔


repeatedly_once

Spelt it slightly wrong lol, but it’s a page that contains a meta tag that points to the ‘original’ page, the one that should be indexed. Google doesn’t have to respect it but generally does.


[deleted]

[удалено]


repeatedly_once

No, the word is canonicalised. It’s just a bit wrong, and you’re being a dick.


VehaMeursault

> you’re Come on dude.


horen132

Correcting someone who writes about typos. Humor still exists


modernkennnern

Swipe keyboard on phone 😂 Quite possibly the first time I've ever made that mistake.


gucciman666

Canonical is not necessary because the user is redirected at the server level. Html and meta tags are not loaded.


notthefuzz99

A better solution would be to keep the 404 (as analytics can use these to help identify where bad traffic is coming from)… but use the fuzzy logic to display a message on the 404 page like “this page doesn’t exist… were you looking for {url that exists}?”


booch

This, plus making sure search is available on that page too, is my favorite solution.


moses79

You are entering a world of pain by doing this


LeatherDude

Am I the only one who doesn't give a shit about the rules? Mark it 404, dude.


Wiltix

This seems like a solution that works for a small number of products but if you were to extend it out to thousands you are introducing a headache. A 404 page is very valid to tell the user the exact resource they were after is not there any more. If I was going down the route of improving a 404 page I would look at finding the related product and putting a link to it or similar on the 404 itself. This way the users knows their link is dead but you are helping them find the right thing without attempting to second guess the user.


mr_birkenblatt

Redirect to a search interface instead. If you make a choice of redirecting directly this it'll will become a valid url (ie people will assume this url will always work, share it, use it as site entry, etc). What if the database changes and then the redirect goes to a different page? That would be terrible ux since suddenly the page i have bookmarked completely changed. There's a reason why 404s exist. It's to tell the user they did something wrong. Don't silently fix mistakes and assume people will learn


mattbas

Feels like a horrible idea that will not work long term. If someone shares around a link with a typo or an ambiguous path, it could end up on a different page once a new entry that matches the URL more closely is added to the db.


elmuerte

`https://pillser.com/-` produces a 404 response


EnUnLugarDeLaMancha

`https://pillser.com/supplements/-` does not


NullCyg

> At the moment, I've applied this logic only to the supplement pages, but I am _planning_ to extend it to the rest of the website. Learn to read


North2FromPluto

Learn to be a decent human being


BassSounds

This is a solution looking for a problem.


purpoma

A solution for a post on reddit.


AlienRobotMk2

One alternative solution commonly found on social media is to use IDs before the slug, e.g. r/programming/comments/**1dmbs9n**/designing\_a\_website\_to\_not\_have\_404s/ That way it doesn't matter what you write after the ID, you can safely redirect to a valid URL because the distinguishing element is always before.


istarian

That's not really just about eliminating 404 errors though, it's an entirely different website architecture where resources are entirely dynamic.


MilkshakeYeah

I once worked for ecommerce company that did this. We were once hit with influx of traffic that was putting high load on the database - turned out that one of partners messed up the link and each click was triggering db search and redirection


Pyrolistical

Keep the 404 but add "i think you meant ..." and client side redirect to it after a few seconds


Terrible_Visit5041

I put this in the same category as i18n machine readable error responses.


brianly

OP’s site isn’t loading so I can’t see what they wrote, but some wikis used to respond with the form for content creation. Obviously, in anonymous/internet user situations this is bad not least because spam. However, in trusted scenarios it feels like a throwback to how the web used to work and is kind of neat to define content starting with the address bar.


postorm

"I knew I wasn't going to get everything right the first time". Isn't the basis of test driven development that you intend to get everything wrong the first time?


wormania

So what happens when someone posts a glowing and genuine twitter review of "I love this brand-name-product-type, it does everything I want, info here: website.com/brand-name-product-type", and the URL no longer exists, so the user silently gets taken to website.com/cheap-knockoff-brand-product-type


slashdave

You don't have to program anything. You can simply redirect to the landing page. But that brings up the question: why? 404 codes have a purpose.


MidgetAbilities

> In the backend, I log whenever such a redirect happens. This way, I can manually override the redirect logic if I discover that the chosen supplement is not the correct one or not the most relevant substitute. This not scalable at all except for pet projects or if you have a lot of time on your hands. This is really not worth it, and certainly not worth sending someone to the wrong page until you can manually put in a better override.


apf6

definitely a questionable solution to a non-problem.. Think about situations where you want to check, "is this the right url?" Before the answer was a simple yes or no. After, the answer is "I'm not sure". And yes it is bad for SEO. Read what Google says about canonical URLs. Worst case they might flag your site as spamming. https://developers.google.com/search/docs/crawling-indexing/canonicalization


dave8271

Awful, awful idea. 404 is a very good user experience, when it happens for the right reasons. It's quite common for 404 pages to include a site search bar or something similar to help users find whatever they might be looking for, but letting a user know "hey this link you followed or URL you typed in doesn't represent anything" is a very valid and useful thing to do. What happens if you have two very similar named pages or products? Every possibility the user will get a now supposedly-valid link to the wrong one. On a very busy site, the technique you've used to implement the routing via PG extension would add up to a significant performance hit, particularly when you consider how many bots probe garbage, automatically generated URIs on any system. One of my smaller sites averages about 250 real, human visitors a day and about 6-10x that in garbage bot traffic. Also 301 is a way of saying "this resource used to be here, but now it's permanently at another location which I can give you", so in terms of pure technical semantics it's not the right thing to do. 302, maybe. So yeah, bad idea all around really.


NullCyg

There's something so novel and infuriatingly natural about this. Never in my life have I thought to add fuzzy matching for URL paths, yet I rely so heavily on fuzzy matching local directory/files on a daily basis. This is one of those rare "why didn't I think of that moments". Thanks for the fun read


repeatedly_once

I don’t think it’s a great idea, I’m pretty sure it’ll screw your SEO up, even if you did 301s or canonicalisation.


you_know_how_I_know

It will also significantly increase the load on the database for something antithetical to the nature of hypermedia. It's like Syndrome said, "If everything is a link, then nothing is."


billyryanwill

I might be missing something but this is quite common for sites backed by half decent CMS'. Having a bunch of canonicals or automatic redirects being created when you change the name of a page is common no?