> I feel like I'm encountering more and more sites and articles where I can't seem to find the date.
It seems to me that its become standard practice on marketing type blogs for corporate websites to remove the date from their posts. I think its because (from personal experience) the company will go though a burst of "blog productivity" create a load of content but then not touch it for years, they don't want that content to look out of date or their website to look stagnant.
Removing the date from their posts, or any other content, hides how old it is and therefore obscures how active they are at crating new content.
Most companies try to use their blogs to attract new customers, a new customer may visit their website once or twice and will never see the blog again, it's not important that they do. They don't want it to look stale.
As a counter example, an interesting thread from yesterday [0] was about how CloudFlare use their blog not as a marketing tool but for technical content and attracting employees. They very regulally use their blog, and so keep the date on it showing how fresh it is.
There’s a popular concept in the content industry of “evergreen content”, meaning posts that are always relevant or useful regardless of when they were originally posted. The idea is that articles will have a longer “shelf life” if they are not tied to a particular date or recent event.
Of course, producing true evergreen content takes more effort than just removing the publish date but that’s one easy way to fake it.
Edit: it’s only a matter of time till an article stops getting updated. Of course I doubt people in the “content industry” (tell me that’s not a real thing) care what happens to anyone else after they stop updating their ‘evergreen’ page.
A past employer almost went with this approach, but with a technical audience it's hard to cover all the edge cases with RSS. People get cranky when they get pinged when an article updates superficially, but you also want people updated when the article is basically completely rewritten
Safer to just publish a new article when there's a sufficient amount of content to update. Link the articles together somehow. Maybe add a disclaimer to older (and not updated) articles that they might no longer be valid
Also solves the problem others have mentioned where they don't trust the date on articles. If you have a solid previous article to link to, you're more likely to build trust that the new one really is new
I don't see how this is a problem with RSS. RSS (and the other feed formats) have a concept of an entry ID. If user's don't want to see updates they only show items with a new entry ID.
To get even more nuanced generally a superficial update shouldn't change the "updated" time, this gives another level of control.
It’s too general though isn’t it. It would be like the “doing stuff” industry being a thing. There might be people who use the term, but that doesn’t mean the term points to any actual phenomenon in the real world.
That was my reaction, it’s scary right. Like need an article on safely installing a light switch? Call the content industry. Managing stored PII and passwords? Content industry. Medical information? Content industry.
I see that a lot these days with "most recent update" that I'm pretty sure is a lie. Or maybe they changed some tiny thing to make it not a technical lie.
Fiction is lies only when you want to pass them off as the truth:
"That's right -- and when you get to the human world, the Nothing will cling to
you. You'll be like a contagious disease that makes humans blind, so they can no longer
distinguish between reality and illusion. Do you know what you and your kind are called
there?"
There's in the same dialog this warning about advertising and politics:
“When it comes to controlling human beings there is no better instrument than lies. Because, you see, humans live by beliefs. And beliefs can be manipulated. The power to manipulate beliefs is the only thing that counts
... Who knows what use they’ll make of you? Maybe you’ll help them to persuade people to buy things they don’t need, or hate things they know nothing about, or hold beliefs that make them easy to handle, or doubt the truths that might save them.”
It is! Ende was raised under the esoteric Anthroposophy philosophy, and I suspect much of his works are dedicated to debunk its nonsensical beliefs. The second half of The Neverending Story is a philosophical treaty on itself (it hurts the book as an action story, but gives you a lot to ponder about).
His child books certainly have a deep second reading as adults.
After all, it claims the unreal as real. Fiction, lies.
The difference with things like fiction in books and movies, for entertainment, is that the listener of the lie, knows it is a lie, and listens for entertainment.
I would argue that a lie involves deliberate deception, not just untruth. Fiction, then, is not a lie, as by definition it is something imagined or invented, and therefore not created to deceive.
I first heard about "evergreen content" from the local news industry. To add examples of what it looks like (to demonstrate the extra effort required as you mentioned), here are a couple recent examples from The Guardian (because there's no paywall):
These are unlikely to pull lots of traffic versus a more timely event (e.g. interview with an author for a recent book release for the first section; except maybe the second as it's a regular column). However, the usefulness about evergreen content for a local print publication was to fill content for the print edition when there was a slow news week (not enough timely news to fill the pages).
The ultimate piece of evergreen content I’ve heard of is Peter Stark’s “Frozen Alive” for Outside Magazine.[0] It’s a story about what hypothermia does to you, told as a second person narrative. It was first written 25 years ago. It was a hit article in 1997, and since at least 2016, it is one of Outside Magazine’s most read articles every year.[1][2]
But that can also be extremely counterproductive when the content you publish has a natural expiry date - and in many areas of expertise (pretty much anything other than pure marketing talk) things change over time. Potential client seeing the obsolete information might rightfully presume that the company is out of the loop or just plain unprofessional. If you have a date on the page it's far less likely to happen.
COVID recommendations and measures are one such (rather extreme) example where many big players endangered their credibility because they've failed to properly mark the outdated content.
That's why The Guardian is a pretty great source for news, they add a banner on all old articles saying it's X years old, so beware, information might be stale.
I must say that I put more trust in blog posts that put the notice "Updated: 10/2021") at top of their posts. This communicates to me that this topic was important at some time in the past and that someone is still taking car of the content and is updating from time to time.
Stability and old content can be good. Not everything is being updated and not everything needs to be updated. I'm all for putting dates on pages and blog posts :)
When they are just dishonest, saying that it's updated today or recently using a script is just as easy.
They just lie. I was trying to find out why VLC stopped working for me a few months ago [0] and landed in a terrible site where they suggested to do a lot of cargo cult driver updating and whatnot. A few "comments" thanked the author for the comprehensive (and totally fake) information.
This kind of crap seems to be winning. My other recent quest, finding a uni site about consciusness, psychedelia, emerson and 60s music, that I used to visit 20 years ago, was also unsuccessful.
On that note, Reddit seems to be faking their last modified date metadata for SEO purposes. I’ve seen many old threads (where all the comments are 5 years old with no edits) show up in my Google search results with today’s date below the link. This coincided with and is likely related to the removal of automatic thread archiving 4 months ago.
This also signals an end game, where the last vestiges of true improved, customer facing improvements are over.
With nothing left, one delves into short term growth numbers.
Because of this, some indexes may de-stress reddit by date search, and, by freshness. And some end users will click, become annoyed more often, and stop clicking on reddit links.
They clearly have nothing left, and no other idea how to improve reddit. They have signaled they are well past peak growth.
I understand you. I've noticed this kind of behavior on some review sites. They put the latest month in the title but who knows when did they actually update it?
Just today I was reading a blog post about some framework (KubeFlow) not being “ready for prime time yet”, and even though the article had great technical details, the fact that it lacked a date made it so much less valuable: it’s highly relevant whether this conclusion was drawn last month or two years ago.
I understand why this happens, but part of me really wishes we could stop this. Maybe there is some archive.org extension that can show me “this page first appeared on $date”.
Open developer tools and type `document.lastModified` in the console. Though this is painful for every web page, is useful in these instances. Maybe someone with experience in developing Chrome extensions, can develop one for getting this plus other useful page info by just clicking on it in the toolbar
why would companies who are trying to deceive not go through the trouble of changing the last-modified header? At any rate lots of sites don't set last modified anyway so that makes it January 1, 1970, GMT.
I have the same feeling. Actually, I've seen it in real life. A company I consulted a few years ago created lots of "evergreen content." Now, they had a reason to do so, but...
It would be cool if archive.org (or any others) had an API that made it easy and quick to look up "first seen" timestamp for any given URL.
Here is the API you are looking for: -- https://archive.org/wayback/available?url=example.com×t... -- this will show the oldest timestamp of a archive that the wayback has. The trick is to set the timestamp to be /really/ old and it will show the first snapshot it has.
I don't think you can tell if "the text" of the page has changed since then without manually looking though. I'm not sure how you'd solve this technically, it's probably more than just telling you if the html bytes are identical, since small changes probably happen all the time to "the same article".
Or it might be good enough, because this kind of content is rarely updated at the same url?
I wish Wayback had an indicator for how much the page has changed from one capture to the next. A simple count of the diffs should give you a good idea.
BBC television programmes used to give the year in Roman numerals, in the copyright notice, I think. It has been suggested that this was to make it harder for people to notice how old the programme was. The same technique wouldn't work so well on a web site because you couldn't do what the BBC did and leave the Roman number on the screen for about 0.5 s so that most people don't have time to decypher it. Also the dates at the end of the last century were particularly hard to read in Roman. Here's an example. Time yourself:
> It has been suggested that this was to make it harder for people to notice how old the programme was
This is tempting but I don't know how much credence to give it.
It's more to do with a style that was taken on and kept. My father still writes the month in a date using a Roman numeral, e.g. 25/XII/21.
The date format has been around for many years. There weren't that many BBC programmes made, even over the course of many years. To try and convince viewers that a programme was not "old" was difficult. It might be black and white, and the quality certainly would have looked dated.
Just to add an interesting example of a middle ground, the fly.io blog post [0] currently top of HN [1] has the published date "hidden" at the bottom of the article. Their content is technical content that can go out of date but is also useful marketing content. The post is from August 2021 but has been posted to HN today.
FWIW, I started hating putting dates on most of my stuff because I simply got sick of people incessantly asking "is this still up to date?" because the date happened to be from even just a few months ago much less a year ago.
Why not add something together with the date telling people for how long this information will be valid? Does not have to be a date, can be "indefinitely" or, in the case of software "until new major version update". Works best when you also revisit old content occasionally and update that information. I think you drew the wrong conclusions from those inquiries.
> I simply got sick of people incessantly asking "is this still up to date?"
I don't know what or where 'your stuff' is - but isn't that a fair question for a lot of topics? Writing about data structures or political protests might be valid for decades but a lot of technical writing about languages, platforms, even products and companies can age very quickly and unlike news or culture might be of less interest to people in the future.
Many people find technical writings whilst searching google for a solution to a problem. I love the articles that have a date and version of whatever platform something is relevant for (even better when someone adds a "this was written for version X.Y things have changed in version X.Z see the doc at..." etc)
It's a valid question, you don't have to respond to them but at least if the date is there the user themselves can have a fighting chance at looking up what's changed since then.
It's entirely possible a tutorial made for software 3 months ago or something might have outdated information if there was a new, backwards incompatible release, or that a news article might be missing information that was revealed only a few days ago, etc.
The problem is the downloaded pages of the papers themselves, which often completely lack proper bibliographic data (or even header/footer info other than a page number). Compare this with a page from a corporate tech report, where each page might have the title and document number somewhere.
Depending on the source, it might not be published yet, or the PDF you grabbed might be a pre-print while the 'officially' published article is behind a journal paywall.
Generally, I take the publication dates of the cited works as representative of the age of the paper.
I've heard of some marketing things updating the date periodically without changing the contents in order to trick people and/or search engines into thinking it's new.
AWS has tons of training content they make freely available to their partners and wider community.
In many cases though there are no dates! (youtube shows date of upload but AWS' own training sites lack such markers. We have to guess based on copyright year in the slide footers)
It is clear that they invest heavily in creating new training content. In fact they essentially repeat the same content multiple times in many live tech talks and partnercasts etc. So there is no dearth of new content. It is also well known that they release new features very frequently -- so knowing how recent the content is helps a lot. But they still do this -- seemingly deliberately.
They recently overhauled their whole digital learning portal -- renamed it AWs Skill Builder, built it using the docebo LMS/CMS portal -- changed a lot of things but didnt make any effort to add a published date to any of the courses.
Their API docs are incorrect half the time because they rarely update them. Finding documentation on their dynamic parameter system that all APIs use dumps you out on one page with every possible parameter domain.
At Snyk (https://snyk.io) we're actually working on a new blog process to refine this. Essentially, a problem almost every technical blog has is: when we publish articles, are they ephemeral -- or are they evergreen?
If you treat blog posts as ephemeral, it means you'll write them once, ensure they're accurate, then leave them there forever. Unfortunately, with technology stuff, that rarely works. Technologies change, libraries break, facts now might be different in two years, etc.
One of the things we're currently working on is tagging all of our technical content so that once a year it pops up in a review board somewhere and someone reviews it for accuracy, updates it if necessary, etc.
This way, technical stuff will still be useful to readers (hopefully) a couple of years from now.
This is absolutely infuriating when it's meant to be informative, especially about something in a fast-changing landscape, like how-tos on Kubernetes, for instance.
Honestly although I've occasionally derived benefit from these, I think I'm reaching a tipping point where I feel the plethora of how-to articles with their ads and newsletter pop-ups and everything are less productive than just some plain old documentation and taking the time to fundamentally understand the tech so that I no longer need the how-to. Mainly because I trust Kubernetes to keep their documentation up-to-date but who knows how current a random how-to article is, never mind the marketing bloat they're usually polluted with.
I agree. The date itself is irrelevant if the content stands on its own (if it mentions the version they are using, for example, or it features arguments explicitely). If bad seo articles stop being read (independently of date) better content, with proper versions listed and argumentation (like official documentation), could take their place.
Very easy solution which I'm using when searching for something with expiry date is to look for date first and if it's not there then immediately leave site and not even bother to read anything there, if this is regular practice and domain frequently is at the top then after few times I'm just not visiting it anymore.
It's a sad state of affairs that marketing needs drive Google. I can hear the "duh" response to that in people's minds (yes we know how Google gets revenue) but - this is the index to the world's data. The World's data portal. It has such repercussions that it's adjusted to suit the needs of online sales and marketing sensibilities.
> the company will go though a burst of "blog productivity"
One of the most insightful comments I ever read on HN pointed out that marketing folks are good at selling things, period, and that includes selling things internally. So when nascent companies are wondering why the product doesn't sell itself, the first thing they do is hire a Director of Marketing. That person promises deliverables from day 1, and what's more deliverable/visible than a blog? Then they leave -- in my experience, the typical marketing exec's tenure at a startup is about a year -- and no one else feels like putting in the work. Also, by that time, most people have seen that the blog never really drove engagement in the first place.
Increasingly I feel like this is one of those pieces of metadata that we should be moving out of the page.
I would suggest moving it into the browser (i.e. read a meta tag or header) but the obvious problem is that they’ll just be forged and it would almost immediately become pointless.
Search engines could help here. If Google were to provide a last cached date (or a date of the last significant change) in the search result that would be far more useful. They certainly have this information from crawling, and it would be difficult to forge as constant substantial changes to game the system would be both expensive for the author, and harmful to the page’s ranking.
> I feel like I'm encountering more and more sites and articles where I can't seem to find the date.
Moreover, since static pages are no longer a thing, the system cannot even retrieve the date the file was created/modified, always returns the timestamp from when it was presented on the browser.
Getting EXIF data from images might provide a clue, but the image creation/edits often do not correlate with the text...
If anyone knows how to extract such date info, it'd be helpful
I've experienced some major frustration due to Azure's documentation doing exactly this. Though it may not be limited to Azure. There's just a date on every document and usually it's a month or two old. Many of the examples I've found via Google don't even compile or aren't relevant any more.
> It seems to me that its become standard practice on marketing type blogs for corporate websites to remove the date from their posts.
Worse still, I’ve encountered sites that automatically update their edit dates to be current as a way to optimize SEO. I’ve found articles with decades old information claiming to have been written mere hours or days prior.
I wonder how much of this is misguided SEO, thinking that if they leave the date off Google will assume it's fresh content and always surface it in search results, and not realize that Google knows exactly when they first crawled a page.
To be fair, bit-rot is a real thing for software, because the world keeps changing. In most cases it takes effort and upkeep to keep everything working.
Annoying, for sure, but this is the reality of software and technology as the landscape continues to evolve over time.
It seems to me that its become standard practice on marketing type blogs for corporate websites to remove the date from their posts. I think its because (from personal experience) the company will go though a burst of "blog productivity" create a load of content but then not touch it for years, they don't want that content to look out of date or their website to look stagnant.
Removing the date from their posts, or any other content, hides how old it is and therefore obscures how active they are at crating new content.
Most companies try to use their blogs to attract new customers, a new customer may visit their website once or twice and will never see the blog again, it's not important that they do. They don't want it to look stale.
As a counter example, an interesting thread from yesterday [0] was about how CloudFlare use their blog not as a marketing tool but for technical content and attracting employees. They very regulally use their blog, and so keep the date on it showing how fresh it is.
0: https://news.ycombinator.com/item?id=30070422