Search Suggest

Setting Robots Tags on Amazon S3 Objects

Robots meta directives (sometimes called "meta tags") are pieces of code that provide crawlers instructions for how to crawl or index web page content. Whereas robots.txt file directives give bots suggestions for how to crawl a website's pages, robots meta directives provide more firm instructions on how to crawl and index a page's content.

There are two types of robots meta directives: those that are part of the HTML page (like the meta robotstag) and those that the web server sends as HTTP headers (such as x-robots-tag). The same parameters (i.e., the crawling or indexing instructions a meta tag provides, such as "noindex" and "nofollow" in the example above) can be used with both meta robots and the x-robots-tag; what differs is how those parameters are communicated to crawlers.

This tutorial explains how to setting noindex, nofollow, noarchive or another robots tags on Amazon S3 object.

Ruby
s3.putObject({
ACL: "public-read",
Body
: "hello world",
Bucket
: "my-bucket",
CacheControl
: "public, max-age=31536000",
ContentType
: "text/plain",
Key
: "hello.txt",
XRobotsTag
: "noindex, nofollow"
}, function(err, resp){});

Java
private static final AWSCredentials credentials = new BasicAWSCredentials(
"<Access Key>",
"<Secret Key>"
);

private static final AmazonS3 s3client = AmazonS3ClientBuilder
.standard()
.withCredentials(new AWSStaticCredentialsProvider(credentials))
.withRegion(Regions.AP_SOUTHEAST_1)
.build();

public static final URL upload(String bucketName, String fileKeyName, File file) {
PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName, fileKeyName, file)
.withCannedAcl(CannedAccessControlList.PublicRead);

ObjectMetadata metadata = new ObjectMetadata();
metadata.setCacheControl("public");
metadata.setHeader("Pragma", "public");
metadata.setHeader("X-Robots-Tag", "noindex, nofollow, noarchive, noimageindex, nosnippet, noodp, nodir");

putObjectRequest.setMetadata(metadata);

s3client.putObject(putObjectRequest);

return s3client.getUrl(bucketName, fileKeyName);
}

Đăng nhận xét