向google提交sitemap是增加网站对搜索引擎友好度的重要方法。sitemap.xml文件可以引导蜘蛛更好,更快的爬行网站。如果是一个静态网站,利用google提供的sitemap生成器就可以很傻瓜化的制作一个非常成功的sitemap.xml文件。而对于一个动态网站,制作sitemap就需另费一番心思了。
试过的webmaster都知道,如果用google的生成器为一个动态网站生成sitemap.xml文件(使用第一种设置),文件会相当小。这是由于在这种设置下,google记录的只是网站根目录下有具体文件的url,对动态生成的url很少理会。而动态网站最重要的内容就在生成的url中,这种矛盾促使我们:
如果要提交sitemap,最好是提交一个含全部重要url的sitemap.xml文件。(如果不会制作,还不如不提交,guagua的一个动态网站在没有提交sitemap前,收录量是30000多,后来用它的生成器制作了一个sitemap提交以后,收录量下降到了9000)
制作动态网站的sitemap方法是:首先写一个程序把所有的动态url导入到一个txt文件中,然后利用sitemap生成器的第二种设置(urllist),生成sitemap.xml文件。
下面举个实例:
1、用程序导出所有的url
重点中的重点:每一行只能有一个url
/*getsitemap.php文件
<?php
include("./include/config.php");
include("./include/con_db.php");
include("./include/function.php");//网站的配置,数据库的链接,使用的函数等
ob_start();
$rr="http://www.websitename.com.au/";
echo $rr."\n";//这是第一个url,也就是主页地址
$id=array(1,2,3,4,6,7,8,10,11,12,13,15,19,20);//这是产品的类别id
for($k=0;$k<14;$k++)
{ $categories_id=$id[$k];
$sql="select distinct categories_id,categories_name from categories where categories_id='$categories_id'";
$result=mysql_query($sql);
while($row=mysql_fetch_array($result))
{
$cname=$row["categories_name"];
$rr="http://www.websitename/categories.php/".param_bncode($cname);
echo $rr."\n";//输出类别,类别页面的内容是属于该类别下的所有品牌(二级url)
$cid=$row["categories_id"];
$sql2="select distinct models_brand from models m,models_to_products m2p, products p WHERE m.models_id=m2p.models_id and m2p.products_id=p.products_id and m.categories_id=$cid";
$result2=mysql_query($sql2);
while($row1=mysql_fetch_array($result2))
{
$brand=$row1["models_brand"];
$rr="http://www.websitename/brands.php/".param_bncode($cname)."/".param_encode($brand);
echo $rr."\n";//输出各类别下的品牌url,页面内容是该品牌的所有机型(三级url)
$sql3="select distinct models_name from models m,models_to_products mp,products p where m.models_id = mp.models_id and p.products_id=mp.products_id and m.models_brand='$brand'and m.categories_id=$cid";
$result3=mysql_query($sql3);
$brand=param_encode($brand);
$cname=param_bncode($cname);
while($row2=mysql_fetch_array($result3))
{
$arr[$brand]= param_encode($row2["models_name"]);
$rr="http://www.websitename/"."models.php/".$cname."/".$brand."/".$arr[$brand];
echo $rr."\n";//输出某类别某品牌适合该机型的产品,也就是产品页面(四级url)
}
}
}}
$str=ob_get_contents();
$file=fopen("urllist.txt","w");//写入到一个,名为urllist.txt的文件中
fwrite($file,$str);
fclose($file);
?>
2、设置你的My_config.xml文件
<?xml version="1.0" encoding="UTF-8"?>
<!--
sitemap_gen.py example configuration script
This file specifies a set of sample input parameters for the
sitemap_gen.py client.
You should copy this file into "config.xml" and modify it for
your server.
********************************************************* -->
<!-- ** MODIFY **
The "site" node describes your basic web site.
Required attributes:
base_url - the top-level URL of the site being mapped
store_into - the webserver path to the desired output file.
This should end in '.xml' or '.xml.gz'
(the script will create this file)
Optional attributes:
verbose - an integer from 0 (quiet) to 3 (noisy) for
how much diagnostic output the script gives
suppress_search_engine_notify="1"
- disables notifying search engines about the new map
(same as the "testing" command-line argument.)
default_encoding
- names a character encoding to use for URLs and
file paths. (Example: "UTF-8")
-->
<site
base_url="http://www.websitename.com.au"
store_into="/var/website/go-shop.com.au/sitemap.xml"
verbose="1"
>
<url
href="http://www.websitename.com.au"
lastmod="2007-01-01"
changefreq="daily"
priority="1.0" />
<directory path="/var/website/websitename/" url="http://www.websitename.com.au/" />
<urllist path="/var/website/websitename/urllist.txt" encoding="UTF-8" />
<filter action="drop" type="wildcard" pattern="*.css" />
<filter action="drop" type="wildcard" pattern="*.js" />
<filter action="drop" type="wildcard" pattern="*.inc" />
<filter action="drop" type="wildcard" pattern="*.jpg" />
<filter action="drop" type="wildcard" pattern="*/search" />
<filter action="drop" type="wildcard" pattern="*/images" />
<!-- ** MODIFY or DELETE **
"sitemap" nodes tell the script to scan other Sitemap files. This can
be useful to aggregate the results of multiple runs of this script into
a single Sitemap.
Required attributes:
path - path to the file
<sitemap path="/sitemap.xml" />
-->
<!-- ********************************************************
FILTERS
Filters specify wild-card patterns that the script compares
against all URLs it finds. Filters can be used to exclude
certain URLs from your Sitemap, for instance if you have
hidden content that you hope the search engines don't find.
Filters can be either type="wildcard", which means standard
path wildcards (* and ?) are used to compare against URLs,
or type="regexp", which means regular expressions are used
to compare.
Filters are applied in the order specified in this file.
An action="drop" filter causes exclusion of matching URLs.
An action="pass" filter causes inclusion of matching URLs,
shortcutting any other later filters that might also match.
If no filter at all matches a URL, the URL will be included.
Together you can build up fairly complex rules.
The default action is "drop".
The default type is "wildcard".
You can MODIFY or DELETE these entries as appropriate for
your site. However, unlike above, the example entries in
this section are not contrived and may be useful to you as
they are.
********************************************************* -->
<!-- Exclude URLs that end with a '~' (IE: emacs backup files) -->
<filter action="drop" type="wildcard" pattern="*~" />
<!-- Exclude URLs within UNIX-style hidden files or directories -->
<filter action="drop" type="regexp" pattern="/\.[^/]*" />
</site>
然后你只要启动生成器,就会在网站的根目录下生成一个全新的sitemap.xml文件,把该文件提交给google就 ok了。