环境:centos-6.5_x86_64

一、mysql安装

   1.查看是系统是否自带安装了mysql

[root@slave2 ~]# rpm -q mysql

package mysql is not installed

   2.

[root@slave2 ~]# yum install -y mysql-server mysql mysql-devel

[root@slave2 ~]# rpm -qi  mysql-server

[root@slave2 ~]# service mysqld start

Initializing MySQL database:  Installing MySQL system tables...

OK

Filling help tables...

OK

To start mysqld at boot time you have to copy

support-files/mysql.server to the right place for your system

PLEASE REMEMBER TO SET A PASSWORD FOR THE MySQL root USER !

To do so, start the server, then issue the following commands:

/usr/bin/mysqladmin -u root password 'new-password'

/usr/bin/mysqladmin -u root -h slave2.hadoop password 'new-password'

Alternatively you can run:

/usr/bin/mysql_secure_installation

which will also give you the option of removing the test

databases and anonymous user created by default.  This is

strongly recommended for production servers.

See the manual for more instructions.

You can start the MySQL daemon with:

cd /usr ; /usr/bin/mysqld_safe &

You can test the MySQL daemon with mysql-test-run.pl

cd /usr/mysql-test ; perl mysql-test-run.pl

Please report any problems with the /usr/bin/mysqlbug script!

                                                          [  OK  ]

Starting mysqld:                                           [  OK  ]

[root@slave2 ~]# /usr/bin/mysqladmin -u root password '111111'

[root@slave2 ~]# /usr/bin/mysqladmin -u root -h slave2.hadoop password '111111'

[root@slave2 ~]# /usr/bin/mysql_secure_installation  

                ---执行以上命令做下安全方面的修改

---------------

[root@slave2 ~]# cat /etc/my.cnf

[mysqld]

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

user=mysql

# Disabling symbolic-links is recommended to prevent assorted security risks

symbolic-links=0

[mysqld_safe]

log-error=/var/log/mysqld.log

pid-file=/var/run/mysqld/mysqld.pid

----------------

[root@slave2 ~]# which  mysqld

/usr/bin/which: no mysqld in (/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)

[root@slave2 ~]# find / -name mysqld

/usr/libexec/mysqld

/var/lock/subsys/mysqld

/var/run/mysqld

/etc/logrotate.d/mysqld

/etc/rc.d/init.d/mysqld

[root@slave2 ~]# /usr/libexec/mysqld --verbose --help | grep -A 1 'Default options'

Default options are read from the following files in the given order:

/etc/mysql/my.cnf /etc/my.cnf ~/.my.cnf

从上图可以看出, 服务器首先会读取/etc/mysql/my.cnf文件,如果发现该文件不存在,再依次尝试从后面的几个路径 进行读取

[root@slave2 ~]# chkconfig --list | grep mysqld

mysqld         0:off1:off2:off3:off4:off5:off6:off

[root@slave2 ~]# chkconfig mysqld on

[root@slave2 ~]# chkconfig --list | grep mysqld

mysqld         0:off1:off2:on3:on4:on5:on6:off

[root@slave2 ~]# mysql -u root -p

Enter password:

Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 12

Server version: 5.1.73 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show tables;

ERROR 1046 (3D000): No database selected

mysql> show databases;

+--------------------+

| Database           |

+--------------------+

| information_schema |

| mysql              |

+--------------------+

2 rows in set (0.01 sec)

mysql>

--------------

二、HIVE

   1.下载安装及环境配置

[hadoop@slave2 ~]$ wget  

[hadoop@slave2 ~]$ tar xvf  apache-hive-0.13.0-bin.tar.gz

[hadoop@slave2 ~]$ vi .bash_profile

export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin

export MAVEN_HOME=/home/hadoop/apache-maven-3.1.1

export PATH=/home/hadoop/apache-maven-3.1.1/bin:$PATH

export HADOOP_PREFIX=/home/hadoop/hadoop-2.2.0

export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin

export HADOOP_COMMON_HOME=${HADOOP_PREFIX}

export HADOOP_HDFS_HOME=${HADOOP_PREFIX}

export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}

export HADOOP_YARN_HOME=${HADOOP_PREFIX}

export HADOOP_CONF_DIR=${HADOOP_PREFIX}/etc/hadoop

export HADOOP_HOME=/home/hadoop/hadoop-2.2.0

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"

export YARN_HOME=${HADOOP_PREFIX}

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native

export SCALA_HOME=/home/hadoop/scala-2.10.1

export PATH=$PATH:$SCALA_HOME/bin

export SPARK_HOME=/home/hadoop/spark-0.9.1-bin-hadoop2

export FLUME_HOME=/home/hadoop/apache-flume-1.4.0-bin

export FLUME_CONF_DIR=$FLUME_HOME/conf

export PATH=.:$PATH::$FLUME_HOME/bin

export HIVE_HOME=/home/hadoop/apache-hive-0.13.0-bin

export PATH=$HIVE_HOME/bin:$PATH

[hadoop@slave2 ~]$ .  .bash_profile

[hadoop@slave2 ~]$ hdfs  dfs  -mkdir  /tmp

[hadoop@slave2 ~]$ hdfs  dfs  -mkdir  /usr/hive/warehouse

[hadoop@slave2 ~]$ hdfs  dfs  -chmod +w  /tmp

[hadoop@slave2 ~]$ hdfs  dfs  -chmod +w  /usr/hive/warehouse

[hadoop@slave2 ~]$ cd apache-hive-0.13.0-bin/conf/

[hadoop@slave2 ~]$ cp hive-default.xml.template hive-site.xml

[hadoop@slave2 ~]$ cp hive-env.sh.template hive-env.sh

[hadoop@slave2 ~]$ cp hive-exec-log4j.properties.template hive-exec-log4j.properties

[hadoop@slave2 ~]$ cp hive-log4j.properties.template hive-log4j.properties

[hadoop@slave2 conf]$ hive

hive> add jar /home/hadoop/apache-hive-0.13.0-bin/lib/hive-contrib-0.13.0.jar;

hive> exit;

[hadoop@slave2 ~]$ vi .bash_profile

添加:

export CLASSPATH=.:$HIVE_HOME/lib:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

[hadoop@slave2 ~]$ source  .bash_profile

[hadoop@slave2 conf]$ mysql  -u root  -p

Enter password:

mysql> create database  hadoop;

mysql> create  user 'hive'@'slave2.hadoop'  identified by  '111111';

Query OK, 0 rows affected (0.07 sec)

mysql> GRANT ALL PRIVILEGES ON hadoop.* TO 'hive'@'slave2.hadoop' WITH GRANT OPTION;

Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;

mysql> exit

Bye

修改配置文件:

[hadoop@slave2 conf]$ pwd

/home/hadoop/apache-hive-0.13.0-bin/conf

[hadoop@slave2 ~]$ vi hive-site.xml

删除:

<property>

 <name>hive.metastore.local</name>

 <value>true</value>  

<description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>

</property>

---没删除之前运行hive报以下警告:

~~~~~~~~~

WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.

上述提示是由HIVE的bug引起的,详情见https://issues.apache.org/jira/browse/HIVE-6159,另外实际上在该0.13中该bug已经修复,但由于官网下载的hive版本是基于hadoop 0.20平台编译的,当前hadoop 2.2环境下需要重新编译hive.

~~~~~~~~~~

修改:

<property>

 <name>javax.jdo.option.ConnectionURL</name>

 <value>jdbc:mysql://slave2.hadoop:3306/hive?createDatabaseIfNotExist=true</value>

 <description>JDBC connect string for a JDBC metastore</description>

</property>

<property>

 <name>javax.jdo.option.ConnectionDriverName</name>

 <value>com.mysql.jdbc.Driver</value>

 <description>Driver class name for a JDBC metastore</description>

</property>

<property>

 <name>javax.jdo.option.ConnectionUserName</name>

 <value>hive</value>

 <description>username to use against metastore database</description>

</property>

<property>

 <name>javax.jdo.option.ConnectionPassword</name>

 <value>hivepasswd</value>

 <description>password to use against metastore database</description>

</property>

--

添加jdbc的jar包:

添加驱动jar包到SQOOP_HOME/lib/下面

 我现在用的是mysql-server-5.1.73

 从

[hadoop@slave2 ~]$ tar xvf  mysql-connector-java-5.1.30.tar.gz

[hadoop@slave2 ~]$ cp  ~/mysql-connector-java-5.1.30/mysql-connector-java-5.1.30-bin.jar  ~/apache-hive-0.13.0-bin/lib/

[hadoop@slave2 ~]$ hive

14/05/22 11:08:24 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces

14/05/22 11:08:24 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize

14/05/22 11:08:24 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

14/05/22 11:08:24 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node

14/05/22 11:08:24 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive

14/05/22 11:08:24 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack

14/05/22 11:08:24 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize

14/05/22 11:08:24 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

14/05/22 11:08:24 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead

Logging initialized using configuration in

hive> CREATE TABLE maptile (ipaddress STRING,time STRING,method STRING,request STRING,protocol STRING,status STRING,size STRING,referer STRING,agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ("input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) (\"[^ ]*) ([^ ]*) ([^ ]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*| \".*\") ([^ \"]*|\".*\"))?","output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s %11$s")STORED AS TEXTFILE;

OK

Time taken: 0.18 seconds

hive> load data  inpath '/flume/172.20.105.133_server/20140520_date/FlumeData.1400629779172' overwrite into table maptile;

Loading data to table default.maptile

rmr: DEPRECATED: Please use 'rm -r' instead.

Deleted hdfs://master.hadoop:9000/user/hive/warehouse/maptile

Table default.maptile stats: [numFiles=1, numRows=0, totalSize=40138292, rawDataSize=0]

OK

Time taken: 1.435 seconds

hive> create table result (ip string,num int) partitioned by (dt string);     

OK

Time taken: 0.107 seconds

hive> insert overwrite table result partition (dt='2014-5-20') select ipaddress,count(1) as numrequest from maptile group by ipaddress sort by numrequest desc;

Total jobs = 2

Launching Job 1 out of 2

Number of reduce tasks not specified. Estimated from input data size: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapreduce.job.reduces=<number>

Starting Job = job_1400782075030_0001, Tracking URL = http://master.hadoop:8088/proxy/application_1400782075030_0001/

Kill Command = /home/hadoop/hadoop-2.2.0/bin/hadoop job  -kill job_1400782075030_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2014-05-22 11:59:26,803 Stage-1 map = 0%,  reduce = 0%

2014-05-22 11:59:42,986 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.15 sec

2014-05-22 12:00:02,351 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.59 sec

MapReduce Total cumulative CPU time: 4 seconds 590 msec

Ended Job = job_1400782075030_0001

Launching Job 2 out of 2

Number of reduce tasks not specified. Estimated from input data size: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapreduce.job.reduces=<number>

Starting Job = job_1400782075030_0002, Tracking URL = http://master.hadoop:8088/proxy/application_1400782075030_0002/

Kill Command = /home/hadoop/hadoop-2.2.0/bin/hadoop job  -kill job_1400782075030_0002

Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1

2014-05-22 12:00:32,149 Stage-2 map = 0%,  reduce = 0%

2014-05-22 12:00:41,397 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 1.05 sec

2014-05-22 12:01:23,745 Stage-2 map = 100%,  reduce = 67%, Cumulative CPU 2.77 sec

2014-05-22 12:01:27,819 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 3.92 sec

MapReduce Total cumulative CPU time: 3 seconds 920 msec

Ended Job = job_1400782075030_0002

Loading data to table default.result partition (dt=2014-5-20)

[Error 30017]: Skipping stats aggregation by error org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30015]: Stats aggregator of type counter cannot be connected to

Partition default.result{dt=2014-5-20} stats: [numFiles=1, numRows=-1, totalSize=10, rawDataSize=-1]

MapReduce Jobs Launched: 

Job 0: Map: 1  Reduce: 1   Cumulative CPU: 4.59 sec   HDFS Read: 40138523 HDFS Write: 117 SUCCESS

Job 1: Map: 1  Reduce: 1   Cumulative CPU: 3.92 sec   HDFS Read: 486 HDFS Write: 10 SUCCESS

Total MapReduce CPU Time Spent: 8 seconds 510 msec

OK

Time taken: 206.925 seconds

~~~~~~~~~~~~~~~

---这里的[Error 30017]: Skipping stats aggregation by error org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30015]: Stats aggregator of type counter cannot be connected to  还不知道怎么个情况,回头再查一下

~~~~~~~~~~~~~~~~~~~~~~

hive> show tables;

OK

maptile

result

Time taken: 0.349 seconds, Fetched: 2 row(s)

hive> select * from result;

OK

NULL 138265 2014-5-20

Time taken: 0.766 seconds, Fetched: 1 row(s)