环境:centos-6.5_x86_64
一、mysql安装
1.查看是系统是否自带安装了mysql
[root@slave2 ~]# rpm -q mysql
package mysql is not installed
2.
[root@slave2 ~]# yum install -y mysql-server mysql mysql-devel
[root@slave2 ~]# rpm -qi mysql-server
[root@slave2 ~]# service mysqld start
Initializing MySQL database: Installing MySQL system tables...
OK
Filling help tables...
OK
To start mysqld at boot time you have to copy
support-files/mysql.server to the right place for your system
PLEASE REMEMBER TO SET A PASSWORD FOR THE MySQL root USER !
To do so, start the server, then issue the following commands:
/usr/bin/mysqladmin -u root password 'new-password'
/usr/bin/mysqladmin -u root -h slave2.hadoop password 'new-password'
Alternatively you can run:
/usr/bin/mysql_secure_installation
which will also give you the option of removing the test
databases and anonymous user created by default. This is
strongly recommended for production servers.
See the manual for more instructions.
You can start the MySQL daemon with:
cd /usr ; /usr/bin/mysqld_safe &
You can test the MySQL daemon with mysql-test-run.pl
cd /usr/mysql-test ; perl mysql-test-run.pl
Please report any problems with the /usr/bin/mysqlbug script!
[ OK ]
Starting mysqld: [ OK ]
[root@slave2 ~]# /usr/bin/mysqladmin -u root password '111111'
[root@slave2 ~]# /usr/bin/mysqladmin -u root -h slave2.hadoop password '111111'
[root@slave2 ~]# /usr/bin/mysql_secure_installation
---执行以上命令做下安全方面的修改
---------------
[root@slave2 ~]# cat /etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
----------------
[root@slave2 ~]# which mysqld
/usr/bin/which: no mysqld in (/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
[root@slave2 ~]# find / -name mysqld
/usr/libexec/mysqld
/var/lock/subsys/mysqld
/var/run/mysqld
/etc/logrotate.d/mysqld
/etc/rc.d/init.d/mysqld
[root@slave2 ~]# /usr/libexec/mysqld --verbose --help | grep -A 1 'Default options'
Default options are read from the following files in the given order:
/etc/mysql/my.cnf /etc/my.cnf ~/.my.cnf
从上图可以看出, 服务器首先会读取/etc/mysql/my.cnf文件,如果发现该文件不存在,再依次尝试从后面的几个路径 进行读取
[root@slave2 ~]# chkconfig --list | grep mysqld
mysqld 0:off1:off2:off3:off4:off5:off6:off
[root@slave2 ~]# chkconfig mysqld on
[root@slave2 ~]# chkconfig --list | grep mysqld
mysqld 0:off1:off2:on3:on4:on5:on6:off
[root@slave2 ~]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.1.73 Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show tables;
ERROR 1046 (3D000): No database selected
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
+--------------------+
2 rows in set (0.01 sec)
mysql>
--------------
二、HIVE
1.下载安装及环境配置
[hadoop@slave2 ~]$ wget
[hadoop@slave2 ~]$ tar xvf apache-hive-0.13.0-bin.tar.gz
[hadoop@slave2 ~]$ vi .bash_profile
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
export MAVEN_HOME=/home/hadoop/apache-maven-3.1.1
export PATH=/home/hadoop/apache-maven-3.1.1/bin:$PATH
export HADOOP_PREFIX=/home/hadoop/hadoop-2.2.0
export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
export HADOOP_CONF_DIR=${HADOOP_PREFIX}/etc/hadoop
export HADOOP_HOME=/home/hadoop/hadoop-2.2.0
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
export YARN_HOME=${HADOOP_PREFIX}
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
export SCALA_HOME=/home/hadoop/scala-2.10.1
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_HOME=/home/hadoop/spark-0.9.1-bin-hadoop2
export FLUME_HOME=/home/hadoop/apache-flume-1.4.0-bin
export FLUME_CONF_DIR=$FLUME_HOME/conf
export PATH=.:$PATH::$FLUME_HOME/bin
export HIVE_HOME=/home/hadoop/apache-hive-0.13.0-bin
export PATH=$HIVE_HOME/bin:$PATH
[hadoop@slave2 ~]$ . .bash_profile
[hadoop@slave2 ~]$ hdfs dfs -mkdir /tmp
[hadoop@slave2 ~]$ hdfs dfs -mkdir /usr/hive/warehouse
[hadoop@slave2 ~]$ hdfs dfs -chmod +w /tmp
[hadoop@slave2 ~]$ hdfs dfs -chmod +w /usr/hive/warehouse
[hadoop@slave2 ~]$ cd apache-hive-0.13.0-bin/conf/
[hadoop@slave2 ~]$ cp hive-default.xml.template hive-site.xml
[hadoop@slave2 ~]$ cp hive-env.sh.template hive-env.sh
[hadoop@slave2 ~]$ cp hive-exec-log4j.properties.template hive-exec-log4j.properties
[hadoop@slave2 ~]$ cp hive-log4j.properties.template hive-log4j.properties
[hadoop@slave2 conf]$ hive
hive> add jar /home/hadoop/apache-hive-0.13.0-bin/lib/hive-contrib-0.13.0.jar;
hive> exit;
[hadoop@slave2 ~]$ vi .bash_profile
添加:
export CLASSPATH=.:$HIVE_HOME/lib:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
[hadoop@slave2 ~]$ source .bash_profile
[hadoop@slave2 conf]$ mysql -u root -p
Enter password:
mysql> create database hadoop;
mysql> create user 'hive'@'slave2.hadoop' identified by '111111';
Query OK, 0 rows affected (0.07 sec)
mysql> GRANT ALL PRIVILEGES ON hadoop.* TO 'hive'@'slave2.hadoop' WITH GRANT OPTION;
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
mysql> exit
Bye
修改配置文件:
[hadoop@slave2 conf]$ pwd
/home/hadoop/apache-hive-0.13.0-bin/conf
[hadoop@slave2 ~]$ vi hive-site.xml
删除:
<property>
<name>hive.metastore.local</name>
<value>true</value>
<description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>
---没删除之前运行hive报以下警告:
~~~~~~~~~
WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
上述提示是由HIVE的bug引起的,详情见https://issues.apache.org/jira/browse/HIVE-6159,另外实际上在该0.13中该bug已经修复,但由于官网下载的hive版本是基于hadoop 0.20平台编译的,当前hadoop 2.2环境下需要重新编译hive.
~~~~~~~~~~
修改:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://slave2.hadoop:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepasswd</value>
<description>password to use against metastore database</description>
</property>
--
添加jdbc的jar包:
添加驱动jar包到SQOOP_HOME/lib/下面
我现在用的是mysql-server-5.1.73
从
[hadoop@slave2 ~]$ tar xvf mysql-connector-java-5.1.30.tar.gz
[hadoop@slave2 ~]$ cp ~/mysql-connector-java-5.1.30/mysql-connector-java-5.1.30-bin.jar ~/apache-hive-0.13.0-bin/lib/
[hadoop@slave2 ~]$ hive
14/05/22 11:08:24 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/05/22 11:08:24 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/05/22 11:08:24 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/05/22 11:08:24 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/05/22 11:08:24 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/05/22 11:08:24 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/05/22 11:08:24 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/05/22 11:08:24 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
14/05/22 11:08:24 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in
hive> CREATE TABLE maptile (ipaddress STRING,time STRING,method STRING,request STRING,protocol STRING,status STRING,size STRING,referer STRING,agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ("input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) (\"[^ ]*) ([^ ]*) ([^ ]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*| \".*\") ([^ \"]*|\".*\"))?","output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s %11$s")STORED AS TEXTFILE;
OK
Time taken: 0.18 seconds
hive> load data inpath '/flume/172.20.105.133_server/20140520_date/FlumeData.1400629779172' overwrite into table maptile;
Loading data to table default.maptile
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://master.hadoop:9000/user/hive/warehouse/maptile
Table default.maptile stats: [numFiles=1, numRows=0, totalSize=40138292, rawDataSize=0]
OK
Time taken: 1.435 seconds
hive> create table result (ip string,num int) partitioned by (dt string);
OK
Time taken: 0.107 seconds
hive> insert overwrite table result partition (dt='2014-5-20') select ipaddress,count(1) as numrequest from maptile group by ipaddress sort by numrequest desc;
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1400782075030_0001, Tracking URL = http://master.hadoop:8088/proxy/application_1400782075030_0001/
Kill Command = /home/hadoop/hadoop-2.2.0/bin/hadoop job -kill job_1400782075030_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-05-22 11:59:26,803 Stage-1 map = 0%, reduce = 0%
2014-05-22 11:59:42,986 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.15 sec
2014-05-22 12:00:02,351 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.59 sec
MapReduce Total cumulative CPU time: 4 seconds 590 msec
Ended Job = job_1400782075030_0001
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1400782075030_0002, Tracking URL = http://master.hadoop:8088/proxy/application_1400782075030_0002/
Kill Command = /home/hadoop/hadoop-2.2.0/bin/hadoop job -kill job_1400782075030_0002
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2014-05-22 12:00:32,149 Stage-2 map = 0%, reduce = 0%
2014-05-22 12:00:41,397 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.05 sec
2014-05-22 12:01:23,745 Stage-2 map = 100%, reduce = 67%, Cumulative CPU 2.77 sec
2014-05-22 12:01:27,819 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.92 sec
MapReduce Total cumulative CPU time: 3 seconds 920 msec
Ended Job = job_1400782075030_0002
Loading data to table default.result partition (dt=2014-5-20)
[Error 30017]: Skipping stats aggregation by error org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30015]: Stats aggregator of type counter cannot be connected to
Partition default.result{dt=2014-5-20} stats: [numFiles=1, numRows=-1, totalSize=10, rawDataSize=-1]
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 4.59 sec HDFS Read: 40138523 HDFS Write: 117 SUCCESS
Job 1: Map: 1 Reduce: 1 Cumulative CPU: 3.92 sec HDFS Read: 486 HDFS Write: 10 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 510 msec
OK
Time taken: 206.925 seconds
~~~~~~~~~~~~~~~
---这里的[Error 30017]: Skipping stats aggregation by error org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30015]: Stats aggregator of type counter cannot be connected to 还不知道怎么个情况,回头再查一下
~~~~~~~~~~~~~~~~~~~~~~
hive> show tables;
OK
maptile
result
Time taken: 0.349 seconds, Fetched: 2 row(s)
hive> select * from result;
OK
NULL 138265 2014-5-20
Time taken: 0.766 seconds, Fetched: 1 row(s)